0% found this document useful (0 votes)
294 views97 pages

Huawei Flash Storage

Huawei flash storage
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
294 views97 pages

Huawei Flash Storage

Huawei flash storage
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

Huawei OceanStor Dorado V3 All-Flash Storage Systems

Technical White Paper

Issue 1.6
Date 2019-05-31

HUAWEI TECHNOLOGIES CO., LTD.


Copyright © Huawei Technologies Co., Ltd. 2019. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.

Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.


Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China

Website: http://www.huawei.com
Email: [email protected]

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. i


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper Contents

Contents

1 Executive Summary.................................................................................................................. 1
2 Overview ................................................................................................................................... 2
2.1 OceanStor Dorado V3 Family ................................................................................................................................ 2
2.2 Customer Benefits ................................................................................................................................................. 4

3 System Architecture ................................................................................................................. 6


3.1 Concepts ............................................................................................................................................................... 6
3.1.1 Controller Enclosure ........................................................................................................................................... 6
3.1.2 Controller ........................................................................................................................................................... 8
3.1.3 Disk Enclosure ................................................................................................................................................... 9
3.1.4 Disk Domain ...................................................................................................................................................... 9
3.1.5 Storage Pool ...................................................................................................................................................... 11
3.1.6 RAID ................................................................................................................................................................12
3.2 Hardware Architecture ..........................................................................................................................................16
3.2.1 Product Models .................................................................................................................................................17
3.2.2 Huawei-Developed SSDs ...................................................................................................................................18
3.2.2.1 Wear Leveling ................................................................................................................................................19
3.2.2.2 Bad Block Management ..................................................................................................................................19
3.2.2.3 Data Redundancy Protection ...........................................................................................................................19
3.2.2.4 Background Inspection ...................................................................................................................................20
3.2.2.5 Support for SAS and NVMe............................................................................................................................20
3.2.3 Huawei-Developed Chips ..................................................................................................................................21
3.2.4 Hardware Scalability .........................................................................................................................................22
3.2.5 Hardware Architecture Highlights ......................................................................................................................26
3.3 Software Architecture ...........................................................................................................................................26
3.3.1 FlashLink ..........................................................................................................................................................29
3.3.1.1 Hot and Cold Data Separation .........................................................................................................................29
3.3.1.2 End-to-End I/O Priority...................................................................................................................................30
3.3.1.3 ROW Full-Stripe Write ...................................................................................................................................30
3.3.1.4 Global Garbage Collection ..............................................................................................................................31
3.3.1.5 Global Wear Leveling and Anti-Wear Leveling ................................................................................................32
3.3.2 Read Cache .......................................................................................................................................................33
3.3.3 I/O Process ........................................................................................................................................................34

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. ii


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper Contents

3.3.3.1 Write Process ..................................................................................................................................................34


3.3.3.2 Read Process ..................................................................................................................................................36
3.3.4 Value-added Features .........................................................................................................................................36
3.3.5 Software Architecture Highlights .......................................................................................................................37

4 Smart Series Features ............................................................................................................. 38


4.1 SmartDedupe (Inline Deduplication) .....................................................................................................................38
4.2 SmartCompression (Inline Compression) ..............................................................................................................39
4.3 SmartThin (Intelligent Thin Provisioning) .............................................................................................................41
4.4 SmartQoS (Intelligent Quality of Service Control) ................................................................................................41
4.5 SmartVirtualization (Heterogeneous Virtualization) ...............................................................................................43
4.6 SmartMigration (Intelligent Data Migration) .........................................................................................................44
4.7 SmartMulti-Tenant for File (Multi-tenancy) ..........................................................................................................46
4.8 SmartQuota for File (Quota) .................................................................................................................................48

5 Hyper Series Features ............................................................................................................ 50


5.1 HyperSnap (Snapshot) ..........................................................................................................................................50
5.1.1 HyperSnap for Block .........................................................................................................................................50
5.1.2 HyperSnap for File ............................................................................................................................................53
5.2 HyperCDP (Continuous Data Protection) ..............................................................................................................54
5.3 HyperCopy (Copy) ...............................................................................................................................................56
5.4 HyperClone (Clone) .............................................................................................................................................60
5.4.1 HyperClone for Block ........................................................................................................................................60
5.4.2 HyperClone for File ...........................................................................................................................................62
5.5 HyperReplication (Remote Replication) ................................................................................................................64
5.5.1 HyperReplication/S for Block (Synchronous Remote Replication) ......................................................................64
5.5.2 HyperReplication/A for Block (Asynchronous Remote Replication) ...................................................................67
5.5.3 HyperReplication/A for File (Asynchronous Remote Replication) ......................................................................68
5.6 HyperMetro (Active-Active Layout) .....................................................................................................................71
5.6.1 HyperMetro for Block .......................................................................................................................................71
5.6.2 HyperMetro for File...........................................................................................................................................73
5.7 3DC for Block (Geo-Redundancy) ........................................................................................................................76
5.8 HyperVault for File (All-in-One Backup) ..............................................................................................................77
5.9 HyperLock for File (WORM) ...............................................................................................................................78

6 Cloud Series Features ............................................................................................................ 81


6.1 CloudReplication (Cloud Replication)...................................................................................................................81
6.2 CloudBackup (Cloud Backup) ..............................................................................................................................82

7 System Security and Data Encryption ................................................................................. 86


7.1 Data Encryption ...................................................................................................................................................86
7.2 Role-based Access Control ...................................................................................................................................87

8 System Management and Compatibility ............................................................................ 88


8.1 System Management.............................................................................................................................................88

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. iii


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper Contents

8.1.1 DeviceManager .................................................................................................................................................88


8.1.2 CLI ...................................................................................................................................................................88
8.1.3 Call Home Service .............................................................................................................................................88
8.1.4 RESTful API .....................................................................................................................................................89
8.1.5 SNMP ...............................................................................................................................................................89
8.1.6 SMI-S ...............................................................................................................................................................89
8.1.7 Tools .................................................................................................................................................................89
8.2 Ecosystem and Compatibility................................................................................................................................90
8.2.1 Virtual Volume (VVol) .......................................................................................................................................90
8.2.2 OpenStack Integration .......................................................................................................................................90
8.2.3 Virtual Machine Plug-ins ...................................................................................................................................90
8.2.4 Host Compatibility ............................................................................................................................................90

9 Best Practices........................................................................................................................... 91
10 Appendix ............................................................................................................................... 92
10.1 More Information ...............................................................................................................................................92
10.2 Feedback ............................................................................................................................................................92

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. iv


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 1 Executive Summary

1 Executive Summary

Huawei OceanStor Dorado V3 all-flash storage systems are designed for enterprises'
mission-critical services. They use FlashLink® dedicated to flash media to achieve 0.5 ms
stable latency. The gateway-free HyperMetro feature provides an end-to-end active-active
data center solution, which can smoothly evolve to the geo-redundant disaster recovery (DR)
solution to achieve 99.9999% solution-level reliability. Inline deduplication and compression
maximize the available capacity and reduce the total cost of ownership (TCO). OceanStor
Dorado V3 meets the requirements of enterprise applications such as databases, virtual
desktop infrastructure (VDI), and virtual server infrastructure (VSI), helping the financial,
manufacturing, and carrier industries evolve smoothly to all-flash storage.
This document describes and highlights the unique advantages of OceanStor Dorado V3 in
terms of its product positioning, hardware and software architecture, and features.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 1


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 2 Overview

2 Overview

2.1 OceanStor Dorado V3 Family


2.2 Customer Benefits

2.1 OceanStor Dorado V3 Family


OceanStor Dorado V3 includes Dorado3000 V3, Dorado5000 V3 (NVMe), Dorado5000 V3
(SAS), Dorado6000 V3 (NVMe), Dorado6000 V3 (SAS), Dorado18000 V3 (NVMe), and
Dorado18000 V3 (SAS).

Figure 2-1 OceanStor Dorado3000 V3

Figure 2-2 OceanStor Dorado5000 V3

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 2


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 2 Overview

Figure 2-3 OceanStor Dorado6000 V3

Figure 2-4 OceanStor Dorado18000 V3

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 3


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 2 Overview

Figure 2-5 OceanStor Dorado NAS

For detailed product specifications, visit


http://e.huawei.com/en/products/cloud-computing-dc/storage/unified-storage/dorado-v3.

2.2 Customer Benefits


OceanStor Dorado V3's software architecture is optimized for flash media. Features such as
HyperSnap, HyperClone, HyperReplication/S, HyperReplication/A, HyperMetro, 3DC,
SmartQoS, SmartMigration, SmartThin, HyperCopy, HyperCDP, CloudReplication, and
CloudBackup provide ultimate performance and rich data protection.
OceanStor Dorado NAS provides rich file system features on the basis of OceanStor Dorado
V3's fast, stable and economical hardware. These features include HyperSnap, HyperClone,
HyperReplication/A, HyperMetro, HyperLock, SmartMulti-Tenant, SmartQuota, and
SmartPartition.
Specifically, OceanStor Dorado V3 provides the following benefits:
 Outstanding performance
For banks, customs, and securities institutions, OceanStor Dorado V3 is able to provide
high throughput at a latency lower than 0.5 ms, greatly improving service processing
efficiency and reducing the time window required for batch service processing.
 Scalability
OceanStor Dorado V3 supports both scale-out and scale-up to flexibly meet customers'
requirements for performance and capacity.
− To enhance performance, customers can scale out the system by adding controllers.
The IOPS and bandwidth increase linearly with the number of controllers, while the
latency is unaffected.
− To improve capacity, customers can scale up the system by adding disk enclosures.
 Stability and reliability
Reliability design is implemented at the component, system, and solution levels.
− Huawei-developed SSDs (HSSDs) implement two levels of reliability solutions:
low-density parity check (LDPC) inside flash chips and RAID between flash chips,
providing chip-level failure protection.
− Flash-dedicated technologies such as the Smart Matrix multi-controller architecture,
innovative RAID 2.0+ and RAID-TP, and FlashLink® eliminate single points of
failure, tolerate simultaneous failure of three disks, and improve the longevity of
flash chips.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 4


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 2 Overview

− The gateway-free active-active solution achieves zero recovery time objective (RTO)
and recovery point objective (RPO) in the case of a site failure, ensuring business
continuity.
 Convergence and efficiency
Inline global deduplication and compression allow OceanStor Dorado V3 to reduce
customer capital expenditure (CAPEX) by 75% while providing the same available
capacity as traditional storage systems. Remote replication between OceanStor Dorado
V3 and Huawei converged storage systems can form a DR network containing both
all-flash and traditional storage systems. Heterogeneous virtualization enables OceanStor
Dorado V3 to take over resources from third-party storage systems.
 Fast and cost-effective cloud DR
The CloudReplication and CloudBackup features back up production data to the cloud
without any external gateway, providing a fast, cost-effective, and maintenance-free
cloud DR center.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 5


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

3 System Architecture

3.1 Concepts
3.2 Hardware Architecture
3.3 Software Architecture

3.1 Concepts
3.1.1 Controller Enclosure
The OceanStor Dorado V3 controller enclosure contains storage controllers that process all
storage service logic. It provides core functions such as host access, device management, and
data services. A controller enclosure consists of a system subrack, controllers, interface
modules, power modules, BBUs, and management modules. OceanStor Dorado V3 supports 2
U, 3 U, and 6 U controller enclosures. The 2 U enclosure has integrated disks, while the 3 U
and 6 U enclosures do not.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 6


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-1 OceanStor Dorado V3 2 U controller enclosure

1 System subrack 2 Disk


3 Power-BBU module 4 Controller (including
interface modules)

Figure 3-2 OceanStor Dorado V3 3 U controller enclosure

1 System subrack 2 BBU

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 7


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

3 Controller 4 Power module


5 Management 6 Interface module
module

Figure 3-3 OceanStor Dorado V3 6 U controller enclosure

1 System subrack 2 Controller


3 BBU 4 Power module
5 Management 6 Interface module
module

3.1.2 Controller
An OceanStor Dorado V3 controller is a computing module consisting of the CPU, memory,
and main board. It processes storage services, receives configuration and management
commands, saves configuration data, connects to disk enclosures, and stores critical data onto
coffer disks.
Coffer disks can be either built-in or external ones. They store system data and cache data in
the event of a power failure on the storage system. For the Dorado3000 V3 and Dorado5000
V3 series, the first four disks on the controller enclosure are the coffer disks; for the
Dorado6000 V3 series, the first four disks on the first disk enclosure are the coffer disks. For
details about the coffer disk specifications and partitioning, see the OceanStor Dorado3000

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 8


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

V3, Dorado5000 V3 and Dorado6000 V3 Product Documentation and OceanStor


Dorado18000 V3 Product Documentation.
Each controller enclosure has two or four controllers. Every two controllers form a pair for
high availability. If a single controller fails, the other controller takes over the storage services
to guarantee service continuity. The front-end I/O modules on the controllers provide host
access ports. The port types include 8 Gbit/s, 16 Gbit/s, or 32 Gbit/s Fibre Channel, 100GE,
40GE, 25GE, and 10GE.

3.1.3 Disk Enclosure


A disk enclosure of OceanStor Dorado V3 houses 25 x 2.5-inch SSDs. It consists of a system
subrack, expansion modules, power modules, and disks. A SAS disk enclosure provides four
SAS 3.0 x 4 expansion ports for scale-up, and an NVMe disk enclosure provides two PCIe 3.0
x 8 expansion ports for scale-up.

Figure 3-4 Disk enclosure

1 System subrack 2 Disk


3 Power module 4 Expansion module

3.1.4 Disk Domain


A disk domain consists of multiple disks. RAID groups select member disks from a disk
domain. OceanStor Dorado V3 can have one or more disk domains and supports disk domains
across controller enclosures (two at most). A dual-controller system supports up to four disk
domains, and a four-controller system supports up to eight disk domains. Each disk domain
can have SSDs of two different capacities.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 9


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-5 Disk domain across controller enclosures

Figure 3-5 shows a dual-controller system. You can create a disk domain that contains all
disks in the system or create a separate disk domain for each controller enclosure.
When creating a disk domain, you must specify the hot spare policy and encryption type.
You can choose to use a high or low hot spare policy, or not to use any one. The policy can be
changed online.
 When you use a high hot spare policy, the disk domain reserves great hot spare space for
data reconstruction in the event of a disk failure. The hot spare space increases
non-linearly with the number of disks.
 When you use a low hot spare policy, which is the default setting, the disk domain
reserves a small amount of hot spare space (enough for the data on at least one disk) for
data reconstruction in the event of a disk failure. The hot spare space increases
non-linearly with the number of disks.
 If you do not use a hot spare policy, the system will not reserve hot spare space.

Table 3-1 Relationship between the hot spare space and the number of disks (less than 200)

Number of Hot Spare Space Under the Hot Spare Space Under the
Disks High Policy High Policy
8 to 12 Equals to the capacity of 1 disk. Equals to the capacity of 1 disk.
13 to 25 Equals to the capacity of 2 disks.
26 to 50 Equals to the capacity of 3 disks. Equals to the capacity of 2 disks.
51 to 75 Equals to the capacity of 4 disks.
76 to 125 Equals to the capacity of 5 disks. Equals to the capacity of 3 disks.
126 to 175 Equals to the capacity of 6 disks.
176 to 200 Equals to the capacity of 7 disks. Equals to the capacity of 4 disks.

You can create either a standard or an encrypted disk domain. The encryption type cannot be
changed after the disk domain is created.
 A standard disk domain consists of non-self-encrypting drives or self-encrypting drives
(SEDs) on which encryption is disabled.
 An encrypted disk domain consists of only SEDs. You must configure the key
management service when you use an encrypted disk domain.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 10


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-6 Creating a disk domain

3.1.5 Storage Pool


Storage pools, which are containers of storage resources, are created in disk domains. The
storage resources used by application servers are all from storage pools. Each disk domain can
have only one storage pool.
You must specify the RAID level when creating a storage pool. By default, a storage pool has
all the available capacity of the selected disk domain.
By default, a storage pool uses RAID 6, which meets the reliability requirements in most
scenarios while providing high performance and capacity utilization. When the capacity of a
single disk is large (for example, 8 TB), reconstruction of a single disk will take a long time,
which reduces reliability. In this case, RAID-TP can be used for higher reliability.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 11


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-7 Creating a storage pool

3.1.6 RAID
OceanStor Dorado V3 uses a Huawei proprietary algorithm, Erase-Code (EC), to implement
RAID5, RAID 6, RAID-TP, RAID 10*. RAID-TP is able to tolerate three faulty disks,
providing high system reliability.

If you require the specifications marked by *, contact Huawei sales personnel.

OceanStor Dorado V3 uses the RAID 2.0+ block-level virtualization technology to implement
RAID. With this technology:
 Multiple SSDs form a disk domain.
 Each SSD is divided into fixed-size chunks (typically 4 MB per chunk) to facilitate
logical space management.
 Chunks from different SSDs constitute a chunk group (CKG) based on the
customer-configured RAID level.
Chunk groups support three redundancy configurations:
 RAID 5 uses the EC-1 algorithm and generates one copy of parity data for each stripe.
 RAID 6 uses the EC-2 algorithm and generates two copies of parity data for each stripe.
 RAID-TP uses the EC-3 algorithm and generates three copies of parity data for each
stripe.
A chunk group is further divided to smaller-granularity (typically, 8 KB) grains, which are the
smallest unit for data writes. OceanStor Dorado V3 adopts full-stripe write to avoid extra
overhead generated in traditional RAID mechanisms. Figure 3-8 shows RAID mapping on
OceanStor Dorado V3.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 12


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-8 RAID mapping on OceanStor Dorado V3

OceanStor Dorado V3 uses EC to support more member disks in a RAID group, improving
space utilization.

Table 3-2 Space utilization of RAID groups using EC

RAID Level Number of Space Number of Space


Member Disks Utilization Member Disks Utilization
Recommended Recommended by
by EC the Traditional
Algorithm
RAID 5 22+1 95.6% 7+1 87.5%
RAID 6 21+2 91.3% 14+2 87.5%
RAID-TP 20+3 86.9% Not supported NA

If a disk is faulty or is removed for a long time, the chunks on this disk are reconstructed. The
detailed procedure is as follows:
1. The disk becomes faulty and the chunks on it become unavailable.
2. The RAID level degrades for the chunk groups that contain the affected chunks.
3. The system allocates idle chunks from the storage pool for data reconstruction.
4. Based on the RAID level of the storage pool, the system uses the normal data columns
and parity data to restore the damaged data blocks and writes them to the idle chunks.
Because the faulty chucks are distributed to multiple chunk groups, all of the affected chunk
groups start reconstruction at the same time. In addition, the new chunks are from multiple
disks. This enables all disks in the disk domain to participate in reconstruction, fully utilizing
the I/O capability of all disks to improve the data reconstruction speed and shorten data
recovery time.
OceanStor Dorado V3 uses both common and dynamic RAID reconstruction methods to
prevent RAID level downgrade and ensure system reliability in various scenarios.
 Common reconstruction
A RAID group has M+N members (M indicates data columns and N indicates parity
columns). When the system has faulty disks, common reconstruction is triggered if the

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 13


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

number of normal member disks in the disk domain is still greater than or equal to M+N.
During reconstruction, the system uses idle chunks to replace the faulty ones in the
chunk groups and restores data to the new chunks. The RAID level remains M+N.
In Figure 3-9, D0, D1, D2, P, and Q form a chunk group. If disk 2 fails, a new chunk
D2_new on disk 5 is used to replace D2 on disk 2. In this way, D0, D1, D2_new, P, and
Q form a new chunk group and the system restores the data of D2 to D2_new.
After common reconstruction is complete, the number of RAID member disks remains
unchanged, maintaining the original redundancy level.

Figure 3-9 Common reconstruction

 Dynamic reconstruction
If the number of member disks in the disk domain is fewer than M+N, the system
reduces the number of data columns (M) and retains the number of parity columns (N)
during reconstruction. This method retains the RAID level by reducing the number of
data columns, ensuring system reliability.
During the reconstruction, the data on the faulty chunk is migrated to a new chunk group.
If the system only has M+N-1 available disks, the RAID level for the new chunk group
is (M-1)+N. The remaining normal chunks (M-1) and parity columns P and Q form a
new chunk group and the system calculates new parity columns P' and Q'.
In Figure 3-10, there are six disks (4+2). If disk 2 fails, data D2 in CKG0 is written to
the new CKG1 as new data (D2') and the RAID level is 3+2. D0, D1, and D3 form a new
3+2 CKG0 with new parity columns P' and Q'.
After the reconstruction is complete, the number of member disks in the RAID group is
decreased, but the RAID redundancy level remains unchanged.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 14


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-10 Dynamic reconstruction

The number of RAID members is automatically adjusted by the system based on the number
of disks in a disk domain. Factors such as capacity utilization, reliability, and reconstruction
speed are considered. Table 3-3 describes the relationship between the disks in a disk domain
and RAID members.

Table 3-3 Number of disks and RAID members

Number of Disks in Number of RAID Hot Spare Space Under the


a Disk Domain (X) Members High Policy
8 to 12 X-1 Equals to the capacity of 1 disk.
13 to 25 X-2 Equals to the capacity of 2 disks.
26 or 27 X-3 Equals to the capacity of 3 disks.
> 27 25 Greater than or equal to the capacity
of 3 disks.

The number of RAID members (M+N) complies with the following rules:
1. If the number of faulty disks in a disk domain is less than or equal to the number of disks
in the hot spare space, the system does not trigger dynamic reconstruction.
2. A high capacity utilization should be guaranteed.
3. M+N should not exceed 25.
When the number of disks is less than 13, the hot spare space equals to the capacity of one
disk and M+N is X-1. This ensures the highest possible capacity utilization.
When a disk domain has 13 to 25 disks, the hot spare space equals to the capacity of two disks
and M+N is X-2. This setting is to avoid dynamic reconstruction when multiple disks fail.
When a disk domain has 26 or 27 disks, the hot spare space equals to the capacity of three
disks and M+N is X-3. Dynamic reconstruction will not be triggered if up to three disks fail
(at different time).
When the number of disks is greater than 27, the maximum value of M+N will be 25. This
ensures a high capacity utilization while limiting read amplification caused by reconstruction.
For example, if a disk in a 30+2 RAID group becomes faulty, the system must read the chunks

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 15


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

from 30 disks to reconstruct each chunk in the affected chunk groups, resulting in great read
amplification. To avoid this, the system limits M+N to 25.
When new disks are added to the system to expand capacity, the value of M+N increases with
the number of disks. All new data (including data generated by garbage collection) will be
written using the new RAID level, while the RAID level for the existing data remains
unchanged. For example, a disk domain has 15 disks and uses RAID 6; M+N is 11+2. If the
customer expands the domain to 25 disks, new data will be written to the new 21+2 chunk
groups, while the existing data is still in the original 11+2 chunk groups. When garbage
collection starts, the system will move the valid chunks in the original 11+2 chunk groups to
the 21+2 chunk groups and then reclaim the original chunk groups.
OceanStor Dorado V3 has the following advantages in terms of data redundancy and
recovery:
 Fast reconstruction
All disks in the disk domain participate in reconstruction. Test results show that
OceanStor Dorado V3 takes only 30 minutes to reconstruct 1 TB of data (when there is
no new data written to the system), whereas traditional RAID takes more than 2 hours.
 Multiple RAID levels available
OceanStor Dorado V3 supports RAID 5, RAID 6, and RAID-TP. You can choose the
RAID level that meets your needs. RAID-TP allows three faulty disks and provides the
highest reliability for mission-critical services.
 Intelligent selection of RAID member disks
If a disk has a persistent fault, the system can intelligently reduce the number of member
disks in the RAID group and use dynamic reconstruction to write new data with the
original RAID level instead of a lower level, avoiding reduction in data reliability.
 Appending mechanism to ensure data consistency
OceanStor Dorado V3 uses appending in full-stripe writes. This avoids data
inconsistency in traditional RAID caused by write holes.

3.2 Hardware Architecture


OceanStor Dorado V3 series uses the Smart Matrix multi-controller architecture. Controller
enclosures can be scaled out to achieve linear increase in performance and capacity. Every
two controllers on a controller enclosure form a pair for high availability. Cache mirroring
channels are established between the two controllers using onboard PCIe 3.0 links. Multiple
controller enclosures are interconnected by PCIe 3.0 switches for scale-out. Controller
enclosures connect to disk enclosures via SAS 3.0 links for scale-up.
The disks on a controller enclosure have two ports to connect to two controllers. Both SAS
and NVMe SSDs are supported.
The backup battery units (BBUs) supply power to the system in the event of an unexpected
power outage, which allows the system to write cache data to coffer disks to prevent data loss.
Huawei-developed SmartIO interface module provides 8 Gbit/s, 16 Gbit/s, or 32 Gbit/s Fibre
Channel, 25GE, and 10GE ports. The system also supports 40GE and 100GE interface
modules.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 16


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-11 Smart Matrix multi-controller architecture

3.2.1 Product Models


The OceanStor Dorado V3 series products include OceanStor Dorado3000 V3, OceanStor
Dorado5000 V3, OceanStor Dorado6000 V3, and OceanStor Dorado18000 V3.

Table 3-4 OceanStor Dorado V3 product models

Model Controller Enclosure Number of Disk Type


Controllers per
Enclosure
Dorado3000 V3 2 U enclosure with 2 SAS
integrated disks
Dorado5000 V3 2 U enclosure with 2 NVMe or SAS
integrated disks
Dorado6000 V3 3 U independent enclosure 2 NVMe or SAS
without disks
Dorado18000 V3 6 U independent enclosure 2 or 4 NVMe or SAS
without disks
Dorado NAS 2 U enclosure with 2 N/A
integrated disks

The Dorado3000 V3 or Dorado5000 V3 controller enclosure has integrated disks to achieve


high-density performance and capacity. The controller enclosure is 2 U high and houses two
controllers that are interconnected by the midplane.
Dorado5000 V3 supports both NVMe and SAS SSDs. With NVMe SSDs, PCIe switching
chips connect to 25 x 2.5-inch dual-port NVMe SSDs. With SAS SSDs, SAS switching chips
connect to 25 x 2.5-inch dual-port SAS SSDs.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 17


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-12 Device architecture of Dorado5000 V3 with NVMe SSDs

Figure 3-13 Device architecture of Dorado3000 V3 and Dorado5000 V3 with SAS SSDs

Dorado6000 V3 and Dorado18000 V3 use independent controller enclosures that do not have
disks, allowing flexible scale-out and scale-up. Dorado6000 V3 uses 3 U controller enclosures,
each of which houses two controllers; Dorado18000 V3 uses 6 U controller enclosures, each
of which houses two or four controllers. Controllers within an enclosure are interconnected by
PCIe 3.0 channels on the midplane, while controllers on different enclosures are
interconnected by PCIe 3.0 switches to scale out the system. The controller enclosures can
connect to disk enclosures via SAS 3.0 links to scale up the system capacity.

3.2.2 Huawei-Developed SSDs


OceanStor Dorado V3 uses Huawei-developed SSDs (HSSDs) to maximize system
performance. HSSDs work perfectly with storage software to provide an optimal experience
across various service scenarios.
An SSD consists of a control unit and a storage unit (mainly flash memory chips). The control
unit contains an SSD controller, host interface, and dynamic random access memory (DRAM)
module. The storage unit contains only NAND flash chips.
Blocks and pages are the basic units for reading and writing data in the NAND flash.
 A block is the smallest erasure unit and generally consists of multiple pages.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 18


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

 A page is the smallest programming and read unit. Its size is usually 4 KB, 8 KB, or 16
KB.
Operations on NAND flash include erase, program, and read. The program and read
operations are implemented at the page level, while the erase operations are implemented at
the block level. Before writing a page, the system must erase the entire block where the page
resides. Therefore, the system must migrate the valid data in the block to a new storage space
before erasing it. This process is called garbage collection (GC). SSDs can only tolerate a
limited number of program/erase (P/E) cycles. If a block on an SSD experiences more P/E
cycles than others, it will wear out more quickly. To ensure reliability and performance,
HSSDs leverage the following advanced technologies.

3.2.2.1 Wear Leveling


The SSD controller uses software algorithms to monitor and balance the P/E cycles on blocks
in the NAND flash. This prevents over-used blocks from failing and extends the service life of
the NAND flash.
HSSDs support both dynamic and static wear leveling. Dynamic wear leveling enables the
SSD to write data preferentially to less-worn blocks to balance P/E cycles. Static wear
leveling allows the SSD to periodically detect blocks with fewer P/E cycles and reclaim their
data, ensuring that blocks storing cold data can participate in wear leveling.

3.2.2.2 Bad Block Management


Unqualified blocks may occur when the NAND flash is manufactured or used, which are
labeled as bad blocks. HSSDs identify bad blocks according to the P/E cycles, error type, and
error frequency of the NAND flash. If a bad block exists, the SSD recovers the data on the
bad block by using the Exclusive-OR (XOR) redundancy check data between the NAND flash
memories, and saves it to a new block. Within the lifecycle of an HSSD, about 1.5% of blocks
may become bad blocks. HSSDs have reserved space to replace these bad blocks, ensuring
sufficient available capacity and user data security.

3.2.2.3 Data Redundancy Protection


HSSDs use multiple redundancy check methods to protect user data from bit flipping,
manipulation, or loss. Error correction code (ECC) and cyclic redundancy check (CRC) are
used in the DRAM of the SSDs to prevent data changes or manipulation; low-density parity
check (LDPC) and CRC are used in the NAND flash to prevent data loss caused by NAND
flash errors; XOR redundancy is used between NAND flash memories to prevent data loss
caused by flash failures.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 19


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-14 Data redundancy check

LDPC uses linear codes defined by the check matrix to check and correct errors. When data is
written to pages on the NAND flash, the system calculates the LDPC verification information
and writes it to the pages with the user data. When data is read from the pages, LDPC verifies
and corrects the data.
HSSDs house a built-in XOR engine to implement redundancy protection between flash chips.
If a flash chip becomes faulty (page failure, block failure, die failure, or full chip failure),
redundancy check data is used to recover the data on the faulty blocks, preventing data loss.

3.2.2.4 Background Inspection


If data is stored in NAND flash for a long term, data errors may occur due to read interference,
write interference, or random failures. HSSDs periodically read data from the NAND flash,
check for bit changes, and write data with bit changes to new pages. This process detects and
handles risks in advance, which effectively prevents data loss and improves data security and
reliability.

3.2.2.5 Support for SAS and NVMe


Huawei HSSDs support both SAS and NVMe ports. NVMe is a more light-weighted protocol
than SAS. Its software stack does not have a SCSI layer, reducing the number of protocol
interactions. In addition, NVMe does not require a SAS controller or SAS expander on the
hardware transmission path. The NVMe SSD directly connects to the CPU via the PCIe bus to
achieve lower latency. In addition, NVMe supports a larger concurrency and queue depth (64k
queues, each queue with a depth of 64k), fully exploiting SSD performance. The NVMe
HSSDs provide dual ports and are hot swappable, improving system performance, reliability,
and maintainability.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 20


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-15 Transmission paths of NVMe and SAS SSDs

NVMe SSDs reduce the number of interactions in a write request from 4 (in a SAS protocol)
to 2.

Figure 3-16 SAS and NVMe protocol interactions

3.2.3 Huawei-Developed Chips


OceanStor Dorado V3 uses Huawei-developed chips, including SSD controller chips,
front-end interface chips (SmartIO chips), and baseboard management controller (BMC)
chips.
 SSD controller chip
HSSDs use new-generation enterprise-class controllers, which provide SAS 3.0 x2 and
PCIe 3.0 x4 ports in compliance with industry standards. The controller features high
performance and low power consumption. The controllers use enhanced ECC and
built-in RAID technologies to extend the SSD service life to meet enterprise-level
reliability requirements. In addition, this 28 nm chip supports the latest DDR4, 12 Gbit/s

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 21


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

SAS, and 8 Gbit/s PCIe rates as well as Flash Translation Layer (FTL) hardware
acceleration to provide stable performance at a low latency for enterprise applications.
 SmartIO chip
Hi182x (IOC) is the first Huawei-developed storage interface chip. It integrates multiple
interface protocols such as 8 Gbit/s, 16 Gbit/s, or 32 Gbit/s Fibre Channel, 100GE, 40GE,
25GE, and 10GE to achieve excellent performance, high interface density, and flexible
configuration.
 BMC chip
Hi1710 is a BMC chip dedicated to the X86 CPU platform. It consists of the A9 CPU,
8051 co-processor, sensor circuits, control circuits, and interface circuits. It supports the
Intelligent Platform Management Interface (IPMI), which monitors and controls the
hardware components of the storage system, including system power control, controller
monitoring, interface module monitoring, power supply and BBU management, and fan
monitoring.

3.2.4 Hardware Scalability


OceanStor Dorado V3 supports both scale-up and scale-out.

Figure 3-17 Scale-out and scale-up

Scale-up
The controller and disk enclosures of OceanStor Dorado V3 are directly connected by
redundant SAS 3.0 links. For Dorado6000 V3 and Dorado18000 V3, disk enclosures use
dual-uplink networking; for Dorado5000 V3 (SAS), disk enclosures use single-uplink
networking.
In dual-uplink networking, both ports on each expansion module of a disk enclosure are used
as uplink ports to connect to a controller enclosure. That is, each disk enclosure is connected
to a controller enclosure using four ports. Dual-uplink networking can improve back-end
bandwidth and reduce latency, eliminating bottlenecks caused by links.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 22


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-18 Dual-uplink networking

In single-uplink networking, one port on each expansion module of a disk enclosure is used as
the uplink port to connect to a controller enclosure. That is, each disk enclosure is connected
to a controller enclosure using two ports.
NVMe disk enclosures use 8 x 8 Gbit/s PCIe 3.0 expansion cables, which provide greater
transmission capability than SAS cables. Therefore, single-uplink networking using PCIe
cables is able to meet performance requirements.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 23


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-19 Single-uplink networking for NVMe disk enclosures

For Dorado3000 V3 and Dorado5000 V3 (SAS), the 25 SSDs on the controller enclosure use
dual-uplink networking, while external disk enclosures use single-uplink networking to
connect to the controller enclosure.

Figure 3-20 Single-uplink networking for Dorado3000 V3 and Dorado5000 V3 (SAS)

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 24


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

It is recommended that you use disks of the same capacity when deploying the storage system
for the first time. You can later scale up by adding disks of the same or greater capacity as the
existing ones, reducing TCO.

Scale-out
The two or four controllers on an OceanStor Dorado V3 controller enclosure are
interconnected by the mirroring channels on the midplane, and controllers on different
controller enclosures are interconnected using PCIe 3.0 switches. Each controller has a 2-port
PCIe interface module that connects to two PCIe switches for redundancy. Faults on any
switch, controller, interface module, or link will not interrupt services.
The following figures show details of the network connections.

Figure 3-21 Scale-out data network connections

The scale-out management network is connected in daisy chain layout, which manages both
the controllers and PCIe switches. This saves ports on the management switches.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 25


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-22 Scale-out management network connections

3.2.5 Hardware Architecture Highlights


 Outstanding performance
The hardware features end-to-end high-speed architecture, PCIe 3.0 buses, SAS 3.0 or
PCIe 3.0 x 4 disk ports, and 8 Gbit/s, 16 Gbit/s, or 32 Gbit/s Fibre Channel, 100GE,
40GE, 25GE, or 10GE front-end ports. Huawei-developed NVMe SSDs contribute to
high system performance at a low latency.
 Stable and reliable
Tens of thousands of sets of these systems on live networks have consistently
demonstrated the hardware maturity and fully redundant architecture. Stable and reliable
PCIe hot swap technology allows online maintenance and replacement of NVMe SSDs.
 Efficient
OceanStor Dorado V3 supports both scale-out and scale-up, and its controllers and disks
can be expanded online. Its I/O modules use a modular design and are hot swappable.
Both its front-end and back-end ports can be configured on demand.

3.3 Software Architecture


OceanStor Dorado V3 uses a version of the OceanStor OS that has been designed specifically
for SSDs and employs FlashLink® and comprehensive value-added features to provide
excellent performance, robust reliability, and high efficiency.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 26


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-23 Software architecture of OceanStor Dorado V3

The software architecture of the storage controller mainly consists of the cluster &
management plane and service plane.
 The cluster & management plane provides a basic environment to run the system,
controls multi-controller scale-out, and manages alarms, performance, and user
operations.
 The service plane schedules storage service I/Os, permits data scale-out, and implements
controller software-related functions provided by FlashLink®, such as deduplication and
compression, redirect-on-write (ROW) full-stripe write, hot and cold data separation,
garbage collection, global wear leveling, and anti-wear leveling.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 27


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-24 Dorado V3 + Dorado NAS logical framework

The Dorado NAS unit provides end-to-end file system services over the LUNs provided by
Dorado V3, featuring high reliability and performance.

Figure 3-25 Data deduplication and compression for Dorado NAS

The Dorado NAS unit leverages the inline deduplication and compression capability of
Dorado V3 to provide high data reduction ratio at a low latency for NAS services.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 28


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

3.3.1 FlashLink
FlashLink® associates storage controllers with SSDs by using a series of technologies for
flash media, ensuring both reliability and performance of flash storage. The key technologies
of FlashLink® include hot and cold data separation, end-to-end I/O priority, ROW full stripe
write, global garbage collection, global wear leveling, and anti-wear leveling. These
techniques resolve problems such as performance jitter caused by write amplification and
garbage collection, and ensure a steady low latency and high IOPS of OceanStor Dorado V3.

3.3.1.1 Hot and Cold Data Separation


During garbage collection, an SSD must migrate the valid data in the blocks that are to be
reclaimed to a new storage space, and then erase the entire blocks to release their space. If all
the data in a block is invalid, the SSD can directly erase the whole block without migrating
data.
Data in the storage system is classified into hot and cold data by change frequency. For
example, metadata (hot) is updated more frequently and is more likely to cause garbage than
user data (cold). FlashLink® adds labels to data with different change frequencies (user data
and metadata) in the controller software, sends the data to SSDs, and writes the data to
dedicated blocks to separate hot and cold data. In this way, there is a high probability that all
data in a block is invalid, reducing the amount of data migration for garbage collection, and
improving SSD performance and reliability.

Figure 3-26 Hot and cold data separation (1)

In Figure 3-27, the red and gray blocks represent metadata and user data, respectively.
If metadata and user data are stored in the same blocks, the blocks may still contain a large
amount of valid user data after all the metadata becomes garbage, because metadata changes
more frequently than user data. When the system erases these blocks, it must migrate the valid
user data to new blocks, reducing garbage collection efficiency and system performance.
If metadata and user data are stored in different blocks, the system only needs to migrate a
small amount of data before erasing the metadata blocks. This significantly improves the
garbage collection efficiency.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 29


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-27 Hot and cold data separation (2)

3.3.1.2 End-to-End I/O Priority


To ensure stable latency for specific types of I/Os, OceanStor Dorado V3 controllers label
each I/O with a priority according to its type. This allows the system to schedule CPU and
other resources and queue I/Os by priority, offering an end-to-end I/O-priority-based latency
guarantee. Specifically, upon reception of multiple I/Os, SSDs check their priorities and
process higher-priority I/Os first.
OceanStor Dorado V3 classifies I/Os into five types and assigns their priorities in descending
order: read/write I/Os, advanced feature I/Os, reconstruction I/Os, cache flush I/Os, and
garbage collection I/Os. Control based on I/O priorities allows OceanStor Dorado V3 to
achieve optimal internal and external I/O response.

Figure 3-28 End-to-end I/O priority

On the left side in the preceding figure, various I/Os have the same priority and contend for
resources. After I/O priority adjustment, system resources are allocated by I/O priority.

3.3.1.3 ROW Full-Stripe Write


OceanStor Dorado V3 uses ROW full-stripe write, which writes all new data to new blocks
instead of overwriting existing blocks. This greatly reduces the overhead on controller CPUs
and read/write loads on SSDs in a write process, improving system performance in various
RAID levels.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 30


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-29 ROW full-stripe write

In Figure 3-29, the system uses RAID 6 (4+2) and writes new data blocks 1, 2, 3, and 4 to
modify existing data.
In traditional overwrite mode, the system must modify every chunk group where these blocks
reside. For example, when writing data block 3 to CKG2, the system must first read the
original data block d and the parity data P and Q. Then it calculates new parity data P 'and Q',
and writes P', Q', and data block 3 to CKG2. In ROW full-stripe write, the system uses the
data blocks 1, 2, 3, and 4 to calculate P and Q and writes them to a new chunk group. Then it
modifies the logical block addressing (LBA) pointer to point to the new chunk group. This
process does not need to read any existing data.
Typically, RAID 5 uses 22D+1P, RAID 6 uses 21D+2P, and RAID-TP uses 20D+3P, where D
indicates data columns and P indicates parity columns. Table 3-5 compares write
amplification on OceanStor Dorado V3 using these RAID levels.

Table 3-5 Amplification in ROW-based full-strip write

Write Read Write


Amplification of Amplification of Amplification of
Random Small Random Small Sequential I/Os
I/Os I/Os
RAID 5 (22D+1P) 1.05 (23/22) 0 1.05
RAID 6 (21D+2P) 1.10 (23/21) 0 1.10
RAID-TP (20D+3P) 1.15 (23/20) 0 1.15

The performance differences between RAID 5 and RAID 6, and between RAID 6 and
RAID-TP are only about 5%.

3.3.1.4 Global Garbage Collection


OceanStor Dorado V3 uses global garbage collection to reclaim the space occupied by invalid
data blocks after ROW full-stripe write. Garbage collection is triggered when the ratio of

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 31


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

garbage reaches a specified threshold. During garbage collection, the system migrates the
valid data in the target chunk group to a new chunk group. Then the system reclaims all
chunks in the target chunk group to release its space. At the same time, the system issues the
unmap or deallocate command to SSDs to mark the data in the corresponding LBA area as
invalid. The SSDs then reclaim the space. The garbage collection process is initiated by
storage controllers and takes effect on all SSDs.

Figure 3-30 Global garbage collection

3.3.1.5 Global Wear Leveling and Anti-Wear Leveling


Different from HDDs, SSDs can only withstand a limited number of read and write operations.
Therefore, an all-flash storage system requires load balancing between disks to prevent
overly-used disks from failing. FlashLink® uses controller software and disk drivers to
regularly query the disk wearing level from the SSD controller.

Figure 3-31 Global wear leveling

However, if SSDs are approaching the end of their life, for example, the wearing level
exceeds 80%, multiple SSDs may fail simultaneously and data may be lost if global wear
leveling is still used. In this case, the system enables anti-global wear leveling to avoid
simultaneous failures. The system selects the most severely worn SSD and writes data onto it
as long as it has idle space. This reduces that SSD's life faster than others, and you are
prompted to replace it sooner, avoiding simultaneous failures.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 32


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

Figure 3-32 Global anti-wear leveling

3.3.2 Read Cache


Read cache is added to Dorado V300R002 to accelerate read operations. It is a part of the
memory used to cache hot data. When the cached data is read, the system obtains data from
the cache instead of SSDs, accelerating read I/Os. Read cache, together with read prefetch and
cache eviction, greatly improves system performance.
Because the latencies of reading data from SSDs and from the memory are of the same order
of magnitude, the read cache is disabled on Dorado V3 in typical scenarios. However, in
scenarios with specific I/O characteristics (for example, sequential I/Os), enabling the read
cache will increase system performance significantly. You can choose to enable or disable the
read cache according to your own service types. For database services, such as OLTP in
Oracle and SQL Server databases, you are advised to enable the read cache. On Dorado6000
and Dorado18000 V3 with 1 TB memory per controller, the read cache is enabled by default.
For other device models, this function is disabled by default. When there is no write I/O, all
the system cache can be used as the read cache. In addition, the system reserves a minimum
space for the read cache to guarantee cache resources for read services when the write I/O
load is heavy.
 Read prefetch algorithm
OceanStor Dorado V3 uses an adaptive sequential I/O identification algorithm to identity
sequential I/Os from a large number of random I/Os. For these sequential I/Os, the
storage system executes prefetch algorithms to optimize system performance in various
application scenarios. OceanStor Dorado V3 supports intelligent, constant, and variable
prefetch algorithms. Intelligent prefetch automatically identifies the I/O characteristics,
based on which it determines whether to prefetch data and the prefetch length. In
addition, intelligent prefetch collects the outcome of the algorithm, such as the read
cache hit ratio and waste rate of prefetch data, to adjust the prefetch threshold and length,
ensuring proper performance for a variety of application scenarios. By default, the
storage system does not use prefetch. In scenarios with specific I/O models, you are
advised to enable the intelligent prefetch algorithm. You can also choose the constant or
variable prefetch algorithm as required.
 Cache eviction algorithm

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 33


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

When the cache usage reaches the threshold, the cache eviction algorithm calculates the
historical and current data access frequencies, and invokes the least recently used (LRU)
algorithm to evict unnecessary cached data.

3.3.3 I/O Process


3.3.3.1 Write Process

Figure 3-33 Write I/O process

1. A controller receives write I/Os.


2. The write I/Os enter the storage system after passing through the protocol layer. The
system checks whether the I/Os belong to the local controller. If they do not belong to
the local controller, the system forwards them to the peer controller.
3. If the I/Os belong to the local controller, they are written to the local cache and mirrored
to the peer cache.
4. A write success is returned to the host.
5. The cache flushes data to the pool where the data will be deduplicated and compressed
(if deduplication and compression are disabled, the system jumps to step 6).
a. The pool divides the received data into data blocks with a fixed size (4 KB, 8 KB,
16 KB, or 32 KB).
b. For each data block, the pool calculates the fingerprint value and forwards the block
to the owning controller based on the fingerprint.
c. After receiving the data block, the pool of the owning controller searches for the
data block's fingerprint value in its fingerprint table.
d. If the fingerprint is found, the system obtains the location of the corresponding data
and compares that saved data to the new data block, byte by byte. If they are the
same, the system increases the reference count of the fingerprint and does not write
the new data block to the SSDs. If they differ, a hash conflict exists in the data

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 34


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

block, and the system compresses the new data and writes it to the SSDs rather than
deduplicating the data.
e. If the fingerprint table does not contain the same fingerprint, the new data is not
duplicate. The system adds the data block's fingerprint to the table, compresses the
data, and writes it to the SSDs.
f. The compression algorithm is LZ4 or ZSTD, and the granularity is 4 KB, 8 KB, 16
KB, or 32 KB. The compressed data is aligned by byte.
6. The pool combines the data into full stripes and writes them to the SSDs.
a. Compressed I/Os are combined into stripes whose size is an integer multiple of 8
KB.
b. When a stripe is full, the system calculates parity bits and writes the data and parity
bits to disks.
c. If the stripe is not full, 0s are added to the tail before the data is written to disks
(these 0s will be cleared subsequently during garbage collection).
d. Data is written to a new location every time and metadata mapping relationships are
updated.
e. After a message is returned indicating that I/Os are successfully written to disks, the
cache deletes the corresponding data pages.

Figure 3-34 Data flow in a disk domain across controller enclosures

If a disk domain contains disks owned by multiple controller enclosures, host data will be
evenly distributed to all disks in the disk domain. In Figure 3-34, upon receiving a write
request from the host, the storage system performs hash calculation on the received data and
evenly distributes it to all disks in the disk domain based on the hash result.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 35


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

3.3.3.2 Read Process

Figure 3-35 Read I/O process

1. A controller receives a read request.


2. The controller sends the request to the space management module, which determines
whether the I/O belongs to the local controller.
3. If the read request belongs to the local controller, the system proceeds with step 4. If the
read request does not belong to the local controller, the space management module
forwards the request to its owning controller.
4. The owning controller searches for requested data in its cache and returns the data to the
host if it is found.
5. If the controller cannot find the data in its cache, the request is sent to the pool.
6. The pool reads the requested data from disks and returns it to the host. If deduplication
and compression are enabled, the pool reads the read in the following procedure:
a. The pool queries the LBA-fingerprint mapping table to obtain the fingerprint that
corresponds to the request.
b. The request is forwarded to the owning controller of the fingerprint according to the
fingerprint forwarding rules.
c. The owning controller of the fingerprint queries the fingerprint-storage location
mapping table and reads the data from the storage location. It then decompresses
and returns the data to the host.

3.3.4 Value-added Features


OceanStor Dorado V3 provides the following value-added features:

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 36


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 3 System Architecture

 Smart series software includes SmartDedupe, SmartCompression, SmartThin,


SmartVirtualization, and SmartMigration, which improve storage efficiency and reduce
user TCO.
 Hyper series software includes HyperSnap, HyperClone, HyperReplication, HyperMetro,
HyperVault, and HyperLock, which provide disaster recovery and data backup.
 Cloud series software includes CloudReplication and CloudBackup, which construct
cost-effective and maintenance-free cloud DR centers to reduce the OPEX.

3.3.5 Software Architecture Highlights


 Excellent performance
FlashLink® realizes efficient I/O scheduling, providing high performance and low
system latency.
 Stable and reliable
Innovative RAID algorithms, value-added features, and multi-level reliability solutions
ensure 99.9999% reliability and 24/7 stable service system operation.
 Efficient
Multiple efficiency-improving features, such as heterogeneous virtualization and inline
deduplication and compression, protect customers' investments.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 37


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

4 Smart Series Features

4.1 SmartDedupe (Inline Deduplication)


4.2 SmartCompression (Inline Compression)
4.3 SmartThin (Intelligent Thin Provisioning)
4.4 SmartQoS (Intelligent Quality of Service Control)
4.5 SmartVirtualization (Heterogeneous Virtualization)
4.6 SmartMigration (Intelligent Data Migration)
4.7 SmartMulti-Tenant for File (Multi-tenancy)
4.8 SmartQuota for File (Quota)

4.1 SmartDedupe (Inline Deduplication)


SmartDedupe allows OceanStor Dorado V3 to delete duplicate data online before writing data
to flash media. The deduplication process is as follows:
The storage system divides the new data into blocks based on the deduplication granularity.
Then for each block, the system calculates its fingerprint and compares it with the existing
fingerprints. If the same fingerprint is found, the system reads the data corresponding to the
fingerprint and compares that saved data to the new data block, byte by byte. If they are the
same, the system increases the reference count of the fingerprint and does not write the new
data block to the SSDs. If the fingerprint is not found or byte-by-byte comparison is not
passed, the system writes the new data block to SSDs and records the mapping between the
fingerprint and storage location.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 38


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

Figure 4-1 Working principle of deduplication

SmartDedupe on OceanStor Dorado V3 has the following highlights:


 OceanStor Dorado V3 supports 4 KB and 8 KB deduplication granularities. You can
enable or disable SmartDedupe on particular LUNs.
The deduplication ratio depends on the application scenarios and user data contents. For
applications that provide a high deduplication ratio (for example, VDI), it is
recommended that you enable SmartDedupe and use 8 KB deduplication granularity to
save space. In scenarios where the deduplication ratio is low, such as for databases, you
can disable SmartDedupe to improve performance.
 OceanStor Dorado V3 supports byte-by-byte comparison to ensure data reliability.
 OceanStor Dorado V3 can identify zero data, which occupies no storage space.
When an application is reading data, zero is returned if a mapping relationship does not
exist between LBA and the fingerprints. When an application writes zero data blocks, an
internal zero page that requires no storage space is used to replace the zero data,
improving the space utilization and system performance.

4.2 SmartCompression (Inline Compression)


SmartCompression compresses data online before writing data to flash media. In addition,
compression is performed after deduplication, ensuring that no duplicate data is compressed
and improving compression efficiency. SmartCompression reduces the amount of data written
to SSDs and minimizes write amplification, improving the longevity of flash arrays.
The compression algorithm is a compute-intensive program. Inline compression consumes
significant CPU resources, affecting end-to-end performance of the system. Open-source
compression algorithms that feature high performance and low compression ratio are
commonly used in the industry, for example, LZ4, LZO, and Snappy. OceanStor Dorado V3
uses the Fast LZX, LZ4, and ZTSD algorithms, which are improvements of the open-source
LZX, LZ4, and ZTSD compression algorithms and double the compression efficiency without
decreasing the compression ratio.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 39


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

Figure 4-2 Comparison between open-source and Fast LZ4 algorithms

The size of data blocks to be compressed can be 4 KB, 8 KB, 16 KB, and 32 KB. The
compressed data is aligned by byte, which improves the compression efficiency and saves the
storage space for compressed data. In the following figure, 8 KB data blocks are compressed,
converged into full stripes, and then written to disks.

Figure 4-3 Working principle of compression

The compression ratio of OceanStor Dorado V3 also depends on user data. For Oracle OLTP
applications, the compression ratio is between 1.5 and 7.9; for VDI applications, the
compression ratio is between 2.8 and 4. You can enable or disable SmartCompression for each
specific LUN. In applications that require high performance, you can disable this function.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 40


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

4.3 SmartThin (Intelligent Thin Provisioning)


OceanStor Dorado V3 supports thin provisioning, which enables the storage system to
allocate storage resources on demand. SmartThin does not allocate all capacity in advance,
but presents a virtual storage capacity larger than the physical storage capacity. This allows
you to see a larger storage capacity than the actual storage capacity. When you begin to use
the storage, SmartThin provides only the required space. If the storage space is about to use
up, SmartThin triggers storage resource pool expansion to add more space. The expansion
process is transparent to users and causes no system downtime.
Application Scenarios
 SmartThin can help core service systems that have demanding requirements on business
continuity, such as bank transaction systems, to expand system capacity non-disruptively
without interrupting ongoing services.
 For services where the growth of application system data is hard to evaluate accurately,
such as email services and web disk services, SmartThin can assist with on-demand
physical space allocation, preventing wasted space.
 For mixed services that have diverse storage requirements, such as carrier services,
SmartThin can assist with physical space contention, achieving optimized space
allocation.

4.4 SmartQoS (Intelligent Quality of Service Control)


SmartQoS dynamically allocates storage system resources to meet the performance objectives
of applications. You can set upper limits on IOPS or bandwidth for specific applications.
Based on the upper limits, SmartQoS can accurately limit performance of these applications,
preventing them from contending for storage resources with critical applications.
SmartQoS uses LUN- or snapshot-specific I/O priority scheduling and the I/O traffic control
to guarantee the service quality.

I/O Priority Scheduling


This schedules resources based on applications' priorities, prioritizing applications with higher
priorities in resource allocation to ensure their SLAs when storage resources are insufficient.
You can configure an application as high, medium, or low priority.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 41


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

Figure 4-4 I/O priority scheduling process

I/O Traffic Control


This limits traffic of some applications by limiting their IOPS or bandwidth, thereby
preventing these applications from affecting other applications. I/O traffic control is
implemented based on hierarchical management, objective distribution, and traffic control
management.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 42


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

Figure 4-5 Managing LUN or snapshot I/O queues

4.5 SmartVirtualization (Heterogeneous Virtualization)


OceanStor Dorado V3 uses SmartVirtualization to take over heterogeneous storage systems
(including other Huawei storage systems and third-party storage systems), protecting
customer investments. SmartVirtualization conceals the software and hardware differences
between the local and heterogeneous storage systems, allowing the local system to use and
manage the heterogeneous storage resources as its local resources. In addition,
SmartVirtualization can work with SmartMigration to migrate data from heterogeneous
storage systems online, facilitating device replacement.
The working principles of SmartVirtualization are as follows:
SmartVirtualization maps the heterogeneous storage system to the local storage system, which
then uses external device LUNs (eDevLUNs) to take over and manage the heterogeneous
resources. eDevLUNs consist of metadata volumes and data volumes. The metadata volumes
manage the data storage locations of eDevLUNs and use physical space provided by the local
storage system. The data volumes are logical presentations of external LUNs and use physical
space provided by the heterogeneous storage system. An eDevLUN on the local storage
system matches an external LUN on the heterogeneous storage system. Application servers
access data on the external LUNs via the eDevLUNs.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 43


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

Figure 4-6 Heterogeneous storage virtualization

SmartVirtualization uses LUN masquerading to set the WWNs and Host LUN IDs of
eDevLUNs on OceanStor Dorado V3 to the same values as those on heterogeneous storage
system. After data migration is complete, the host's multipathing software switches over the
LUNs online without interrupting services.
Application Scenarios
 Heterogeneous array takeover
As customers build data centers over time, the storage arrays they use may come from
different vendors. Storage administrators can leverage SmartVirtualization to manage
and configure existing devices, protecting investments.
 Heterogeneous data migration
The customer may need to replace storage systems whose warranty periods are about to
expire or whose performance does not meet service requirements. SmartVirtualization
and SmartMigration can migrate customer data to OceanStor Dorado V3 online without
interrupting host services.
For more information, see the OceanStor Dorado V3 Series V300R002 SmartVirtualization
Feature Guide.

4.6 SmartMigration (Intelligent Data Migration)


OceanStor Dorado V3 provides intelligent data migration based on LUNs. Data on a source
LUN can be completely migrated to a target LUN without interrupting ongoing services.
SmartMigration also supports data migration between a Huawei storage system and a
compatible heterogeneous storage system.
When the system receives new data during migration, it writes the new data to both the source
and target LUNs simultaneously and records data change logs (DCLs) to ensure data
consistency. After the migration is complete, the source and target LUNs exchange
information to allow the target LUN to take over services.
SmartMigration is implemented in two stages:

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 44


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

1. Data synchronization
a. Before migration, you must configure the source and target LUNs.
b. When migration starts, the source LUN replicates data to the target LUN.
c. During migration, the host can still access the source LUN. When the host writes
data to the source LUN, the system records the DCL.
d. The system writes the incoming data to both the source and target LUNs.
 If writing to both LUNs is successful, the system clears the record in the DCL.
 If writing to the target LUN fails, the storage system identifies the data that
failed to be synchronized according to the DCL and then copies the data to the
target LUN. After the data is copied, the storage system returns a write success
to the host.
 If writing to the source LUN fails, the system returns a write failure to notify
the host to re-send the data. Upon reception, the system only writes the data to
the source LUN.
2. LUN information exchange
After data replication is complete, host I/Os are suspended temporarily. The source and
target LUN exchanges information as follows:
a. Before LUN information is exchanged, the host uses the source LUN ID to identify
the source LUN. Because of the mapping relationship between the source LUN ID
and the source data volume ID used to identify physical space, the host can read the
physical space information about the source LUN. The mapping relationship also
exists between the target LUN ID and target data volume ID.
b. In LUN information exchange, the source and target LUN IDs remain unchanged
but the data volume IDs of the source and target LUNs are exchanged. This creates
a new mapping relationship between the source LUN ID and target data volume ID.
c. After the exchange, the host can still identify the source LUN using the source LUN
ID but reads physical space information about the target LUN due to the new
mapping relationship.
LUN information exchange is completed instantaneously, which does not interrupt
services.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 45


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

Figure 4-7 LUN information exchange

Application Scenarios
 Storage system upgrade with SmartVirtualization
SmartMigration works with SmartVirtualization to migrate data from legacy storage
systems (from Huawei or other vendors) to new Huawei storage systems to improve
service performance and data reliability.
 Data migration for capacity, performance, and reliability adjustments
For more information, see the OceanStor Dorado V3 Series V300R002 SmartMigration
Feature Guide.

4.7 SmartMulti-Tenant for File (Multi-tenancy)


SmartMulti-Tenant allows the creation of multiple virtual storage systems (vStores) in a
physical storage system. vStores can share the same storage hardware resources in a
multi-protocol unified storage architecture, without affecting the data security or privacy of
each other.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 46


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

SmartMulti-Tenant implements management, network, and resource isolation, which prevents


data access between vStores and ensures security.

Figure 4-8 Logical architecture of SmartMulti-Tenant

 Management isolation
Each vStore has its own administrator. vStore administrators can only configure and
manage their own storage resources through the GUI or RESTful API. vStore
administrators support role-based permission control. When being created, a vStore
administrator is assigned a role specific to its permissions.
 Service isolation
Each vStore has its own file systems, users, user groups, shares, and exports. Users can
only access file systems belonging to the vStore through logical interfaces (LIFs).
Service isolation includes: service data isolation (covering file systems, quotas, and
snapshots), service access isolation, and service configuration isolation (typically for
NAS protocol configuration).
− Service data isolation
System administrators assign different file systems to different vStores, thereby
achieving file system isolation. File system quotas and snapshots are isolated in the
same way.
− Service access isolation
Each vStore has its own NAS protocol instances, including the SMB service, NFS
service, and NDMP service.
− Service configuration isolation

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 47


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

Each vStore can have its own users, user groups, user mapping rules, security
policies, SMB shares, NFS shares, AD domain, DNS service, LDAP service, and
NIS service.
 Network isolation
VLANs and LIFs are used to isolate the vStore network, preventing illegal host access to
vStore's storage resources.
vStores use LIFs to configure services. A LIF belongs only to one vStore to achieve
logical port isolation. You can create LIFs from GE ports, 10GE ports, bond ports, or
VLANs.

4.8 SmartQuota for File (Quota)


In a NAS file service environment, resources are provided to departments, organizations, and
individuals as shared directories. Because each department or person has unique resource
requirements or limitations, storage systems must allocate and restrict resources, based on the
shared directories, in a customized manner. SmartQuota can restrict and control resource
consumption for directories, users, and user groups, perfectly tackling all of these challenges.
SmartQuota allows you to configure the following quotas:
 Space soft quota
Specifies a soft space limit. If any new data writes are performed and would result in this
limit being exceeded, the storage system reports an alarm. This alarm indicates that space
is insufficient and asks the user to delete unnecessary files or expand the quota. The user
can still continue to write data to the directory.
 Space hard quota
Specifies a hard space limit. If any new data writes are performed and would result in
this limit being exceeded, the storage system prevents the writes and reports an error.
 File soft quota
Specifies a soft limit on the file quantity. If the number of used files exceeds this limit,
the storage system reports an alarm. This alarm indicates that the file resources are
insufficient and asks the user to delete unnecessary files or expand the quota. The user
can still continue to create files or directories.
 File hard quota
Specifies a hard limit on the file quantity. If the number of used files for a quota exceeds
this limit, the storage system prevents the creation of new files or directories and reports
an error.
SmartQuota employs space and file hard quotas to restrict the maximum number of resources
available to each user. The process is as follows:
1. In each write I/O operation, SmartQuota checks whether the accumulated quota (Quotas
of the used space and file quantity + Quotas of the increased space and file quantity in
this operation) exceeds the preset hard quota.
− If yes, the write I/O operation fails.
− If no, follow-up operations can be performed.
2. After the write I/O operation is allowed, SmartQuota adds an incremental amount of
space and number of files to the previously used amount of space and number of files.
This is done separately.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 48


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 4 Smart Series Features

3. SmartQuota updates the quota (used amount of space and number of files + incremental
amount of space and number of files) and allows the quota and I/O data to be written into
the file system.
The I/O operation and quota update succeed or fail at the same time, ensuring that the used
capacity is correct during each I/O check.

If the directory quota, user quota, and group quota are concurrently configured in a shared directory in
which you are performing operations, each write I/O operation will be restricted by these three quotas.
All types of quota are checked. If the hard quota of one type of quota does not pass the check, the I/O
will be rejected.

SmartQuota does the following to clear alarms: When the used resource of a user is lower
than 90% of the soft quota, SmartQuota clears the resource over-usage alarm. In this way,
even though the used resource is slightly higher or lower than the soft quota, alarms are not
frequently generated or cleared.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 49


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

5 Hyper Series Features

5.1 HyperSnap (Snapshot)


5.2 HyperCDP (Continuous Data Protection)
5.3 HyperCopy (Copy)
5.4 HyperClone (Clone)
5.5 HyperReplication (Remote Replication)
5.6 HyperMetro (Active-Active Layout)
5.7 3DC for Block (Geo-Redundancy)
5.8 HyperVault for File (All-in-One Backup)
5.9 HyperLock for File (WORM)

5.1 HyperSnap (Snapshot)


5.1.1 HyperSnap for Block
Generally, snapshot is implemented using copy-on-write (COW) or ROW technology. COW
must reserve write space for snapshots. When the snapshot data is modified for the first time,
COW must copy the original data to the reserved space, which affects write performance of
hosts.
OceanStor Dorado V3 implements lossless snapshot using ROW. When snapshot data is
changed, OceanStor Dorado V3 writes new data to new locations and does not need to copy
the old data, reducing system I/O overhead. This prevents the performance deterioration
caused by COW.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 50


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-1 ROW snapshot principle

In Figure 5-1, both the source LUN and snapshot use a mapping table to access the physical
space. The original data in the source LUN is ABCDE and is saved in sequence in the
physical space. The metadata of the snapshot is null. All read requests to the snapshot are
redirected to the source LUN.
 When the source LUN receives a write request that changes C to F, the new data is
written into a new physical space P5 instead of being overwritten in P2.
 In the mapping metadata of the source LUN, the system changes L2->P2 to L2->P5.
 If the snapshot must be modified, for example, A corresponding to L0 must be changed
to G, the system first writes G to P6 and then changes L0->P0 in the snapshot mapping
table to L0->P6. Data in the source LUN is changed to ABFDE and data in the snapshot
is changed to GBCDE.
HyperSnap implements writable snapshots by default. All snapshots are readable and writable,
and support snapshot copies and cascading snapshots. You can create a read-only copy of a
snapshot at a specific point in time, or leverage snapshot cascading to create child snapshots
for a parent snapshot. For multi-level cascading snapshots that share a source volume, they
can roll back each other and the source volume regardless of their cascading levels. This is
called cross-level rollback.
In Figure 5-2, Snapshot1 is created for the source volume at 9:00, and Snapshot1.snapshot0
is a cascading snapshot of Snapshot1 at 10:00. The system can roll back the source volume
using Snapshot1.snapshot0 or Snapshot1, or roll back Snapshot1 using
Snapshot1.snapshot0.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 51


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-2 Cascading snapshot and cross-level rollback

HyperSnap supports timed snapshots, which can be triggered weekly, daily, or at a custom
interval (with a minimum interval of 30 seconds). The system supports multiple schedules and
each schedule can have multiple source LUNs. Snapshots of the sources LUNs that share a
schedule are in the same consistency group.

Figure 5-3 Configuring timing snapshot

HyperSnap supports snapshot consistency groups. For LUNs that are dependent on each other,
you can create a snapshot consistency group for these LUNs to ensure data consistency. For
example, the data files, configuration files, and logs of an Oracle database are usually saved
on different LUNs. Snapshots for these LUNs must be created at the same time to guarantee
that the snapshot data is consistent in time.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 52


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-4 Snapshot consistency group

5.1.2 HyperSnap for File


OceanStor Dorado NAS uses HyperSnap to quickly generate a consistent image, that is, a
duplicate, for a source file system at a certain point in time without interrupting services
running on the source file system. This duplicate is available immediately after being
generated, and reading or writing the duplicate does not impact the data on the source file
system. HyperSnap helps with online backups, data analysis, and application testing.
HyperSnap can:
 Create file system snapshots and back up these snapshots to tapes.
 Provide data backups of the source file system so that end users can restore accidentally
deleted files.
 Work together with HyperReplication and HyperVault for remote replication and backup.
HyperSnap works based on ROW file systems. In a ROW file system, new or modified data
does not overwrite the original data but instead is written to newly allocated storage space.
This ensures enhanced data reliability and high file system scalability. ROW-based
HyperSnap, used for file systems, can create snapshots in seconds. The snapshot data does not
occupy any additional disk space unless the source files are deleted or modified.

Technical Highlights
 Zero-duration backup window
A backup window refers to the maximum backup duration tolerated by applications
before data is lost. Traditional backup deteriorates file system performance, or can even
interrupt ongoing applications. Therefore, a traditional backup task can only be executed
after applications are stopped or if the workload is comparatively light. HyperSnap can
back up data online, and requires a backup window that takes almost zero time and does
not interrupt services.
 Snapshot creation within seconds
To create a snapshot for a file system, only the root node of the file system needs to be
copied and stored in caches and protected against power failure. This reduces the
snapshot creation time to seconds.
 Reduced performance loss
HyperSnap makes it easy to create snapshots for file systems. Only a small amount of
data needs to be stored on disks. After a snapshot is created, the system checks whether
data is protected by a snapshot before releasing the data space. If the data is protected by
a snapshot, the system records the space of the data block that is protected by the
snapshot but is deleted by the file system. This results in a negligible impact on system
performance. Background data space reclamation contends some CPU and memory

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 53


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

resources against file system services only when the snapshot is deleted. However,
performance loss remains low.
 Less occupied disk capacity
The file system space occupied by a snapshot (a consistent duplicate) of the source file
system depends on the amount of data that changed after the snapshot was generated.
This space never exceeds the size of the file system at the snapshot point in time. For a
file system with little changed data, only a small storage space is required to generate a
consistent duplicate of the file system.
 Rapid snapshot data access
A file system snapshot is presented in the root directory of the file system as an
independent directory. Users can access this directory to quickly access the snapshot data.
If snapshot rollback is not required, users can easily access the data at the snapshot point
in time. Users can also recover data by copying the file or directory if the file data in the
file system is corrupted.
If using a Windows client to access a CIFS-based file system, a user can restore a file or
folder to the state at a specific snapshot point in time. To be specific, a user can
right-click the desired file or folder, choose Restore previous versions from the
short-cut menu, and select one option for restoration from the displayed list of available
snapshots containing the previous versions of the file or folder.
 Quick file system rollback
Backup data generated by traditional offline backup tasks cannot be read online. A
time-consuming data recovery process is inevitable before a usable duplicate of the
source data at the backup point in time is available. HyperSnap can directly replace the
file system root with specific snapshot root and clear cached data to quickly roll the file
system back to a specific snapshot point in time.
You must exercise caution when using the rollback function because snapshots created
after the rollback point in time are automatically deleted after a file system rollback
succeeds.
 Continuous data protection by timing snapshots
HyperSnap enables users to configure policies to automatically create snapshots at
specific time points or at specific intervals.
The maximum number of snapshots for a file system varies depending on the product
model. If the upper limit is exceeded, the earliest snapshots are automatically deleted.
The file system also allows users to periodically delete snapshots.
As time elapses, snapshots are generated at multiple points, implementing continuous
data protection at a low cost. It must be noted that snapshot technology cannot achieve
real continuous data protection. The interval between two snapshots determines the
granularity of continuous data protection.

5.2 HyperCDP (Continuous Data Protection)


HyperCDP allows OceanStor Dorado V3 to generate high-density snapshots, which are also
called HyperCDP objects. The minimum interval of HyperCDP objects is 10 seconds, which
ensures continuous data protection and reduces the RPO. HyperCDP is based on the lossless
snapshot technology (multi-time-point and ROW). Each HyperCDP object matches a time
point of the source LUN. Dorado V3 supports HyperCDP schedules to meet customers'
backup requirements.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 54


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-5 HyperCDP snapshot principles

Technical Highlights:
 Continuous protection, lossless performance
HyperCDP provides data protection at an interval of seconds, with zero impact on
performance and small space occupation.
 Support for scheduled tasks
You can specify HyperCDP schedules by day, week, month, or specific interval, meeting
different backup requirements.

Figure 5-6 HyperCDP schedule

 Intensive and persistent data protection


A single LUN supports 60,000 HyperCDP objects. The minimum interval is 10 seconds.
At this setting, continuous protection can be achieved for data within a week.
 Support for consistency groups

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 55


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

In database applications, the data, configuration files, and logs are usually saved on
different LUNs. The HyperCDP consistency group ensures data consistency between
these LUNs during restoration.
 HyperCDP duplicate for reads and writes
Hosts cannot read or write HyperCDP objects directly. To read a HyperCDP object, you
can create a duplicate for it and then map the duplicate to the host. The duplicate has the
same data as the source HyperCDP object and can be read and written by the host. In
addition, the duplicate can be rebuilt by a HyperCDP object at any time point to obtain
the data at that time.
There are some restrictions when HyperCDP is used with other features of OceanStor
DoradoV3.

Table 5-1 Restrictions of HyperCDP used with other features

Feature Restriction
HyperSnap  Source LUNs of HyperSnap can be used as the source LUNs of
HyperCDP, but snapshot LUNs of HyperSnap cannot be used as
the source LUNs of HyperCDP.
 HyperCDP objects cannot be used as the source LUNs of
HyperSnap.
HyperMetro  Member LUNs of HyperMetro can be used as the source LUNs
of HyperCDP, but HyperCDP objects cannot be used as member
LUNs of HyperMetro.
 HyperCDP rollback cannot be performed during HyperMetro
synchronization.
HyperReplication  The primary and secondary LUNs of HyperReplication can be
used as the source LUNs of HyperCDP, but HyperCDP objects
cannot be used as the primary or secondary LUNs of
HyperReplication.
 HyperCDP rollback cannot be performed during
HyperReplication synchronization.
SmartMigration Source LUNs of HyperCDP and HyperCDP objects cannot be used
as the source or target LUNs of SmartMigration.
HyperClone Source LUNs of HyperClone can be used as the source LUNs of
HyperCDP. Before clone LUNs are split, they cannot be used as the
source LUNs of HyperCDP.
SmartVirtualization Heterogeneous LUNs cannot be used as the source LUNs of
HyperCDP.

5.3 HyperCopy (Copy)


OceanStor Dorado V300R002 supports HyperCopy, which allows the system to create a
complete physical copy of the source LUN's data on the target LUN. The source and target
LUNs that form a HyperCopy pair must have the same capacity. The target LUN can either be

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 56


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

empty or have existing data. If the target LUN has data, the data will be overwritten by the
source LUN during synchronization. After the HyperCopy pair is created, you can
synchronize data. During the synchronization, the target LUN can be read and written
immediately. HyperCopy supports consistency groups, incremental synchronization,
incremental restoration, providing full backup for source LUNs. HyperCopy allows data copy
between controllers, but does not support copy between different arrays.
HyperCopy is typically applied to:
 Data backup and restoration
 Data analysis
 Data reproduction

Data Synchronization After HyperCopy Is Created


When data synchronization starts, the system generates an instant snapshot for the source
LUN, and then synchronizes the snapshot data to the target LUN. Any subsequent write
operations are recorded in a differential table. When synchronization is performed again, the
system compares the data of the source and target LUNs, and only synchronizes the
differential data to the target LUN. The data written to the target LUN between the two
synchronizations will be overwritten. To retain the existing data on the target LUN, you can
create a snapshot for it before synchronization.
The following figure illustrates the synchronization principle.

Figure 5-7 Data synchronization from the source LUN to the target LUN

Restoration
If the source LUN is damaged, data on the target LUN can be restored to the source LUN.
Restoration also supports full and incremental data synchronization. When restoration starts,
the system generates a snapshot for the target LUN and synchronizes the snapshot data to the
source LUN. For incremental restoration, the system compares the data of the source and
target LUNs, and only synchronizes the differential data.
The following figure illustrates the restoration principle.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 57


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-8 Restoration from the target LUN to the source LUN

Immediate Read and Write


Read and write I/Os are processed in different ways when HyperCopy is or is not
synchronizing data.
 When HyperCopy is not synchronizing data:
The host reads and writes the source or target LUN directly.

Figure 5-9 Reads and writes when HyperCopy is not synchronizing data

 When HyperCopy is synchronizing data:


The host reads and writes the source LUN directly.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 58


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

For read operations on the target LUN, if the requested data is hit on the target LUN (the
data has been synchronized), the host reads the data from the target LUN. If the
requested data is not hit on the target LUN (the data has not been synchronized), the host
reads the data from the snapshot of the source LUN.
For write operations on the target LUN, if a data block has been synchronized before the
new data is written, the system overwrites this block. If a data block has not been
synchronized, the system writes the new data to this block and stops synchronizing the
source LUN's data to it. This ensures that the target LUN can be read and written before
the synchronization is complete.

Figure 5-10 Reads and writes when HyperCopy is synchronizing data

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 59


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Consistency Group
You can add multiple HyperCopy pairs to a consistency group. When you synchronize or
restore a consistency group, data on all member LUNs is always at a consistent point in time,
ensuring data integrity and availability.

5.4 HyperClone (Clone)


5.4.1 HyperClone for Block
HyperClone generates a complete physical data copy of the source LUN or snapshot, which
can be used for development and testing without affecting the source LUN or snapshot.
After a clone LUN is created, it immediately shares the data with the source LUN and can be
mapped to hosts for data access. You can split the clone LUN to stop data sharing with the
source LUN and obtain a full physical copy of data. Hosts can read and write the clone LUN
non-disruptively during and after the splitting. You can also cancel the splitting before it is
complete to reclaim the storage space occupied by the physical copy and retain data sharing
between the source and clone LUNs.
HyperClone is implemented based on snapshots. When a clone LUN is created, the system
creates a readable and writable snapshot of the source LUN. The source and clone LUNs
share data. When an application server reads data from the clone LUN, it actually reads the
source LUN's data.

Figure 5-11 Clone LUN's data before data changes

When an application server writes new data to the source or clone LUN, the storage system
leverages ROW, which allocates a new storage space for the new data instead of overwriting
the data in the existing storage space. As shown in Figure 5-12, when the application server
attempts to modify data block A in the source LUN, the storage pool allocates a new block
(A1) to store the new data, and retains the original block A. Similarly, when the application
server attempts to modify block D in the clone LUN, the storage pool allocates a new block
(D1) to store the new data, and retains the original block D.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 60


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-12 Clone LUN's data after data changes

When a clone LUN is split, the storage system copies the data that the clone LUN shares with
the source LUN to new data blocks, and retains the new data that has been written to the clone
LUN. After splitting, the association between the source and clone LUNs is canceled and the
clone LUN becomes an independent physical copy.

Figure 5-13 Clone LUN after splitting

OceanStor Dorado V3 supports consistent clones. For LUNs that are dependent on each other,
for example, LUNs that save the data files and logs of a database, you can create clones for
these LUNs' snapshots that were activated simultaneously to ensure data consistency between
the clones.
Both HyperClone and HyperCopy can create a complete copy of data. The following table
compares their similarities and differences.

Table 5-2 Comparison between HyperClone and HyperCopy

Item HyperClone HyperCopy


Copy type Clone LUN Copy relationship between the
source and target LUNs
Immediate Yes Yes
availability

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 61


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Item HyperClone HyperCopy


Synchronization No synchronization Full and incremental
mode synchronization and restoration
Consistency group Not supported Supported
To ensure consistency of clones,
you must create clones for
consistently activated snapshots of
the source LUNs.
Scope Clones cannot be created between Data copy can be performed
different controller pairs or between different controller
storage pools. pairs or storage pools.

5.4.2 HyperClone for File


HyperClone creates a clone file system, which is a copy, for a parent file system at a specified
point in time. Clone file systems can be shared to clients exclusively to meet the requirements
of rapid deployment, application tests, and DR drills.

Working Principle
A clone file system is a readable and writable copy taken from a point in time that is based on
redirect-on-write (ROW) and snapshot technologies.

Figure 5-14 Working principle of HyperClone for File

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 62


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

 As shown in Figure a, the storage system writes new or modified data onto the newly
allocated space of the ROW-based file system, instead of overwriting the original data.
The storage system records the point in time of each data write, indicating the write
sequence. The points in time are represented by serial numbers, in ascending order.
 As shown in Figure b, the storage system creates a clone file system as follows:
− Creates a read-only snapshot in the parent file system.
− Copies the root node of the snapshot to generate the root node of the clone file
system.
− Creates an initial snapshot in the clone file system.
This process is similar to the process of creating a read-only snapshot during which
no user data is copied. Snapshot creation can be completed in one or two seconds.
Before data is modified, the clone file system shares data with its parent file system.
 As shown in Figure c, modifying either the parent file system or the clone file system
does not affect the other system.
− When the application server modifies data block A of the parent file system, the
storage pool allocates new data block A1 to store new data. Data block A is not
released because it is protected by snapshots.
− When the application server modifies data block D of the clone file system, the
storage pool allocates new data block D1 to store new data. Data block D is not
released because its write time is earlier than the creation time of the clone file
system.
 Figure d shows the procedure for splitting a clone file system:
− Deletes all read-only snapshots from the clone file system.
− Traverses the data blocks of all objects in the clone file system, and allocates new
data blocks in the clone file system for the shared data by overwriting data. This
splits shared data.
− Deletes the associated snapshots from the parent file system.
After splitting is complete, the clone file system is independent of the parent file
system. The time required to split the clone file system depends on the size of the
share data.

Technical Highlights
 Rapid deployment
In most scenarios, a clone file system can be created in seconds and can be accessed
immediately after being created.
 Saved storage space
A clone file system shares data with its parent file system and occupies extra storage
space only when it modifies shared data.
 Effective performance assurance
HyperClone has a negligible impact on system performance because a clone file system
is created based on the snapshot of the parent file system.
 Splitting a clone file system
After a clone file system and its parent file system are split, they become completely
independent of each other.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 63


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

5.5 HyperReplication (Remote Replication)


5.5.1 HyperReplication/S for Block (Synchronous Remote
Replication)
OceanStor Dorado V3 supports synchronous remote replication between storage systems.
HyperReplication/S writes each host's write I/O to both the primary and secondary LUNs
concurrently and returns a write success acknowledgement to the host after the data is
successfully written to both LUNs.
The general principles are as follows:
1. After a remote replication relationship is established between the primary and secondary
LUNs, an initial synchronization is implemented to replicate all data from the primary
LUN to the secondary LUN.
2. If the primary LUN receives a write request from the host during initial synchronization,
the new data is written to both the primary and secondary LUNs.
3. After initial synchronization, data on the primary LUN is the same as that on the
secondary LUN.
The following shows how I/Os are processed in synchronous remote replication.
1. The primary site receives a write request from the host. HyperReplication logs the
address information instead of the data content.
2. The data of the write request is written to both the primary and secondary LUNs. If the
LUNs use write-back, the data will be written to the cache.
3. HyperReplication waits for the write results of the primary and secondary LUNs. If the
data has been successfully written to the primary and secondary LUNs,
HyperReplication deletes the log. Otherwise, HyperReplication retains the log, and the
data block enters the interrupted state and will be replicated in the next synchronization.
4. HyperReplication returns the write result of the primary LUN to the host.

Figure 5-15 I/O processing in synchronous remote replication

Technical Highlights
 Zero data loss
HyperReplication/S updates data in the primary and secondary LUNs simultaneously,
ensuring zero RPO.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 64


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

 Split mode
HyperReplication/S supports split mode, where write requests of production hosts go
only to the primary LUN, and the difference between the primary and secondary LUNs is
recorded by the differential log. If you want to resume data consistency between the
primary and secondary LUNs, you can manually start synchronization, during which
data blocks marked as differential in the differential log are copied from the primary
LUN to the secondary LUN. The I/O processing is similar to the initial synchronization.
This mode meets user requirements such as temporary link maintenance, network
bandwidth expansion, and saving data at a certain time on the secondary LUN.

 Quick response and recovery


HyperReplication/S immediately enters the Interrupted state in case of a system fault
such as a link down failure or I/O error due to faults of the primary or secondary LUN.
In the Interrupted state, I/Os are processed similarly to in split mode. That is, data is
written only to the primary LUN and the data difference is recorded. If the primary LUN
fails, it cannot receive I/O requests from the production host. After the fault is rectified,
the HyperReplication/S pair is recovered based on the specified recovery policy. If the
policy is automatic recovery, the pair automatically enters the Synchronizing state and
incremental data is copied to the secondary LUN. If the policy is manual recovery, the
pair enters the To Be Recovered state and must be manually synchronized. Incremental
synchronization greatly reduces the recovery time of HyperReplication/S.
 Writable secondary LUN
When the secondary LUN is split or disconnected, you can cancel the write protection
for the secondary LUN to receive data from the host.
The write protection for the secondary LUN can be canceled only when the following
two conditions are met:
− The remote replication pair is in the split or interrupted state.
− Data on the secondary LUN is consistent with that on the primary LUN (when data
on the secondary LUN is inconsistent, the data is unavailable, and the secondary
LUN cannot be set to writable).
This function is used in the following scenarios:
− You want to use the data on the secondary LUN for analysis and mining without
affecting services on the primary LUN.
− The production storage system at the primary site is faulty but the secondary site
fails to take over services due to a primary/secondary switchover failure or
communication failure.
OceanStor Dorado V3 can record the difference between the primary and secondary
LUNs after host data is written to the secondary LUN. After the production storage
system at the primary site recovers, you can perform incremental synchronization to
quickly switch services back.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 65


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

 Primary/secondary switchover
A primary/secondary switchover is the process where the primary and secondary LUNs
in a remote replication pair exchange roles.

Primary/secondary switchover depends on the secondary LUN' data status, which can be:
− Consistent: Data on the secondary LUN is a duplicate of the primary LUN's data at
the time when the last synchronization was performed. In this state, the secondary
LUN's data is available but not necessarily the same as the current data on the
primary LUN.
− Inconsistent: Data on the secondary LUN is not a duplicate of the primary LUN's
data at the time when the last synchronization was performed and, therefore, is
unavailable.
In the preceding figure, the primary LUN at the primary site becomes the new secondary
LUN after the switchover, and the secondary LUN at the secondary site becomes the new
primary LUN. After the new primary LUN is mapped to the standby hosts at the
secondary site (this can be performed in advance), the standby hosts can take over
services and issue new I/O requests to the new primary LUN. A primary/secondary
switchover can be performed only when data on the secondary LUN is consistent with
that on the primary LUN. Incremental synchronization is performed after a
primary/secondary switchover.
Note the following:
− When the pair is in the normal state, a primary/secondary switchover can be
performed.
− In the split state, a primary/secondary switchover can be performed only when the
secondary LUN is set to writable.
 Consistency group
Medium- and large-size databases' data, logs, and modification information are stored on
different LUNs. If data on one of these LUNs is unavailable, data on the other LUNs is
also invalid. Consistency between multiple remote replication pairs must be considered
when remote disaster recovery solutions are implemented on these LUNs.
HyperReplication/S uses consistency groups to maintain the same synchronization pace
between multiple remote replication pairs.
A consistency group is a collection of multiple remote replication pairs that ensures data
consistency when a host writes data to multiple LUNs on a single storage system. After
data is written to a consistency group at the primary site, all data in the consistency
group is simultaneously copied to the secondary LUNs to ensure data integrity and
availability at the secondary site.
HyperReplication/S allows you to add multiple remote replication pairs to a consistency
group. When you set writable secondary LUNs for a consistency group or perform

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 66


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

splitting, synchronization, or primary/secondary switchover, the operation applies to all


members in the consistency group. If a link fault occurs, all member pairs are interrupted
simultaneously. After the fault is rectified, data synchronization is performed again to
ensure availability of the data on the secondary storage system.

5.5.2 HyperReplication/A for Block (Asynchronous Remote


Replication)
OceanStor Dorado V3 supports asynchronous remote replication. After an asynchronous
remote replication pair is established between a primary LUN at the primary site and a
secondary LUN at the secondary site, initial synchronization is implemented. After the initial
synchronization, the data status of the secondary LUN becomes Synchronized or Consistent.
Then, I/Os are processed as follows:
1. The primary LUN receives a write request from a production host.
2. After data is written to the primary LUN, a write completion response is immediately
returned to the host.
3. Incremental data is automatically synchronized from the primary LUN to the secondary
LUN at the user-defined interval, which ranges from 3 seconds to 1,440 minutes. (If the
synchronization type is Manual, you must trigger synchronization manually.) Before
synchronization starts, a snapshot is generated for the primary and secondary LUNs
separately. The snapshot of the primary LUN ensures that the data read from the primary
LUN during the synchronization remains unchanged. The snapshot of the secondary
LUN backs up the secondary LUN's data in case an exception during synchronization
causes the data to become unavailable.
4. During the synchronization, data is read from the snapshot of the primary LUN and
copied to the secondary LUN. After the synchronization is complete, the snapshots of the
primary and secondary LUNs are deleted, and the next synchronization period starts.

Figure 5-16 Working principle of asynchronous remote replication

Technical Highlights
 Data compression

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 67


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Both Fibre Channel and IP links support data compression by using the LZ4 algorithm,
which can be enabled or disabled as required. Data compression reduces the bandwidth
required by asynchronous remote replication. In the testing of an Oracle OLTP
application with 100 Mbit/s bandwidth, data compression saves half of the bandwidth.
 Quick response to host requests
After a host writes data to the primary LUN at the primary site, the primary site
immediately returns a write success to the host before the data is written to the secondary
LUN. In addition, data is synchronized in the background, which does not affect access
to the primary LUN. HyperReplication/A does not synchronize incremental data from the
primary LUN to the secondary LUN in real time. Therefore, the amount of data loss
depends on the synchronization interval (ranging from 3 seconds to 1440 minutes; 30
seconds by default), which can be specified based on site requirements.
 Splitting, switchover of primary and secondary LUNs, and rapid fault recovery
HyperReplication/A supports splitting, synchronization, primary/secondary switchover,
and recovery after disconnection.
 Consistency group
Consistency groups apply to databases. Multiple LUNs, such as log LUNs and data
LUNs, can be added to a consistency group so that data on these LUNs is from a
consistent time in the case of periodic synchronization or fault. This facilitates data
recovery at the application layer.
 Interoperability with Huawei OceanStor converged storage systems
Developed on the OceanStor OS unified storage software platform, OceanStor Dorado
V3 is compatible with the replication protocols of all Huawei OceanStor converged
storage products. Remote replication can be created among different types of products to
construct a highly flexible disaster recovery solution.
 Support for fan-in
HyperReplication of OceanStor Dorado V3 supports data replication from 64 storage
devices to one storage device for central backup (64:1 replication ratio, which is four to
eight times that provided by other vendors). This implements disaster recovery resource
sharing and greatly reduces the disaster recovery cost.
 Support for cloud replication
OceanStor Dorado V3 supports CloudReplication, which works with Dedicated
Enterprise Storage Service (DESS) on HUAWEI CLOUD to constitute cloud DR
solutions. You can purchase HUAWEI CLOUD resources on demand to build your DR
centers without the need for on-premises equipment rooms or O&M teams, reducing
costs and improving efficiency.
For more information, see the OceanStor Dorado V3 Series V300R002 HyperReplication
Feature Guide.

5.5.3 HyperReplication/A for File (Asynchronous Remote


Replication)
HyperReplication/A supports the long-distance data disaster recovery of file systems. It copies
all content of a primary file system to the secondary file system. This implements remote
disaster recovery across data centers and minimizes the performance deterioration caused by
remote data transmission. HyperReplication/A also applies to file systems within a storage
system for local data disaster recovery, data backup, and data migration.
HyperReplication/A implements data replication based on the file system object layer, and
periodically synchronizes data between primary and secondary file systems. All data changes

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 68


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

made to the primary file system since the last synchronization will be synchronized to the
secondary file system.

Working Principle
 Object layer-based replication
HyperReplication/A implements data replication based on the object layer. The files,
directories, and file properties of file systems consist of objects. Object layer-based
replication copies objects from the primary file system to the secondary file system
without considering complex file-level information, such as dependency between files
and directories, and file operations, simplifying the replication process.
 Periodical replication based on ROW
HyperReplication/A implements data replication based on ROW snapshots.
− Periodic replication improves replication efficiency and bandwidth utilization.
During a replication period, the data that was written most recently is always copied.
For example, if data in the same file location is modified multiple times, the data
written last is copied.
− File systems and their snapshots employ ROW to process data writes. Regardless of
whether a file system has a snapshot, data is always written to the new address
space, and service performance will not decrease even if snapshots are created.
Therefore, HyperReplication/A has a slight impact on production service
performance.
Written data is periodically replicated to the secondary file system in the background.
Replication periods are defined by users. The addresses, rather than the content of incremental
data blocks in each period, are recorded. During each replication period, the secondary file
system is incomplete before all incremental data is completely transferred to the secondary
file system.
After the replication period ends and the secondary file system becomes a point of data
consistency, a snapshot is created for the secondary file system. If the next replication period
is interrupted because the production center malfunctions or the link goes down,
HyperReplication/A can restore the secondary file system data to the last snapshot point,
ensuring consistent data.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 69


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-17 Working principle of HyperReplication/A for File

1. The production storage system receives a write request from a production host.
2. The production storage system writes the new data to the primary file system and
immediately sends a write acknowledgement to the host.
3. When a replication period starts, HyperReplication/A creates a snapshot for the primary
file system.
4. The production storage system reads and replicates snapshot data to the secondary file
system based on the incremental information received since the last synchronization.
5. After incremental replication is complete, the content of the secondary file system is the
same as the snapshot of the primary file system. The secondary file system becomes the
point of data consistency.

Technical Highlights
 Splitting and incremental resynchronization
If you want to suspend data replication from the primary file system to the secondary file
system, you can split the remote replication pair. For HyperReplication/A, splitting will
stop the ongoing replication process and later periodic replication.
After splitting, if the host writes new data, the incremental information will be recorded.
You can start a synchronization session after splitting. During resynchronization, only
incremental data is replicated.
Splitting applies to device maintenance scenarios, such as storage array upgrades and
replication link changes. In such scenarios, splitting can reduce the number of concurrent
tasks so that the system becomes more reliable. The replication tasks will be resumed or
restarted after maintenance.
 Automatic recovery
If data replication from the primary file system to the secondary file system is interrupted
due to a fault, remote replication enters the interrupted state. If the host writes new data
when remote replication is in this state, the incremental information will be recorded.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 70


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

After the fault is rectified, remote replication is automatically recovered, and incremental
resynchronization is automatically implemented.
 Readable and writable secondary file system and incremental failback
Normally, a secondary file system is readable but not writable. When accessing a
secondary file system, the host reads the data on snapshots generated during the last
backup. After the next backup is completed, the host reads the data on the new snapshots.
A readable and writable secondary file system applies to scenarios in which backup data
must be accessed during replication.
You can set a secondary file system to readable and writable if the following conditions
are met:
− Initial synchronization has been implemented. For HyperReplication/A, data on the
secondary file system is in the complete state after initial synchronization.
− The remote replication pair is in the split or interrupted state.
If data is being replicated from the primary file system to the secondary file system
(the data is inconsistent on the primary and secondary file systems) and you set the
secondary file system to readable and writable, HyperReplication/A restores the
data in the secondary file system to the point in time at which the last snapshot was
taken.
After the secondary file system is set to readable and writable, HyperReplication/A
records the incremental information about data that the host writes to the secondary
file system for subsequent incremental resynchronization. After replication recovery,
you can replicate incremental data from the primary file system to the secondary
file system or from the secondary file system to the primary file system (a
primary/secondary switchover is required before synchronization). Before a
replication session starts, HyperReplication/A restores target end data to a point in
time at which a snapshot was taken and the data was consistent with source end data.
Then, HyperReplication/A performs incremental resynchronization from the source
end to the target end.
Readable and writable secondary file systems are commonly used in disaster
recovery scenarios.
 Primary/Secondary switchover
Primary/secondary switchover exchanges the roles of the primary and secondary file
systems. These roles determine the direction in which the data is copied. Data is always
copied from the primary file system to the secondary file system.
Primary/secondary switchover is commonly used for failback during disaster recovery.
 Quick response to host I/Os
All I/Os generated during file system asynchronous remote replication are processed in
the background. A write success acknowledgement is returned immediately after host
data is written to the cache. Incremental information is recorded and snapshots are
created only when data is flushed from cache to disks. Therefore, host I/Os can be
responded to quickly.

5.6 HyperMetro (Active-Active Layout)


5.6.1 HyperMetro for Block
HyperMetro, an array-level active-active technology provided by OceanStor Dorado V3,
enables two storage systems to work in active-active mode in two locations within 100 km

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 71


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

from each other, such as in the same equipment room or in the same city. HyperMetro
supports both Fibre Channel and IP networking (10GE). It allows two LUNs from separate
storage arrays to maintain real-time data consistency and to be accessible to hosts. If one
storage array fails, hosts automatically choose the path to the other storage array for service
access. If the links between storage arrays fail and only one storage array can be accessed by
hosts, the arbitration mechanism uses a quorum server deployed at a third location to
determine which storage array continues providing services.

Figure 5-18 Active-active arrays

Technical Features of HyperMetro


 Gateway-free active-active solution
Simple networking makes deployment easy. The gateway-free design improves
reliability and performance because there is one less possible failure point and the 0.5 ms
latency caused by a gateway is avoided.
 Active-active mode
Hosts in different data centers can read or write data in the same LUN simultaneously,
implementing load balancing across data centers.
 Site access optimization
UltraPath is optimized specifically for active-active scenarios. It can identify region
information to reduce cross-site access, reducing latency. UltraPath can read data from
the local or remote storage array. However, when the local storage array is working
properly, UltraPath preferentially reads data from and writes data to the local storage
array, preventing data read and write across data centers.
 FastWrite
In a common SCSI write process, a write request goes back and forth between two data
centers twice to complete two interactions, namely Write Alloc and Write Data.
FastWrite optimizes the storage transmission protocol and reserves cache space on the
destination array for receiving write requests. Write Alloc is omitted and only one
interaction is required. FastWrite halves the time required for data synchronization
between two arrays, improving the overall performance of the HyperMetro solution.
 Service granularity-based arbitration

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 72


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

If links between two sites fail, HyperMetro can enable some services to run
preferentially in data center A and others in data center B based on service configurations.
Compared with traditional arbitration where only one data center provides services,
HyperMetro improves resource usage of hosts and storage systems and balances service
loads. Service granularity-based arbitration is implemented based on LUNs or
consistency groups. Generally, a service belongs to only one LUN or consistency group.
 Automatic link quality adaptation
If multiple links exist between two data centers, HyperMetro automatically balances
loads among links based on the quality of each link. The system dynamically monitors
link quality and adjusts the load ratio of the links to reduce the retransmission ratio and
improve network performance.
 Compatibility with other features
HyperMetro can work with existing features such as HyperSnap, SmartThin,
SmartDedupe, and SmartCompression.
 Active and standby quorum servers
The quorum servers can be either physical or virtual machines. HyperMetro can have
two quorum servers working in active/standby mode to eliminate single point of failure
and guarantee service continuity.
 Expansion to 3DC
HyperMetro can work with HyperReplication/A to form a geo-redundant architecture.

5.6.2 HyperMetro for File


HyperMetro enables hosts to virtualize the file systems of two storage systems as a single file
system on a single storage system. In addition, HyperMetro keeps data in both of these file
systems consistent. Data is read from or written to the primary storage system, and is
synchronized to the secondary storage system in real time. If the primary storage system fails,
HyperMetro uses vStore to switch services to the secondary storage system, without losing
any data or interrupting any applications.
HyperMetro provides the following benefits:
 High availability with geographic protection
 Easy management
 Minimal risk of data loss, reduced system downtime, and quick disaster recovery
 Negligible disruption to users and client applications
HyperMetro supports both Fibre Channel and IP networking.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 73


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-19 Architecture of HyperMetro for File

Technical Highlights
 Gateway-free solution
With the gateway-free design, host I/O requests do not need to be forwarded by storage
gateway, avoiding corresponding I/O forwarding latency and gateway failures and
improving reliability. In addition, the design simplifies the cross-site high availability
(HA) network, making maintenance easier.
 Simple networking
The data replication, configuration synchronization, and heartbeat detection links share
the same network, simplifying the networking. Either IP or Fibre Channel links can be
used between storage systems, making it possible for HyperMetro to work on all-IP
networks, improving cost-effectiveness.
 vStore-based HyperMetro
Traditional cross-site HA solutions typically deploy cluster nodes at two sites to
implement cross-site HA. These solutions, however, have limited flexibility in resource
configuration and distribution. HyperMetro can establish pair relationships between two
vStores at different sites, implementing real-time mirroring of data and configurations.
Each vStore pair has an independent arbitration result, providing true cross-site HA
capabilities at the vStore level. HyperMetro also enables applications to run more
efficiently at two sites, ensuring better load balancing. A vStore pair includes a primary
vStore and a secondary vStore. If either of the storage systems in the HyperMetro
solution fail or if the links connecting them go down, HyperMetro implements
arbitration on a per vStore pair basis. Paired vStores are mutually redundant, maintaining
service continuity in the event of a storage system failure.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 74


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-20 vStore-based HyperMetro architecture

 Automatic recovery
If site A breaks down, site B becomes the primary site. Once site A recovers, HyperMetro
automatically initiates resynchronization. When resynchronization is complete, the
HyperMetro pair returns to its normal state. If site B then breaks down, site A becomes
the primary site again to maintain host services.
 Easy upgrade
To use the HyperMetro feature, upgrade your storage system software to the latest
version and purchase the required feature license. You can establish a HyperMetro
solution between the upgraded storage system and another storage system, without the
need for extra data migration. Users are free to include HyperMetro in initial
configurations or add it later as required.
 FastWrite
In a common SCSI write process, a write request goes back and forth twice between two
data centers to complete two interactions, Write Alloc and Write Data. FastWrite
optimizes the storage transmission protocol and reserves cache space on the destination
array for receiving write requests, while Write Alloc is omitted and only one interaction
is required. FastWrite halves the time required for data synchronization between two
arrays, improving the overall performance of the HyperMetro solution.
 Self-adaptation to link quality
If there are multiple links between two data centers, HyperMetro automatically
implements load balancing among these links based on quality. The system dynamically
monitors link quality and adjusts the load ratio between links to minimize the
retransmission rate and improve network performance.
 Compatibility with other features

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 75


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

HyperMetro can be used with SmartThin, SmartQoS, and SmartCache. HyperMetro can
also work with HyperVault, HyperSnap, and HyperReplication to form a more complex
and advanced data protection solution, such as the Disaster Recovery Data Center
Solution (Geo-Redundant Mode), which uses HyperMetro and HyperReplication.
 Dual quorum servers
HyperMetro supports dual quorum servers. If one quorum server fails, its services are
seamlessly switched to the other, preventing a single point of failure (SPOF) and
improving the reliability of the HyperMetro solution.

5.7 3DC for Block (Geo-Redundancy)


3DC supports flexible networking using HyperMetro, synchronous remote replication, and
asynchronous remote replication, including:
 Cascading network in synchronous + asynchronous mode
 Parallel network in synchronous + asynchronous mode
 Cascading network in asynchronous + asynchronous mode
 Parallel network in asynchronous + asynchronous mode
 Star network in synchronous + asynchronous mode
 Star network in HyperMetro + asynchronous mode

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 76


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-21 3DC networking

Technical Highlights:
 Two HyperMetro or synchronous remote replication sites can be flexibly expanded to
3DC without requiring external gateways.
 In the star topology, only incremental synchronization is required in the event of any site
failure.
 The star topology supports centralized configuration and management at a single site.

5.8 HyperVault for File (All-in-One Backup)


OceanStor Dorado NAS provides an all-in-one backup feature called HyperVault to
implement file system data backup and recovery within or between storage systems.
HyperVault can work in either of the following modes:
 Local backup
Data backup within a storage system. HyperVault works with HyperSnap to periodically
back up a file system, generate backup copies, and retain these copies based on
user-configured policies. By default, five backup copies are retained for a file system.
 Remote backup

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 77


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Data backup between storage systems. HyperVault works with HyperReplication to


periodically back up a file system. The process is as follows:
a. A backup snapshot is created for the primary storage system.
b. The incremental data between the backup snapshot and its previous snapshot is
synchronized to the secondary storage system.
c. After data is synchronized, a snapshot is created on the secondary storage system.
By default, 35 snapshots can be retained on the backup storage system.

Technical Highlights
 High cost efficiency
HyperVault can be seamlessly integrated into the primary storage system and provide
data backup without additional backup software. Huawei-developed storage management
software, OceanStor DeviceManager, allows you to configure flexible backup policies
and efficiently perform data backup.
 Fast data backup
HyperVault works with HyperSnap to achieve second-level local data backup. For
remote backup, the system performs full backup the first time, and then only backs up
incremental data blocks. This allows HyperVault to provide faster data backup than
software that backs up data every time.
 Fast data recovery
HyperVault uses snapshot rollback technology to implement local data recovery, without
requiring additional data resolution. This allows it to achieve second-level data recovery.
Remote recovery, which is incremental data recovery, can be used when local recovery
cannot meet requirements. Each copy of backup data is a logically full backup of service
data. The backup data is saved in its original format and can be accessed immediately.
 Simple management
Only one primary storage system, one backup storage system, and native management
software, OceanStor DeviceManager, are required. This mode is simpler and easier to
manage than old network designs, which contain primary storage, backup software, and
backup media.

5.9 HyperLock for File (WORM)


With the explosive growth of information, increasing importance has been pinned on secure
access and application. To comply with laws and regulations, important data such as case
documents from courts, medical records, and financial documents can be read but not written
within a specific period. Therefore, measures must be taken to prevent such data from being
tampered with. In the storage industry, Write Once Read Many (WORM) is the most common
method used to archive and back up data, ensure secure data access, and prevent data
tampering.
Huawei's WORM feature is called HyperLock. A file protected by WORM can enter the
read-only state immediately after data is written to it. In the read-only state, the file can be
read, but cannot be deleted, modified, or renamed. WORM can prevent data from being
tampered with, meeting the data security requirements of enterprises and organizations.
File systems that WORM has been configured for are called WORM file systems and can
only be configured by administrators. There are two WORM modes:

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 78


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

 Regulatory Compliance WORM (WORM-C for short): applies to archive scenarios


where data protection mechanisms are implemented to comply with laws and
regulations.
 Enterprise WORM (WORM-E): mainly used by enterprises for internal control.

Working Principle
With WORM, data can be written to files once only, and cannot be rewritten, modified,
deleted, or renamed. If a common file system is protected by WORM, files in the WORM file
system can be read only within the protection period. After WORM file systems are created,
they must be mapped to application servers using the NFS or CIFS protocol.
WORM enables files in a WORM file system to be shifted between initial state, locked state,
appending state, and expired state, preventing important data from being tampered with
within a specified period. Figure 5-22 shows how a file shifts from one state to another.

Figure 5-22 File state shifting

1. Initial to locked: A file can be shifted from the initial state to the locked state using the
following methods:
− If the automatic lock mode is enabled, the file automatically enters the locked state
after a change is made and a specific period of time expires.
− You can manually set the file to the locked state. Before locking the file, you can
specify a protection period for the file or use the default protection period.
2. Locked to locked: In the locked state, you can manually extend the protection periods of
files. Protection periods cannot be shortened.
3. Locked to expired: After the WORM file system compliance clock reaches the file
overdue time, the file shifts from the locked state to the expired state.
4. Expired to locked: You can extend the protection period of a file to shift it from the
expired state to the locked state.
5. Locked to appending: You can delete the read-only permission of a file to shift it from
the locked state to the appending state.
6. Appending to locked: You can manually set a file in the appending state to the locked
state to ensure that it cannot be modified.
7. Expired to appending: You can manually set a file in the expired state to the appending
state.
You can save files to WORM file systems and set the WORM properties of the files to the
locked state based on service requirements. Figure 5-23 shows the reads and writes of files in
all states in a WORM file system.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 79


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 5 Hyper Series Features

Figure 5-23 Read and write of files in a WORM file system

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 80


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 6 Cloud Series Features

6 Cloud Series Features

6.1 CloudReplication (Cloud Replication)


6.2 CloudBackup (Cloud Backup)

6.1 CloudReplication (Cloud Replication)


OceanStor Dorado V3 supports CloudReplication, which works with Dedicated Enterprise
Storage Service (DESS) on HUAWEI CLOUD to constitute cloud DR solutions. You can
purchase HUAWEI CLOUD resources on demand to build your DR centers without the need
for on-premises equipment rooms or O&M teams, reducing costs and improving efficiency.
Dorado5000 V3 can serve as the DESS array on the cloud. The CloudReplication license and
DESS authentication license must be installed.
When used as on-premises arrays, all the Dorado models support interconnection with the
cloud. The CloudReplication license must be installed. In addition, CloudReplication also
supports the OceanStor V5 series converged storage systems.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 81


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 6 Cloud Series Features

Figure 6-1 CloudReplication architecture

Technical Highlights:
 Data is replicated to the cloud in asynchronous mode. CloudReplication inherits all
functions of HyperReplication/A.
 DESS supports interconnection with OceanStor converged storage systems.
 No on-premises DR center or O&M team is required. Cloud DR resources can be
purchased or expanded on demand.
Application Scenarios:
 If you only have a production center, you can set up a remote DR center on HUAWEI
CLOUD at a low cost, implementing remote protection for production data.
 If you have a production center and a DR center, you can upgrade the protection level to
3DC with a remote DR center on HUAWEI CLOUD.

6.2 CloudBackup (Cloud Backup)


CloudBackup of OceanStor Dorado V3 allows the system to back up LUNs or LUN
consistency groups to the public cloud or local NAS or object storage. Based on the Cloud
Server Backup Service (CSBS) of HUAWEI CLOUD, quick recovery from the cloud is
supported, with no need for backup servers on the cloud.
Remote data backup and recovery on the cloud or local data center can be implemented
without external backup servers, simplifying backup solutions and reducing the purchase and
maintenance costs.
The local NAS devices supported by CloudBackup include Huawei OceanStor 9000,
OceanStor V3/V5 series, FusionStorage, and OceanStor 9000 Object. The public cloud
storage supported by CloudBackup includes HUAWEI CLOUD Object Storage Service (OBS)
and AWS S3.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 82


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 6 Cloud Series Features

Figure 6-2 Typical CloudBackup networking

CloudBackup supports:
 LUN backup
 Consistency group backup
 Data restoration to the source LUN or other existing LUNs
 Data restoration to the source LUN consistency group or other existing LUN consistency
groups
 Backup data compression, which reduces the required backup bandwidth and backup
storage space
 Resumable data transfer. If a network fault occurs during backup to the cloud, data
transfer can be resumed once the network recovers.
 Offline backup based on the Data Express Service (DES) of HUAWEI CLOUD. Data is
first backed up to the Teleport device of DES. Then the Teleport device is transported to
the nearest data center of HUAWEI CLOUD, where the data is imported to the specified
OBS S3 buckets. This improves data transfer efficiency for the initial backup, and only
incremental backups are required in subsequent operations.
Backup data flow and principles:

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 83


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 6 Cloud Series Features

1. The system creates a read-only snapshot for the LUN or consistency group to be backed
up.
2. CloudBackup reads data from the read-only snapshot and transfers it to the specified
local NAS share or object storage on the public cloud. If the backup source is a
consistency group, CloudBackup reads data from each read-only snapshot in the
consistency group.
When CloudBackup is reading data, it compares the data with that of the read-only
snapshot created in the last backup, and only transfers the differential data to the backup
storage.
Restoration data flow and principles:

1. Select the desired backup image from the local NAS share or public cloud. (The data set
generated when a LUN or consistency group is backed up is a backup image. A LUN or
consistency group has multiple backup images at different time points.)
2. Select the LUN or consistency group to be restored.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 84


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 6 Cloud Series Features

3. Restore the data. During restoration, CloudBackup reads data from the specified backup
image on the local NAS share or public cloud and writes the data to the LUN or
consistency group.
Technical Highlights:
 Data can be backed up without purchasing external backup servers.
 Backup to the cloud is achieved. With BCManager and CSBS, data can be quickly
recovered, and customers can perform tests and analysis on source LUNs' data on the
cloud.
 Data can be backed up to local NAS and object storage.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 85


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 7 System Security and Data Encryption

7 System Security and Data Encryption

7.1 Data Encryption


7.2 Role-based Access Control

7.1 Data Encryption


OceanStor Dorado V3 can work with self-encrypting drives (SEDs) and Internal Key
Manager to implement static data encryption and ensure data security.

Internal Key Manager


Internal Key Manager is OceanStor Dorado V3's built-in key management system. It
generates, updates, backs up, restores, and destroys keys, and provides hierarchical key
protection. Internal Key Manager is easy to deploy, configure, and manage. It is
recommended if certification is not required and the key management system is only used by
the storage systems in a data center.

SED
SEDs provide two-layer security protection by using an authentication key (AK) and a data
encryption key (DEK).
 An AK authenticates the identity in disk initialization.
 A DEK encrypts and decrypts data in the event of writing data to or reading data from
SEDs.
AK mechanism: After data encryption has been enabled, the storage system activates the
AutoLock function of SEDs and uses AKs assigned by a key manager. SED access is
protected by AutoLock and only the storage system itself can access its SEDs. When the
storage system accesses an SED, it acquires an AK from the key manager. If the AK is
consistent with the SED's, the SED decrypts the DEK for data encryption/decryption. If the
AKs do not match, all read and write operations will fail.
DEK mechanism: After the AutoLock authentication is successful, the SED uses its hardware
circuits and internal DEK to encrypt or decrypt the data that is written or read. When you
write data, the data is encrypted by the DEK of the AES encryption engine into ciphertext,
and then written to the system. When you read data, the system decrypts the requested data
into plaintext using the DEK. The DEK cannot be acquired separately, which means that the

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 86


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 7 System Security and Data Encryption

original information on an SED cannot be read directly after it is removed from the storage
system.

7.2 Role-based Access Control


OceanStor Dorado V3 supports role-based access control to authenticate users. Roles can be
classified into default and user-defined ones.
 Default roles

Table 7-1 Default roles and permission

Default Role Permission


Super administrator Has all permissions of the system.
Administrator Has all permissions except user management and security
configuration permissions.
Security administrator Has the security configuration permission, including security
rule management, audit management, and KMC
management.
Network administrator Has the network management permission, including
management on physical ports, logical ports, VLANs, and
failover groups.

SAN resource Has the SAN resource management permission, including


administrator management on storage pools, LUNs, mapping views, hosts,
and ports.
Data protection Has the data protection management permission, including
administrator management on local data protection, remote data protection,
and HyperMetro schemes.
Backup administrator Has the data backup management permission, including
management on local data and mapping views.

 User-defined roles: The system allows you to define permissions as required. You can
specify the role when creating a user account.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 87


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 8 System Management and Compatibility

8 System Management and Compatibility

8.1 System Management


8.2 Ecosystem and Compatibility

8.1 System Management


OceanStor Dorado V3 provides device management interfaces and integrated northbound
management interfaces. Device management interfaces include a graphic management
interface DeviceManager and a command-line interface (CLI). Northbound interfaces are
RESTful interfaces, supporting SMI-S, SNMP, evaluation tools, and third-party network
management plug-ins. For details, refer to the compatibility list of OceanStor Dorado V3.

8.1.1 DeviceManager
DeviceManager is a common GUI management system for Huawei OceanStor systems and
accessed through a web page. The GUI uses HTTP to communicate with Dorado V3. Most
system operations can be executed on DeviceManager, but certain operations must be run in
the CLI.

8.1.2 CLI
The CLI allows administrators and other system users to perform supported operations. You
can define key-based SSH user access permission, allowing users to compile scripts on a
remote host. You are not required to save the passwords in the scripts and log in to the CLI
remotely.

8.1.3 Call Home Service


In traditional service support mode, technical support personnel provide local services
manually. Faults may not be detected quickly and information may not be communicated
correctly. Call Home is a remote maintenance expert system. Using the secure and
controllable network connections between devices and Huawei technical support centers, Call
Home enables Huawei to monitor the health status of customers' devices, 24/7. If a fault
occurs, the fault information is automatically and immediately sent to Huawei technical
support, shortening fault discovery and handling time.
After the built-in Call Home service is enabled on the DeviceManager, the pre-installed
eService Agent on devices periodically collects information and sends the information to

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 88


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 8 System Management and Compatibility

Huawei technical support. Customers must ensure that devices can be connected to Huawei
technical support over a network. HTTP proxy is supported.
The following information is collected:
 Device performance statistics
 Device running data
 Device alarm data
All data is sent to Huawei technical support in text mode over HTTPS. Records of sent
information can be sent to the Syslog server for security audit. If data cannot be uploaded due
to network interruption, devices can save the last day's data files (up to 5 MB per controller)
and send them when the network recovers. The files that are not uploaded can be exported for
troubleshooting by using the command line.
The information sent to Huawei technical support can be used to provide the following
functions.
 Alarm monitoring: Device alarms are monitored 24/7. If a fault occurs on a device,
Huawei technical support is notified within 1 minute and a troubleshooting ticket is
dispatched to engineers. This helps customers locate and resolve problems quickly.
 In conjunction with big data analysis technologies and device fault libraries across the
world, fault prevention and fast fault troubleshooting are supported.
 Based on industry application workload models, optimal device configurations and
performance optimization suggestions are provided.

8.1.4 RESTful API


RESTful APIs of OceanStor Dorado V3 allow system automation, development, query, and
allocation based on HTTPS interfaces. With RESTful APIs, you can use third-party
applications to control and manage arrays and develop flexible management solutions for
Dorado V3.

8.1.5 SNMP
SNMP interfaces can be used to report alarms and connect to northbound management
interfaces.

8.1.6 SMI-S
SMI-S interfaces support hardware and service configuration and connect to northbound
management interfaces.

8.1.7 Tools
OceanStor Dorado V3 provides diversified tools for pre-sales assessment and post-sales
delivery. These tools can be accessed through WEB, SmartKit, DeviceManager,
SystemReporter, and eService and effectively help deploy, monitor, analyze, and maintain
OceanStor Dorado V3.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 89


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 8 System Management and Compatibility

8.2 Ecosystem and Compatibility


8.2.1 Virtual Volume (VVol)
OceanStor Dorado V3 supports VVol 1.0, which includes new objects such as Protocol
Endpoint (PE) LUN, VVol, and VVol SNAP. The VVol object supports cascading snapshot,
differential bitmap, and LUN data copy. To quickly deploy VMs, you can create a VVol
snapshot for the VM template and then create snapshots for the VVol snapshot to generate
multiple VMs using the same data image.
When a VM that has snapshots is cloned, data can be copied by the host or storage system.
 When the host copies the VM data, it can query the area where the VVol object stores
data and perform full copy. Then the host can query the differences between the
snapshots and the VM and copy the differential data.
 When the storage system copies the VM data, it uses its own full copy and differential
copy capabilities to copy the data to the new VM directly. Data can be copied between
different controllers, controller enclosures, and storage pools.
VMware uses the VASA Provider plug-in to detect and use storage capabilities to deploy,
migrate, and clone VMs quickly.
Each VM is stored in multiple VVols. VMware can clone, migrate, or configure traffic control
policies for individual VMs. The storage system completes the data copy operations directly
without occupying host bandwidth, greatly improving VM management efficiency.

8.2.2 OpenStack Integration


OceanStor Dorado V3 launches the latest OpenStack Cinder Driver in the OpenStack
community. Vendors of commercial OpenStack versions can obtain and integrate OpenStack
Cinder Driver, allowing their products to support OceanStor Dorado V3.
OceanStor Dorado V3 provides four versions of OpenStack Cinder Driver: OpenStack Juno,
Kilo, Liberty, and Mitaka. In addition, OceanStor Dorado V3 supports commercial versions of
OpenStack such as Huawei FusionSphere OpenStack, Red Hat OpenStack Platform, and
Mirantis OpenStack. For details, see
http://support-open.huawei.com/ready/pages/user/compatibility/support-matrix.jsf.

8.2.3 Virtual Machine Plug-ins


OceanStor Dorado V3 supports various VM plug-ins. For details, see
http://support-open.huawei.com/ready/pages/user/compatibility/support-matrix.jsf.

8.2.4 Host Compatibility


OceanStor Dorado V3 supports mainstream host components, including operating systems,
virtualization software, HBAs, volume management, and cluster software. OceanStor Dorado
V3 supports a wider range of operating systems and VM platforms for mainstream database
software. For details, see
http://support-open.huawei.com/ready/pages/user/compatibility/support-matrix.jsf.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 90


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 9 Best Practices

9 Best Practices

Huawei is continuously collecting requirements of important customers in major industries


and summarizes the typical high-performance storage applications and challenges facing these
customers. This helps Huawei provide best practices which are tested and verified together
with application suppliers.
For best practices, visit
http://e.huawei.com/en/products/cloud-computing-dc/storage/unified-storage/dorado-v3.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 91


Huawei OceanStor Dorado V3 All-Flash Storage
Systems
Technical White Paper 10 Appendix

10 Appendix

10.1 More Information


10.2 Feedback

10.1 More Information


You can obtain more information about OceanStor Dorado V3 at the following site:
http://e.huawei.com/en/products/cloud-computing-dc/storage/unified-storage/dorado-v3
You can also visit our official website to get more information about Huawei storage:
http://e.huawei.com/en/products/cloud-computing-dc/storage
For after-sales support, visit our technical support website:
http://support.huawei.com/enterprise/en
For pre-sales support, visit the following website:
http://e.huawei.com/en/how-to-buy/contact-us
You can also contact your local Huawei office:
http://e.huawei.com/en/branch-office

10.2 Feedback
Huawei welcomes your suggestions for improving our documentation. If you have comments,
send your feedback to [email protected].
Your suggestions will be seriously considered and we will make necessary changes to the
document in the next release.

Issue 1.6 (2019-05-31) Copyright © Huawei Technologies Co., Ltd. 92

You might also like