Huawei Flash Storage
Huawei Flash Storage
Issue 1.6
Date 2019-05-31
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: [email protected]
Contents
1 Executive Summary.................................................................................................................. 1
2 Overview ................................................................................................................................... 2
2.1 OceanStor Dorado V3 Family ................................................................................................................................ 2
2.2 Customer Benefits ................................................................................................................................................. 4
9 Best Practices........................................................................................................................... 91
10 Appendix ............................................................................................................................... 92
10.1 More Information ...............................................................................................................................................92
10.2 Feedback ............................................................................................................................................................92
1 Executive Summary
Huawei OceanStor Dorado V3 all-flash storage systems are designed for enterprises'
mission-critical services. They use FlashLink® dedicated to flash media to achieve 0.5 ms
stable latency. The gateway-free HyperMetro feature provides an end-to-end active-active
data center solution, which can smoothly evolve to the geo-redundant disaster recovery (DR)
solution to achieve 99.9999% solution-level reliability. Inline deduplication and compression
maximize the available capacity and reduce the total cost of ownership (TCO). OceanStor
Dorado V3 meets the requirements of enterprise applications such as databases, virtual
desktop infrastructure (VDI), and virtual server infrastructure (VSI), helping the financial,
manufacturing, and carrier industries evolve smoothly to all-flash storage.
This document describes and highlights the unique advantages of OceanStor Dorado V3 in
terms of its product positioning, hardware and software architecture, and features.
2 Overview
− The gateway-free active-active solution achieves zero recovery time objective (RTO)
and recovery point objective (RPO) in the case of a site failure, ensuring business
continuity.
Convergence and efficiency
Inline global deduplication and compression allow OceanStor Dorado V3 to reduce
customer capital expenditure (CAPEX) by 75% while providing the same available
capacity as traditional storage systems. Remote replication between OceanStor Dorado
V3 and Huawei converged storage systems can form a DR network containing both
all-flash and traditional storage systems. Heterogeneous virtualization enables OceanStor
Dorado V3 to take over resources from third-party storage systems.
Fast and cost-effective cloud DR
The CloudReplication and CloudBackup features back up production data to the cloud
without any external gateway, providing a fast, cost-effective, and maintenance-free
cloud DR center.
3 System Architecture
3.1 Concepts
3.2 Hardware Architecture
3.3 Software Architecture
3.1 Concepts
3.1.1 Controller Enclosure
The OceanStor Dorado V3 controller enclosure contains storage controllers that process all
storage service logic. It provides core functions such as host access, device management, and
data services. A controller enclosure consists of a system subrack, controllers, interface
modules, power modules, BBUs, and management modules. OceanStor Dorado V3 supports 2
U, 3 U, and 6 U controller enclosures. The 2 U enclosure has integrated disks, while the 3 U
and 6 U enclosures do not.
3.1.2 Controller
An OceanStor Dorado V3 controller is a computing module consisting of the CPU, memory,
and main board. It processes storage services, receives configuration and management
commands, saves configuration data, connects to disk enclosures, and stores critical data onto
coffer disks.
Coffer disks can be either built-in or external ones. They store system data and cache data in
the event of a power failure on the storage system. For the Dorado3000 V3 and Dorado5000
V3 series, the first four disks on the controller enclosure are the coffer disks; for the
Dorado6000 V3 series, the first four disks on the first disk enclosure are the coffer disks. For
details about the coffer disk specifications and partitioning, see the OceanStor Dorado3000
Figure 3-5 shows a dual-controller system. You can create a disk domain that contains all
disks in the system or create a separate disk domain for each controller enclosure.
When creating a disk domain, you must specify the hot spare policy and encryption type.
You can choose to use a high or low hot spare policy, or not to use any one. The policy can be
changed online.
When you use a high hot spare policy, the disk domain reserves great hot spare space for
data reconstruction in the event of a disk failure. The hot spare space increases
non-linearly with the number of disks.
When you use a low hot spare policy, which is the default setting, the disk domain
reserves a small amount of hot spare space (enough for the data on at least one disk) for
data reconstruction in the event of a disk failure. The hot spare space increases
non-linearly with the number of disks.
If you do not use a hot spare policy, the system will not reserve hot spare space.
Table 3-1 Relationship between the hot spare space and the number of disks (less than 200)
Number of Hot Spare Space Under the Hot Spare Space Under the
Disks High Policy High Policy
8 to 12 Equals to the capacity of 1 disk. Equals to the capacity of 1 disk.
13 to 25 Equals to the capacity of 2 disks.
26 to 50 Equals to the capacity of 3 disks. Equals to the capacity of 2 disks.
51 to 75 Equals to the capacity of 4 disks.
76 to 125 Equals to the capacity of 5 disks. Equals to the capacity of 3 disks.
126 to 175 Equals to the capacity of 6 disks.
176 to 200 Equals to the capacity of 7 disks. Equals to the capacity of 4 disks.
You can create either a standard or an encrypted disk domain. The encryption type cannot be
changed after the disk domain is created.
A standard disk domain consists of non-self-encrypting drives or self-encrypting drives
(SEDs) on which encryption is disabled.
An encrypted disk domain consists of only SEDs. You must configure the key
management service when you use an encrypted disk domain.
3.1.6 RAID
OceanStor Dorado V3 uses a Huawei proprietary algorithm, Erase-Code (EC), to implement
RAID5, RAID 6, RAID-TP, RAID 10*. RAID-TP is able to tolerate three faulty disks,
providing high system reliability.
OceanStor Dorado V3 uses the RAID 2.0+ block-level virtualization technology to implement
RAID. With this technology:
Multiple SSDs form a disk domain.
Each SSD is divided into fixed-size chunks (typically 4 MB per chunk) to facilitate
logical space management.
Chunks from different SSDs constitute a chunk group (CKG) based on the
customer-configured RAID level.
Chunk groups support three redundancy configurations:
RAID 5 uses the EC-1 algorithm and generates one copy of parity data for each stripe.
RAID 6 uses the EC-2 algorithm and generates two copies of parity data for each stripe.
RAID-TP uses the EC-3 algorithm and generates three copies of parity data for each
stripe.
A chunk group is further divided to smaller-granularity (typically, 8 KB) grains, which are the
smallest unit for data writes. OceanStor Dorado V3 adopts full-stripe write to avoid extra
overhead generated in traditional RAID mechanisms. Figure 3-8 shows RAID mapping on
OceanStor Dorado V3.
OceanStor Dorado V3 uses EC to support more member disks in a RAID group, improving
space utilization.
If a disk is faulty or is removed for a long time, the chunks on this disk are reconstructed. The
detailed procedure is as follows:
1. The disk becomes faulty and the chunks on it become unavailable.
2. The RAID level degrades for the chunk groups that contain the affected chunks.
3. The system allocates idle chunks from the storage pool for data reconstruction.
4. Based on the RAID level of the storage pool, the system uses the normal data columns
and parity data to restore the damaged data blocks and writes them to the idle chunks.
Because the faulty chucks are distributed to multiple chunk groups, all of the affected chunk
groups start reconstruction at the same time. In addition, the new chunks are from multiple
disks. This enables all disks in the disk domain to participate in reconstruction, fully utilizing
the I/O capability of all disks to improve the data reconstruction speed and shorten data
recovery time.
OceanStor Dorado V3 uses both common and dynamic RAID reconstruction methods to
prevent RAID level downgrade and ensure system reliability in various scenarios.
Common reconstruction
A RAID group has M+N members (M indicates data columns and N indicates parity
columns). When the system has faulty disks, common reconstruction is triggered if the
number of normal member disks in the disk domain is still greater than or equal to M+N.
During reconstruction, the system uses idle chunks to replace the faulty ones in the
chunk groups and restores data to the new chunks. The RAID level remains M+N.
In Figure 3-9, D0, D1, D2, P, and Q form a chunk group. If disk 2 fails, a new chunk
D2_new on disk 5 is used to replace D2 on disk 2. In this way, D0, D1, D2_new, P, and
Q form a new chunk group and the system restores the data of D2 to D2_new.
After common reconstruction is complete, the number of RAID member disks remains
unchanged, maintaining the original redundancy level.
Dynamic reconstruction
If the number of member disks in the disk domain is fewer than M+N, the system
reduces the number of data columns (M) and retains the number of parity columns (N)
during reconstruction. This method retains the RAID level by reducing the number of
data columns, ensuring system reliability.
During the reconstruction, the data on the faulty chunk is migrated to a new chunk group.
If the system only has M+N-1 available disks, the RAID level for the new chunk group
is (M-1)+N. The remaining normal chunks (M-1) and parity columns P and Q form a
new chunk group and the system calculates new parity columns P' and Q'.
In Figure 3-10, there are six disks (4+2). If disk 2 fails, data D2 in CKG0 is written to
the new CKG1 as new data (D2') and the RAID level is 3+2. D0, D1, and D3 form a new
3+2 CKG0 with new parity columns P' and Q'.
After the reconstruction is complete, the number of member disks in the RAID group is
decreased, but the RAID redundancy level remains unchanged.
The number of RAID members is automatically adjusted by the system based on the number
of disks in a disk domain. Factors such as capacity utilization, reliability, and reconstruction
speed are considered. Table 3-3 describes the relationship between the disks in a disk domain
and RAID members.
The number of RAID members (M+N) complies with the following rules:
1. If the number of faulty disks in a disk domain is less than or equal to the number of disks
in the hot spare space, the system does not trigger dynamic reconstruction.
2. A high capacity utilization should be guaranteed.
3. M+N should not exceed 25.
When the number of disks is less than 13, the hot spare space equals to the capacity of one
disk and M+N is X-1. This ensures the highest possible capacity utilization.
When a disk domain has 13 to 25 disks, the hot spare space equals to the capacity of two disks
and M+N is X-2. This setting is to avoid dynamic reconstruction when multiple disks fail.
When a disk domain has 26 or 27 disks, the hot spare space equals to the capacity of three
disks and M+N is X-3. Dynamic reconstruction will not be triggered if up to three disks fail
(at different time).
When the number of disks is greater than 27, the maximum value of M+N will be 25. This
ensures a high capacity utilization while limiting read amplification caused by reconstruction.
For example, if a disk in a 30+2 RAID group becomes faulty, the system must read the chunks
from 30 disks to reconstruct each chunk in the affected chunk groups, resulting in great read
amplification. To avoid this, the system limits M+N to 25.
When new disks are added to the system to expand capacity, the value of M+N increases with
the number of disks. All new data (including data generated by garbage collection) will be
written using the new RAID level, while the RAID level for the existing data remains
unchanged. For example, a disk domain has 15 disks and uses RAID 6; M+N is 11+2. If the
customer expands the domain to 25 disks, new data will be written to the new 21+2 chunk
groups, while the existing data is still in the original 11+2 chunk groups. When garbage
collection starts, the system will move the valid chunks in the original 11+2 chunk groups to
the 21+2 chunk groups and then reclaim the original chunk groups.
OceanStor Dorado V3 has the following advantages in terms of data redundancy and
recovery:
Fast reconstruction
All disks in the disk domain participate in reconstruction. Test results show that
OceanStor Dorado V3 takes only 30 minutes to reconstruct 1 TB of data (when there is
no new data written to the system), whereas traditional RAID takes more than 2 hours.
Multiple RAID levels available
OceanStor Dorado V3 supports RAID 5, RAID 6, and RAID-TP. You can choose the
RAID level that meets your needs. RAID-TP allows three faulty disks and provides the
highest reliability for mission-critical services.
Intelligent selection of RAID member disks
If a disk has a persistent fault, the system can intelligently reduce the number of member
disks in the RAID group and use dynamic reconstruction to write new data with the
original RAID level instead of a lower level, avoiding reduction in data reliability.
Appending mechanism to ensure data consistency
OceanStor Dorado V3 uses appending in full-stripe writes. This avoids data
inconsistency in traditional RAID caused by write holes.
Figure 3-13 Device architecture of Dorado3000 V3 and Dorado5000 V3 with SAS SSDs
Dorado6000 V3 and Dorado18000 V3 use independent controller enclosures that do not have
disks, allowing flexible scale-out and scale-up. Dorado6000 V3 uses 3 U controller enclosures,
each of which houses two controllers; Dorado18000 V3 uses 6 U controller enclosures, each
of which houses two or four controllers. Controllers within an enclosure are interconnected by
PCIe 3.0 channels on the midplane, while controllers on different enclosures are
interconnected by PCIe 3.0 switches to scale out the system. The controller enclosures can
connect to disk enclosures via SAS 3.0 links to scale up the system capacity.
A page is the smallest programming and read unit. Its size is usually 4 KB, 8 KB, or 16
KB.
Operations on NAND flash include erase, program, and read. The program and read
operations are implemented at the page level, while the erase operations are implemented at
the block level. Before writing a page, the system must erase the entire block where the page
resides. Therefore, the system must migrate the valid data in the block to a new storage space
before erasing it. This process is called garbage collection (GC). SSDs can only tolerate a
limited number of program/erase (P/E) cycles. If a block on an SSD experiences more P/E
cycles than others, it will wear out more quickly. To ensure reliability and performance,
HSSDs leverage the following advanced technologies.
LDPC uses linear codes defined by the check matrix to check and correct errors. When data is
written to pages on the NAND flash, the system calculates the LDPC verification information
and writes it to the pages with the user data. When data is read from the pages, LDPC verifies
and corrects the data.
HSSDs house a built-in XOR engine to implement redundancy protection between flash chips.
If a flash chip becomes faulty (page failure, block failure, die failure, or full chip failure),
redundancy check data is used to recover the data on the faulty blocks, preventing data loss.
NVMe SSDs reduce the number of interactions in a write request from 4 (in a SAS protocol)
to 2.
SAS, and 8 Gbit/s PCIe rates as well as Flash Translation Layer (FTL) hardware
acceleration to provide stable performance at a low latency for enterprise applications.
SmartIO chip
Hi182x (IOC) is the first Huawei-developed storage interface chip. It integrates multiple
interface protocols such as 8 Gbit/s, 16 Gbit/s, or 32 Gbit/s Fibre Channel, 100GE, 40GE,
25GE, and 10GE to achieve excellent performance, high interface density, and flexible
configuration.
BMC chip
Hi1710 is a BMC chip dedicated to the X86 CPU platform. It consists of the A9 CPU,
8051 co-processor, sensor circuits, control circuits, and interface circuits. It supports the
Intelligent Platform Management Interface (IPMI), which monitors and controls the
hardware components of the storage system, including system power control, controller
monitoring, interface module monitoring, power supply and BBU management, and fan
monitoring.
Scale-up
The controller and disk enclosures of OceanStor Dorado V3 are directly connected by
redundant SAS 3.0 links. For Dorado6000 V3 and Dorado18000 V3, disk enclosures use
dual-uplink networking; for Dorado5000 V3 (SAS), disk enclosures use single-uplink
networking.
In dual-uplink networking, both ports on each expansion module of a disk enclosure are used
as uplink ports to connect to a controller enclosure. That is, each disk enclosure is connected
to a controller enclosure using four ports. Dual-uplink networking can improve back-end
bandwidth and reduce latency, eliminating bottlenecks caused by links.
In single-uplink networking, one port on each expansion module of a disk enclosure is used as
the uplink port to connect to a controller enclosure. That is, each disk enclosure is connected
to a controller enclosure using two ports.
NVMe disk enclosures use 8 x 8 Gbit/s PCIe 3.0 expansion cables, which provide greater
transmission capability than SAS cables. Therefore, single-uplink networking using PCIe
cables is able to meet performance requirements.
For Dorado3000 V3 and Dorado5000 V3 (SAS), the 25 SSDs on the controller enclosure use
dual-uplink networking, while external disk enclosures use single-uplink networking to
connect to the controller enclosure.
It is recommended that you use disks of the same capacity when deploying the storage system
for the first time. You can later scale up by adding disks of the same or greater capacity as the
existing ones, reducing TCO.
Scale-out
The two or four controllers on an OceanStor Dorado V3 controller enclosure are
interconnected by the mirroring channels on the midplane, and controllers on different
controller enclosures are interconnected using PCIe 3.0 switches. Each controller has a 2-port
PCIe interface module that connects to two PCIe switches for redundancy. Faults on any
switch, controller, interface module, or link will not interrupt services.
The following figures show details of the network connections.
The scale-out management network is connected in daisy chain layout, which manages both
the controllers and PCIe switches. This saves ports on the management switches.
The software architecture of the storage controller mainly consists of the cluster &
management plane and service plane.
The cluster & management plane provides a basic environment to run the system,
controls multi-controller scale-out, and manages alarms, performance, and user
operations.
The service plane schedules storage service I/Os, permits data scale-out, and implements
controller software-related functions provided by FlashLink®, such as deduplication and
compression, redirect-on-write (ROW) full-stripe write, hot and cold data separation,
garbage collection, global wear leveling, and anti-wear leveling.
The Dorado NAS unit provides end-to-end file system services over the LUNs provided by
Dorado V3, featuring high reliability and performance.
The Dorado NAS unit leverages the inline deduplication and compression capability of
Dorado V3 to provide high data reduction ratio at a low latency for NAS services.
3.3.1 FlashLink
FlashLink® associates storage controllers with SSDs by using a series of technologies for
flash media, ensuring both reliability and performance of flash storage. The key technologies
of FlashLink® include hot and cold data separation, end-to-end I/O priority, ROW full stripe
write, global garbage collection, global wear leveling, and anti-wear leveling. These
techniques resolve problems such as performance jitter caused by write amplification and
garbage collection, and ensure a steady low latency and high IOPS of OceanStor Dorado V3.
In Figure 3-27, the red and gray blocks represent metadata and user data, respectively.
If metadata and user data are stored in the same blocks, the blocks may still contain a large
amount of valid user data after all the metadata becomes garbage, because metadata changes
more frequently than user data. When the system erases these blocks, it must migrate the valid
user data to new blocks, reducing garbage collection efficiency and system performance.
If metadata and user data are stored in different blocks, the system only needs to migrate a
small amount of data before erasing the metadata blocks. This significantly improves the
garbage collection efficiency.
On the left side in the preceding figure, various I/Os have the same priority and contend for
resources. After I/O priority adjustment, system resources are allocated by I/O priority.
In Figure 3-29, the system uses RAID 6 (4+2) and writes new data blocks 1, 2, 3, and 4 to
modify existing data.
In traditional overwrite mode, the system must modify every chunk group where these blocks
reside. For example, when writing data block 3 to CKG2, the system must first read the
original data block d and the parity data P and Q. Then it calculates new parity data P 'and Q',
and writes P', Q', and data block 3 to CKG2. In ROW full-stripe write, the system uses the
data blocks 1, 2, 3, and 4 to calculate P and Q and writes them to a new chunk group. Then it
modifies the logical block addressing (LBA) pointer to point to the new chunk group. This
process does not need to read any existing data.
Typically, RAID 5 uses 22D+1P, RAID 6 uses 21D+2P, and RAID-TP uses 20D+3P, where D
indicates data columns and P indicates parity columns. Table 3-5 compares write
amplification on OceanStor Dorado V3 using these RAID levels.
The performance differences between RAID 5 and RAID 6, and between RAID 6 and
RAID-TP are only about 5%.
garbage reaches a specified threshold. During garbage collection, the system migrates the
valid data in the target chunk group to a new chunk group. Then the system reclaims all
chunks in the target chunk group to release its space. At the same time, the system issues the
unmap or deallocate command to SSDs to mark the data in the corresponding LBA area as
invalid. The SSDs then reclaim the space. The garbage collection process is initiated by
storage controllers and takes effect on all SSDs.
However, if SSDs are approaching the end of their life, for example, the wearing level
exceeds 80%, multiple SSDs may fail simultaneously and data may be lost if global wear
leveling is still used. In this case, the system enables anti-global wear leveling to avoid
simultaneous failures. The system selects the most severely worn SSD and writes data onto it
as long as it has idle space. This reduces that SSD's life faster than others, and you are
prompted to replace it sooner, avoiding simultaneous failures.
When the cache usage reaches the threshold, the cache eviction algorithm calculates the
historical and current data access frequencies, and invokes the least recently used (LRU)
algorithm to evict unnecessary cached data.
block, and the system compresses the new data and writes it to the SSDs rather than
deduplicating the data.
e. If the fingerprint table does not contain the same fingerprint, the new data is not
duplicate. The system adds the data block's fingerprint to the table, compresses the
data, and writes it to the SSDs.
f. The compression algorithm is LZ4 or ZSTD, and the granularity is 4 KB, 8 KB, 16
KB, or 32 KB. The compressed data is aligned by byte.
6. The pool combines the data into full stripes and writes them to the SSDs.
a. Compressed I/Os are combined into stripes whose size is an integer multiple of 8
KB.
b. When a stripe is full, the system calculates parity bits and writes the data and parity
bits to disks.
c. If the stripe is not full, 0s are added to the tail before the data is written to disks
(these 0s will be cleared subsequently during garbage collection).
d. Data is written to a new location every time and metadata mapping relationships are
updated.
e. After a message is returned indicating that I/Os are successfully written to disks, the
cache deletes the corresponding data pages.
If a disk domain contains disks owned by multiple controller enclosures, host data will be
evenly distributed to all disks in the disk domain. In Figure 3-34, upon receiving a write
request from the host, the storage system performs hash calculation on the received data and
evenly distributes it to all disks in the disk domain based on the hash result.
The size of data blocks to be compressed can be 4 KB, 8 KB, 16 KB, and 32 KB. The
compressed data is aligned by byte, which improves the compression efficiency and saves the
storage space for compressed data. In the following figure, 8 KB data blocks are compressed,
converged into full stripes, and then written to disks.
The compression ratio of OceanStor Dorado V3 also depends on user data. For Oracle OLTP
applications, the compression ratio is between 1.5 and 7.9; for VDI applications, the
compression ratio is between 2.8 and 4. You can enable or disable SmartCompression for each
specific LUN. In applications that require high performance, you can disable this function.
SmartVirtualization uses LUN masquerading to set the WWNs and Host LUN IDs of
eDevLUNs on OceanStor Dorado V3 to the same values as those on heterogeneous storage
system. After data migration is complete, the host's multipathing software switches over the
LUNs online without interrupting services.
Application Scenarios
Heterogeneous array takeover
As customers build data centers over time, the storage arrays they use may come from
different vendors. Storage administrators can leverage SmartVirtualization to manage
and configure existing devices, protecting investments.
Heterogeneous data migration
The customer may need to replace storage systems whose warranty periods are about to
expire or whose performance does not meet service requirements. SmartVirtualization
and SmartMigration can migrate customer data to OceanStor Dorado V3 online without
interrupting host services.
For more information, see the OceanStor Dorado V3 Series V300R002 SmartVirtualization
Feature Guide.
1. Data synchronization
a. Before migration, you must configure the source and target LUNs.
b. When migration starts, the source LUN replicates data to the target LUN.
c. During migration, the host can still access the source LUN. When the host writes
data to the source LUN, the system records the DCL.
d. The system writes the incoming data to both the source and target LUNs.
If writing to both LUNs is successful, the system clears the record in the DCL.
If writing to the target LUN fails, the storage system identifies the data that
failed to be synchronized according to the DCL and then copies the data to the
target LUN. After the data is copied, the storage system returns a write success
to the host.
If writing to the source LUN fails, the system returns a write failure to notify
the host to re-send the data. Upon reception, the system only writes the data to
the source LUN.
2. LUN information exchange
After data replication is complete, host I/Os are suspended temporarily. The source and
target LUN exchanges information as follows:
a. Before LUN information is exchanged, the host uses the source LUN ID to identify
the source LUN. Because of the mapping relationship between the source LUN ID
and the source data volume ID used to identify physical space, the host can read the
physical space information about the source LUN. The mapping relationship also
exists between the target LUN ID and target data volume ID.
b. In LUN information exchange, the source and target LUN IDs remain unchanged
but the data volume IDs of the source and target LUNs are exchanged. This creates
a new mapping relationship between the source LUN ID and target data volume ID.
c. After the exchange, the host can still identify the source LUN using the source LUN
ID but reads physical space information about the target LUN due to the new
mapping relationship.
LUN information exchange is completed instantaneously, which does not interrupt
services.
Application Scenarios
Storage system upgrade with SmartVirtualization
SmartMigration works with SmartVirtualization to migrate data from legacy storage
systems (from Huawei or other vendors) to new Huawei storage systems to improve
service performance and data reliability.
Data migration for capacity, performance, and reliability adjustments
For more information, see the OceanStor Dorado V3 Series V300R002 SmartMigration
Feature Guide.
Management isolation
Each vStore has its own administrator. vStore administrators can only configure and
manage their own storage resources through the GUI or RESTful API. vStore
administrators support role-based permission control. When being created, a vStore
administrator is assigned a role specific to its permissions.
Service isolation
Each vStore has its own file systems, users, user groups, shares, and exports. Users can
only access file systems belonging to the vStore through logical interfaces (LIFs).
Service isolation includes: service data isolation (covering file systems, quotas, and
snapshots), service access isolation, and service configuration isolation (typically for
NAS protocol configuration).
− Service data isolation
System administrators assign different file systems to different vStores, thereby
achieving file system isolation. File system quotas and snapshots are isolated in the
same way.
− Service access isolation
Each vStore has its own NAS protocol instances, including the SMB service, NFS
service, and NDMP service.
− Service configuration isolation
Each vStore can have its own users, user groups, user mapping rules, security
policies, SMB shares, NFS shares, AD domain, DNS service, LDAP service, and
NIS service.
Network isolation
VLANs and LIFs are used to isolate the vStore network, preventing illegal host access to
vStore's storage resources.
vStores use LIFs to configure services. A LIF belongs only to one vStore to achieve
logical port isolation. You can create LIFs from GE ports, 10GE ports, bond ports, or
VLANs.
3. SmartQuota updates the quota (used amount of space and number of files + incremental
amount of space and number of files) and allows the quota and I/O data to be written into
the file system.
The I/O operation and quota update succeed or fail at the same time, ensuring that the used
capacity is correct during each I/O check.
If the directory quota, user quota, and group quota are concurrently configured in a shared directory in
which you are performing operations, each write I/O operation will be restricted by these three quotas.
All types of quota are checked. If the hard quota of one type of quota does not pass the check, the I/O
will be rejected.
SmartQuota does the following to clear alarms: When the used resource of a user is lower
than 90% of the soft quota, SmartQuota clears the resource over-usage alarm. In this way,
even though the used resource is slightly higher or lower than the soft quota, alarms are not
frequently generated or cleared.
In Figure 5-1, both the source LUN and snapshot use a mapping table to access the physical
space. The original data in the source LUN is ABCDE and is saved in sequence in the
physical space. The metadata of the snapshot is null. All read requests to the snapshot are
redirected to the source LUN.
When the source LUN receives a write request that changes C to F, the new data is
written into a new physical space P5 instead of being overwritten in P2.
In the mapping metadata of the source LUN, the system changes L2->P2 to L2->P5.
If the snapshot must be modified, for example, A corresponding to L0 must be changed
to G, the system first writes G to P6 and then changes L0->P0 in the snapshot mapping
table to L0->P6. Data in the source LUN is changed to ABFDE and data in the snapshot
is changed to GBCDE.
HyperSnap implements writable snapshots by default. All snapshots are readable and writable,
and support snapshot copies and cascading snapshots. You can create a read-only copy of a
snapshot at a specific point in time, or leverage snapshot cascading to create child snapshots
for a parent snapshot. For multi-level cascading snapshots that share a source volume, they
can roll back each other and the source volume regardless of their cascading levels. This is
called cross-level rollback.
In Figure 5-2, Snapshot1 is created for the source volume at 9:00, and Snapshot1.snapshot0
is a cascading snapshot of Snapshot1 at 10:00. The system can roll back the source volume
using Snapshot1.snapshot0 or Snapshot1, or roll back Snapshot1 using
Snapshot1.snapshot0.
HyperSnap supports timed snapshots, which can be triggered weekly, daily, or at a custom
interval (with a minimum interval of 30 seconds). The system supports multiple schedules and
each schedule can have multiple source LUNs. Snapshots of the sources LUNs that share a
schedule are in the same consistency group.
HyperSnap supports snapshot consistency groups. For LUNs that are dependent on each other,
you can create a snapshot consistency group for these LUNs to ensure data consistency. For
example, the data files, configuration files, and logs of an Oracle database are usually saved
on different LUNs. Snapshots for these LUNs must be created at the same time to guarantee
that the snapshot data is consistent in time.
Technical Highlights
Zero-duration backup window
A backup window refers to the maximum backup duration tolerated by applications
before data is lost. Traditional backup deteriorates file system performance, or can even
interrupt ongoing applications. Therefore, a traditional backup task can only be executed
after applications are stopped or if the workload is comparatively light. HyperSnap can
back up data online, and requires a backup window that takes almost zero time and does
not interrupt services.
Snapshot creation within seconds
To create a snapshot for a file system, only the root node of the file system needs to be
copied and stored in caches and protected against power failure. This reduces the
snapshot creation time to seconds.
Reduced performance loss
HyperSnap makes it easy to create snapshots for file systems. Only a small amount of
data needs to be stored on disks. After a snapshot is created, the system checks whether
data is protected by a snapshot before releasing the data space. If the data is protected by
a snapshot, the system records the space of the data block that is protected by the
snapshot but is deleted by the file system. This results in a negligible impact on system
performance. Background data space reclamation contends some CPU and memory
resources against file system services only when the snapshot is deleted. However,
performance loss remains low.
Less occupied disk capacity
The file system space occupied by a snapshot (a consistent duplicate) of the source file
system depends on the amount of data that changed after the snapshot was generated.
This space never exceeds the size of the file system at the snapshot point in time. For a
file system with little changed data, only a small storage space is required to generate a
consistent duplicate of the file system.
Rapid snapshot data access
A file system snapshot is presented in the root directory of the file system as an
independent directory. Users can access this directory to quickly access the snapshot data.
If snapshot rollback is not required, users can easily access the data at the snapshot point
in time. Users can also recover data by copying the file or directory if the file data in the
file system is corrupted.
If using a Windows client to access a CIFS-based file system, a user can restore a file or
folder to the state at a specific snapshot point in time. To be specific, a user can
right-click the desired file or folder, choose Restore previous versions from the
short-cut menu, and select one option for restoration from the displayed list of available
snapshots containing the previous versions of the file or folder.
Quick file system rollback
Backup data generated by traditional offline backup tasks cannot be read online. A
time-consuming data recovery process is inevitable before a usable duplicate of the
source data at the backup point in time is available. HyperSnap can directly replace the
file system root with specific snapshot root and clear cached data to quickly roll the file
system back to a specific snapshot point in time.
You must exercise caution when using the rollback function because snapshots created
after the rollback point in time are automatically deleted after a file system rollback
succeeds.
Continuous data protection by timing snapshots
HyperSnap enables users to configure policies to automatically create snapshots at
specific time points or at specific intervals.
The maximum number of snapshots for a file system varies depending on the product
model. If the upper limit is exceeded, the earliest snapshots are automatically deleted.
The file system also allows users to periodically delete snapshots.
As time elapses, snapshots are generated at multiple points, implementing continuous
data protection at a low cost. It must be noted that snapshot technology cannot achieve
real continuous data protection. The interval between two snapshots determines the
granularity of continuous data protection.
Technical Highlights:
Continuous protection, lossless performance
HyperCDP provides data protection at an interval of seconds, with zero impact on
performance and small space occupation.
Support for scheduled tasks
You can specify HyperCDP schedules by day, week, month, or specific interval, meeting
different backup requirements.
In database applications, the data, configuration files, and logs are usually saved on
different LUNs. The HyperCDP consistency group ensures data consistency between
these LUNs during restoration.
HyperCDP duplicate for reads and writes
Hosts cannot read or write HyperCDP objects directly. To read a HyperCDP object, you
can create a duplicate for it and then map the duplicate to the host. The duplicate has the
same data as the source HyperCDP object and can be read and written by the host. In
addition, the duplicate can be rebuilt by a HyperCDP object at any time point to obtain
the data at that time.
There are some restrictions when HyperCDP is used with other features of OceanStor
DoradoV3.
Feature Restriction
HyperSnap Source LUNs of HyperSnap can be used as the source LUNs of
HyperCDP, but snapshot LUNs of HyperSnap cannot be used as
the source LUNs of HyperCDP.
HyperCDP objects cannot be used as the source LUNs of
HyperSnap.
HyperMetro Member LUNs of HyperMetro can be used as the source LUNs
of HyperCDP, but HyperCDP objects cannot be used as member
LUNs of HyperMetro.
HyperCDP rollback cannot be performed during HyperMetro
synchronization.
HyperReplication The primary and secondary LUNs of HyperReplication can be
used as the source LUNs of HyperCDP, but HyperCDP objects
cannot be used as the primary or secondary LUNs of
HyperReplication.
HyperCDP rollback cannot be performed during
HyperReplication synchronization.
SmartMigration Source LUNs of HyperCDP and HyperCDP objects cannot be used
as the source or target LUNs of SmartMigration.
HyperClone Source LUNs of HyperClone can be used as the source LUNs of
HyperCDP. Before clone LUNs are split, they cannot be used as the
source LUNs of HyperCDP.
SmartVirtualization Heterogeneous LUNs cannot be used as the source LUNs of
HyperCDP.
empty or have existing data. If the target LUN has data, the data will be overwritten by the
source LUN during synchronization. After the HyperCopy pair is created, you can
synchronize data. During the synchronization, the target LUN can be read and written
immediately. HyperCopy supports consistency groups, incremental synchronization,
incremental restoration, providing full backup for source LUNs. HyperCopy allows data copy
between controllers, but does not support copy between different arrays.
HyperCopy is typically applied to:
Data backup and restoration
Data analysis
Data reproduction
Figure 5-7 Data synchronization from the source LUN to the target LUN
Restoration
If the source LUN is damaged, data on the target LUN can be restored to the source LUN.
Restoration also supports full and incremental data synchronization. When restoration starts,
the system generates a snapshot for the target LUN and synchronizes the snapshot data to the
source LUN. For incremental restoration, the system compares the data of the source and
target LUNs, and only synchronizes the differential data.
The following figure illustrates the restoration principle.
Figure 5-8 Restoration from the target LUN to the source LUN
Figure 5-9 Reads and writes when HyperCopy is not synchronizing data
For read operations on the target LUN, if the requested data is hit on the target LUN (the
data has been synchronized), the host reads the data from the target LUN. If the
requested data is not hit on the target LUN (the data has not been synchronized), the host
reads the data from the snapshot of the source LUN.
For write operations on the target LUN, if a data block has been synchronized before the
new data is written, the system overwrites this block. If a data block has not been
synchronized, the system writes the new data to this block and stops synchronizing the
source LUN's data to it. This ensures that the target LUN can be read and written before
the synchronization is complete.
Consistency Group
You can add multiple HyperCopy pairs to a consistency group. When you synchronize or
restore a consistency group, data on all member LUNs is always at a consistent point in time,
ensuring data integrity and availability.
When an application server writes new data to the source or clone LUN, the storage system
leverages ROW, which allocates a new storage space for the new data instead of overwriting
the data in the existing storage space. As shown in Figure 5-12, when the application server
attempts to modify data block A in the source LUN, the storage pool allocates a new block
(A1) to store the new data, and retains the original block A. Similarly, when the application
server attempts to modify block D in the clone LUN, the storage pool allocates a new block
(D1) to store the new data, and retains the original block D.
When a clone LUN is split, the storage system copies the data that the clone LUN shares with
the source LUN to new data blocks, and retains the new data that has been written to the clone
LUN. After splitting, the association between the source and clone LUNs is canceled and the
clone LUN becomes an independent physical copy.
OceanStor Dorado V3 supports consistent clones. For LUNs that are dependent on each other,
for example, LUNs that save the data files and logs of a database, you can create clones for
these LUNs' snapshots that were activated simultaneously to ensure data consistency between
the clones.
Both HyperClone and HyperCopy can create a complete copy of data. The following table
compares their similarities and differences.
Working Principle
A clone file system is a readable and writable copy taken from a point in time that is based on
redirect-on-write (ROW) and snapshot technologies.
As shown in Figure a, the storage system writes new or modified data onto the newly
allocated space of the ROW-based file system, instead of overwriting the original data.
The storage system records the point in time of each data write, indicating the write
sequence. The points in time are represented by serial numbers, in ascending order.
As shown in Figure b, the storage system creates a clone file system as follows:
− Creates a read-only snapshot in the parent file system.
− Copies the root node of the snapshot to generate the root node of the clone file
system.
− Creates an initial snapshot in the clone file system.
This process is similar to the process of creating a read-only snapshot during which
no user data is copied. Snapshot creation can be completed in one or two seconds.
Before data is modified, the clone file system shares data with its parent file system.
As shown in Figure c, modifying either the parent file system or the clone file system
does not affect the other system.
− When the application server modifies data block A of the parent file system, the
storage pool allocates new data block A1 to store new data. Data block A is not
released because it is protected by snapshots.
− When the application server modifies data block D of the clone file system, the
storage pool allocates new data block D1 to store new data. Data block D is not
released because its write time is earlier than the creation time of the clone file
system.
Figure d shows the procedure for splitting a clone file system:
− Deletes all read-only snapshots from the clone file system.
− Traverses the data blocks of all objects in the clone file system, and allocates new
data blocks in the clone file system for the shared data by overwriting data. This
splits shared data.
− Deletes the associated snapshots from the parent file system.
After splitting is complete, the clone file system is independent of the parent file
system. The time required to split the clone file system depends on the size of the
share data.
Technical Highlights
Rapid deployment
In most scenarios, a clone file system can be created in seconds and can be accessed
immediately after being created.
Saved storage space
A clone file system shares data with its parent file system and occupies extra storage
space only when it modifies shared data.
Effective performance assurance
HyperClone has a negligible impact on system performance because a clone file system
is created based on the snapshot of the parent file system.
Splitting a clone file system
After a clone file system and its parent file system are split, they become completely
independent of each other.
Technical Highlights
Zero data loss
HyperReplication/S updates data in the primary and secondary LUNs simultaneously,
ensuring zero RPO.
Split mode
HyperReplication/S supports split mode, where write requests of production hosts go
only to the primary LUN, and the difference between the primary and secondary LUNs is
recorded by the differential log. If you want to resume data consistency between the
primary and secondary LUNs, you can manually start synchronization, during which
data blocks marked as differential in the differential log are copied from the primary
LUN to the secondary LUN. The I/O processing is similar to the initial synchronization.
This mode meets user requirements such as temporary link maintenance, network
bandwidth expansion, and saving data at a certain time on the secondary LUN.
Primary/secondary switchover
A primary/secondary switchover is the process where the primary and secondary LUNs
in a remote replication pair exchange roles.
Primary/secondary switchover depends on the secondary LUN' data status, which can be:
− Consistent: Data on the secondary LUN is a duplicate of the primary LUN's data at
the time when the last synchronization was performed. In this state, the secondary
LUN's data is available but not necessarily the same as the current data on the
primary LUN.
− Inconsistent: Data on the secondary LUN is not a duplicate of the primary LUN's
data at the time when the last synchronization was performed and, therefore, is
unavailable.
In the preceding figure, the primary LUN at the primary site becomes the new secondary
LUN after the switchover, and the secondary LUN at the secondary site becomes the new
primary LUN. After the new primary LUN is mapped to the standby hosts at the
secondary site (this can be performed in advance), the standby hosts can take over
services and issue new I/O requests to the new primary LUN. A primary/secondary
switchover can be performed only when data on the secondary LUN is consistent with
that on the primary LUN. Incremental synchronization is performed after a
primary/secondary switchover.
Note the following:
− When the pair is in the normal state, a primary/secondary switchover can be
performed.
− In the split state, a primary/secondary switchover can be performed only when the
secondary LUN is set to writable.
Consistency group
Medium- and large-size databases' data, logs, and modification information are stored on
different LUNs. If data on one of these LUNs is unavailable, data on the other LUNs is
also invalid. Consistency between multiple remote replication pairs must be considered
when remote disaster recovery solutions are implemented on these LUNs.
HyperReplication/S uses consistency groups to maintain the same synchronization pace
between multiple remote replication pairs.
A consistency group is a collection of multiple remote replication pairs that ensures data
consistency when a host writes data to multiple LUNs on a single storage system. After
data is written to a consistency group at the primary site, all data in the consistency
group is simultaneously copied to the secondary LUNs to ensure data integrity and
availability at the secondary site.
HyperReplication/S allows you to add multiple remote replication pairs to a consistency
group. When you set writable secondary LUNs for a consistency group or perform
Technical Highlights
Data compression
Both Fibre Channel and IP links support data compression by using the LZ4 algorithm,
which can be enabled or disabled as required. Data compression reduces the bandwidth
required by asynchronous remote replication. In the testing of an Oracle OLTP
application with 100 Mbit/s bandwidth, data compression saves half of the bandwidth.
Quick response to host requests
After a host writes data to the primary LUN at the primary site, the primary site
immediately returns a write success to the host before the data is written to the secondary
LUN. In addition, data is synchronized in the background, which does not affect access
to the primary LUN. HyperReplication/A does not synchronize incremental data from the
primary LUN to the secondary LUN in real time. Therefore, the amount of data loss
depends on the synchronization interval (ranging from 3 seconds to 1440 minutes; 30
seconds by default), which can be specified based on site requirements.
Splitting, switchover of primary and secondary LUNs, and rapid fault recovery
HyperReplication/A supports splitting, synchronization, primary/secondary switchover,
and recovery after disconnection.
Consistency group
Consistency groups apply to databases. Multiple LUNs, such as log LUNs and data
LUNs, can be added to a consistency group so that data on these LUNs is from a
consistent time in the case of periodic synchronization or fault. This facilitates data
recovery at the application layer.
Interoperability with Huawei OceanStor converged storage systems
Developed on the OceanStor OS unified storage software platform, OceanStor Dorado
V3 is compatible with the replication protocols of all Huawei OceanStor converged
storage products. Remote replication can be created among different types of products to
construct a highly flexible disaster recovery solution.
Support for fan-in
HyperReplication of OceanStor Dorado V3 supports data replication from 64 storage
devices to one storage device for central backup (64:1 replication ratio, which is four to
eight times that provided by other vendors). This implements disaster recovery resource
sharing and greatly reduces the disaster recovery cost.
Support for cloud replication
OceanStor Dorado V3 supports CloudReplication, which works with Dedicated
Enterprise Storage Service (DESS) on HUAWEI CLOUD to constitute cloud DR
solutions. You can purchase HUAWEI CLOUD resources on demand to build your DR
centers without the need for on-premises equipment rooms or O&M teams, reducing
costs and improving efficiency.
For more information, see the OceanStor Dorado V3 Series V300R002 HyperReplication
Feature Guide.
made to the primary file system since the last synchronization will be synchronized to the
secondary file system.
Working Principle
Object layer-based replication
HyperReplication/A implements data replication based on the object layer. The files,
directories, and file properties of file systems consist of objects. Object layer-based
replication copies objects from the primary file system to the secondary file system
without considering complex file-level information, such as dependency between files
and directories, and file operations, simplifying the replication process.
Periodical replication based on ROW
HyperReplication/A implements data replication based on ROW snapshots.
− Periodic replication improves replication efficiency and bandwidth utilization.
During a replication period, the data that was written most recently is always copied.
For example, if data in the same file location is modified multiple times, the data
written last is copied.
− File systems and their snapshots employ ROW to process data writes. Regardless of
whether a file system has a snapshot, data is always written to the new address
space, and service performance will not decrease even if snapshots are created.
Therefore, HyperReplication/A has a slight impact on production service
performance.
Written data is periodically replicated to the secondary file system in the background.
Replication periods are defined by users. The addresses, rather than the content of incremental
data blocks in each period, are recorded. During each replication period, the secondary file
system is incomplete before all incremental data is completely transferred to the secondary
file system.
After the replication period ends and the secondary file system becomes a point of data
consistency, a snapshot is created for the secondary file system. If the next replication period
is interrupted because the production center malfunctions or the link goes down,
HyperReplication/A can restore the secondary file system data to the last snapshot point,
ensuring consistent data.
1. The production storage system receives a write request from a production host.
2. The production storage system writes the new data to the primary file system and
immediately sends a write acknowledgement to the host.
3. When a replication period starts, HyperReplication/A creates a snapshot for the primary
file system.
4. The production storage system reads and replicates snapshot data to the secondary file
system based on the incremental information received since the last synchronization.
5. After incremental replication is complete, the content of the secondary file system is the
same as the snapshot of the primary file system. The secondary file system becomes the
point of data consistency.
Technical Highlights
Splitting and incremental resynchronization
If you want to suspend data replication from the primary file system to the secondary file
system, you can split the remote replication pair. For HyperReplication/A, splitting will
stop the ongoing replication process and later periodic replication.
After splitting, if the host writes new data, the incremental information will be recorded.
You can start a synchronization session after splitting. During resynchronization, only
incremental data is replicated.
Splitting applies to device maintenance scenarios, such as storage array upgrades and
replication link changes. In such scenarios, splitting can reduce the number of concurrent
tasks so that the system becomes more reliable. The replication tasks will be resumed or
restarted after maintenance.
Automatic recovery
If data replication from the primary file system to the secondary file system is interrupted
due to a fault, remote replication enters the interrupted state. If the host writes new data
when remote replication is in this state, the incremental information will be recorded.
After the fault is rectified, remote replication is automatically recovered, and incremental
resynchronization is automatically implemented.
Readable and writable secondary file system and incremental failback
Normally, a secondary file system is readable but not writable. When accessing a
secondary file system, the host reads the data on snapshots generated during the last
backup. After the next backup is completed, the host reads the data on the new snapshots.
A readable and writable secondary file system applies to scenarios in which backup data
must be accessed during replication.
You can set a secondary file system to readable and writable if the following conditions
are met:
− Initial synchronization has been implemented. For HyperReplication/A, data on the
secondary file system is in the complete state after initial synchronization.
− The remote replication pair is in the split or interrupted state.
If data is being replicated from the primary file system to the secondary file system
(the data is inconsistent on the primary and secondary file systems) and you set the
secondary file system to readable and writable, HyperReplication/A restores the
data in the secondary file system to the point in time at which the last snapshot was
taken.
After the secondary file system is set to readable and writable, HyperReplication/A
records the incremental information about data that the host writes to the secondary
file system for subsequent incremental resynchronization. After replication recovery,
you can replicate incremental data from the primary file system to the secondary
file system or from the secondary file system to the primary file system (a
primary/secondary switchover is required before synchronization). Before a
replication session starts, HyperReplication/A restores target end data to a point in
time at which a snapshot was taken and the data was consistent with source end data.
Then, HyperReplication/A performs incremental resynchronization from the source
end to the target end.
Readable and writable secondary file systems are commonly used in disaster
recovery scenarios.
Primary/Secondary switchover
Primary/secondary switchover exchanges the roles of the primary and secondary file
systems. These roles determine the direction in which the data is copied. Data is always
copied from the primary file system to the secondary file system.
Primary/secondary switchover is commonly used for failback during disaster recovery.
Quick response to host I/Os
All I/Os generated during file system asynchronous remote replication are processed in
the background. A write success acknowledgement is returned immediately after host
data is written to the cache. Incremental information is recorded and snapshots are
created only when data is flushed from cache to disks. Therefore, host I/Os can be
responded to quickly.
from each other, such as in the same equipment room or in the same city. HyperMetro
supports both Fibre Channel and IP networking (10GE). It allows two LUNs from separate
storage arrays to maintain real-time data consistency and to be accessible to hosts. If one
storage array fails, hosts automatically choose the path to the other storage array for service
access. If the links between storage arrays fail and only one storage array can be accessed by
hosts, the arbitration mechanism uses a quorum server deployed at a third location to
determine which storage array continues providing services.
If links between two sites fail, HyperMetro can enable some services to run
preferentially in data center A and others in data center B based on service configurations.
Compared with traditional arbitration where only one data center provides services,
HyperMetro improves resource usage of hosts and storage systems and balances service
loads. Service granularity-based arbitration is implemented based on LUNs or
consistency groups. Generally, a service belongs to only one LUN or consistency group.
Automatic link quality adaptation
If multiple links exist between two data centers, HyperMetro automatically balances
loads among links based on the quality of each link. The system dynamically monitors
link quality and adjusts the load ratio of the links to reduce the retransmission ratio and
improve network performance.
Compatibility with other features
HyperMetro can work with existing features such as HyperSnap, SmartThin,
SmartDedupe, and SmartCompression.
Active and standby quorum servers
The quorum servers can be either physical or virtual machines. HyperMetro can have
two quorum servers working in active/standby mode to eliminate single point of failure
and guarantee service continuity.
Expansion to 3DC
HyperMetro can work with HyperReplication/A to form a geo-redundant architecture.
Technical Highlights
Gateway-free solution
With the gateway-free design, host I/O requests do not need to be forwarded by storage
gateway, avoiding corresponding I/O forwarding latency and gateway failures and
improving reliability. In addition, the design simplifies the cross-site high availability
(HA) network, making maintenance easier.
Simple networking
The data replication, configuration synchronization, and heartbeat detection links share
the same network, simplifying the networking. Either IP or Fibre Channel links can be
used between storage systems, making it possible for HyperMetro to work on all-IP
networks, improving cost-effectiveness.
vStore-based HyperMetro
Traditional cross-site HA solutions typically deploy cluster nodes at two sites to
implement cross-site HA. These solutions, however, have limited flexibility in resource
configuration and distribution. HyperMetro can establish pair relationships between two
vStores at different sites, implementing real-time mirroring of data and configurations.
Each vStore pair has an independent arbitration result, providing true cross-site HA
capabilities at the vStore level. HyperMetro also enables applications to run more
efficiently at two sites, ensuring better load balancing. A vStore pair includes a primary
vStore and a secondary vStore. If either of the storage systems in the HyperMetro
solution fail or if the links connecting them go down, HyperMetro implements
arbitration on a per vStore pair basis. Paired vStores are mutually redundant, maintaining
service continuity in the event of a storage system failure.
Automatic recovery
If site A breaks down, site B becomes the primary site. Once site A recovers, HyperMetro
automatically initiates resynchronization. When resynchronization is complete, the
HyperMetro pair returns to its normal state. If site B then breaks down, site A becomes
the primary site again to maintain host services.
Easy upgrade
To use the HyperMetro feature, upgrade your storage system software to the latest
version and purchase the required feature license. You can establish a HyperMetro
solution between the upgraded storage system and another storage system, without the
need for extra data migration. Users are free to include HyperMetro in initial
configurations or add it later as required.
FastWrite
In a common SCSI write process, a write request goes back and forth twice between two
data centers to complete two interactions, Write Alloc and Write Data. FastWrite
optimizes the storage transmission protocol and reserves cache space on the destination
array for receiving write requests, while Write Alloc is omitted and only one interaction
is required. FastWrite halves the time required for data synchronization between two
arrays, improving the overall performance of the HyperMetro solution.
Self-adaptation to link quality
If there are multiple links between two data centers, HyperMetro automatically
implements load balancing among these links based on quality. The system dynamically
monitors link quality and adjusts the load ratio between links to minimize the
retransmission rate and improve network performance.
Compatibility with other features
HyperMetro can be used with SmartThin, SmartQoS, and SmartCache. HyperMetro can
also work with HyperVault, HyperSnap, and HyperReplication to form a more complex
and advanced data protection solution, such as the Disaster Recovery Data Center
Solution (Geo-Redundant Mode), which uses HyperMetro and HyperReplication.
Dual quorum servers
HyperMetro supports dual quorum servers. If one quorum server fails, its services are
seamlessly switched to the other, preventing a single point of failure (SPOF) and
improving the reliability of the HyperMetro solution.
Technical Highlights:
Two HyperMetro or synchronous remote replication sites can be flexibly expanded to
3DC without requiring external gateways.
In the star topology, only incremental synchronization is required in the event of any site
failure.
The star topology supports centralized configuration and management at a single site.
Technical Highlights
High cost efficiency
HyperVault can be seamlessly integrated into the primary storage system and provide
data backup without additional backup software. Huawei-developed storage management
software, OceanStor DeviceManager, allows you to configure flexible backup policies
and efficiently perform data backup.
Fast data backup
HyperVault works with HyperSnap to achieve second-level local data backup. For
remote backup, the system performs full backup the first time, and then only backs up
incremental data blocks. This allows HyperVault to provide faster data backup than
software that backs up data every time.
Fast data recovery
HyperVault uses snapshot rollback technology to implement local data recovery, without
requiring additional data resolution. This allows it to achieve second-level data recovery.
Remote recovery, which is incremental data recovery, can be used when local recovery
cannot meet requirements. Each copy of backup data is a logically full backup of service
data. The backup data is saved in its original format and can be accessed immediately.
Simple management
Only one primary storage system, one backup storage system, and native management
software, OceanStor DeviceManager, are required. This mode is simpler and easier to
manage than old network designs, which contain primary storage, backup software, and
backup media.
Working Principle
With WORM, data can be written to files once only, and cannot be rewritten, modified,
deleted, or renamed. If a common file system is protected by WORM, files in the WORM file
system can be read only within the protection period. After WORM file systems are created,
they must be mapped to application servers using the NFS or CIFS protocol.
WORM enables files in a WORM file system to be shifted between initial state, locked state,
appending state, and expired state, preventing important data from being tampered with
within a specified period. Figure 5-22 shows how a file shifts from one state to another.
1. Initial to locked: A file can be shifted from the initial state to the locked state using the
following methods:
− If the automatic lock mode is enabled, the file automatically enters the locked state
after a change is made and a specific period of time expires.
− You can manually set the file to the locked state. Before locking the file, you can
specify a protection period for the file or use the default protection period.
2. Locked to locked: In the locked state, you can manually extend the protection periods of
files. Protection periods cannot be shortened.
3. Locked to expired: After the WORM file system compliance clock reaches the file
overdue time, the file shifts from the locked state to the expired state.
4. Expired to locked: You can extend the protection period of a file to shift it from the
expired state to the locked state.
5. Locked to appending: You can delete the read-only permission of a file to shift it from
the locked state to the appending state.
6. Appending to locked: You can manually set a file in the appending state to the locked
state to ensure that it cannot be modified.
7. Expired to appending: You can manually set a file in the expired state to the appending
state.
You can save files to WORM file systems and set the WORM properties of the files to the
locked state based on service requirements. Figure 5-23 shows the reads and writes of files in
all states in a WORM file system.
Technical Highlights:
Data is replicated to the cloud in asynchronous mode. CloudReplication inherits all
functions of HyperReplication/A.
DESS supports interconnection with OceanStor converged storage systems.
No on-premises DR center or O&M team is required. Cloud DR resources can be
purchased or expanded on demand.
Application Scenarios:
If you only have a production center, you can set up a remote DR center on HUAWEI
CLOUD at a low cost, implementing remote protection for production data.
If you have a production center and a DR center, you can upgrade the protection level to
3DC with a remote DR center on HUAWEI CLOUD.
CloudBackup supports:
LUN backup
Consistency group backup
Data restoration to the source LUN or other existing LUNs
Data restoration to the source LUN consistency group or other existing LUN consistency
groups
Backup data compression, which reduces the required backup bandwidth and backup
storage space
Resumable data transfer. If a network fault occurs during backup to the cloud, data
transfer can be resumed once the network recovers.
Offline backup based on the Data Express Service (DES) of HUAWEI CLOUD. Data is
first backed up to the Teleport device of DES. Then the Teleport device is transported to
the nearest data center of HUAWEI CLOUD, where the data is imported to the specified
OBS S3 buckets. This improves data transfer efficiency for the initial backup, and only
incremental backups are required in subsequent operations.
Backup data flow and principles:
1. The system creates a read-only snapshot for the LUN or consistency group to be backed
up.
2. CloudBackup reads data from the read-only snapshot and transfers it to the specified
local NAS share or object storage on the public cloud. If the backup source is a
consistency group, CloudBackup reads data from each read-only snapshot in the
consistency group.
When CloudBackup is reading data, it compares the data with that of the read-only
snapshot created in the last backup, and only transfers the differential data to the backup
storage.
Restoration data flow and principles:
1. Select the desired backup image from the local NAS share or public cloud. (The data set
generated when a LUN or consistency group is backed up is a backup image. A LUN or
consistency group has multiple backup images at different time points.)
2. Select the LUN or consistency group to be restored.
3. Restore the data. During restoration, CloudBackup reads data from the specified backup
image on the local NAS share or public cloud and writes the data to the LUN or
consistency group.
Technical Highlights:
Data can be backed up without purchasing external backup servers.
Backup to the cloud is achieved. With BCManager and CSBS, data can be quickly
recovered, and customers can perform tests and analysis on source LUNs' data on the
cloud.
Data can be backed up to local NAS and object storage.
SED
SEDs provide two-layer security protection by using an authentication key (AK) and a data
encryption key (DEK).
An AK authenticates the identity in disk initialization.
A DEK encrypts and decrypts data in the event of writing data to or reading data from
SEDs.
AK mechanism: After data encryption has been enabled, the storage system activates the
AutoLock function of SEDs and uses AKs assigned by a key manager. SED access is
protected by AutoLock and only the storage system itself can access its SEDs. When the
storage system accesses an SED, it acquires an AK from the key manager. If the AK is
consistent with the SED's, the SED decrypts the DEK for data encryption/decryption. If the
AKs do not match, all read and write operations will fail.
DEK mechanism: After the AutoLock authentication is successful, the SED uses its hardware
circuits and internal DEK to encrypt or decrypt the data that is written or read. When you
write data, the data is encrypted by the DEK of the AES encryption engine into ciphertext,
and then written to the system. When you read data, the system decrypts the requested data
into plaintext using the DEK. The DEK cannot be acquired separately, which means that the
original information on an SED cannot be read directly after it is removed from the storage
system.
User-defined roles: The system allows you to define permissions as required. You can
specify the role when creating a user account.
8.1.1 DeviceManager
DeviceManager is a common GUI management system for Huawei OceanStor systems and
accessed through a web page. The GUI uses HTTP to communicate with Dorado V3. Most
system operations can be executed on DeviceManager, but certain operations must be run in
the CLI.
8.1.2 CLI
The CLI allows administrators and other system users to perform supported operations. You
can define key-based SSH user access permission, allowing users to compile scripts on a
remote host. You are not required to save the passwords in the scripts and log in to the CLI
remotely.
Huawei technical support. Customers must ensure that devices can be connected to Huawei
technical support over a network. HTTP proxy is supported.
The following information is collected:
Device performance statistics
Device running data
Device alarm data
All data is sent to Huawei technical support in text mode over HTTPS. Records of sent
information can be sent to the Syslog server for security audit. If data cannot be uploaded due
to network interruption, devices can save the last day's data files (up to 5 MB per controller)
and send them when the network recovers. The files that are not uploaded can be exported for
troubleshooting by using the command line.
The information sent to Huawei technical support can be used to provide the following
functions.
Alarm monitoring: Device alarms are monitored 24/7. If a fault occurs on a device,
Huawei technical support is notified within 1 minute and a troubleshooting ticket is
dispatched to engineers. This helps customers locate and resolve problems quickly.
In conjunction with big data analysis technologies and device fault libraries across the
world, fault prevention and fast fault troubleshooting are supported.
Based on industry application workload models, optimal device configurations and
performance optimization suggestions are provided.
8.1.5 SNMP
SNMP interfaces can be used to report alarms and connect to northbound management
interfaces.
8.1.6 SMI-S
SMI-S interfaces support hardware and service configuration and connect to northbound
management interfaces.
8.1.7 Tools
OceanStor Dorado V3 provides diversified tools for pre-sales assessment and post-sales
delivery. These tools can be accessed through WEB, SmartKit, DeviceManager,
SystemReporter, and eService and effectively help deploy, monitor, analyze, and maintain
OceanStor Dorado V3.
9 Best Practices
10 Appendix
10.2 Feedback
Huawei welcomes your suggestions for improving our documentation. If you have comments,
send your feedback to [email protected].
Your suggestions will be seriously considered and we will make necessary changes to the
document in the next release.