0% found this document useful (0 votes)
4 views

Chapter 2

The document provides an overview of basic data storage technologies, focusing on intelligent storage components, RAID technologies, storage system architecture, and storage network architecture. It details the design and function of various components such as controller enclosures, disk enclosures, and hard disk drives (HDDs), along with their performance factors and data organization. Additionally, it discusses common storage protocols and the evolution of storage systems, emphasizing the importance of modular design and redundancy in ensuring data integrity and accessibility.

Uploaded by

johnmar cuevas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Chapter 2

The document provides an overview of basic data storage technologies, focusing on intelligent storage components, RAID technologies, storage system architecture, and storage network architecture. It details the design and function of various components such as controller enclosures, disk enclosures, and hard disk drives (HDDs), along with their performance factors and data organization. Additionally, it discusses common storage protocols and the evolution of storage systems, emphasizing the importance of modular design and redundancy in ensuring data integrity and accessibility.

Uploaded by

johnmar cuevas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Data Storage Technology

Basic Storage Technologies

HUAWEI TECHNOLOGIES CO., LTD.


Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any
means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of
their respective holders.

Notice
The purchased products, services and features are stipulated by the contract made
between Huawei and the customer. All or part of the products, services and features
described in this document may not be within the purchase scope or the usage scope.
Unless otherwise specified in the contract, all statements, information, and
recommendations in this document are provided "AS IS" without warranties,
guarantees or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has
been made in the preparation of this document to ensure accuracy of the contents, but
all statements, information, and recommendations in this document do not constitute
a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.


Address: Huawei Industrial Base Bantian, Longgang Shenzhen 518129
People's Republic of China
Website: http://e.huawei.com

Huawei Proprietary and Confidential


Copyright © Huawei Technologies Co., Ltd.
Data Storage Technology Page 2

Contents

1 Basic Storage Technologies ............................................................................................................. 3


1.1 Intelligent Storage Components ......................................................................................................................................... 3
1.1.1 Controller Enclosure ............................................................................................................................................................. 3
1.1.2 Disk Enclosure ........................................................................................................................................................................ 4
1.1.3 Expansion Module ................................................................................................................................................................ 5
1.1.4 HDD ........................................................................................................................................................................................... 5
1.1.5 SSD ...........................................................................................................................................................................................10
1.1.6 Interface Module .................................................................................................................................................................13
1.1.7 Quiz ..........................................................................................................................................................................................14
1.2 RAID Technologies .................................................................................................................................................................15
1.2.1 Traditional RAID ..................................................................................................................................................................15
1.2.2 RAID 2.0+ ...............................................................................................................................................................................24
1.2.3 Other RAID Technologies .................................................................................................................................................27
1.2.4 Quiz ..........................................................................................................................................................................................28
1.3 Storage System Architecture ..............................................................................................................................................29
1.3.1 Storage System Architecture Evolution ......................................................................................................................29
1.3.2 Storage System Expansion Methods ............................................................................................................................31
1.3.3 Huawei Storage Product Architecture .........................................................................................................................33
1.3.4 Quiz ..........................................................................................................................................................................................37
1.4 Storage Network Architecture ...........................................................................................................................................38
1.4.1 DAS...........................................................................................................................................................................................38
1.4.2 NAS ..........................................................................................................................................................................................38
1.4.3 SAN ..........................................................................................................................................................................................42
1.4.4 Distributed Storage ............................................................................................................................................................50
1.4.5 Quiz ..........................................................................................................................................................................................51
1.5 Common Storage Protocols ................................................................................................................................................52
1.5.1 SAS and SATA.......................................................................................................................................................................52
1.5.2 PCIe and NVMe ...................................................................................................................................................................54
1.5.3 RDMA and IB ........................................................................................................................................................................57
1.5.4 Quiz ..........................................................................................................................................................................................59
Data Storage Technology Page 3

1 Basic Storage Technologies

1.1 Intelligent Storage Components


1.1.1 Controller Enclosure
1.1.1.1 Controller Enclosure Design
A controller enclosure contains controllers and is the core component of a storage
system.
The controller enclosure uses a modular design with a system subrack, controllers (with
built-in fan modules), BBUs, power modules, management modules, and interface
modules.
 The system subrack integrates a backplane to provide signal and power connectivity
among modules.
 The controller is a core module for processing storage system services.
 BBUs protect storage system data by providing backup power during failures of the
external power supply.
 The AC power module supplies power to the controller enclosure, allowing the
enclosure to operate normally at maximum power.
 The management module provides management, maintenance, and serial ports.
 Interface modules provide service or management ports and are field replaceable
units. In computer science, data is a generic term for all media such as numbers,
letters, symbols, and analog parameters that can be input to and processed by
computer programs. Computers store and process a wide range of objects that
generate complex data.

1.1.1.2 Controller Enclosure Components


 A controller is the core component of a storage system. It processes storage services,
receives configuration management commands, saves configuration data, connects
to disks, and saves critical data to coffer disks.
− The controller CPU and cache process I/O requests from the host and manage
storage system RAID.
− Each controller has built-in disks to store system data. These disks also store
cache data during power failures. Disks on different controllers are redundant of
each other.
 Front-end (FE) ports provide service communication between application servers and
the storage system for processing host I/Os.
Data Storage Technology Page 4

 Back-end (BE) ports connect a controller enclosure to a disk enclosure and provide
disks with channels for reading and writing data.
 A cache is a memory chip on a disk controller. It provides fast data access and is a
buffer between the internal storage and external interfaces.
 An engine is a core component of a development program or system on an
electronic platform. It usually provides support for programs or a set of systems.
 Coffer disks store user data, system configurations, logs, and dirty data in the cache
to protect against unexpected power outages.
− Built-in coffer disk: Each controller of Huawei OceanStor Dorado V6 has one or
two built-in SSDs as coffer disks. See the product documentation for more
details.
− External coffer disk: The storage system automatically selects four disks as coffer
disks. Each coffer disk provides 2 GB space to form a RAID 1 group. The
remaining space can store service data. If a coffer disk is faulty, the system
automatically replaces the faulty coffer disk with a normal disk for redundancy.
 Power module: The controller enclosure employs an AC power module for its normal
operations.
− A 4 U controller enclosure has four power modules (PSU 0, PSU 1, PSU 2, and
PSU 3). PSU 0 and PSU 1 form a power plane to power controllers A and C and
provide mutual redundancy. PSU 2 and PSU 3 form the other power plane to
power controllers B and D and provide mutual redundancy. It is recommended
that you connect PSU 0 and PSU 2 to one PDU and PSU 1 and PSU 3 to another
PDU for maximum reliability.
− A 2 U controller enclosure has two power modules (PSU 0 and PSU 1) to power
controllers A and B. The two power modules form a power plane and provide
mutual redundancy. Connect PSU 0 and PSU 1 to different PDUs for maximum
reliability.

1.1.2 Disk Enclosure


1.1.2.1 Disk Enclosure Design
The disk enclosure uses a modular design with a system subrack, expansion modules,
power modules, and disks.
 The system subrack integrates a backplane to provide signal and power connectivity
among modules.
 The expansion module provides expansion ports to connect to a controller enclosure
or another disk enclosure for data transmission.
 The power module supplies power to the disk enclosure, allowing the enclosure to
operate normally at maximum power.
 Disks provide storage space for the storage system to save service data, system data,
and cache data. Specific disks are used as coffer disks.
Data Storage Technology Page 5

1.1.3 Expansion Module


1.1.3.1 Expansion Module
Each expansion module provides one P0 and one P1 expansion port to connect to a
controller enclosure or another disk enclosure for data transmission.

1.1.3.2 CE Switch
Huawei CloudEngine series fixed switches are next-generation Ethernet switches for data
centers and provide high performance, high port density, and low latency. The switches
use a flexible front-to-rear or rear-to-front design for airflow and support IP SANs and
distributed storage networks.

1.1.3.3 Fibre Channel Switch


Fibre Channel switches are high-speed network transmission relay devices that transmit
data over optical fibers. They accelerate transmission and protect against interference.
Fibre Channel switches are used on FC SANs.

1.1.3.4 Device Cables


A serial cable connects the serial port of the storage system to the maintenance terminal.
Mini SAS HD cables connect to expansion ports on controller and disk enclosures. There
are mini SAS HD electrical cables and mini SAS HD optical cables.
An active optical cable (AOC) connects a PCIe port on a controller enclosure to a data
switch.
100G QSFP28 cables are for direct connection between controllers or for connection to
smart disk enclosures.
25G SFP28 cables are for front-end networking.
Fourteen Data Rate (FDR) cables are dedicated for 56 Gbit/s IB interface modules.
Optical fibers connect the storage system to Fibre Channel switches. One end of the
optical fiber connects to a Fibre Channel host bus adapter (HBA), and the other end
connects to the Fibre Channel switch or the storage system. An optical fiber uses LC
connectors at both ends. MPO-4*DLC optical fibers are dedicated for 8 Gbit/s Fibre
Channel interface modules with 8 ports and 16 Gbit/s Fibre Channel interface modules
with 8 ports, and are used to connect the storage system to Fibre Channel switches.

1.1.4 HDD
1.1.4.1 HDD Structure
 A platter is coated with magnetic materials on both surfaces with polarized magnetic
grains to represent a binary information unit, or bit.
 A read/write head reads and writes data for platters. It changes the polarities of
magnetic grains on the platter surface to save data.
 The actuator arm moves the read/write head to the specified position.
 The spindle has a motor and bearing underneath. It rotates the specified position on
the platter to the read/write head.
Data Storage Technology Page 6

 The control circuit controls the speed of the platter and movement of the actuator
arm, and delivers commands to the head.

1.1.4.2 HDD Design


Each disk platter has two read/write heads to read and write data on the two surfaces of
the platter.
Airflow prevents the head from touching the platter, so the head can move between
tracks at a high speed. A long distance between the head and the platter results in weak
signals, and a short distance may cause the head to rub against the platter surface. The
platter surface must therefore be smooth and flat. Any foreign matter or dust will
shorten the distance and cause the head to rub against the magnetic surface. This will
result in permanent data corruption.
Working principles:
 The read/write head starts in the landing zone near the platter spindle.
 The spindle connects to all of the platters and a motor. The spindle motor rotates at
a constant speed to drive the platters.
 When the spindle rotates, there is a small gap between the head and the platter.
This is called the flying height of the head.
 The head is attached to the end of the actuator arm, which drives the head to the
specified position above the platter where data needs to be written or read.
 The head reads and writes data in binary format on the platter surface. The read
data is stored in the flash chip of the disk and then transmitted to the program.

1.1.4.3 Data Organization on a Disk


 Platter surface: Each platter of a disk has two valid surfaces to store data. All valid
surfaces are numbered in sequence, starting from 0 for the top. A surface number in
a disk system is also referred to as a head number because each valid surface has a
read/write head.
 Track: Tracks are the concentric circles for data recording around the spindle on a
platter. Tracks are numbered from the outermost circle to the innermost one,
starting from 0. Each platter surface has 300 to 1024 tracks. New types of large-
capacity disks have even more tracks on each surface. The tracks per inch (TPI) on a
platter are generally used to measure the track density. Tracks are only magnetized
areas on the platter surfaces and are invisible to human eyes.
 Cylinder: A cylinder is formed by tracks with the same number on all platter surfaces
of a disk. The heads of each cylinder are numbered from top to bottom, starting
from 0. Data is read and written based on cylinders. Head 0 in a cylinder reads and
writes data first, and then the other heads in the same cylinder read and write data
in sequence. After all heads in a cylinder have completed reads and writes, the heads
move to the next cylinder. Selection of cylinders is a mechanical switching process
called seek. The position of heads in a disk is generally indicated by the cylinder
number instead of the track number.
 Sector: Each track is divided into smaller units called sectors to arrange data orderly.
A sector is the smallest storage unit that can be independently addressed in a disk.
Data Storage Technology Page 7

Tracks may vary in the number of sectors. A sector can generally store 512 bytes of
user data, but some disks can be formatted into even larger sectors of 4 KB.

1.1.4.4 Disk Capacity


Disks may have one or multiple platters. However, a disk allows only one head to read
and write data at a time. As a result, increasing the number of platters and heads can
only improve the disk capacity. The throughput or I/O performance of the disk will not
change.
Disk capacity = Number of cylinders x Number of heads x Number of sectors x Sector
size. The unit is MB or GB. The disk capacity is determined by the capacity of a single
platter and the number of platters.
The superior processing speed of a CPU over a disk forces the CPU to wait until the disk
completes a read/write operation before issuing a new command. Adding a cache to the
disk to improve the read/write speed solves this problem.

1.1.4.5 Disk Performance Factors


 Rotation speed: This is the number of platter revolutions per minute (rpm). When
data is being read or written, the platter rotates while the head stays still. Fast
platter rotations shorten data transmission time. When processing sequential I/Os,
the actuator arm avoids frequent seeking, so the rotation speed is the primary factor
in determining throughput and IOPS.
 Seek speed: The actuator arm must change tracks frequently for random I/Os. Track
changes take much longer time than data transmission. An actuator arm with a
faster seek speed can therefore improve the IOPS of random I/Os.
 Single platter capacity: A larger capacity for a single platter increases data storage
within a unit of space for a higher data density. A higher data density with the same
rotation and seek speed gives disks better performance.
 Port speed: In theory, the current port speed is enough to support the maximum
external transmission bandwidth of disks. The seek speed is the bottleneck for
random I/Os with port speed having little impact on performance.

1.1.4.6 Average Access Time


 The average seek time is the average time required for a head to move from its
initial position to a specified platter track. This is an important metric for the internal
transfer rate of a disk and should be as short as possible.
 The average latency time is how long a head must wait for a sector to move to the
specified position after the head has reached the desired track. The average latency
is generally half of the time required for the platter to rotate a full circle. Faster
rotations therefore decrease latency.

1.1.4.7 Data Transfer Rate


The data transfer rate of a disk refers to how fast the disk can read and write data. It
includes the internal and external data transfer rates and is measured in MB/s.
 Internal transfer rate is also called sustained transfer rate. It is the highest rate at
which a head reads and writes data. This excludes the seek time and the delay for
the sector to move to the head. It is a measurement based on an ideal situation
Data Storage Technology Page 8

where the head does not need to change the track or read a specified sector, but
reads and writes all sectors sequentially and cyclically on one track.
 External transfer rate is also called burst data transfer rate or interface transfer rate.
It refers to the data transfer rate between the system bus and the disk buffer and
depends on the disk port type and buffer size.

1.1.4.8 Disk IOPS and Transmission Bandwidth


IOPS is calculated using the seek time, rotation latency, and data transmission time.
 Seek time: The shorter the seek time, the faster the I/O. The current average seek
time is 3 to 15 ms.
 Rotation latency: It refers to the time required for the platter to rotate the sector of
the target data to the position below the head. The rotation latency depends on the
rotation speed. Generally, the latency is half of the time required for the platter to
rotate a full circle. For example, the average rotation latency of a 7200 rpm disk is
about 4.17 ms (60 x 1000/7200/2), and the average rotation latency of a 15000 rpm
disk is about 2 ms.
 Data transmission time: It is the time required for transmitting the requested data
and can be calculated by dividing the data size by the data transfer rate. For
example, the data transfer rate of IDE/ATA disks can reach 133 MB/s and that of
SATA II disks can reach 300 MB/s.
 Random I/Os require the head to change tracks frequently. The data transmission
time is much shorter than the time for track changes. In this case, data transmission
time can be ignored.
Theoretically, the maximum IOPS of a disk can be calculated using the following formula:
IOPS = 1000 ms/(Seek time + Rotation latency). The data transmission time is ignored.
For example, if the average seek time is 3 ms, the theoretical maximum IOPS for 7200
rpm, 10k rpm, and 15k rpm disks is 140, 167, and 200, respectively.

1.1.4.9 Transmission Mode


Parallel transmission:
 Parallel transmission features high efficiency, short distances, and low frequency.
 In long-distance transmission, using multiple lines is more expensive than using a
single line.
 Long-distance transmission requires thicker conducting wires to reduce signal
attenuation, but it is difficult to bundle them into a single cable.
 In long-distance transmission, the time for data on each line to reach the peer end
varies due to wire resistance or other factors. The next transmission can be initiated
only after data on all lines has reached the peer end.
 High transmission frequency causes serious circuit oscillation and generates
interference between the lines. The frequency of parallel transmission must therefore
be carefully set.
Serial transmission:
Data Storage Technology Page 9

 Serial transmission is less efficient than parallel transmission, but is generally faster
with potential increases in transmission speed from increasing the transmission
frequency.
 Serial transmission is used for long-distance transmission. Currently, PCI interfaces
use serial transmission. The PCIe interface is a typical example of serial transmission.
The transmission rate of a single line is up to 2.5 Gbit/s.

1.1.4.10 Disk Ports


Disks are classified into IDE, SCSI, SATA, SAS, and Fibre Channel disks by port. These disks
also differ in their mechanical bases.
IDE and SATA disks use the ATA mechanical base and are suitable for single-task
processing.
SCSI, SAS, and Fibre Channel disks use the SCSI mechanical base and are suitable for
multi-task processing.
Comparison:
 SCSI disks provide faster processing than ATA disks under high data throughput.
 ATA disks overheat during multi-task processing due to the frequent movement of
the read/write head.
 SCSI disks provide higher reliability than ATA disks.
IDE disk port:
 Multiple ATA versions have been released, including ATA-1 (IDE), ATA-2 (Enhanced
IDE/Fast ATA), ATA-3 (Fast ATA-2), ATA-4 (ATA33), ATA-5 (ATA66), ATA-6
(ATA100), and ATA-7 (ATA133).
 ATA ports have several advantages and disadvantages:
− Their strengths are their low price and good compatibility.
− Their disadvantages are their low speed, limited applications, and strict
restrictions on cable length.
− The transmission rate of the PATA port is also inadequate for current user needs.
SATA port:
 During data transmission, the data and signal lines are separated and use
independent transmission clock frequency. The transmission rate of SATA is 30 times
that of PATA.
 Advantages:
− A SATA port generally has 7+15 pins, uses a single channel, and transmits data
faster than ATA.
− SATA uses the cyclic redundancy check (CRC) for instructions and data packets
to ensure data transmission reliability.
− SATA surpasses ATA in interference protection.
SCSI port:
 SCSI disks were developed to replace IDE disks to provide higher rotation speed and
transmission rate. SCSI was originally a bus-type interface and worked independently
of the system bus.
Data Storage Technology Page 10

 Advantages:
− It is applicable to a wide range of devices. One SCSI controller card can connect
to 15 devices simultaneously.
− It provides high performance with multi-task processing, low CPU usage, fast
rotation speed, and a high transmission rate.
− SCSI disks support diverse applications as external or built-in components with
hot-swappable replacement.
 Disadvantages:
− High cost and complex installation and configuration.
SAS port:
 SAS is similar to SATA in its use of a serial architecture for a high transmission rate
and streamlined internal space with shorter internal connections.
 SAS improves the efficiency, availability, and scalability of the storage system. It is
backward compatible with SATA for the physical and protocol layers.
 Advantages:
− SAS is superior to SCSI in its transmission rate, anti-interference, and longer
connection distances.
 Disadvantages:
− SAS disks are more expensive.
Fibre Channel port:
 Fiber Channel was originally designed for network transmission rather than disk
ports. It has gradually been applied to disk systems in pursuit of higher speed.
 Advantages:
− Easy to upgrade. Supports optical fiber cables with a length over 10 km.
− Large bandwidth
− Strong universality
 Disadvantages:
− High cost
− Complex to build

1.1.5 SSD
1.1.5.1 SSD Overview
Traditional disks use magnetic materials to store data, but SSDs use NAND flash with
cells as storage units. NAND flash is a non-volatile random access storage medium that
can retain stored data after the power is turned off. It quickly and compactly stores
digital information.
SSDs eliminate high-speed rotational components for higher performance, lower power
consumption, and zero noise.
SSDs do not have mechanical parts, but this does not mean that they have an infinite life
cycle. Because NAND flash is a non-volatile medium, original data must be erased before
Data Storage Technology Page 11

new data can be written. However, there is a limit to how many times each cell can be
erased. Once the limit is reached, data reads and writes become invalid on that cell.

1.1.5.2 SSD Architecture


The host interface is the protocol and physical interface for the host to access an SSD.
Common interfaces are SATA, SAS, and PCIe.
The SSD controller is the core SSD component for read and write access between a host
and the back-end media and for protocol conversion, table entry management, data
caching, and data checking.
DRAM is the cache for the flash translation layer (FTL) entries and data.
NAND flash is a non-volatile random access storage medium that stores data.
There are concurrent multiple channels with time-division multiplexing for flash granules
in a channel. There is also support for TCQ and NCQ for simultaneous responses to
multiple I/O requests.

1.1.5.3 NAND Flash


Internal storage units of NAND flash include LUNs, planes, blocks, pages, and cells.
NAND flash stores data using floating gate transistors. The threshold voltage changes
based on the number of electric charges stored in a floating gate. Data is then
represented using the read voltage of the transistor threshold.
 A LUN is the smallest physical unit that can be independently encapsulated and
typically contains multiple planes.
 A plane has an independent page register. It typically contains 1,000 or 2,000 odd or
even blocks.
 A block is the smallest erasure unit and generally consists of multiple pages.
 A page is the smallest programming and read unit and is usually 16 KB.
 A cell is the smallest erasable, programmable, and readable unit found in pages. A
cell corresponds to a floating gate transistor that stores one or multiple bits.
A page is the basic unit of programming and reading, and a block is the basic unit of
erasing.
Each P/E cycle causes some damage to the insulation layer of the floating gate transistor.
If block erasure or programming fails, the block is labeled a bad block. When the number
of bad blocks reaches a threshold (4%), the NAND flash reaches the end of its service
life.

1.1.5.4 SLC, MLC, TLC, and QLC


NAND flash chips have the following classifications based on the number of bits stored in
a cell:
 A single level cell (SLC) can store one bit of data: 0 or 1.
 A multi level cell (MLC) can store two bits of data: 00, 01, 10, and 11.
 A triple level cell (TLC) can store three bits of data: 000, 001, 010, 011, 100, 101, 110,
and 111.
 A quad level cell (QLC) can store four bits of data: 0000, 0001, 0010, 0011, 0100,
0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, and 1111.
Data Storage Technology Page 12

These four types of cells have similar costs but store different amounts of data.
Originally, the capacity of an SSD was only 64 GB or smaller. Now, a TLC SSD can store
up to 2 TB of data. However, each cell type has a different life cycle, resulting in different
SSD reliability. The life cycle is also an important factor in selecting SSDs.

1.1.5.5 Flash Chip Data Relationship


This slide shows the logic diagram of a flash chip (Toshiba 3D-TLC).
 A page is logically formed by 146,688 cells. Each page can store 16 KB of content
and 1952 bytes of ECC data. A page is the minimum I/O unit of the flash chip.
 Every 768 pages form a block. Every 1478 blocks form a plane.
 A flash chip consists of two planes, with one storing blocks with odd sequence
numbers and the other storing even sequence numbers. The two planes can be
operated concurrently.
ECC must be performed on the data stored in the NAND flash, so the size of the page in
the NAND flash is not an integer of 16 KB, but with an extra group of bytes. For example,
if the actual size of a 16 KB page is 16,384 + 1,952 bytes, then the 16,384 bytes are for
data storage, and the 1,952 bytes are for storing data check codes for ECC.

1.1.5.6 Address Mapping Management


The logical block address (LBA) may refer to an address of a data block or the data block
that the address indicates.
PBA: physical block address
The host accesses the SSD through the LBA. Each LBA generally represents a sector of
512 bytes. The host OS accesses the SSD in units of 4 KB. The basic unit for the host to
access the SSD is called host page.
The flash page of an SSD is the basic unit for the SSD controller to access the flash chip,
which is also called the physical page. Each time the host writes a host page, the SSD
controller writes it to a physical page and records their mapping relationship.
When the host reads a host page, the SSD finds the requested data according to the
mapping relationship.

1.1.5.7 SSD Read and Write Process


SSD write process:
 The SSD controller connects to eight flash dies through eight channels. For better
explanation, the figure shows only one block in each die. Each 4 KB square in the
blocks represents a page.
− The host writes 4 kilobytes to the block of channel 0 to occupy one page.
− The host continues to write 16 kilobytes. This example shows 4 kilobytes being
written to each block of channels 1 through 4.
− The host continues to write data to the blocks until all blocks are full.
 When the blocks on all channels are full, the SSD controller selects a new block to
write data in the same way.
 Green indicates valid data and red indicates invalid data. Unnecessary data in the
blocks becomes aged or invalid, and its mapping relationship is replaced.
Data Storage Technology Page 13

 For example, host page A was originally stored in flash page X, and the mapping
relationship was A to X. Later, the host rewrites the host page. Flash memory does
not overwrite data, so the SSD writes the new data to a new page Y, establishes the
new mapping relationship of A to Y, and cancels the original mapping relationship.
The data in page X becomes aged and invalid, which is also known as garbage data.
 The host continues to write data to the SSD until it is full. In this case, the host
cannot write more data unless the garbage data is cleared.
SSD read process:
 An 8-fold increase in read speed depends on whether the read data is evenly
distributed in the blocks of each channel. If the 32 KB data is stored in the blocks of
channels 1 through 4, the read speed can only support a 4-fold improvement at
most. That is why smaller files are transmitted at a slower rate.

1.1.5.8 SSD Performance Advantages


Short response time: Traditional HDDs have lower efficiency in data transmission because
they waste time with seeking and mechanical latency. SSDs use NAND flash to eliminate
seeking and mechanical latency for far faster responses to read and write requests.
High read/write efficiency: HDDs perform random read/write operations by moving the
head back and forth, resulting in low read/write efficiency. In contrast, SSDs calculate
data storage locations with an internal controller to reduce the mechanical operations
and streamline read/write processing.
In addition, deploying a large number of SSDs grants enormous advantages in power
efficiency.

1.1.6 Interface Module


1.1.6.1 GE Interface Modules
A GE electrical interface module has four electrical ports with 1 Gbit/s and is used for
HyperMetro quorum networking.
A 40GE interface module provides two optical ports with 40 Gbit/s for connecting storage
devices to application servers.
A 100GE interface module provides two optical ports with 100 Gbit/s for connecting
storage devices to application servers.

1.1.6.2 SAS Expansion Module and RDMA Interface Module


A 25 Gbit/s RDMA interface module provides four optical ports with 25 Gbit/s for direct
connections between two controller enclosures.
A 100 Gbit/s RDMA interface module provides two optical ports with 100 Gbit/s for
connecting controller enclosures to scale-out switches or smart disk enclosures. SO stands
for scale-out and BE stands for back-end in the labels.
A 12 Gbit/s SAS expansion module provides four mini SAS HD expansion ports with 4 x
12 Gbit/s to connect controller enclosures to 2 U SAS disk enclosures.

1.1.6.3 SmartIO Interface Modules


SmartIO interface modules support 8, 10, 16, 25, and 32 Gbit/s optical modules, which
respectively provide 8 Gbit/s Fibre Channel, 10GE, 16 Gbit/s Fibre Channel, 25GE, and 32
Data Storage Technology Page 14

Gbit/s Fibre Channel ports. SmartIO interface modules connect storage devices to
application servers.
The optical module rate must match the rate on the interface module label. Otherwise,
the storage system will report an alarm and the port will become unavailable.

1.1.6.4 PCIe and 56 Gbit/s IB Interface Modules


A PCIe interface module provides two PCIe ports for connecting controller enclosures to
PCIe switches and exchanging control and data flows between the controller enclosures.
Indicators:
 Interface module power indicator
 Link/Speed indicator of a PCIe port
 PCIe port
 Handle
The 56 Gbit/s IB interface module provides two IB ports with a transmission rate of 4 x 14
Gbit/s.
Indicators:
 Power indicator/Hot swap button
 Link indicator of a 56 Gbit/s IB port
 Active indicator of a 56 Gbit/s IB port
 56 Gbit/s IB port
 Handle

1.1.6.5 Fibre Channel and FCoE Interface Modules


A 16 Gbit/s Fibre Channel interface module has two physical ports, which are converted
into eight 16 Gbit/s Fibre Channel ports by dedicated cables. Each port provides a
transmission rate of 16 Gbit/s. They serve as the service ports between the storage
system and application server to receive data exchange commands from the application
server.
A 10 Gbit/s FCoE interface module provides two FCoE ports with 10 Gbit/s, which connect
the storage system to the application server for data transmission.
The 10 Gbit/s FCoE interface module supports only direct connections.

1.1.7 Quiz
1. (Multiple-answer question) Which of the following are SSD types?
A. SLC
B. MLC
C. TLC
D. QLC
2. (Multiple-answer question) Which of the following affect HDD performance?
A. Disk capacity
B. Rotation speed
Data Storage Technology Page 15

C. Data transfer rate


D. Average access time
3. (True or false) SSDs of different types have different anti-wear capabilities that
deliver different levels of reliability. Therefore, the anti-wear capability is an
important evaluation item in selecting SSDs.
4. (True or false) In OceanStor V3 storage systems, controller enclosure indicators and
disk enclosure indicators show the running status of controller enclosures and disk
enclosures. Checking these indicators allows you to promptly know the status of each
component.
5. (True or false) Before replacing an interface module of an OceanStor V3 storage
system, you must power off the module.

1.2 RAID Technologies


1.2.1 Traditional RAID
1.2.1.1 Basic Concept of RAID
Redundant Array of Independent Disks (RAID) combines multiple physical disks into one
logical disk in different ways, for the purposes of read/write performance and data
security improvement.
Functionality of RAID:
 Combines multiple physical disks into one logical disk array to provide larger storage
capacity.
 Divides data into blocks and concurrently writes/reads data to/from multiple disks to
improve disk access efficiency.
 Provides mirroring or parity for fault tolerance.
Hardware RAID and software RAID can be implemented in storage devices.
 Hardware RAID uses a dedicated RAID adapter, disk controller, or storage processor.
The RAID controller has a built-in processor, I/O processor, and memory to improve
resource utilization and data transmission speed. The RAID controller manages
routes and buffers, and controls data flows between the host and the RAID array.
Hardware RAID is usually used in servers.
 Software RAID has no built-in processor or I/O processor but relies on a host
processor. Therefore, a low-speed CPU cannot meet the requirements for RAID
implementation. Software RAID is typically used in enterprise-class storage devices.
Disk striping: Space in each disk is divided into multiple strips of a specific size. Data is
also divided into blocks based on strip size when data is being written.
 Strip: A strip consists of one or more consecutive sectors in a disk, and multiple strips
form a stripe.
 Stripe: A stripe consists of strips of the same location or ID on multiple disks in the
same array.
RAID generally provides two methods for data protection.
Data Storage Technology Page 16

 One is storing data copies on another redundant disk to improve data reliability and
read performance.
 The other is parity. Parity data is additional information calculated using user data.
For a RAID array that uses parity, an additional parity disk is required. The XOR
(symbol: ⊕) algorithm is used for parity.

1.2.1.2 RAID 0
RAID 0, also referred to as striping, provides the best storage performance among all
RAID levels. RAID 0 uses the striping technology to distribute data to all disks in a RAID
array.

Figure 1-1 Working principles of RAID 0


A RAID 0 array contains at least two member disks. A RAID 0 array divides data into
blocks of different sizes ranging from 512 bytes to megabytes (usually multiples of 512
bytes) and concurrently writes the data blocks to different disks. The preceding figure
shows a RAID 0 array consisting of two disks (drives). The first two data blocks are
written to stripe 0: the first data block is written to strip 0 in disk 1, and the second data
block is written to strip 0 in disk 2. Then, the next data block is written to the next strip
(strip 1) in disk 1, and so forth. In this mode, I/O loads are balanced among all disks in
the RAID array. As the data transfer speed on the bus is much higher than the data read
and write speed on disks, data reads and writes on disks can be considered as being
processed concurrently.
A RAID 0 array provides a large-capacity disk with high I/O processing performance.
Before the introduction of RAID 0, there was a technology similar to RAID 0, called Just a
Bundle Of Disks (JBOD). JBOD refers to a large virtual disk consisting of multiple disks.
Unlike RAID 0, JBOD does not concurrently write data blocks to different disks. JBOD uses
another disk only when the storage capacity in the first disk is used up. Therefore, JBOD
provides a total available capacity which is the sum of capacities in all disks but provides
the performance of individual disks.
Data Storage Technology Page 17

In contrast, RAID 0 searches the target data block and reads data in all disks upon
receiving a data read request. The preceding figure shows a data read process. A RAID 0
array provides a read/write performance that is directly proportional to disk quantity.
Best Practices for RAID 0
As shown in 错误!未找到引用源。, when a system sends a data I/O request to a logical
drive (a RAID 0 array) formed by three drives, the request is converted into three
operations specific to the three physical drives, respectively.
In this RAID 0 array, a data request is concurrently processed on all of the three drives.
Theoretically, drive read and write rates triple in that operations are performed
concurrently on three disks. In practice, drive read and write rates increase by less than
this due to a variety of factors, such as bus bandwidth. However, the parallel
transmission rate of large amounts of data improves remarkably over the serial
transmission rate.

1.2.1.3 RAID 1
RAID 1, also referred to as mirroring, maximizes data security. A RAID 1 array uses two
identical disks including one mirror disk. When data is written to a disk, a copy of the
same data is stored in the mirror disk. When the source (physical) disk fails, the mirror
disk takes over services from the source disk to maintain service continuity. The mirror
disk is used as a backup to provide high data reliability.
The amount of data stored in a RAID 1 array is only equal to the capacity of a single disk,
and data copies are retained in another disk. That is, each gigabyte data needs 2
gigabyte disk space. Therefore, a RAID 1 array consisting of two disks has a space
utilization of 50%.

Figure 1-2 Working principles of RAID 1


Unlike RAID 0 which utilizes striping to concurrently write different data to different
disks, RAID 1 writes same data to each disk so that data in all member disks is consistent.
Data Storage Technology Page 18

As shown in the preceding figure, data blocks D 0, D 1, and D 2 are to be written to disks.
D 0 and the copy of D 0 are written to the two disks (disk 1 and disk 2) at the same time.
Other data blocks are also written to the RAID 1 array in the same way by mirroring.
Generally, a RAID 1 array provides write performance of a single disk.
A RAID 1 array reads data from the data disk and the mirror disk at the same time to
improve read performance. If one disk fails, data can be read from the other disk.
A RAID 1 array provides read performance which is the sum of the read performance of
the two disks. When a RAID array degrades, its performance decreases by half.
Best Practices for RAID 1
As shown in 错误!未找到引用源。, a system sends a data I/O request to a logical drive (a
RAID 1 array) formed by two drives.
When data is written to drive 0, the same data is automatically copied to drive 1.
In a data read operation, data is read from drive 0 and drive 1 at the same time.

1.2.1.4 RAID 3
RAID 3 is similar to RAID 0 but uses dedicated parity stripes. In a RAID 3 array, a
dedicated disk (parity disk) is used to store the parity data of strips in other disks in the
same stripe. If incorrect data is detected or a disk fails, data in the faulty disk can be
recovered using the parity data. RAID 3 applies to data-intensive or single-user
environments where data blocks need to be continuously accessed for a long time. RAID
3 writes data to all member data disks. However, when new data is written to any disk,
RAID 3 recalculates and rewrites parity data. Therefore, when a large amount of data
from an application is written, the parity disk in a RAID 3 array needs to process heavy
workloads. Parity operations have certain impact on the read and write performance of a
RAID 3 array. In addition, the parity disk is subject to the highest failure rate in a RAID 3
array due to heavy workloads. A write penalty occurs when just a small amount of data is
written to multiple disks, which does not improve disk performance as compared with
data writes to a single disk.

Figure 1-3 Working principles of RAID 3


Data Storage Technology Page 19

RAID 3 uses a single disk for fault tolerance and performs parallel data transmission.
RAID 3 uses striping to divide data into blocks and writes XOR parity data to the last disk
(parity disk).
The write performance of RAID 3 depends on the amount of changed data, the number
of disks, and the time required to calculate and store parity data. If a RAID 3 array
consists of N member disks of the same rotational speed and write penalty is not
considered, its sequential I/O write performance is theoretically slightly inferior to N – 1
times that of a single disk when full-stripe write is performed. (Additional time is
required to calculate redundancy check.)
In a RAID 3 array, data is read by stripe. Data blocks in a stripe can be concurrently read
as drives in all disks are controlled.
RAID 3 performs parallel data reads and writes. The read performance of a RAID 3 array
depends on the amount of data to be read and the number of member disks.

1.2.1.5 RAID 5
RAID 5 is improved based on RAID 3 and consists of striping and parity. In a RAID 5 array,
data is written to disks by striping. In a RAID 5 array, the parity data of different strips is
distributed among member disks instead of a parity disk.
Similar to RAID 3, a write penalty occurs when just a small amount of data is written.

Figure 1-4 Working principles of RAID 5


The write performance of a RAID 5 array depends on the amount of data to be written
and the number of member disks. If a RAID 5 array consists of N member disks of the
same rotational speed and write penalty is not considered, its sequential I/O write
performance is theoretically slightly inferior to N – 1 times that of a single disk when full-
stripe write is performed. (Additional time is required to calculate redundancy check.)
In a RAID 3 or RAID 5 array, if a disk fails, the array changes from the online (normal)
state to the degraded state until the faulty disk is reconstructed. If a second disk also
fails, the data in the array will be lost.
Data Storage Technology Page 20

Best Practices for RAID 5


As shown in 错误!未找到引用源。, PA is the parity data of A0, A1, and A2; PB is the parity
data of B0, B1, and B2.
RAID 5 does not back up the data stored, but instead, stores data and its parity data on
different member drives in the array. Damaged data on any drive in a RAID 5 array can
be restored using remaining data and the related parity data.
RAID 5 can be considered as a compromise between RAID 0 and RAID 1.
RAID 5 is less secure than RAID 1, but it offers greater drive space utilization than RAID 1
as multiple data blocks share the same parity data, resulting in lower storage costs.
RAID 5 provides slightly lower data read and write rates than RAID 0. However, its write
performance is higher than the write performance when data is written to a single drive.

1.2.1.6 RAID 6
Data protection mechanisms of all RAID arrays previously discussed considered only
failures of individual disks (excluding RAID 0). The time required for reconstruction
increases along with the growth of disk capacities. It may take several days instead of
hours to reconstruct a RAID 5 array consisting of large-capacity disks. During the
reconstruction, the array is in the degraded state, and the failure of any additional disk
will cause the array to be faulty and data to be lost. This is why some organizations or
units need a dual-redundancy system. In other words, a RAID array should tolerate
failures of up to two disks while maintaining normal access to data. Such dual-
redundancy data protection can be implemented in the following ways:
 The first one is multi-mirroring. Multi-mirroring is a method of storing multiple
copies of a data block in redundant disks when the data block is stored in the
primary disk. This means heavy overheads.
 The second one is a RAID 6 array. A RAID 6 array protects data by tolerating failures
of up to two disks even at the same time.
The formal name of RAID 6 is distributed double-parity (DP) RAID. It is essentially an
improved RAID 5, and also consists of striping and distributed parity. RAID 6 supports
double parity, which means that:
 When user data is written, double parity calculation needs to be performed.
Therefore, RAID 6 provides the slowest data writes among all RAID levels.
 Additional parity data takes storage spaces in two disks. This is why RAID 6 is
considered as an N + 2 RAID.
Currently, RAID 6 is implemented in different ways. Different methods are used for
obtaining parity data.
RAID 6 P+Q
Data Storage Technology Page 21

Figure 1-5 Working principles of RAID 6 P+Q


 When a RAID 6 array uses P+Q parity, P and Q represent two independent parity
data. P and Q parity data is obtained using different algorithms. User data and parity
data are distributed in all disks in the same stripe.
 As shown in the figure, P 1 is obtained by performing an XOR operation on D 0, D 1,
and D 2 in stripe 0, P 2 is obtained by performing an XOR operation on D 3, D 4, and
D 5 in stripe 1, and P 3 is obtained by performing an XOR operation on D 6, D 7, and
D 8 in stripe 2.
 Q 1 is obtained by performing a GF transform and then an XOR operation on D 0, D
1, and D 2 in stripe 0, Q 2 is obtained by performing a GF transform and then an
XOR operation on D 3, D 4, and D 5 in stripe 1, and Q 3 is obtained by performing a
GF transform and then an XOR operation on D 6, D 7, and D 8 in stripe 2.
 If a strip on a disk fails, data on the failed disk can be recovered using the P parity
value. The XOR operation is performed between the P parity value and other data
disks. If two disks in the same stripe fail at the same time, different solutions apply
to different scenarios. If the Q parity data is not in any of the two faulty disks, the
data can be recovered to data disks, and then the parity data is recalculated. If the Q
parity data is in one of the two faulty disks, data in the two faulty disks must be
recovered by using both the formulas.
RAID 6 DP

Figure 1-6 Working principles of RAID 6 DP


 RAID 6 DP also has two independent parity data blocks. The first parity data is the
same as the first parity data of RAID 6 P+Q. The second parity data is the diagonal
parity data obtained through diagonal XOR operation. Horizontal parity data is
obtained by performing an XOR operation on user data in the same stripe. As shown
in the preceding figure, P 0 is obtained by performing an XOR operation on D 0, D 1,
D 2, and D 3 in stripe 0, and P 1 is obtained by performing an XOR operation on D4,
Data Storage Technology Page 22

D5, D6, and D 7 in stripe 1. Therefore, P 0 = D 0 ⊕ D 1 ⊕ D 2 ⊕ D 3, P 1 = D 4 ⊕


D 5 ⊕ D 6 ⊕ D 7, and so on.
 The second parity data block is obtained by performing an XOR operation on
diagonal data blocks in the array. The process of selecting data blocks is relatively
complex. DP 0 is obtained by performing an XOR operation on D 0 in disk 1 in stripe
0, D 5 in disk 2 in stripe 1, D 10 in disk 3 in stripe 2, and D 15 in disk 4 in stripe 3. DP
1 is obtained by performing an XOR operation on D 1 in disk 2 in stripe 0, D 6 in disk
3 in stripe 1, D 11 in disk 4 in stripe 2, and P 3 in the first parity disk in stripe 3. DP 2
is obtained by performing an XOR operation on D 2 in disk 3 in stripe 0, D 7 in disk 4
in stripe 1, P 2 in the first parity disk in stripe 2, and D 12 in disk 1 in stripe 3.
Therefore, DP 0 = D 0 ⊕ D 5 ⊕ D 10 ⊕ D 15, DP 1 = D 1 ⊕ D 6 ⊕ D 11 ⊕ P 3,
and so on.
 A RAID 6 array tolerates failures of up to two disks.
 A RAID 6 array provides relatively poor performance no matter whether DP or P+Q is
implemented. Therefore, RAID 6 applies to the following two scenarios:
− Data is critical and should be consistently in online and available state.
− Large-capacity (generally > 2 T) disks are used. The reconstruction of a large-
capacity disk takes a long time. Data will be inaccessible for a long time if two
disks fail at the same time. A RAID 6 array tolerates failure of another disk
during the reconstruction of one disk. Some enterprises anticipate to use a dual-
redundancy protection RAID array for their large-capacity disks.
Best Practices for RAID 6
As shown in 错误!未找到引用源。, PA is the first parity data block of data blocks A0, A1,
and A2; QA is the second parity data block of data blocks A0, A1, and A2; PB is the first
parity data block of data blocks B0, B1, and B2; and QB is the second parity data block of
data blocks B0, B1, and B2.
Data blocks and parity data blocks are distributed to each RAID 6 member drive. When
any one or two member drives fail, the RAID controller card restores or regenerates the
lost data using data from other member drives.

1.2.1.7 RAID 10
For most enterprises, RAID 0 is not really a practical choice, while RAID 1 is limited by
disk capacity utilization. RAID 10 provides the optimal solution by combining RAID 1 and
RAID 0. In particular, RAID 10 provides superior performance by eliminating write penalty
in random writes.
A RAID 10 array consists of an even number of disks. User data is written to half of the
disks and mirror copies of user data are retained in the other half of disks. Mirroring is
performed based on stripes.
Data Storage Technology Page 23

Figure 1-7 Working principles of RAID 10


As shown in the figure, physical disks 1 and 2 form a RAID 1 array, and physical disks 3
and 4 form another RAID 1 array. The two RAID 1 sub-arrays form a RAID 0 array.
When data is written to the RAID 10 array, data blocks are concurrently written to sub-
arrays by mirroring. As shown in the figure, D 0 is written to physical disk 1, and its copy
is written to physical disk 2.
If disks (such as disk 2 and disk 4) in both the two RAID 1 sub-arrays fail, accesses to
data in the RAID 10 array will remain normal. This is because integral copies of the data
in faulty disks 2 and 4 are retained on other two disks (such as disk 3 and disk 1).
However, if disks (such as disk 1 and 2) in the same RAID 1 sub-array fail at the same
time, data will be inaccessible.
Theoretically, RAID 10 tolerates failures of half of the physical disks. However, in the
worst case, failures of two disks in the same sub-array may also cause data loss.
Generally, RAID 10 protects data against the failure of a single disk.
Best Practices for RAID 10
As shown in 错误!未找到引用源。, drives 0 and 1 form subarray 0; drives 2 and 3 form
subarray 1. Drives in each subarray mirror each other.
When a system sends a data I/O request to a drive, the request is distributed to the two
subarrays for concurrent processing in the same way as in RAID 0. When a system writes
data to drive 0, the same data is automatically copied to drive 1 in the same way as RAID
1. Similarly, when a system writes data to drive 2, the same data is automatically copied
to drive 3.

1.2.1.8 RAID 50
RAID 50 combines RAID 0 and RAID 5. Two RAID 5 sub-arrays form a RAID 0 array. The
two RAID 5 sub-arrays are independent of each other. A RAID 50 array requires at least
six disks because a RAID 5 sub-array requires at least three disks.
Data Storage Technology Page 24

Figure 1-8 Working principles of RAID 50


As shown in the figure, disks 1, 2, and 3 form a RAID 5 sub-array, and disks 4, 5, and 6
form another RAID 5 sub-array. The two RAID 5 sub-arrays form a RAID 0 array.
A RAID 50 array tolerates failures of multiple disks at the same time. However, failures of
two disks in the same RAID 5 sub-array will cause data loss.
Best Practices for RAID 50
As shown in 错误!未找到引用源。, PA is the parity data of A0, A1, and A2; PB is the parity
data of B0, B1, and B2, and so forth.
RAID 50, also known as RAID 5+0, combines distributed parity (RAID 5) with striping
(RAID 0). A RAID 50 array comprises several RAID 5 subarrays. Data is accessed and
stored on each RAID 5 subarray in the same way as in RAID 0. With the redundancy
provided by RAID 5, when any drive in any subarray becomes faulty, a RAID 50 array can
continue running normally and restore data on the faulty drive. In addition, services can
still run as normal during the replacement of any member drive. RAID 50 can therefore
tolerate one faulty drive in each of multiple subarrays at the same time, which is
impossible with RAID 5. Furthermore, as data is distributed over multiple subarrays, RAID
50 provides superior data read and write performance.

1.2.2 RAID 2.0+


1.2.2.1 RAID Evolution
As a well-developed and reliable disk data protection standard, RAID has always been
used as a basic technology for storage systems. However, traditional RAID is becoming
increasingly defective, in particular, in reconstruction of large-capacity disks, with ever-
increasing data storage requirements and capacity per disk.
Traditional RAID is defective due to high risk of data loss and impact on services.
 High risk of data loss: Ever-increasing disk capacities lead to longer reconstruction
time and higher risk of data loss. Dual redundancy protection is invalid during
Data Storage Technology Page 25

reconstruction and data will be lost if any additional disk or data block fails.
Therefore, a longer reconstruction duration results in higher risk of data loss.
 Material impact on services: During reconstruction, member disks are engaged in
reconstruction and provide poor service performance, which will affect the operation
of upper-layer services.
To solve the preceding problems of traditional RAID and ride on the development of
virtualization technologies, the following alternative solutions emerged:
 LUN virtualization: A traditional RAID array is further divided into small units. These
units are regrouped into storage spaces accessible to hosts.
 Block virtualization: Disks in a storage pool are divided into small data blocks. A
RAID array is created using these data blocks so that data can be evenly distributed
to all disks in the storage pool. Then, resources are managed based on data blocks.

1.2.2.2 Basic Principles of RAID 2.0+


RAID 2.0+ divides a physical disk into multiple chunks (CKs). CKs in different disks form a
chunk group (CKG). CKGs have a RAID relationship with each other. Multiple CKGs form
a large storage resource pool. Resources are allocated from the resource pool to hosts.
Implementation mechanism of RAID 2.0+:
 Multiple SSDs form a storage pool.
 Each SSD is then divided into chunks (CKs) of a fixed size (typically 4 MB) for logical
space management.
 CKs from different SSDs form chunk groups (CKGs) based on the RAID policy
specified on DeviceManager.
 CKGs are further divided into grains (typically 8 KB). Grains are mapped to LUNs for
refined management of storage resources.
RAID 2.0+ outperforms traditional RAID in the following aspects:
 Service load balancing to avoid hot spots: Data is evenly distributed to all disks in the
resource pool, protecting disks from early end of service life due to excessive writes.
 Fast reconstruction to reduce risk window: When a disk fails, the valid data in the
faulty disk is reconstructed to all other functioning disks in the resource pool (fast
many-to-many reconstruction), efficiently resuming redundancy protection.
 Reconstruction load balancing among all disks in the resource pool to minimize the
impact on upper-layer applications.

1.2.2.3 RAID 2.0+ Composition


1. Disk Domain
A disk domain is a combination of disks (which can be all disks in the array). After
the disks are combined and reserved for hot spare capacity, each disk domain
provides storage resources for the storage pool.
For traditional RAID, a RAID array must be created first for allocating disk spaces to
service hosts. However, there are some restrictions and requirements for creating a
RAID array: A RAID array must consist of disks of the same type, size, and rotational
speed, and should consist of a maximum number of 12 disks.
Data Storage Technology Page 26

Huawei RAID 2.0+ is implemented in another way. A disk domain should be created
first. A disk domain is a disk array. A disk can belong to only one disk domain. One or
more disk domains can be created in an OceanStor storage system. It seems that a
disk domain is similar to a RAID array. Both consist of disks but have significant
differences. A RAID array consists of disks of the same type, size, and rotational
speed, and such disks are associated with a RAID level. In contrast, a disk domain
consists of up to more than 100 disks of up to three types. Each type of disk is
associated with a storage tier. For example, SSDs are associated with the high
performance tier, SAS disks are associated with the performance tier, and NL-SAS
disks are associated with the capacity tier. A storage tier would not exist if there are
no disks of the corresponding type in a disk domain. A disk domain separates an
array of disks from another array of disks for fully isolating faults and maintaining
independent performance and storage resources. RAID levels are not specified when
a disk domain is created. That is, data redundancy protection methods are not
specified. Actually, RAID 2.0+ provides more flexible and specific data redundancy
protection methods. The storage space formed by disks in a disk domain is divided
into storage pools of a smaller granularity and hot spare space shared among
storage tiers. The system automatically sets the hot spare space based on the hot
spare policy (high, low, or none) set by an administrator for the disk domain and the
number of disks at each storage tier in the disk domain. In a traditional RAID array,
an administrator should specify a disk as the hot space disk.
2. Storage Pool and Storage Tier
A storage pool is a storage resource container. The storage resources used by
application servers are all from storage pools.
A storage tier is a collection of storage media providing the same performance level
in a storage pool. Different storage tiers manage storage media of different
performance levels and provide storage space for applications that have different
performance requirements.
A storage pool created based on a specified disk domain dynamically allocates CKs
from the disk domain to form CKGs according to the RAID policy of each storage tier
for providing storage resources with RAID protection to applications.
A storage pool can be divided into multiple tiers based on disk types.
When creating a storage pool, a user is allowed to specify a storage tier and related
RAID policy and capacity for the storage pool.
OceanStor storage systems support RAID 1, RAID 10, RAID 3, RAID 5, RAID 50, and
RAID 6 and related RAID policies.
The capacity tier consists of large-capacity SATA and NL-SAS disks. DP RAID 6 is
recommended.
3. Disk Group
An OceanStor storage system automatically divides disks of each type in each disk
domain into one or more disk groups (DGs) according to disk quantity.
One DG consists of disks of only one type.
CKs in a CKG are allocated from different disks in a DG.
Data Storage Technology Page 27

DGs are internal objects automatically configured by OceanStor storage systems and
typically used for fault isolation. DGs are not presented externally.
4. Logical Drive
A logical drive (LD) is a disk that is managed by a storage system and corresponds to
a physical disk.
5. CK
A chunk (CK) is a disk space of a specified size allocated from a storage pool. It is the
basic unit of a RAID array.
6. CKG
A chunk group (CKG) is a logical storage unit that consists of CKs from different
disks in the same DG based on the RAID algorithm. It is the minimum unit for
allocating resources from a disk domain to a storage pool.
All CKs in a CKG are allocated from the disks in the same DG. A CKG has RAID
attributes, which are actually configured for corresponding storage tiers. CKs and
CKGs are internal objects automatically configured by storage systems. They are not
presented externally.
7. Extent
Each CKG is divided into logical storage spaces of a specific and adjustable size called
extents. Extent is the minimum unit (granularity) for migration and statistics of hot
data. It is also the minimum unit for space application and release in a storage pool.
An extent belongs to a volume or LUN. A user can set the extent size when creating
a storage pool. After that, the extent size cannot be changed. Different storage pools
may consist of extents of different sizes, but one storage pool must consist of extents
of the same size.
8. Grain
When a thin LUN is created, extents are divided into 64 KB blocks which are called
grains. A thin LUN allocates storage space by grains. Logical block addresses (LBAs)
in a grain are consecutive.
Grains are mapped to thin LUNs. A thick LUN does not involve grains.
9. Volume and LUN
A volume is an internal management object in a storage system.
A LUN is a storage unit that can be directly mapped to a host for data reads and
writes. A LUN is the external embodiment of a volume.
A volume organizes all extents and grains of a LUN and applies for and releases
extents to increase and decrease the actual space used by the volume.

1.2.3 Other RAID Technologies


1.2.3.1 Huawei Dynamic RAID Algorithm
When a flash component fails, Huawei dynamic RAID algorithm can proactively recover
the data in the faulty flash component and keep providing RAID protection for the data.
This RAID algorithm dynamically adjusts the number of data blocks in a RAID array to
meet system reliability and capacity requirements. If a chunk is faulty and no chunk is
Data Storage Technology Page 28

available from disks outside the disk domain, the system dynamically reconstructs the
original N + M chunks to (N - 1) + M chunks. When a new SSD is inserted, the system
migrates data from the (N - 1) + M chunks to the newly constructed N + M chunks for
efficient disk utilization.
Dynamic RAID adopts the erasure coding (EC) algorithm, which can dynamically adjust
the number of CKs in a CKG if only SSDs are used to meet the system reliability and
capacity requirements.

1.2.3.2 RAID-TP
RAID protection is essential to a storage system for consistently high reliability and
performance. However, the reliability of RAID protection is challenged by uncontrollable
RAID array construction time due to drastic increase in capacity.
RAID-TP achieves optimal performance, reliability, and capacity utilization.
Customers have to purchase disks of larger capacity to replace existing disks for system
upgrades. In such a case, one system may consist of disks of different capacities. How to
maintain the optimal capacity utilization in a system that uses a mix of disks with
different capacities?
RAID-TP uses Huawei's optimized FlexEC algorithm that allows the system to tolerate
failures of up to three disks, improving reliability while allowing a longer reconstruction
time window.
RAID-TP with FlexEC algorithm reduces the amount of data read from a single disk by
70%, as compared with traditional RAID, minimizing the impact on system performance.
In a typical 4:2 RAID 6 array, the capacity utilization is about 67%. The capacity
utilization of a Huawei OceanStor all-flash storage system with 25 disks is improved by
20% on this basis.

1.2.4 Quiz
1. What is the difference between a strip and stripe?
2. Which RAID level would you recommend if a user focuses on reliability and random
write performance?
3. (True or false) Data access will remain unaffected if any disk in a RAID 10 array fails.
4. (Multiple-answer question) What are the advantages of RAID 2.0+?
A. Automatic load balancing to reduce fault rate
B. Fast thin reconstruction to reduce dual-disk failure rate
C. Intelligent fault troubleshooting to maintain system reliability
D. Virtual storage pools to simplify storage planning and management
5. (Multiple-answer question) Which of the following storage pool technologies use
extents of RAID 2.0+ as basic units?
A. Applying for space
B. Space release
C. Statistics collection
D. Data migration
Data Storage Technology Page 29

1.3 Storage System Architecture


1.3.1 Storage System Architecture Evolution
The storage system evolved from a single controller to mutual backup of dual controllers
that processed their own tasks before processing data concurrently. Later, parallel
symmetric processing was implemented for multiple controllers. Distributed storage has
become widely used thanks to the development of cloud computing and big data.
Currently, single-controller storage is rare. Most entry-level and mid-range storage
systems use dual-controller architecture, while most mission-critical storage systems use
multi-controller architecture.
Single-controller Storage:
 External disk array with RAID controllers: Using a disk chassis, a disk array virtualizes
internal disks into logical disks through RAID controllers, and then connects to an
SCS interface on a host through an external SCSI interface.
 If a storage system has only one controller module, it is a single point of failure
(SPOF).
Dual-controller Storage:
 Currently, dual-controller architecture is mainly used in mainstream entry-level and
mid-range storage systems.
 There are two working modes: Active-Standby and Active-Active.
− Active-Standby
This is also called high availability (HA). That is, only one is working at a time,
while the other waits, synchronizes data, and supervises services. If the active
controller fails, the standby controller takes over its services. In addition, the
active controller is powered off or restarted before the takeover to prevent brain
split. The bus use of the active controller is released and then back-end and
front-end buses are taken over.
− Active-Active
Two controllers are working at the same time. Each connects to all back-end
buses, but each bus is managed by only one controller. Each controller manages
half of all back-end buses. If one controller is faulty, the other takes over all
buses. This is more efficient than Active-Standby.
Mid-range Storage Architecture Evolution:
 Mid-range storage systems always use an independent dual-controller architecture.
Controllers are usually of modular hardware.
 The evolution of mid-range storage mainly focuses on the rate of host interfaces and
disk interfaces, and the number of ports.
 The common form factor is the convergence of SAN and NAS storage services.
Multi-controller Storage:
 Most mission-critical storage systems use multi-controller architecture.
 The main architecture models are as follows:
Data Storage Technology Page 30

− Bus architecture
− Hi-Star architecture
− Direct-connection architecture
− Virtual matrix architecture
Mission-critical storage architecture evolution:
 In 1990, EMC launched Symmetrix, a full bus architecture. A parallel bus connected
front-end interface modules, cache modules, and back-end disk interface modules
for data and signal exchange in time-division multiplexing mode.
 In 2000, HDS adopted the switching architecture for Lightning 9900 products. Front-
end interface modules, cache modules, and back-end disk interface modules were
connected on two redundant switched networks, increasing communication channels
to dozens of times more than that of the bus architecture. The internal bus was no
longer a performance bottleneck.
 In 2003, EMC launched the DMX series based on full mesh architecture. All modules
were connected in point-to-point mode, obtaining theoretically larger internal
bandwidth but adding system complexity and limiting scalability challenges.
 In 2009, to reduce hardware development costs, EMC launched the distributed
switching architecture by connecting a separated switch module to the tightly
coupled dual-controller of mid-range storage systems. This achieved a balance
between costs and scalability.
 In 2012, Huawei launched the Huawei OceanStor 18000 series, a mission-critical
storage product also based on distributed switching architecture.
Storage Software Technology Evolution:
A storage system combines unreliable and low-performance disks to provide high-
reliability and high-performance storage through effective management. Storage systems
provide sharing, easy-to-manage, and convenient data protection functions. Storage
system software has evolved from basic RAID and cache to data protection features such
as snapshot and replication, to dynamic resource management with improved data
management efficiency, and deduplication and tiered storage with improved storage
efficiency.
Distributed Storage Architecture:
 A distributed storage system organizes local HDDs and SSDs of general-purpose
servers into a large-scale storage resource pool, and then distributes data to multiple
data storage servers.
 Currently, distributed storage of Huawei learns from Google, building a distributed
file system among multiple servers and then implementing storage services on the
file system.
 Most storage nodes are general-purpose servers. Huawei OceanStor 100D is
compatible with multiple general-purpose x86 servers and Arm servers.
− Protocol: storage protocol layer. The block, object, HDFS, and file services
support local mounting access over iSCSI or VSC, S3/Swift access, HDFS access,
and NFS access respectively.
Data Storage Technology Page 31

− VBS: block access layer of FusionStorage Block. User I/Os are delivered to VBS
over iSCSI or SCSI.
− EDS-B: provides block services with enterprise features, and receives and
processes I/Os from VBS.
− EDS-F: provides the HDFS service.
− Metadata Controller (MDC): The metadata control device controls distributed
cluster node status, data distribution rules, and data rebuilding rules.
− Object Storage Device (OSD): a storage device for storing user data in
distributed clusters of the object storage device
− Cluster Manager (CM): manages cluster information.

1.3.2 Storage System Expansion Methods


Service data continues to increase with the continued development of enterprise
information systems and the ever-expanding scale of services. The initial configuration of
storage systems is often not enough to meet these demands. Storage system capacity
expansion has become a major concern of system administrators. There are two capacity
expansion methods: scale-up and scale-out.
Scale-up:
 This traditional vertical expansion architecture continuously adds storage disks into
the existing storage systems to meet demands.
 Advantage: simple operation at the initial stage
 Disadvantage: As the storage system scale increases, resource increase reaches a
bottleneck.
Scale-out:
 This horizontal expansion architecture adds controllers to meet demands.
 Advantage: As the scale increases, the unit price decreases and the efficiency is
improved.
 Disadvantage: The complexity of software and management increases.
Huawei SAS disk enclosure is used as an example.
 Port consistency: In a loop, the EXP port of an upper-level disk enclosure is connected
to the PRI port of a lower-level disk enclosure.
 Dual-plane networking: Expansion module A connects to controller A, while
expansion module B connects to controller B.
 Symmetric networking: On controllers A and B, symmetric ports and slots are
connected to the same disk enclosure.
 Forward and backward connection networking: Expansion module A uses forward
connection, while expansion module B uses backward connection.
 Cascading depth: The number of cascaded disk enclosures in a loop cannot exceed
the upper limit.
Huawei smart disk enclosure is used as an example.
Data Storage Technology Page 32

 Port consistency: In a loop, the EXP (P1) port of an upper-level disk enclosure is
connected to the PRI (P0) port of a lower-level disk enclosure.
 Dual-plane networking: Expansion board A connects to controller A, while expansion
board B connects to controller B.
 Symmetric networking: On controllers A and B, symmetric ports and slots are
connected to the same disk enclosure.
 Forward connection networking: Both expansion modules A and B use forward
connection.
 Cascading depth: The number of cascaded disk enclosures in a loop cannot exceed
the upper limit.
IP scale-out is used for Huawei OceanStor V3 and V5 entry-level and mid-range series,
Huawei OceanStor V5 Kunpeng series, and Huawei OceanStor Dorado V6 series. IP scale-
out integrates TCP/IP, Remote Direct Memory Access (RDMA), and Internet Wide Area
RDMA Protocol (iWARP) to implement service switching between controllers, which
complies with the all-IP trend of the data center network.
PCIe scale-out is used for Huawei OceanStor 18000 V3 and V5 series, and Huawei
OceanStor Dorado V3 series. PCIe scale-out integrates PCIe channels and the RDMA
technology to implement service switching between controllers.
PCIe scale-out: features high bandwidth and low latency.
IP scale-out: employs standard data center technologies (such as ETH, TCP/IP, and
iWARP) and infrastructure, and boosts the development of Huawei's proprietary chips for
entry-level and mid-range products.
Next, let's move on to I/O read and write processes of the host. The scenarios are as
follows:
 Local Write Process
− A host delivers write I/Os to engine 0.
− Engine 0 writes the data into the local cache, implements mirror protection, and
returns a message indicating that data is written successfully.
− Engine 0 flushes dirty data onto a disk. If the target disk is on the local
computer, engine 0 directly delivers the write I/Os.
− If the target disk is on a remote device, engine 0 transfers the I/Os to the engine
(engine 1 for example) where the disk resides.
− Engine 1 writes dirty data onto disks.
 Non-local Write Process
− A host delivers write I/Os to engine 2.
− After detecting that the LUN is owned by engine 0, engine 2 transfers the write
I/Os to engine 0.
− Engine 0 writes the data into the local cache, implements mirror protection, and
returns a message to engine 2, indicating that data is written successfully.
− Engine 2 returns the write success message to the host.
− Engine 0 flushes dirty data onto a disk. If the target disk is on the local
computer, engine 0 directly delivers the write I/Os.
Data Storage Technology Page 33

− If the target disk is on a remote device, engine 0 transfers the I/Os to the engine
(engine 1 for example) where the disk resides.
− Engine 1 writes dirty data onto disks.
 Local Read Process
− A host delivers write I/Os to engine 0.
− If the read I/Os are hit in the cache of engine 0, engine 0 returns the data to the
host.
− If the read I/Os are not hit in the cache of engine 0, engine 0 reads data from
the disk. If the target disk is on the local computer, engine 0 reads data from the
disk.
− After the read I/Os are hit locally, engine 0 returns the data to the host.
− If the target disk is on a remote device, engine 0 transfers the I/Os to the engine
(engine 1 for example) where the disk resides.
− Engine 1 reads data from the disk.
− Engine 1 accomplishes the data read.
− Engine 1 returns the data to engine 0 and then engine 0 returns the data to the
host.
 Non-local Read Process
− The LUN is not owned by the engine that delivers read I/Os, and the host
delivers the read I/Os to engine 2.
− After detecting that the LUN is owned by engine 0, engine 2 transfers the read
I/Os to engine 0.
− If the read I/Os are hit in the cache of engine 0, engine 0 returns the data to
engine 2.
− Engine 2 returns the data to the host.
− If the read I/Os are not hit in the cache of engine 0, engine 0 reads data from
the disk. If the target disk is on the local computer, engine 0 reads data from the
disk.
− After the read I/Os are hit locally, engine 0 returns the data to engine 2 and then
engine 2 returns the data to the host.
− If the target disk is on a remote device, engine 0 transfers the I/Os to engine 1
where the disk resides.
− Engine 1 reads data from the disk.
− Engine 1 completes the data read.
− Engine 1 returns the data to engine 0, engine 0 returns the data to engine 2, and
then engine 2 returns the data to the host.

1.3.3 Huawei Storage Product Architecture


Huawei entry-level and mid-range storage products use dual-controller architecture by
default. Huawei mission-critical storage products use architecture with multiple
controllers. OceanStor Dorado V6 SmartMatrix architecture integrates the advantages of
Data Storage Technology Page 34

scale-up and scale-out architectures. A single system can be expanded to a maximum of


32 controllers, greatly improving E2E high reliability. The architecture ensures zero service
interruption when seven out of eight controllers are faulty, providing 99.9999% high
availability. It is a perfect choice to carry key service applications in finance,
manufacturing, and carrier industries.
SmartMatrix makes breakthroughs in the mission-critical storage architecture that
separates computing and storage resources. Controller enclosures are completely
separated from and directly connected to disk enclosures. The biggest advantage is that
controllers and storage devices can be independently expanded and upgraded, which
greatly improves storage system flexibility, protects customers' investments in the long
term, reduces storage risks, and guarantees service continuity.
 Front-end full interconnection
− Dorado 8000 and 18000 V6 support FIMs, which can be simultaneously accessed
by four controllers in a controller enclosure.
− Upon reception of host I/Os, the FIM directly distributes the I/Os to appropriate
controllers.
 Full interconnection among controllers
− Controllers in a controller enclosure are connected by 100 Gbit/s (40 Gbit/s for
Dorado 3000 V6) RDMA links on the backplane.
− For scale-out to multiple controller enclosures, any two controllers can be
directly connected to avoid data forwarding.
 Back-end full interconnection
− Dorado 8000 and 18000 V6 support BIMs, which allow a smart disk enclosure to
be connected to two controller enclosures and accessed by eight controllers
simultaneously. This technique, together with continuous mirroring, allows the
system to tolerate failure of 7 out of 8 controllers.
− Dorado 3000, 5000, and 6000 V6 do not support BIMs. Disk enclosures
connected to Dorado 3000, 5000, and 6000 V6 can be accessed by only one
controller enclosure. Continuous mirroring is not supported.
The storage system supports three types of disk enclosures: SAS, smart SAS, and smart
NVMe. Currently, they cannot be used together on one storage system. Smart SAS and
smart NVMe disk enclosures use the same networking mode. In this mode, a controller
enclosure uses the shared 2-port 100 Gbit/s RDMA interface module to connect to a disk
enclosure. Each interface module connects to the four controllers in the controller
enclosure through PCIe 3.0 x16. In this way, each disk enclosure can be simultaneously
accessed by all four controllers, achieving full interconnection between the disk enclosure
and the four controllers. A smart disk enclosure has two groups of uplink ports and can
connect to two controller enclosures at the same time. This allows the two controller
enclosures (eight controllers) to simultaneously access a disk enclosure, implementing
full interconnection between the disk enclosure and eight controllers. When full
interconnection between disk enclosures and eight controllers is implemented, the system
can use continuous mirroring to tolerate failure of 7 out of 8 controllers without service
interruption.
Huawei storage provides E2E global resource sharing:
Data Storage Technology Page 35

 Symmetric architecture
− All products support host access in active-active mode. Requests can be evenly
distributed to each front-end link.
− They eliminate LUN ownership of controllers, making LUNs easier to use and
balancing loads. They accomplish this by dividing a LUN into multiple slices that
are then evenly distributed to all controllers using the DHT algorithm
− Mission-critical products reduce latency with intelligent FIMs that divide LUNs
into slices for hosts I/Os and send the requests to their target controller.
 Shared port
− A single port is shared by four controllers in a controller enclosure.
− Loads are balanced without host multipathing.
 Global cache
− The system directly writes received I/Os (in one or two slices) to the cache of the
corresponding controller and sends an acknowledgement to the host.
− The intelligent read cache of all controllers participates in prefetch and cache hit
of all LUN data and metadata.
FIMs of Huawei OceanStor Dorado 8000 and 18000 V6 series storage adopt Huawei-
developed Hi1822 chip to connect to all controllers in a controller enclosure via four
internal links and each front-end port provides a communication link for the host. If any
controller restarts during an upgrade, services are seamlessly switched to the other
controller without impacting hosts and interrupting links. The host is unaware of
controller faults. Switchover is completed within 1 second.
The FIM has the following features:
 Failure of a controller will not disconnect the front-end link, and the host is unaware
of the controller failure.
 The PCIe link between the FIM and the controller is disconnected, and the FIM
detects the controller failure.
 Service switchover is performed between the controllers, and the FIM redistributes
host requests to other controllers.
 The switchover time is about 1 second, which is much shorter than switchover
performed by multipathing software (10-30s).
In global cache mode, host data is directly written into linear space logs, and the logs
directly copy the host data to the memory of multiple controllers using RDMA based on a
preset copy policy. The global cache consists of two parts:
 Global memory: memory of all controllers (four controllers in the figure). This is
managed in a unified memory address, and provides linear address space for the
upper layer based on a redundancy configuration policy.
 WAL: new write cache of the log type
The global pool uses RAID 2.0+, full-strip write of new data, and shared RAID groups
between multiple strips.
Data Storage Technology Page 36

Another feature is back-end sharing, which includes sharing of back-end interface


modules within an enclosure and cross-controller enclosure sharing of back-end disk
enclosures.
Active-Active Architecture with Full Load Balancing:
 Even distribution of unhomed LUNs
− Data on LUNs is divided into 64 MB slices. The slices are distributed to different
virtual nodes based on the hash result (LUN ID + LBA).
 Front-end load balancing
− UltraPath selects appropriate physical links to send each slice to the
corresponding virtual node.
− The front-end interconnect I/O modules forward the slices to the corresponding
virtual nodes.
− Front-end: If there is no UltraPath or FIM, the controllers forward I/Os to the
corresponding virtual nodes.
 Global write cache load balancing
− The data volume is balanced.
− Data hotspots are balanced.
 Global storage pool load balancing
− Usage of disks is balanced.
− The wear degree and lifecycle of disks are balanced.
− Data is evenly distributed.
− Hotspot data is balanced.
 Three cache copies
− The system supports two or three copies of the write cache.
− Three-copy requires an extra license.
− Only mission-critical storage systems support three copies.
 Three copies tolerate simultaneous failure of two controllers.
− Failure of two controllers does not cause data loss or service interruption.
 Three copies tolerate failure of one controller enclosure.
− With three copies, data is mirrored in a controller enclosure and across controller
enclosures.
− Failure of a controller enclosure does not cause data loss or service interruption.
Key reliability technologies of Huawei storage products:
 Continuous mirroring
− Dorado V6's mission-critical storage systems support continuous mirroring. In the
event of a controller failure, the system automatically selects new controllers for
mirroring.
− Continuous mirroring includes all devices in back-end full interconnection.
 Back-end full interconnection
Data Storage Technology Page 37

− Controllers are directly connected to disk enclosures.


− Dorado V6's mission-critical storage systems support back-end full
interconnection.
− BIMs + two groups of uplink ports on the disk enclosures achieve full
interconnection of the disk enclosures to eight controllers.
 Continuous mirroring and back-end full interconnection allow the system to tolerate
failure of seven out of eight controllers.
Host service switchover when a single controller is faulty: When FIMs are used,
failure of a controller will not disconnect front-end ports from hosts, and the hosts
are unaware of the controller failure, ensuring high availability. When a controller
fails, the FIM port chip detects that the PCIe link between the FIM and the controller
is disconnected. Then service switchover is performed between the controllers, and
the FIM redistributes host I/Os to other controllers. This process is completed within
seconds and does not affect host services. In comparison, when non-shared interface
modules are used, a link switchover must be performed by the host's multipathing
software in the event of a controller failure, which takes a longer time (10 to 30
seconds) and reduces reliability.

1.3.4 Quiz
1. (True or false) Scale-up is a method in which disk enclosures are continuously added
to existing storage systems to handle increasing data volumes.
2. What are the differences between scale-up and scale-out?
3. If any controller of an OceanStor V3 storage system is faulty, the other controller can
seamlessly take over services using the host multipathing software to ensure service
continuity.
4. (Multiple-answer question) Which of the following specifications can be selected
when RH2288 V3 is fully configured with disks?
A. 8 disks
B. 12 disks
C. 16 disks
D. 25 disks
5. (Multiple-answer question) Which operating systems are supported by Huawei
OceanStor 5300 V3 block storage?
A. Windows
B. Linux
C. FusionSphere
D. VMware
Data Storage Technology Page 38

1.4 Storage Network Architecture


1.4.1 DAS
Direct-attached storage (DAS) connects one or more storage devices to servers. These
storage devices provide block-level data access for the servers. Based on the locations of
storage devices and servers, DAS is classified into internal DAS and external DAS. SCSI
cables are used to connect hosts and storage devices.
JBOD, short for Just a Bunch Of Disks, logically connects several physical disks in series to
increase capacity but does not provide data protection. JBOD can resolve the insufficient
capacity expansion issue caused by limited disk slots of internal storage. However, it
offers no redundancy, resulting in poor reliability.
For a smart disk array, the controller provides RAID and large-capacity cache, enables the
disk array to have multiple functions, and is equipped with dedicated management
software.

1.4.2 NAS
Enterprises need to store a large amount of data and share the data through a network.
Therefore, network-attached storage (NAS) is a good choice. NAS connects storage
devices to the live network to provide data and file services.
For a server or host, a NAS device is an external device and can be flexibly deployed
through a network. In addition, NAS provides file-level sharing rather than block-level
sharing, which makes it easier for clients to access NAS over a network. UNIX and
Microsoft Windows users can seamlessly share data using NAS or File Transfer Protocol
(FTP). When NAS is used for data sharing, UNIX uses NFS and Windows uses CIFS.
NAS has the following characteristics:
 NAS provides storage resources through file-level data access and sharing, enabling
users to quickly share files with minimum storage management costs.
 NAS is preferred for file sharing storage because it does not require multiple file
servers.
 NAS also helps eliminate bottlenecks in access to general-purpose servers.
 NAS uses network and file sharing protocols for archiving and storage. These
protocols include TCP/IP for data transmission as well as CIFS and NFS for remote
file services.
A general-purpose server runs a general-purpose operating system and can carry any
application. Unlike general-purpose servers, NAS is dedicated for file services and
provides file sharing services for other operating systems using open standard protocols.
NAS devices are optimized based on general-purpose servers in aspects such as file
service functions, storage, and retrieval. To improve the high availability of NAS devices,
some vendors also provide the NAS clustering function.
The components of a NAS device are as follows:
 NAS engine (CPU and memory)
 One or more NICs for network connection, for example, GE NIC and 10GE NIC
Data Storage Technology Page 39

 An optimized operating system for NAS function management


 NFS and CIFS protocols
 Disk resources that use industry-standard storage protocols such as ATA, SCSI, and
FC
 NAS protocols include NFS, CIFS, FTP, HTTP, and NDMP.
 The NFS protocol is a traditional stateless file sharing protocol for UNIX. If a fault
occurs, NFS connections can be automatically recovered.
 The CIFS protocol is a traditional file sharing protocol in the Microsoft environment.
It is a stateful protocol based on the Server Message Block (SMB) protocol. If a fault
occurs, CIFS connections cannot be automatically recovered. CIFS is integrated into
the operating system and does not require additional software. Moreover, CIFS sends
only a small amount of redundant information, so it delivers higher transmission
efficiency than NFS.
 The File Transfer Protocol (FTP) is one of the protocols in the TCP/IP protocol suite.
The FTP protocol contains the FTP server and FTP client. The FTP server is used to
store files. Users can use the FTP client to access resources on the FTP server through
FTP.
 The Hypertext Transfer Protocol (HTTP) is an application-layer protocol used to
transfer hypermedia documents (such as HTML). It is designed for communication
between a Web browser and a Web server, but can also be used for other purposes.
 The Network Data Management Protocol (NDMP) provides an open standard for
NAS network backup. NDMP enables data to be directly written to tapes without
being backed up by backup servers, improving the speed and efficiency of NAS data
protection.
NFS Protocol
NFS is short for Network File System. The network file sharing protocol is defined by the
IETF and widely used in the Linux/Unix environment.
NFS is a client/server application that uses remote procedure call (RPC) for
communication between computers. Users can store and update files on the remote NAS
just like on local PCs. A system requires an NFS client to connect to an NFS server. NFS is
used for independent transmission so it uses TCP or UDP. Users or system administrators
can use NFS to mount all file systems or a part of a file system (a part of any directory or
subdirectory hierarchy). Access to the mounted file system can be controlled using
permissions, for example, read-only or read-write permissions.
Differences between NFSv3 and NFSv4:
 NFSv4 is a stateful protocol. It implements the file lock function and can obtain the
root node of a file system without the help of the NLM and MOUNT protocols.
NFSv3 is a stateless protocol. It requires the NLM protocol to implement the file lock
function.
 NFSv4 has enhanced security and supports RPCSEC-GSS identity authentication.
 NFSv4 provides only two requests: NULL and COMPOUND. All operations are
integrated into COMPOUND. A client can encapsulate multiple operations into one
COMPOUND request based on actual requests to improve flexibility.
Data Storage Technology Page 40

 The command space of the NFSv4 file system is changed. A root file system (fsid=0)
must be set on the server, and other file systems are mounted to the root file system
for export.
 Compared with NFSv3, the cross-platform feature of NFSv4 is enhanced.
Working principles of NFS: Like other file sharing protocols, NFS also uses the C/S
architecture. However, NFS provides only the basic file processing function and does not
provide any TCP/IP data transmission function. The TCP/IP data transmission function can
be implemented only by using the Remote Procedure Call (RPC) protocol. NFS file
systems are completely transparent to clients. Accessing files or directories in an NFS file
system is the same as accessing local files or directories.
One program can use RPC to request a service from a program located in another
computer over a network without having to understand the underlying network
protocols. RPC assumes the existence of a transmission protocol such as Transmission
Control Protocol (TCP) or User Datagram Protocol (UDP) to carry the message data
between communicating programs. In the OSI network communication model, RPC
traverses the transport layer and application layer. RPC simplifies development of
applications.
RPC works based on the client/server model. The requester is a client, and the service
provider is a server. The client sends a call request with parameters to the RPC server and
waits for a response. On the server side, the process remains in a sleep state until the call
request arrives. Upon receipt of the call request, the server obtains the process
parameters, outputs the calculation results, and sends the response to the client. Then,
the server waits for the next call request. The client receives the response and obtains call
results.
One of the typical applications of NFS is using the NFS server as internal shared storage
in cloud computing. The NFS client is optimized based on cloud computing to provide
better performance and reliability. Cloud virtualization software (such as VMware)
optimizes the NFS client, so that the VM storage space can be created on the shared
space of the NFS server.
NDMP Protocol
The backup process of the traditional NAS storage is as follows:
A NAS device is a closed storage system. The Client Agent of the backup software can
only be installed on the production system instead of the NAS device. In the traditional
network backup process, data is read from a NAS device through the CIFS or NFS sharing
protocol, and then transferred to a backup server over a network.
Such a mode occupies network, production system and backup server resources, resulting
in poor performance and an inability to meet the requirements for backing up a large
amount of data.
The NDMP protocol is designed for the data backup system of NAS devices. It enables
NAS devices, without any backup client agent, to send data directly to the connected disk
devices or the backup servers on the network for backup.
There are two networking modes for NDMP:
On a 2-way network, the backup medium is connected directly to a NAS storage system
instead of to a backup server. In a backup process, the backup server sends a backup
Data Storage Technology Page 41

command to the NAS storage system through the Ethernet. The system then directly
backs up data to the tape library it is connected to.
In the NDMP 2-way backup mode, data flows are transmitted directly to backup media,
greatly improving the transmission performance and reducing server resource usage.
However, a tape library is connected to a NAS storage device, so the tape library can
back up data only for the NAS storage device to which it is connected.
Tape libraries are expensive. To enable different NAS storage devices to share tape
devices, NDMP also supports the 3-way backup mode.
In the 3-way backup mode, a NAS storage system can transfer backup data to a NAS
storage device connected to a tape library through a dedicated backup network. Then,
the storage device backs up the data to the tape library.
Working principles of CIFS: CIFS runs on top of TCP/IP and allows Windows computers to
access files on UNIX computers over a network.
CIFS Protocol
In 1996, Microsoft renamed SMB to CIFS and added many new functions. Now, CIFS
includes SMB1, SMB2, and SMB3.0.
CIFS has high requirements on network transmission reliability, so usually uses TCP/IP.
CIFS is mainly used for the Internet and by Windows hosts to access files or other
resources over the Internet. CIFS allows Windows clients to identify and access shared
resources. With CIFS, clients can quickly read, write, and create files in storage systems as
on local PCs. CIFS helps maintain a high access speed and a fast system response even
when many users simultaneously access the same shared file.
The CIFS protocol applies to file sharing. Two typical application scenarios are as follows:
 File sharing service
− CIFS is commonly used in file sharing service scenarios such as enterprise file
sharing.
 Hyper-V VM application scenario
− SMB can be used to share mirrors of Hyper-V virtual machines promoted by
Microsoft. In this scenario, the failover feature of SMB 3.0 is required to ensure
service continuity upon a node failure and to ensure the reliability of VMs.
File protocol comparison

Transmission
Type Application scenario Work Mode
Protocol

Linux and UNIX environments, including


the non-domain environment,
C/S architecture,
Lightweight Directory Access Protocol
NFS TCP or UDP requiring client
(LDAP)a domain environment, and
software
network information service (NIS)b
domain environment.

Windows environments, including the C/S architecture,


CIFS non-domain environment and active TCP with client
directory (AD)c domain environment software integrated
Data Storage Technology Page 42

Transmission
Type Application scenario Work Mode
Protocol

Linux environments where SMB clients into operating


are installed systems

C/S architecture,
with client
FTP No restrictions on operating systems TCP software integrated
into operating
systems

HTTP No restrictions on operating systems TCP B/S architecture

a: LDAP is a domain environment in Linux and used to construct a user authentication


system based on directories.

b: NIS is a domain environment in Linux and can centrally manage the directory service
of system databases.

c: AD is a domain environment in Windows and can centrally manage computers,


servers, and users.

1.4.3 SAN
1.4.3.1 IP SAN Technologies
NIC + Initiator software: Host devices such as servers and workstations use standard NICs
to connect to Ethernet switches. iSCSI storage devices are also connected to the Ethernet
switches or to the NICs of the hosts. The initiator software installed on hosts virtualizes
NICs into iSCSI cards. The iSCSI cards are used to receive and transmit iSCSI data packets,
implementing iSCSI and TCP/IP transmission between the hosts and iSCSI devices. This
mode uses standard Ethernet NICs and switches, eliminating the need for adding other
adapters. Therefore, this mode is the most cost-effective. However, the mode occupies
host resources when converting iSCSI packets into TCP/IP packets, increasing host
operation overheads and degrading system performance. The NIC + initiator software
mode is applicable to scenarios that require the relatively low I/O and bandwidth
performance for data access.
TOE NIC + initiator software: The TOE NIC processes the functions of the TCP/IP protocol
layer, and the host processes the functions of the iSCSI protocol layer. Therefore, the TOE
NIC significantly improves the data transmission rate. Compared with the pure software
mode, this mode reduces host operation overheads and requires minimal network
construction expenditure. This is a trade-off solution.
iSCSI HBA:
 An iSCSI HBA is installed on the host to implement efficient data exchange between
the host and the switch and between the host and the storage device. Functions of
the iSCSI protocol layer and TCP/IP protocol stack are handled by the host HBA,
occupying the least CPU resources. This mode delivers the best data transmission
performance but requires high expenditure.
Data Storage Technology Page 43

 The iSCSI communication system inherits some of SCSI's features. The iSCSI
communication involves an initiator that sends I/O requests and a target that
responds to the I/O requests and executes I/O operations. After a connection is set
up between the initiator and target, the target controls the entire process as the
primary device. An iSCSI target is usually an iSCSI disk array or iSCSI tape library.
 The iSCSI protocol defines a set of naming and addressing methods for iSCSI
initiators and targets. All iSCSI nodes are identified by their iSCSI names. This method
distinguishes iSCSI names from host names.
 iSCSI uses iSCSI Qualified Name (IQN) to identify initiators and targets. Addresses
change with the relocation of initiator or target devices, but their names remain
unchanged. When setting up a connection, an initiator sends a request. After the
target receives the request, it checks whether the iSCSI name contained in the
request is consistent with that bound with the target. If the iSCSI names are
consistent, the connection is set up. Each iSCSI node has a unique iSCSI name. One
iSCSI name can be used in the connections from one initiator to multiple targets.
Multiple iSCSI names can be used in the connections from one target to multiple
initiators.
Logical ports are created based on bond ports, VLAN ports, or Ethernet ports. Logical
ports are virtual ports that carry host services. A unique IP address is allocated to each
logical port for carrying its services.
 Bond port: To improve reliability of paths for accessing file systems and increase
bandwidth, you can bond multiple Ethernet ports on the same interface module to
form a bond port.
 VLAN: VLANs logically divide the physical Ethernet ports or bond ports of a storage
system into multiple broadcast domains. On a VLAN, when service data is being sent
or received, a VLAN ID is configured for the data so that the networks and services
of VLANs are isolated, further ensuring service data security and reliability.
 Ethernet port: Physical Ethernet ports on an interface module of a storage system.
Bond ports, VLANs, and logical ports are created based on Ethernet ports.
IP address failover: A logical IP address fails over from a faulty port to an available port.
In this way, services are switched from the faulty port to the available port without
interruption. The faulty port takes over services back after it recovers. This task can be
completed automatically or manually. IP address failover applies to IP SAN and NAS.
During the IP address failover, services are switched from the faulty port to an available
port, ensuring service continuity and improving the reliability of paths for accessing file
systems. Users are not aware of this process.
The essence of IP address failover is a service switchover between ports. The ports can be
Ethernet ports, bond ports, or VLAN ports.
 Ethernet port–based IP address failover: To improve the reliability of paths for
accessing file systems, you can create logical ports based on Ethernet ports.
Data Storage Technology Page 44

Host services are running on logical port A of Ethernet port A. The corresponding IP
address is "a". Ethernet port A fails and thereby cannot provide services. After IP
address failover is enabled, the storage system will automatically locate available
Ethernet port B, delete the configuration of logical port A that corresponds to
Ethernet port A, and create and configure logical port A on Ethernet port B. In this
way, host services are quickly switched to logical port A on Ethernet port B. The
service switchover is executed quickly. Users are not aware of this process.
 Bond port-based IP address failover: To improve the reliability of paths for accessing
file systems, you can bond multiple Ethernet ports to form a bond port. When an
Ethernet port that is used to create the bond port fails, services are still running on
the bond port. The IP address fails over only when all Ethernet ports that are used to
create the bond port fail.

Multiple Ethernet ports are bonded to form bond port A. Logical port A created
based on bond port A can provide high-speed data transmission. When both Ethernet
ports A and B fail due to various causes, the storage system will automatically locate
bond port B, delete logical port A, and create the same logical port A on bond port B.
In this way, services are switched from bond port A to bond port B. After Ethernet
ports A and B recover, services will be switched back to bond port A if failback is
enabled. The service switchover is executed quickly, and users are not aware of this
process.
Data Storage Technology Page 45

 VLAN-based IP address failover: You can create VLANs to isolate different services.
− To implement VLAN-based IP address failover, you must create VLANs, allocate
a unique ID to each VLAN, and use the VLANs to isolate different services. When
an Ethernet port on a VLAN fails, the storage system will automatically locate an
available Ethernet port with the same VLAN ID and switch services to the
available Ethernet port. After the faulty port recovers, it takes over the services.
− VLAN names, such as VLAN A and VLAN B, are automatically generated when
VLANs are created. The actual VLAN names depend on the storage system
version.
− Ethernet ports and their corresponding switch ports are divided into multiple
VLANs, and different IDs are allocated to the VLANs. The VLANs are used to
isolated different services. VLAN A is created on Ethernet port A, and the VLAN
ID is 1. Logical port A that is created based on VLAN A can be used to isolate
services. When Ethernet port A fails due to various causes, the storage system
will automatically locate VLAN B and the port whose VLAN ID is 1, delete logical
port A, and create the same logical port A based on VLAN B. In this way, the
port where services are running is switched to VLAN B. After Ethernet port A
recovers, the port where services are running will be switched back to VLAN A if
failback is enabled.
− An Ethernet port can belong to multiple VLANs. When the Ethernet port fails, all
VLANs will fail. Services must be switched to ports of other available VLANs. The
service switchover is executed quickly, and users are not aware of this process.

1.4.3.2 IP SAN Protocols – SCSI and iSCSI


SCSI Protocol
Small Computer System Interface (SCSI) is a vast protocol system. The SCSI protocol
defines a model and a necessary instruction set for different devices to exchange
information using the framework.
SCSI reference documents cover devices, models, and links.
SCSI architecture documents discuss the basic architecture models SAM and SPC and
describe the SCSI architecture in detail, covering topics like the task queue model and
basic common instruction model.
SCSI device implementation documents cover the implementation of specific devices,
such as the block device (disk) SBC and stream device (tape) SSC instruction systems.
SCSI transmission link implementation documents discuss FCP, SAS, iSCSI, and FCoE, and
describe in detail the implementation of the SCSI protocol on media.
SCSI Logical Topology
The SCSI logical topology includes initiators, targets, and LUNs.
Initiator: SCSI is essentially a client/server (C/S) architecture in which a client acts as an
initiator to send request instructions to a SCSI target. Generally, a host acts as an
initiator.
Target: processes SCSI instructions. It receives and parses instructions from a host. For
example, a disk array functions as a target.
Data Storage Technology Page 46

LUN: a namespace resource described by a SCSI target. A target may include multiple
LUNs, and attributes of the LUNs may be different. For example, LUN#0 may be a disk,
and LUN#1 may be another device.
The initiator and target of SCSI constitute a typical C/S model. Each instruction is
implemented through the request/response mode. The initiator sends SCSI requests. The
target responds to the SCSI requests, provides services through LUNs, and provides a task
management function.
SCSI Initiator Model
SCSI initiator logical layers in different operating systems:
On Windows, a SCSI initiator includes three logical layers: storage/tape driver, SCSI port,
and mini port. The SCSI port implements the basic framework processing procedures for
SCSI, such as device discovery and namespace scanning.
On Linux, a SCSI initiator includes three logical layers: SCSI device driver, scsi_mod middle
layer, and SCSI adapter driver (HBA). The scsi_mod middle layer processes SCSI device-
irrelevant and adapter-irrelevant processes, such as exceptions and namespace
maintenance. The HBA driver provides link implementation details, such as SCSI
instruction packaging and unpacking. The device driver implements specific SCSI device
drivers, such as the famous SCSI disk driver, SCSI tape driver, and SCSI CD-ROM device
driver.
The structure of Solaris comprises the SCSI device driver, SSA middle layer, and SCSI
adapter driver, which is similar to the structure of Linux/Windows.
The AIX architecture is structured in three layers: SCSI device driver, SCSI middle layer,
and SCSI adaptation driver.
SCSI Target Model
Based on the SCSI architecture, a target includes three layers: port layer, middle layer,
and device layer.
A PORT model in a target packages or unpackages SCSI instructions on links. For
example, a PORT can package instructions into FPC, iSCSI, or SAS, or unpackage
instructions from those formats.
A device model in a target serves as a SCSI instruction analyser. It tells the initiator what
device the current LUN is by processing INQUIRT, and processes I/Os through
READ/WRITE.
The middle layer of a target maintains models such as LUN space, task set, and task
(command). There are two ways to maintain LUN space. One is to maintain a global LUN
for all PORTs, and the other is to maintain a LUN space for each PORT.
SCSI Protocol and Storage Systems
The SCSI protocol is the basic protocol used for communication between hosts and
storage devices.
The controller sends a signal to the bus processor requesting to use the bus. After the
request is accepted, the controller's high-speed cache sends data. During this process, the
bus is occupied by the controller and other devices connected to the same bus cannot use
it. However, the bus processor can interrupt the data transfer at any time and allow other
devices to use the bus for operations of a higher priority.
Data Storage Technology Page 47

A SCSI controller is like a small CPU with its own command set and cache. The special
SCSI bus architecture can dynamically allocate resources to tasks run by multiple devices
in a computer. In this way, multiple tasks can be processed at the same time.
SCSI Protocol Addressing
A traditional SCSI controller is connected to a single bus. Therefore, only one bus ID is
allocated. An enterprise-level server may be configured with multiple SCSI controllers, so
there may be multiple SCSI buses. In a storage network, each FC HBA or iSCSI network
adapter is connected to a bus. A bus ID must therefore be allocated to each bus to
distinguish between them.
To address devices connected to a SCSI bus, SCSI device IDs and LUNs are used. Each
device on the SCSI bus must have a unique device ID. The HBA on the server also has its
own device ID: 7. Each bus, including the bus adapter, supports a maximum of 8 or 16
device IDs. The device ID is used to address devices and identify the priority of the devices
on the bus.
Each storage device may include sub-devices, such as virtual disks and tape drives. As a
result, LUN IDs are used to address sub-devices in a storage device.
A ternary description (bus ID, target device ID, and LUN ID) is used to identify a SCSI
target.

1.4.3.3 FC SAN Technologies


FC HBA: The FC HBA converts SCSI packets into Fibre Channel packets, which does not
occupy host resources.
Here are some key concepts in FC networking:
 Fibre Channel Routing (FCR) provides connectivity to devices in different fabrics
without merging the fabrics. Different from E_Port cascading of common switches,
after switches are connected through an FCR switch, the two fabric networks are not
converged and are still two independent fabrics. The link switch between two fabrics
functions as a router.
 FC Router: a switch running the FC-FC routing service.
 EX_Port: A type of port that functions like an E_Port, but does not propagate fabric
services or routing topology information from one fabric to another.
 Backbone fabric: fabric of a switch running the FC router service.
 Edge fabric: fabric that connects a Fibre Channel router.
 Inter-fabric link (IFL): the link between an E_Port and an EX-Port, or a VE_Port and a
VEX-Port.
A zone is a set of ports or devices that communicate with each other. A zone member
can only access other members of the same zone. A device can reside in multiple zones.
You can configure basic zones to control the access permission of each device or port.
Moreover, you can set traffic isolation zones. When there are multiple ISLs (E_Ports), an
ISL only transmits the traffic destined for ports that reside in the same traffic isolation
zone.
Data Storage Technology Page 48

1.4.3.4 FC SAN Protocols – FC and FCoE


1.4.3.4.1 FC Protocol
FC can be referred to as the FC protocol, FC network, or FC interconnection. As FC
delivers high performance, it is becoming more commonly used for front-end host access
on point-to-point and switch-based networks. Like TCP/IP, the FC protocol suite also
includes concepts from the TCP/IP protocol suite and the Ethernet, such as FC switching,
FC switch, FC routing, FC router, and SPF routing algorithm.
FC protocol structure:
 FC-0: defines physical connections and selects different physical media and data rates
for protocol operations. This maximizes system flexibility and allows for existing
cables and different technologies to be used to meet the requirements of different
systems. Copper cables and optical cables are commonly used.
 FC-1: records the 8-bit/10-bit transmission code to balance the transmission bit
stream. The code can also serve as a mechanism to transfer data and detect errors.
Its excellent transfer capability of 8-bit/10-bit encoding helps reduce component
design costs and ensures optimum transfer density for better clock recovery. Note: 8-
bit/10-bit encoding is also applicable to IBM ESCON.
 FC-2: includes the following items for sending data over the network.
− How data should be split into small frames
− How much data should be sent at a time (flow control)
− Where frames should be sent (including defining service levels based on
applications)
 FC-3: defines advanced functions such as striping (data is transferred through
multiple channels), multicast (one message is sent to multiple targets), and group
query (multiple ports are mapped to one node). When FC-2 defines functions for a
single port, FC-3 can define functions across ports.
 FC-4: maps upper-layer protocols. FC performance is mapped to an IP address, a SCSI
protocol, or an ATM protocol. SCSI is a subset of the FC protocol.
Like the Ethernet, FC provides the following network topologies:
 Point-to-point:
The simplest topology that allows direct communication between two nodes (usually
a storage device and a server).
 FC-AL:
− Similar to the Ethernet shared bus topology but is in arbitrated loop mode rather
than bus connection mode. Each device is connected to another device end to
end to form a loop.
− Data frames are transmitted hop by hop in the arbitrated loop and the data
frames can be transmitted only in one direction at any time. As shown in the
figure, node A needs to communicate with node H. After node A wins the
arbitration, it sends data frames to node H. However, the data frames are
transmitted clockwise in the sequence of B-C-D-E-F-G-H, which is inefficient.
 Fabric:
Data Storage Technology Page 49

− Similar to an Ethernet switching topology, a fabric topology is a mesh switching


matrix.
− The forwarding efficiency is much greater than in FC-AL.
− FC devices are connected to fabric switches through optical fibres or copper
cables to implement point-to-point communication between nodes.
FC frees the workstation from the management of every port. Each port manages its own
point-to-point connection to the fabric, and other fabric functions are implemented by FC
switches. There are seven types of ports in FC networks.
 Device (node) port:
− N_Port: Node port. A fabric device can be directly attached.
− NL_Port: Node loop port. A device can be attached to a loop.
 Switch port:
− E_Port: Expansion port (connecting switches).
− F_Port: A port of a fabric device used to connect to the N_Port.
− FL_Port: Fabric loop port.
− G_Port: A generic port that can be converted into an E_Port or F_Port.
− U_Port: A universal port used to describe automatic port detection.
FCoE Protocol
FCoE: defines the mapping from FC to IEEE 802.3 Ethernet. It uses the Ethernet's physical
and data link layers and the FC's network, service, and protocol layers.
FCoE has the following characteristics:
 Organization: Submitted to the American National Standards Institute (ANSI) T11
committee for approval in 2008. Cooperation with the IEEE is required.
 Objective: To use the scalability of the Ethernet and retain the high reliability and
efficiency of FC
 Other challenges: When FC and the Ethernet are used together, there can be
problems related to packet loss, path redundancy, failover, frame segmentation and
reassembly, and non-blocking transmission.
 FC delivers poor compatibility and is not applicable for long-distance transmission.
FCoE has the same problems.
FCoE retains the protocol stack above FC-2 and replaces FC-0 and FC-1 with the
Ethernet's link layer. The original FC2 is further divided into the following:
 FC-2V: FC2 virtual sub-layer
 FC-2M: FC2 multiplexer sub-layer
 FC-2P: FC2 virtual physical layer
The FC_BB_E mapping protocol requires that FCoE uses lossless Ethernet for transmission
at the bottom layer and carries FC frames in full-duplex and lossless mode. The Ethernet
protocol is used on the physical line.
Comparison between FC and FCoE:
 FC-0 defines the bearer media type, and FC-1 defines the frame encoding and
decoding mode. The two layers need to be defined during transmission over the FC
Data Storage Technology Page 50

SAN network. FCoE runs on the Ethernet. Therefore, the Ethernet link layer replaces
the preceding two layers.
 Different environments: The FC protocol runs on the traditional FC SAN storage
network, while FCoE runs on the Ethernet.
 Different channels: The FC protocol runs on the FC network, and all packets are
transmitted through FC channels. There are various protocol packets, such as IP and
ARP packets, on the Ethernet. To transmit FCoE packets, a virtual FC needs to be
created on the Ethernet.
 Compared with the FC protocol, the FIP initialization protocol is used for FCoE to
obtain the VLAN, establish a virtual channel with an FCF, and maintain virtual links.
FCoE requires the support of other protocols. The Ethernet tolerates packet loss, but the
FC protocol does not. As the FC protocol for transmission on the Ethernet, FCoE inherits
this feature that packet loss is not allowed. To ensure that FCoE runs properly on an
Ethernet network, the Ethernet needs to be enhanced to prevent packet loss. The
enhanced Ethernet is called Converged Enhanced Ethernet (CEE).

1.4.3.5 Comparison Between IP SAN and FC SAN


Review:
 Protocol: Fibre Channel/iSCSI. The SAN architectures that use the two protocols are
FC SAN and IP SAN.
 Raw device access: suitable for traditional database access.
 Dependence on the application host to provide file access. Share access requires the
support of cluster software, which causes high overheads in processing access
conflicts, resulting in poor performance. In addition, it is difficult to support sharing
in heterogeneous environments.
 High performance, high bandwidth, and low latency, but high cost and poor
scalability
Comparison between FC SAN and IP SAN:
 To solve the poor scalability issue of DAS, storage devices can be networked using FC
SAN to support connection to more than 100 servers.
 IP SAN is designed to address the management and cost challenges of FC SAN. IP
SAN requires only a few hardware configurations and the hardware is widely used.
Therefore, the cost of IP SAN is much lower than that of FC SAN. Most hosts have
been configured with appropriate NICs and switches, which are also suitable
(although not perfect) for iSCSI transmission. High-performance IP SAN requires
dedicated iSCSI HBAs and high-end switches.

1.4.4 Distributed Storage


A distributed storage system organizes local HDDs and SSDs of general-purpose servers
into large-scale storage resource pools, and then distributes data to multiple data storage
servers.
10GE, 25GE, and IB networks are generally used as the backend networks of distributed
storage. The frontend network is usually a GE, 10GE, or 25GE network.
Network planes and their functions:
Data Storage Technology Page 51

 Management plane: interconnects with the customer's management network for


system management and maintenance.
 BMC plane: connects to management ports of management or storage nodes to
enable remote device management.
 Storage plane: An internal plane used for service data communication among all
nodes in the storage system.
 Service plane: interconnects with customers' applications and accesses storage
devices through standard protocols such as iSCSI and HDFS.
 Replication plane: enables data synchronization and replication among replication
nodes.
 Arbitration plane: communicates with the HyperMetro quorum server. This plane is
planned only when the HyperMetro function is planned for the block service.
 Key software components and their functions:
 FusionStorage Manager (FSM): a management process of Huawei distributed storage
that provides operation and maintenance (O&M) functions, such as alarm
management, superviseing, log management, and configuration. It is recommended
that this module be deployed on two nodes in active/standby mode.
 Virtual Block Service (VBS): a process that provides the distributed storage access
point service through SCSI or iSCSI interfaces and enables application servers to
access distributed storage resources
 Object Storage Device (OSD): a component of Huawei distributed storage for storing
user data in distributed clusters.
 REP: data replication network.
 Enterprise Data Service (EDS): a component that processes I/O services sent from
VBS.

1.4.5 Quiz
6. (Multiple-answer question) Which of the following networks are included in a
distributed storage network topology?
A. Management network
B. Front-end service network
C. Front-end storage network
D. Back-end storage network
7. (Multiple-answer question) Which of the following protocols are commonly used for
SAN?
A. Fibre Channel
B. iSCSI
C. CIFS
D. NFS
8. (Multiple-answer question) Which of the following statements are true about SAN
and NAS?
Data Storage Technology Page 52

A. SAN is mostly used in databases to store structured data.


B. NAS is used to store unstructured data. For example, it is used in shared file
servers for sharing key office documents and centralized storage among
departments.
C. DAS, NAS, and SAN are all external storage that can easily implement storage
resource sharing.
D. NAS places a server's file system in the memory so that the server can share
data over a network.
9. (True or false) SCSI only transfers data between a server and a storage device.
10. (True or false) iSCSI results in a higher data transmission latency than Fibre Channel.

1.5 Common Storage Protocols


1.5.1 SAS and SATA
1.5.1.1 SAS
SAS, also called Serial Attached SCSI, is a serial SCSI standard. A serial SAS interface is
simple and hot swappable, and allows a high transfer speed and processing efficiency.
SAS cables are narrower than parallel SCSI cables and address the electronic interference
problem resulting from the latter. As a result, SAS cables save space and improve heat
dissipation and ventilation for servers that use SAS disks.
Advantages of SAS:
 Lower cost
− A SAS backplane supports both SAS and SATA disks, reducing the cost of using
different types of disks.
− There is no need to design different products based on the SCSI and SATA
standards. Cost is further reduced as cabling becomes simpler and PCB layers
become fewer.
− System integrators no longer need to purchase different backplanes and cables
for different disks.
− More devices can be connected.
− A SAS expander expands the number of devices that you can connect together in
a SAS system. Each expander can connect to multiple ports. Each port can
connect to SAS devices, hosts, and other SAS expanders.
 High reliability
− SAS device reliability is equal to that of SCSI and Fibre Channel disks and is
superior to that of SATA disks.
− Like the older parallel SCSI, SAS also uses the verified SCSI command set.
 High performance
− SAS provides a high unidirectional port rate.
Data Storage Technology Page 53

 Compatibility with SATA


− SATA disks can be installed directly in a SAS environment.
− Mixed use of SATA and SAS disks in the same system is in line with the prevalent
tiered storage strategy.
− The SAS architecture comprises six layers. From the lowest to the highest they
are the physical layer, PHY layer, link layer, port layer, transport layer, and
application layer. Each layer provides specific functions.
 Physical layer: defines hardware, such as cables, connectors, and transceivers.
 PHY layer: contains the lowest-level protocols, such as encoding schemes and power
supply and reset sequences.
 Link layer: describes how to control connection management at the PHY layer,
primitives, CRC, scrambling and descrambling, and rate matching.
 Port layer: describes the link layer and transport layer interfaces, including how to
request, disconnect, and form connections.
 Transport layer: defines how to encapsulate transmitted commands, status, and data
into SAS frames and how to decapsulate SAS frames.
 Application layer: describes in detail how to use SAS for a variety of applications.
Characteristics of SAS:
 SAS implements full duplex (bidirectional) communications as each SAS cable
contains four SAS physical links, including two input links and two output links. This
way, SAS can read and write data at the same time, improving data throughput. In
contrast, the older parallel SCSI implements unidirectional communications. When a
device receives and needs to respond to a data packet sent from the parallel SCSI, a
new SCSI communication link must be established after the previous link is
disconnected.
 SAS improves over parallel SCSI in the following ways:
− SAS uses the serial communication mode to provide higher throughput and has
the potential to enable further performance enhancement in the future.
− Four narrow ports can be aggregated into a wide port to provide higher
throughput.
Scalability of SAS:
 SAS uses expanders to provide flexible scalability. One SAS domain supports a
maximum of 16,384 disk devices.
 A SAS expander is an interconnection device in a SAS domain and functions similarly
to an Ethernet switch. A SAS expander expands the number of end devices that you
can connect together in a SAS system, reducing the cost in HBAs. Each expander can
connect to a maximum of 128 end devices or 128 expanders. A SAS domain primarily
comprises SAS expanders, end devices, and connectors (SAS cables).
− A SAS expander tracks the addresses of all SAS drives via a routing table.
− End devices include an initiator (usually a SAS HBA) and targets (SAS or SATA
disk, or HBAs in target mode).
Data Storage Technology Page 54

 A loop cannot be formed in a SAS domain to ensure end devices can be normally
identified.
 In practice, to ensure high bandwidth, the number of end devices connected to an
expander is far fewer than 128.
SAS cables and connections:
 Most storage vendors now use SAS cables for connections between disk enclosures
and between disk and controller enclosures. A SAS cable aggregates four
independent physical links (narrow ports) into a wide port to provide higher
bandwidth. Each of the four independent links can transmit at 12 Gbit/s. As a result,
a wide port can transmit at 48 Gbit/s. To keep the data flow over a SAS cable under
its maximum bandwidth, the total number of disks connected to a SAS loop must be
limited.
 For Huawei storage devices, a SAS loop comprises up to 168 disks, or seven 24-slot
disk enclosures, but all disks in the loop must be traditional SAS disks. However,
SSDs are becoming more common as they provide much higher transmission speeds
than SAS disks. As the best practice, a loop comprises up to 96 SSDs, or four 24-slot
disk enclosures.
 SAS cables include mini-SAS cables (6 Gbit/s per link) and mini-SAS High Density
cables (12 Gbit/s per link).

1.5.1.2 SATA
SATA, also called Serial ATA, is a type of computer bus used to transfer data between a
main board and many storage devices, such as disks and CD-ROM drives. SATA uses a
brand new bus architecture but not just improves over PATA, also called Parallel ATA.
At the physical layer, the SAS interface is compatible with the SATA interface. SATA is
literally a subordinate interface standard of SAS and a SAS controller can directly control
SATA disks. Therefore, SATA disks can be used directly in a SAS environment. However,
SAS disks cannot be used directly in a SATA environment in that a SATA controller cannot
control SAS disks.
At the protocol layer, SAS supports three types of protocols that are used to transfer data
between specific types of interconnected devices.
 Serial SCSI Protocol (SSP): for transmitting SCSI commands
 Serial Management Protocol (SMP): for managing and maintaining interconnected
devices
 Serial ATA Tunneled Protocol (STP): for data transfer between SAS and SATA
With these three protocols, SAS works seamlessly with SATA and some SCSI devices.

1.5.2 PCIe and NVMe


1.5.2.1 PCIe
Intel first proposed the concept of PCI in 1991.
Characteristics of PCI:
 The bus architecture is simple, resulting in low costs and easy designs.
Data Storage Technology Page 55

 The parallel bus provides poor scalability and limits the number of connected
devices.
 Connections between multiple devices considerably undermine the effective bus
bandwidth and slow down data transfer.
Advances in processor technology are driving the replacement of parallel bus with high-
speed differential bus in the interconnection field. In contrast to single-ended parallel
signals, high-speed differential signals are used for higher clock rates. This led to the
birth of the PCIe bus.
PCIe, also called PCI Express, is a high-performance and high-bandwidth serial
communication and interconnection standard. It was first proposed by Intel and then
developed by the Peripheral Component Interconnect Special Interest Group (PCI-SIG) to
replace the bus-based communication architecture.
PCIe improves over the traditional PCI bus in the following ways:
 Dual simplex channel, high bandwidth, and high transfer speeds:
− A transmission mode (separated RX and TX) similar to the full duplex mode is
implemented.
− Higher transfer speeds are provided: 2.5 Gbit/s in PCIe 1.0, 5 Gbit/s in PCIe 2.0, 8
Gbit/s in PCIe 3.0, 16 Gbit/s in PCIe 4.0, and up to 32 Gbit/s in PCIe 5.0.
− Bandwidth can increase linearly with link quantity.
 Compatibility:
− PCIe is compatible with PCI at the application layer. Upgraded PCIe versions are
also compatible with existing PCI software.
 Ease-of-use:
− PCIe provides the hot-swap functionality. A PCIe bus interface slot contains the
hot-swap detection signal, supporting hot-swap and heat exchange like a USB.
 Error processing and reporting:
− A PCIe bus uses a layered architecture where the application layer processes and
reports errors.
 Multiple virtual lanes in each physical link:
− Each physical link contains multiple virtual lanes (in theory, eight virtual lanes
are allowed for independent communication control). This way, PCIe supports
QoS for each virtual lane and provides excellent traffic quality control.
 Lower I/O pin count, smaller physical footprint, and reduced crosstalk:
− A typical PCI bus data cable has at least 50 I/O pins, while a PCIe x1 link has
only four I/O pins. Fewer I/O pins mean a smaller physical footprint, greater
clearance between I/O pins, and reduced crosstalk.
Why PCIe? The PCIe standard is still evolving with an outlook to provide higher
throughput for future systems by leveraging the latest technologies. In addition, PCIe is
also being developed to facilitate the transition from PCI to PCIe by remaining back-
compatible with existing PCI software using layered protocols and drivers. PCIe features
point-to-point connection, high reliability, tree network, full duplex, and frame-based
transmission.
Data Storage Technology Page 56

The PCIe architecture comprises the physical layer, data link layer, transaction layer, and
application layer.
 The physical layer determines the physical characteristics of the bus. In future
development, the performance of a PCIe bus can be further improved by increasing
the speed or changing the encoding or decoding scheme. Such changes only affect
the physical layer, facilitating upgrades.
 The data link layer plays a vital role in ensuring the correctness and reliability of data
packets transmitted over a PCIe bus. The data link layer checks whether a data
packet is completely and correctly encapsulated, adds the SN and CRC code to the
data, and then uses the ACK/NACK handshake protocol for error detection and
correction.
 The transaction layer receives read and write requests sent from the application
layer, or itself creates an encapsulated request packet and transmits it to the data
link layer. This type of data packet is called a transaction layer packet (TLP). The
transaction layer also receives data link layer packets (DLLPs) sent from the link
layer, associates them with related software requests, and then transmits them to
the application layer for processing.
 The application layer is designed by users based on actual needs. Other layers must
comply with the protocol requirements.

1.5.2.2 NVMe
NVMe, also called Non-Volatile Memory Express, is designed for PCIe SSDs. A direct
connection between the native PCIe lane and the CPU can eliminate the latency caused
by communications between the external controller (PCH) of the SATA and SAS
interfaces and the CPU.
NVMe serves as a logical protocol interface, a command standard, and a protocol
throughout the entire storage process. By fully utilizing the low latency and parallelism of
PCIe lanes and the parallelism of modern processors, platforms, and applications, NVMe
aims to remarkably improve the read and write performance of SSDs, optimize the high
latency caused by advanced host controller interfaces (AHCIs) at controllable storage
costs, and ultimately unleash greater performance for SSDs in a SATA environment.
NVMe protocol stack:
 I/O transmission path
− A SAS-based all-flash storage array: I/Os are transmitted from a front-end server
to a CPU through a front-end interface protocol (Fibre Channel or IP), then from
the CPU to a SAS chip through a PCIe link and a PCIe switch, and further to a
SAS expander and a SAS SSD.
− A Huawei NVMe-based all-flash storage system that supports end-to-end NVMe:
Data I/Os are transmitted from a front-end server to a CPU through a front-end
interface protocol (FC-NVMe or NVMe over RDMA). Then, data is transmitted
directly to an NVMe SSD through 100G RDMA. CPUs of the NVMe-based all-
flash storage system communicate directly with NVMe SSDs via a shorter
transmission path, reducing latency and improving transmission efficiency.
 Software protocol parsing
Data Storage Technology Page 57

− SAS- and NVMe-based all-flash storage systems differ greatly in protocol


interaction during data writes. If the SAS back-end SCSI protocol is used, four
protocol interactions are required to complete a data write operation. However,
Huawei NVMe-based all-flash storage systems require only two protocol
interactions, making them twice as efficient as SAS-based all-flash storage
systems in terms of processing write requests.
Advantages of NVMe:
 Low latency: Data is not read from registers when commands are executed, resulting
in a low I/O latency.
 High bandwidth: PCIe x4 can provide up to 4 Gbit/s throughput per driver.
 High IOPS: NVMe increases the maximum queue depth from 32 to 64,000,
remarkably improving the IOPS capability of SSDs.
 Low power consumption: The automatic switchover between power consumption
modes and dynamic power management greatly reduce power consumption.
 Wide driver applicability: NVMe has addressed the driver inapplicability problem
between different PCIe SSDs.
Huawei OceanStor Dorado all-flash storage systems use NVMe-oF to share SSDs, and
provide 32G FC-NVMe and NVMe over 100G RDMA networking modes. This way, the
same network protocol is used for front-end network connections, back-end disk
enclosure connections, and scale-out controller interconnections.
Remote Direct Memory Access (RDMA) uses appropriate hardware and network
technologies to allow remote direct memory access between NICs of servers, resulting in
high bandwidth, low latency, and low resource consumption. However, the RDMA-
dedicated IB network architecture is incompatible with a live network, making it
expensive. RDMA over Converged Ethernet (RoCE) effectively resolves this problem. RoCE
is a network protocol that implements RDMA over an Ethernet network. There are two
RoCE versions, RoCE v1 and RoCE v2. RoCE v1 is an Ethernet link layer protocol and does
not allow communications in different Ethernet broadcast domains. RoCE v2 is a network
layer protocol which means that RoCE v2 packets can be routed.

1.5.3 RDMA and IB


1.5.3.1 RDMA
RDMA, also called Remote Direct Memory Access, is a method of transferring data
between buffers of applications running on servers over a network.
Comparison between the traditional DMA and RDMA:
 The traditional DMA transfers I/Os over its internal bus, while RDMA transfers data
directly between buffers of applications at two endpoints over a network.
 Unlike the traditional network transmission mode, RDMA transfers data without the
involvement of an operating system or protocol stack.
 RDMA implements end-to-end data transmission with ultra-low latency and ultra-
high throughput without the involvement of CPU and operating system resources. As
a result, resources are saved for data processing and migration.
Data Storage Technology Page 58

Currently, there are three types of RDMA networks: IB, RoCE, and iWARP. IB is designed
for RDMA to ensure hardware-based reliable transmission. RoCE and iWARP run over
Ethernet. All three support specific verbs APIs.
 IB is a next-generation network protocol that has supported RDMA since its
emergence. NICs and switches that support this technology are required.
 RoCE is a network protocol that allows RDMA over an Ethernet network. RoCE has a
lower Ethernet network header and a higher IB network header (containing data).
This way, RoCE allows RDMA on standard Ethernet infrastructure (switches). NICs
are unique and must support RoCE. RoCE v1 is an RDMA protocol implemented
based on the Ethernet link layer. Switches must support flow control technologies,
such as priority-based flow control (PFC), to ensure reliable transmission at the
physical layer. RoCE v2 is implemented at the UDP layer in the Ethernet TCP or IP
protocol architecture.
 iWARP is layered on the Transmission Control Protocol (TCP) to implement RDMA.
The functions supported by IB and RoCE are not supported by iWARP. This way,
iWARP allows RDMA on standard Ethernet infrastructure (switches). NICs must
support iWARP if CPU offload is used. Otherwise, all iWARP stacks can be
implemented in the switches, and most RDMA performance advantages become
void.

1.5.3.2 IB
The IB technology is designed for interconnections and communications between servers
(for example, replication and distributed work), between a server and a storage device
(for example, SAN and DAS), and between a server and a network (for example, LAN,
WAN, and the Internet).
IB defines a set of devices used for system communications, including channel adapters
(CAs), switches, and routers. CAs are used to connect to other devices and include host
channel adapters (HCAs) and target channel adapters (TCAs).
Characteristics of IB:
 Standard-based protocol: IB is an open standard designed by the InfiniBand Trade
Association (IBTA), which was established in 1999 and has 225 member companies.
Principal IBTA members include Agilent, Dell, HP, IBM, InfiniSwitch, Intel, Mellanox,
Network Appliance, and Sun Microsystems. More than 100 other members also
support the development and promotion of IB.
 Speed: IB provides high speeds.
 Memory: Servers that support IB use HCAs to convert the IB protocol to an internal
PCI-X or PCI-Xpress bus. An IB HCA supports RDMA and is also called a kernel
bypass. RDMA also applies to clusters. It uses a virtual addressing solution to allow a
server to identify and use memory resources provided by other servers without the
involvement of any OS kernels.
 Transport offload: RDMA facilitates transport offload that moves data packet routing
from an operating system to a chip, reducing workloads for the processor. An 80
GHz processor is required to process data at a transmission speed of 10 Gbit/s in the
operating system.
Data Storage Technology Page 59

An IB system comprises CAs, a switch, a router, a repeater, and connected links. CAs
include HCAs and TCAs.
 An HCA connects a host processor to the IB architecture.
 A TCA connects an I/O adapter to the IB architecture.
IB in storage: The IB front-end network is used to exchange data with clients and
transmit data through the IPoIB protocol. The IB back-end network is used for data
interactions between nodes in a storage system. The RPC module uses RDMA to
synchronize data between nodes.
The IB architecture comprises the application layer, transport layer, network layer, link
layer, and physical layer. The following describes the functions of each layer:
 The transport layer is responsible for in-order packet delivery, partitioning, channel
multiplexing, and transport services. The transport layer also sends, receives, and
reassembles data packet segments.
 The network layer handles routing of packets from one subnet to another. Each of
the packets that are sent between source and target nodes contains a Global Route
Header (GRH) and a 128-bit IPv6 address. A standard global 64-bit identifier is also
embedded at the network layer and this identifier is unique among all subnets.
Through the exchange of such identifier values, data can be transmitted across
multiple subnets.
 The link layer encompasses packet layout, point-to-point link operations, and
switching within a local subnet. There are two types of packets within the link layer,
data transmission and network management packets. Network management packets
are used for operation control, subnet indication, and fault tolerance in relation to
device enumeration. Data transmission packets are used for data transmission. The
maximum size of each packet is 4 KB. In each specific device subnet, the direction
and exchange of each packet are implemented by a local subnet manager with a 16-
bit identifier address.
 The physical layer defines three link speeds, 1X, 4X, and 12X. Each individual link
provides a signaling rate of 2.5 Gbit/s, 10 Gbit/s, and 30 Gbit/s, respectively. IBA
therefore allows multiple connections at 30 Gbit/s. In the full duplex serial
communication mode, a 1X bidirectional connection only requires 4 wires and a 12X
connection requires only 48 wires.

1.5.4 Quiz
1. (Multiple-answer question) Which of the following are Fibre Channel network
topologies?
A. Fibre Channel arbitrated loop (FC-AL)
B. Fibre Channel point-to-point (FC-P2P)
C. Fibre Channel switch (FC-SW)
D. Fibre Channel dual-switch
2. (Multiple-answer question) Which PCIe versions are available?
A. PCIe 1.0
B. PCIe 2.0
Data Storage Technology Page 60

C. PCIe 3.0
D. PCIe 4.0
3. (Multiple-answer question) Which of the following are file sharing protocols?
A. HTTP
B. iSCSI
C. NFS
D. CIFS
4. (Multiple-answer question) Which of the following operations are involved in the
CIFS protocol?
A. Protocol handshake
B. Security authentication
C. Shared connection
D. File operation
E. Disconnection

5. (Multiple-answer question) Which NFS versions are available?


A. NFSv1
B. NFSv2
C. NFSv3
D. NFSv4

You might also like