Reliability, Availability, and Serviceability On PowerMax 2500 and 8500 Arrays
Reliability, Availability, and Serviceability On PowerMax 2500 and 8500 Arrays
H19561
White Paper
Abstract
This white paper provides an overview of the reliability, availability, and
serviceability hardware and software features of the PowerMax 2500
and 8500 storage systems.
Copyright
The information in this publication is provided as is. Dell Inc. makes no representations or warranties of any kind with respect
to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular
purpose.
Use, copying, and distribution of any software described in this publication requires an applicable software license.
Copyright © 2023 Dell Inc. or its subsidiaries. Published in the USA April 2023 H19561.
Dell Inc. believes the information in this document is accurate as of its publication date. The information is subject to change
without notice.
Contents
Executive summary ........................................................................................................................ 4
Introduction ..................................................................................................................................... 5
Terminology ..................................................................................................................................... 5
Active-active architecture............................................................................................................... 8
Replication ..................................................................................................................................... 26
Serviceability ................................................................................................................................. 27
Unisphere ....................................................................................................................................... 28
Summary ........................................................................................................................................ 30
References ..................................................................................................................................... 31
Executive summary
Overview Today’s mission-critical environments demand more than redundancy. They require non-
disruptive operations, non-disruptive upgrades, and being “always online.” They require
high-end performance - handling all workloads, predictable or not, under all conditions.
They require the added protection of increased data availability provided by local
snapshot replication and continuous remote replication.
Reliability, availability, and serviceability (RAS) features are crucial for enterprise
environments that require always-on availability. PowerMax arrays are architected for six-
nines (99.9999%) availability. The many redundant features discussed in this document
are factored into the calculation of overall system availability. This includes redundancy in
the back end, cache memory, front-end, and fabric, as well as the types of RAID
protections given to volumes on the back end. Calculations may also include time to
replace failed or failing FRUs (field replaceable units). In turn, this also considers
customer service levels, replacement rates of the various FRUs, and hot sparing
capability for drives.
We value your Dell Technologies and the authors of this document welcome your feedback on this
feedback document. Contact the Dell Technologies team by email.
Note: For links to other documentation for this topic, see the Dell Technologies Info Hub.
Introduction
PowerMax 2500 and 8500 arrays are built on the 30-year legacy of PowerMax and
Symmetrix core platform reliability, availability, and serviceability.
This makes PowerMax 2500 and 8500 series arrays the ideal choice for critical
applications and 24x7x365 environments that demand uninterrupted access to
information.
PowerMax array components have a mean time between failure (MTBF) of several
hundred thousand to millions of hours for a minimal component failure rate. A redundant
design allows systems to remain online and operational during component replacement.
All critical components are fully redundant, and PowerMax OS constantly scans and
reports on all component and environmental conditions.
PowerMax OS validates the integrity of data throughout its lifetime in the array.
Terminology
PowerMax 2500 & 8500 PowerMax 2000 &8000
Dynamic Media Enclosure (DME) – Drive enclosure Disk Array Enclosure (DAE)
connected to the dynamic fabric that provides storage
capacity of 48 drive slots per DME and can scale
independent of compute.
PowerMax 2500 PowerMax 2500 with a maximum of 4 PBe that can operate in open systems, mainframe,
or mixed environments. This is an NVMe backend array that supports one or two node
pairs.
This array must maintain a 1:1 ratio between the number of node pairs and DMEs.
Customers can scale out or add a node pair and DME to an existing minimum
configuration system with one node pair and one DME. The maximum system
configuration is two node pairs and two DMEs.
Memory: 2667MHz DDR4 per node pair (48 slots) with up to 24 PMEM slots per node
pair.
Boot Drive: The system has a 64G boot drive and an embedded battery to support vault.
Connectivity: The system supports up to 64 FE ports and the system has an integrated
service processor.
Interconnect: There is no InfiniBand switch in the PowerMax 2500. It uses 100Gb/s inter-
director links with a PCIe (PCI Express) Gen3 NVMe Backend.
Scale: Up to two node pairs per system which supports up to 96 2.5” NVMe SSD drives.
Flex RAID supports RAID 1 (mirroring 1+1), RAID 5 4+1, 8+1 and 12+1 and RAID 6
supports 8+2 and 12+2 with one spare per DG (Disk Groups,) for 50 drives and two per
100. Supports up to eight IOMs per node pair.
Data Efficiency: The array provides Inline backend compression and deduplication with
hardware assist. Data at rest encryption (D@RE) with SEDs (self-encrypting drives).
PowerMax 8500 PowerMax 8500 provides a maximum capacity of 18PBe that can operate in open
systems, mainframe, or mixed environments. The PowerMax 8500 is the most scalable
NVMe backend array, and scales from two to eight node pairs.
The PowerMax 8500 can have one to eight node pairs per system. Node pairs and DMEs
can be added (scaled out) independently if there is at least a node pair to DME ratio of 1:2
or 2:1. The maximum system configuration is eight node pairs and eight DMEs.
Memory: 48 DDR4 DRAM slots per node pair and PMEM for metadata.
Boot Drive: 64G boot drive and an embedded battery to support vault.
Connectivity: The system supports up to 256 FE ports and has an integrated service
processor.
Interconnect: The 8500 uses an EDR InfiniBand Fabric with 100Gb/s links on a 36-port
switch with 100Gb/s active optical cables with integrated QSFP transceivers and two
InfiniBand fabric access modules (FAM) per node pair.
Scale: Up to eight node pairs per system are supported, giving 384 NVMe drives per
system when the full eight Fornax DMEs are installed/populated.
Configuration: The base system is one node pair + one DME + two Fabric Switches +
SPS pair + Service Tray and PDU (Power Distribution Unit) pair. Node pairs and DMEs
can be added independently. Flex RAID supports RAID1 1:1, RAID 5 8+1 and 12+1 and
RAID 6: 8+2 and 12+2 with one spare per Disk Group.
Data Efficiency: Inline backend compression and de-dupe with offload board. Data at
Rest Encryption (D@RE) with SED (Self Encrypting Drives) backend.
Building blocks In combination with the above configurations, the basic building blocks of PowerMax 2500
and hardware and 8500 arrays are a node pair and a Dynamic Media Enclosure (DME). System builds
expansion also include:
• Flexible RAID: Provides more capacity and flexible upgrades
• Dynamic Fabric: Provides direct access across the system from any node
• Support for open systems and mainframe environments
• Self-encrypting Drives (SEDs)
• Hardware Root of Trust (HWRoT)
• Persistent Memory (PMEM)
Active-active architecture
Both PowerMax 2500 and 8500 series arrays are designed with a multi-node active-active
scale out architecture using industry standard, end-to-end NVMe components. This gives
direct access across the system from any node.
• Direct access to system metadata on any node
• Direct access to data cache on any node
• Direct access to data on any drive/media
Figure 3. Dynamic dual fabric connectivity with multi-Node and DME access
The dynamic dual fabric using new internal NVMe-oF topologies, featuring NVMe/RDMA
using 100GbE InfiniBand (NVMe/IB) and NVMe/PCIe fabrics, allows the compute and
backend storage elements to exist as independent end points on a large internal storage
area network. These individual compute and storage endpoints can be placed into shared
logical and physical resource pools, to disaggregate the storage and compute resources
in the system.
In this active-active architecture, all node endpoints can access all storage endpoints in
the DMEs, using the system’s high speed NVMe-oF topologies, coupled with Mellanox
Bluefield SoC (system on chip) that supports hardware accelerated NVMe-oF directly to
NVMe drives. This creates an ‘any to any’ node/drive access, making the platform a true
active/active and a share everything system design.
The redundant InfiniBand Virtual Matrix allows the directors to communicate with each
other directly.
The directors are operating in parallel on separate tasks, making PowerMax a Massively
Parallel Processing (MPP) system (up to 576 CPU cores).
It also allows the emulation code on each director to access memory on all other
directors, forming a single global memory address space.
The memory on the other directors is addressable but slightly slower, making this a Non-
Uniform Memory Access (NUMA) system.
Note: Dynamic Fabric on PowerMax2500 is added with the addition of second node pair, with no
fabric modules in single node pair systems. Fabric modules are cross connected (PCIe) within the
node pair of a PowerMax2500, with drives PCIe attached with RDMA (Remote Direct Memory
Access) access through fabric modules.
Cache In PowerMax 2500/8500 systems, the cache sub-system is a services layer responsible
for allocation of cache/memory resources to the IO flows and internal “non-data” flows
within PowerMax Operating System.
Memory utilization has been increased and cache density improved along with providing
optimal memory locality for performance-sensitive workloads in the new arrays.
The cache sub-system is also responsible for ensuring control slot/metadata consistency
and recovery across any kinds of software/hardware faults.
Cache is mirrored within the node pair, using dual cast on PCIe. Both PowerMax 2500
and 8500 systems CPU uses Inter-Node bus for fast write mirroring across nodes using
DMA (Direct Memory Access).
Cache operations on the dynamic fabric uses RDMA (Remote Direct Memory Access).
CPUs on the cache nodes are not needed for cache access over fabric.
Fabric Attach Modules (FAMs) use DMA from remote node cache. FAMs offloads
interconnect and data access and can also offload remote drive reads (reads into cache
on remote nodes).
Memory for write-miss IO is typically allocated out of Mirrored cache and also services
reads on cached writes.
Memory for read miss and prefetch is typically allocated out of non-mirrored cache. This
reduces vault infrastructure and improves cache efficiency.
There are numerous code scans that work together to ensure that data in cache is clean,
and that the infrastructure that supports it is working correctly.
General cache integrity scrubbing also runs on each EM emulation, which validates cache
slot linkage, control slot sanity, and verifies transient state slots.
In PowerMax 2500 and 8500, scans are now multi-threaded for greater efficiency and
performance.
Vaulting We vault to save all persistent data to the flash drives when the system is shutting down
or going offline, then restore when the system reboots. The purpose of vaulting is to
prevent data loss. During a vault, memory is saved locally (does not traverse fabric) and
D@RE is incorporated at the drive level. As cache size has grown, the time required to
move all cached data to a persistent state has also increased. Vaulting is designed to limit
the time needed to power off the system if it needs to switch to a battery supply
Persistent PowerMax 2500 and 8500 Arrays use a type of memory known as persistent memory
Memory (PMEM) which combines the durability of flash storage while approaching the performance of more
expensive DRAM. Instead of using electrons to store information, it uses heat to change
the cell's state, from amorphous to crystalline, changing its resistance. By doing this, the
PMEM can retain its data even after the power is off.
PMEM comes in the form of AEP (Apache Pass) DIMMs (Dual In-Line Memory Modules)
which are used to store persistent metadata in Apache Pass (3DXPoint NVDIMM
Memory). This is achieved using an Intel proprietary modification to the DDR4 spec and
this gives 10’s of nanoseconds access time.
PMEM is used to store system metadata, which comes in various forms such as data
structures, pointers, scatter gather lists, and so on. Efficient metadata processing requires
that the CPU has quick access to the metadata, meaning that it needs to be accessible in
memory within the system. The use of PMEM to store metadata improves system
efficiency as the systems scale out.
With PMEM, more efficient data vaulting is achieved, because the data is persistent and
does not require vaulting during shutdowns or other vault trigger scenarios. This results in
lower system demands when moving data to the self-encrypting NVMe vault drives. This
improves vaulting speed and lowers the keep alive time requirements for the internal
battery backup mechanism.
Vault triggers State changes that require the system to vault are referred to as vault triggers. There are
two types of vault triggers: internal availability triggers and external availability triggers.
• Vault drive availability: The NVMe vault drives are used for storage of metadata
under normal conditions, and for storing any data that is being saved during the
vaulting process. PowerMax systems can withstand failure and replacement of
NVMe vault drives without impact to processing. However, if the overall available
NVMe vault drive space in the system is reduced to the minimum to be able to store
the required copies of global memory, the NTV process triggers. This is to ensure
that all data is saved before a potential further loss of vault drive space occurs.
• Global memory (GM) availability: When any of the mirrored director pairs are both
unhealthy either logically or environmentally, NTV triggers because of GM
unavailability.
• Fabric availability: When both the fabric switches are environmentally unhealthy,
NTV triggers because of fabric unavailability.
• DME trigger: If the system has lost access to the whole DME or DMEs, including
dual-initiator failure, and loss of access causes configured RAID members to
become non-accessible, the system vaults.
Power-up During power-up, the data is written back to global memory to restore the system. When
operation the system is powered-on, the startup program does the following:
• Initializes the hardware and the environmental system
• Restores the global memory from the saved data while checking the integrity of the
data. This is accomplished by taking sections from each copy of global memory that
was saved during the power-down operation and combining them into a single
complete copy of global memory. If there are any data integrity issues in a section
of the first copy that was saved, then that section is extracted from the second copy
during this process.
• Performs a cleanup, data structure integrity, and initialization of needed global
memory data structures
At the end of the startup program, the system resumes normal operation when the SPS
modules are recharged enough to support another vault operation. If any condition is not
safe, the system does not resume operation and calls Customer Support for diagnosis
and repair. In this state, Dell Support can communicate with the system and find out the
reason for not resuming normal operation and address it.
Converged Emulation
Emulation types It is important to understand the legacy emulation types. In prior PowerMax systems, the
emulations were IM, EDS, DS, FA, RF, RE, and EF.
• IM refers to Infrastructure Management which placed common infrastructure tasks
on a separate instance to optimize CPU resources. It provided all the environmental
monitoring and servicing.
• EDS stands for Enginuity Data Services and merged the gap between the Front
End and Back End emulations. It ran algorithms for Read (Optimized Read
Miss)/Write and provided multi-threading infrastructure for core services like
replication and virtual provisioning along with data reduction services.
• DS refers to the back-end emulation.
• FA stands for Fibre Adapter and runs the Fibre Channel emulation.
• RF refers to SRDF over Fibre Channel
• RE refers to SRDF over GigE
New emulation New to PowerMax 2500 and 8500 platforms is converged emulation, with two new
instance emulation instances: OR (Open Systems and RDF) and EM (EDS and IM emulations
combined into a single executable).
With PowerMax 2500 and 8500 platforms, multiple emulations run on a single instance,
which allows greater flexibility of sharing and optimizing resources among different
connectivity types on the same SLIC.
With emulation convergence, support for multiple data protocols on a physical port is
achieved.
Resource Converged emulations share memory and CPU, as directed. Many common functions
sharing merged – OSHA (Open Systems Host Adapter) Protocols share Executer, memory, fabric,
and other threads. Infrastructure Mgmt (IM) functions merged into EDS.
The Emulation convergence shows legacy emulations pertaining to the Open System
Host Adapter (OSHA) and RDF ‘converged’ into one emulation - ‘OR.’
The backend emulation DN remains the same and maps to DN on the PowerMax 2500
and 8500 series arrays.
Global access In PowerMax 2500 and 8500 arrays, media enclosures are accessible globally using a
and BEaaS central fabric. This gives ‘any to any’ access, which means that all IOMs drive IO to all the
disks, which improves backend load balance and improves resiliency.
BEaas (Backend as a Service) picks the IO path based on IO complexity and load
balancing.
This model improves rebuild efficiency and offers flexible capacity expansion while
meeting the availability and reliability numbers of traditional RAID.
With PowerMax 2500 and 8500 and Flex RAID, the sparing model changes. In place of
spare drives, there are spare hypers. The spares are provisioned at the Disk Group level,
so that all drives that belong to that Disk Group will share the spare capacity. This
reduces the spare requirement at the system level and speed for rebuild due to any-any
drive access.
Direct Member Sparing (DMS), which is used to back up data from a failed drive to spares
while host IOs are running, is also supported on PowerMax 2500 and 8500 platforms.
When a drive fails, all the hypers on the failed drive will be rebuilt to spare hypers on
different drives, in the same Disk Group.
Flexible RAID
New RAID A new RAID distribution model is enabled by the disaggregation of compute and storage
distribution in the PowerMax system. It provides active/active RAID protection across storage pools in
model the system DMEs.
Flexible RAID provides all compute nodes in the system with active-active access to the
storage resources that are distributed across all DMEs in the system. This new design
reduces the RAID overhead of the system, allowing for much higher system capacity
while using fewer drives. Rebuild efficiency is improved by over 2x with this new model.
Data protection Flexible RAID allows the new PowerMax systems to deliver the highest storage,
schemes performance, resiliency, and efficiency. This technology provides more usable storage
capacity by leveraging granular storage media, load balancing, and several RAID
protection schemes built on RAID 1, 5, and 6.
RAID 5 RAID 5 is an industry-standard data protection mechanism with rotating parity across all
members of the RAID 5 set. If a hard drive failure occurs, the missing data is rebuilt by
reading the remaining drives in the RAID group and performing XOR calculations.
RAID 6 RAID 6 enables the rebuilding of data if two drives fail within a RAID group. Our
implementation of RAID 6 calculates two types of parity. This is important during events
when two drives within the same RAID group fail, because it still allows the data in this
scenario to be reconstructed. Horizontal parity is identical to RAID 5 parity, which is
calculated from the data across all disks in the RAID group. Diagonal parity is calculated
on a diagonal subset of data members. For applications without demanding performance
needs, RAID 6 provides the highest data availability. PowerMax systems support RAID 6
(6+2) with data striped across 8 drives (6 data, 2 parity).
RAID 1 RAID 1 is an industry-standard data protection consisting of two drives containing exact
copies, or mirrors, of the data. There is no parity required because both RAID members
have full copies of the data, allowing the system to retrieve data from either disk. Dell
PowerMax implements RAID 1 (1 + 1) mirroring the data across two drives.
PowerMax 2500 PowerMax 2500 with Intel® Xeon® Gold 5218, 2.8GHz with 16 cores. 64 cores per node
pair, with a total of 128 cores per system.
I/O Modules I/O modules share some common attributes. All are hot swappable, meaning online
add/remove/change of the modules is supported. I/O Modules have either two ports
numbered 0 - 1 or four ports numbered 0 - 3.
• Odd director I/O module ports are numbered top to bottom.
• Even director I/O module ports are numbered bottom to top. Each Fabric Access
Module (FAM) has two ports.
• FAM-1 is numbered left to right.
• FAM-2 is numbered right to left. Each Management Switch Module (MSM) has four
ports.
• MSM-1 is numbered left to right.
• MSM-2 is numbered right to left
PowerMax 8500 PowerMax 8500 with Intel® Xeon® Gold 6254 processors, 3.9GHz with 18 cores. 72
cores per node pair and a total of 576 cores per system.
Array specifications
Array Family PowerMax 2500 PowerMax 8500
CPU Intel® Xeon® Gold 5218, Intel® Xeon® Gold 6254 3.9
2.8 GHz with 16 cores. GHz with 18 cores.
64 cores per engine/node 72 cores per engine/node pair.
pair. 128 cores per system 576 cores per system
Max Front-End 8 8
IOMs/Node Pair
Front – End I/O 4 x 32Gbs (FC, FICON, 4 x 32Gbs (FC, FICON, SRDF)
Modules and Protocols SRDF) 4 x 25GbE (iSCSI, SRDF,
4 x 25GbE (iSCSI, SRDF, NVMe/TCP) 4 x 10GbE (iSCSI,
NVMe/TCP) SRDF, NVMe/TCP) 1 x
zHyperlink Port (MF,
4 x 10GbE (iSCSI, SRDF, zHyperlink)
NVMe/TCP)
1 x zHyperlink Port (MF,
zHyperlink)
Max Software File 4 (1 per Node, 2 per Node 8 (1 per Node, 2 per Node pair)
Nodes pair)
Min required to support 2 x 25GbE: 2 ports out of 2 x 25GbE: 2 ports out of each
Cloud Mobility each 25GbE SLIC 25GbE SLIC
Security features
PowerMax 2500 and 8500 series platforms are the world’s most secure storage platform1,
with cyber resiliency at the core of the product design, providing customers with the
highest levels of data protection and cybersecurity in the industry.
Prevention is key to our cyber security strategy. Our latest storage platforms are aligned
with the National Institute of Standards and Technologies (NIST) cybersecurity framework
and are centered on the following principles.
Identification The PowerMax is built to prevent unauthorized access to system resources. Each model
and protection incorporates security features and access controls to protect company an organization’s
data. These features include:
• The hardware root of Trust (HWRoT) represents the foundation on which all secure
operations of PowerMax depend. HWRoT contains the keys used for cryptographic
functions and enables a secure boot process, preventing system boot if firmware is
tampered with. It uses cryptographic keys stored in One Time Programmable fused
memory provisioned withing Dell manufacturing.
• The fused keys are used for authenticating the digital signature for the Dell signed
firmware. It authenticates the signature of the firmware during update and boot
protecting against unauthorized firmware. HWRoT functionality is deployed on the
nodes, DME enclosures, and Control Station.
• Secure Boot prevents loading tampered firmware along the boot process, it
establishes and extends a firmware trust chain expanding beyond the HWRoT
boundary. It uses cryptographic authentication for subsequent Firmware loads/boot
loaders based on Dell signatures. It also includes UEFI Secure. Secure Boot
functionality is deployed on the nodes and DME enclosures.
• Secure access controls and tamper proof audit logs protect data from unauthorized
access through secure logs of all events on PowerMax.
• Hardware-based Data Encryption through FIPS 140-2 level 2 certified self-
encrypting drives (SEDs) ensures protection when a drive is removed from the
system. The lifecycle of Encryption Key is entirely within the drive with no access
from the outside.
1
Based on Dell internal analysis of cybersecurity capabilities of Dell PowerMax versus
cybersecurity capabilities of competitive mainstream arrays supporting open systems and
mainframe storage, March 2022.
• Secure firmware updates require a digital signature before updates can be applied
and prevents loading unauthorized firmware that could compromise the system.
Dell digitally signs all firmware packages and scans them using cryptographic keys.
This functionality is within Nodes, DMEs, and Control Stations.
• Multi-Factor Authentication for Admin Access (MFA) provides two-factor
authentication to management access using RSA SecureID. It provides time-
sensitive token combined with user password to verify identity during the
authentication process.
Granular protection at scale and accelerated recovery from Cyberattacks with support for
65 million secure snapshots with a minimum 10 mins interval.
CloudIQ Dell CloudIQ is a powerful application used to track system health through pattern
detection and recognition and advanced analytics. Through CloudIQ cybersecurity, users can define
response legal configurations for PowerMax, monitor the system, and receive alerts if the array is
out of compliance. CloudIQ can also track data patterns and detect anomalies, including
changes to data reduction rates, to determine whether ransomware or malware may have
infected the system. When suspicious anomalies are detected, CloudIQ alerts users to
take corrective action.
Data at Rest Data at Rest Encryption (D@RE) protects data confidentiality by adding back-end
Encryption encryption to the entire array. D@RE provides hardware-based, on-array, back-end
encryption. Back-end encryption protects information from unauthorized access when
drives are removed from the system.
All configured drives are encrypted, including data drives, spares, and drives with no
provisioned volumes.
If D@RE is configured, security is enabled on the SED.
• System configures Authentication Key (passphrase) on each SED during
initialization
• Authentication key is required every time SED is power cycled to unlock drive for
media access
• Lifecycle of Encryption Key is entirely within the drive with no access from the
outside
• Same drive feature is used on the vault drives
Encryption keys are kept on the drives as opposed to PowerMaxOS.
D@RE incorporates RSA Embedded Key Manager for key management. With D@RE,
keys are self-managed, and there is no need to replicate keys across volume snapshots
or remote sites. RSA Embedded Key Manager provides a separate, unique Data
Encryption Key (DEK) for each drive in the array, including spare drives.
By securing data on enterprise storage, D@RE ensures that the potential exposure of
sensitive data on discarded, misplaced, or stolen media is reduced or eliminated. If the
key used to encrypt the data is secured, encrypted data cannot be read. In addition to
protecting against threats related to physical removal of media, media can readily be
repurposed by destroying the encryption key used for securing the data previously stored
on that media.
D@RE:
• Is compatible with all PowerMaxOS features.
• Allows for encryption of any supported local drive types or volume emulations.
• Delivers powerful encryption without performance degradation or disruption to
existing applications or infrastructure.
D@RE can also be deployed with external key managers using Key Management
Interoperability Protocol (KMIP) that allows for a separation of key management from
PowerMax arrays. KMIP is an industry standard that defines message formats for the
manipulation of cryptographic keys on a key management server. External key manager
provides support for consolidated key management and allows integration between a
PowerMax array with an existing key management infrastructure.
T10 Data PowerMax 2500 and 8500 series platforms protect against data corruption with industry
Integrity Field standard T10 Data Integrity Field (DIF) block cyclic redundancy code (CRC) for track
formats. For open systems, this enables a host generated DIF CRC to be stored with user
data and used for end-to-end data integrity validation. Additional protections for address
and control fault modes provide increased levels of protection against faults. These
protections are defined in user-definable blocks supported by the T10 standard. Address
and write status information is stored in the extra bytes in the application tag and
reference tag portion of the block CRC.
PowerMaxOS further increases data integrity with T10-DIF+ which has additional bits for
detecting stale data address faults, control faults, and sector signature faults that are not
detected in a standard T10-DIF. T10-DIF+ is performed every time data is moved, across
the internal fabric, to or from drives, and on the way back to the host on reads.
On the backend, the T10-DIF codes for the expected data are stored and the checksums
are verified when the data is read from the host. In addition, a one-byte checksum for
each 8 K of data is kept in the track table (not stored with the data) that is used for
independent validation of the data against the last version written to the array. This
provides protection against situations such as:
• Detecting reads from the wrong block: The data and checksum stored together are
fine, but it was from the wrong address. In this case, the additional checksum will
not match.
• RAID disagreement: Each data block and the parity block of the RAID group have
valid checksums and no errors, but the parity does not match with the data. In this
case, each of the data blocks can be validated to determine if a data block or the
parity block are stale.
Recovery Both PowerMax 2500 and 8500 series systems continue to use secure immutable
snapshots to provide the industry’s most granular cyber recovery at scale, maximizing
data recovery from a cyberattack. Administrators can set snapshot policies for up to 65
million secure snapshots to optimize recovery point objectives (RPO) and minimize data
loss. Several options also exist for native cyber recovery from a secure vault for
mainframe and open systems data storage on these platforms. Security is built into the
hardware with Hardware Root of Trust (HWRoT), secure boot, self-encrypting drives,
anomaly detection, multi-factor authentication, and other features.
Hardware design PowerMax architecture is highly redundant; able to survive multiple component failures
and continue operating, ensuring uncompromised data integrity. All components are hot
swappable and have high MTBF. It is important to note there is no SPOF with PowerMax.
Compute nodes are completely power independent and dual node failures are extremely
unlikely. Nodes are serviced from the front, negating the need to touch any cables. Each
node has 1+1 power supplies, meaning that the system can endure loss of a PSU from
each node, or a whole power zone, and continue running. Memory remains accessible on
the fabric for many types of node failures. Each node has dedicated dual batteries to
facilitate system up time during the vault process. In cases of total power loss, the vault
operation is fast, simple, and data is preserved.
Power related Built in redundancy makes these scenarios very unlikely, but nonetheless a node can go
offline:
• If both power supplies for the node fail at the same time (different zones).
• Due to loss of BOTH AC power supplies on a node (that is, zone power failure, or
cable damaged/unplugged, or both)
• Due to loss of AC power to one power supply and a power supply from the other
zone also fails in the same node.
Component Component failure on the compute module (processor, memory, PCB short, another key
related component). Failure of three or more fans on the compute module at the same time, all of
which are extremely unlikely.
Other causes All nodes of any system can be impacted due to data center events such as flood, fire,
high humidity, power failure loss of climate control, and other disasters.
Basic system resiliency and data protection can be compromised due to random damage,
such as a forklift crashing into the system or something similar.
Replication
Local replication Dell TimeFinder delivers point-in-time copies of volumes that can be used for backups,
decision support, data warehouse refreshes, or any other process that requires parallel
access to the production data.
Previous VMAX families offered multiple TimeFinder products, each with their own
characteristics and use cases. These traditional products required a target volume to
retain snapshot or clone data.
TimeFinder SnapVX provides the best aspects of the traditional TimeFinder offerings
combined with increased scalability and ease-of-use.
Snapshots can be cascaded from linked targets, and targets can be linked to snapshots of
linked targets. There is no limit to the number of levels of cascading, and the cascade can
be broken.
Remote The Dell Symmetrix Remote Data Facility (SRDF) family of products offers a range of
replication array-based disaster recovery, parallel processing, and data migration solutions for Dell
storage systems, including PowerMaxOS for PowerMax 2500 and PowerMax 8500.
SRDF disaster recovery solutions use “active, remote” mirroring and dependent-write
logic to create consistent copies of data. Dependent-write consistency ensures
transactional consistency when the applications are restarted at the remote location. You
can tailor your SRDF solution to meet various Recovery Point Objectives and Recovery
Time Objectives.
Serviceability
Both PowerMax 2500 and 8500 platforms are based on a modular design that makes it
easier for field personnel to add or remove components as required, providing greater
flexibility and scale.
As in previous generations, redundant components span the array to ensure that the
system stays online 24x7x365. Field Replaceable Units (FRUs) ensure seamless hot
swap-out capabilities in case of component failure.
Both systems are remotely monitored and ‘dial home’ using the Secure Connect Gateway
to Dell Technologies where corrective action takes place depending on the event code.
Unisphere
With PowerMax 2500 and 8500 platforms, Unisphere provides a serviceability application
that supports deployment of update packages for PowerMax embedded applications and
integrates functionality that was previously supported by the stand-alone vApp manager
application.
From Unisphere, the user can perform the following update related operations:
• Download an update app package which consists of all the guest components (with
built in package integrity verification)
• Run a pre-upgrade health check on the app package.
• View the Health – and check the result.
• Delete the downloaded app package.
• Install the app package.
• View the update installation progress details.
The Serviceability application is accessed from the main dashboard. It is broken into three
areas:
• Serviceability
• Applications
• Updates
Remote support
Secure Connect SCG is an enterprise monitoring technology that is delivered as an appliance and a stand-
Gateway (SCG) alone application. It monitors devices and proactively detects hardware issues that may
occur. Depending on service contract, it also automates support request creation for
issues that are detected on the monitored devices. See Secure Connect Gateway
capabilities available with Dell Technologies services contracts.
Supported products include Dell server, storage, chassis, networking, data protection
devices, virtual machines, and converged or hyperconverged appliances.
Based on the device type and model, secure connect gateway automatically collects the
telemetry that is required to troubleshoot the issue that is detected. The collected
telemetry helps technical support to provide a proactive and personalized support
experience. For information about the telemetry collected, see the Secure Connect
Gateway 5.x — Virtual Edition Reportable Items available at https://www.dell.com/SCG-
VE-docs.
Connectivity Hub For Remote Access to PowerMax 2500 and 8500, Connectivity Hub replaces ServiceLink.
It facilitates hardware, gateway and site monitoring and management, and enables
remote sessions.
It also replaces the MFT (Managed File Transfer) portal for MFT file transfers.
Remote Support Agent (RSA) is an essential and mandatory windows service that is
required to be installed on the user’s laptop to facilitate the launch of remote support from
within Connectivity Hub.
Summary
PowerMax 2500 and 8500 series Arrays are the most secure, high performing, efficient
and scalable enterprise storage in Dell Technologies history, incorporating high
availability, redundancy, and serviceability, capable of operating in open systems,
mainframe, or mixed environments.
References
Dell The following Dell Technologies documentation provides other information related to this
Technologies document. Access to these documents depends on your login credentials. If you do not
documentation have access to a document, contact your Dell Technologies representative.
• PowerMax and VMAX Info Hub
• Dell PowerMax 2500 and 8500: TimeFinder SnapVX Snapshots and Clones
• Dell PowerMax Data at Rest Encryption
• Dell PowerMax Cybersecurity
• Remote replication with SRDF