Rubrik Hardware Failure Scenarios
Rubrik Hardware Failure Scenarios
Rubrik Hardware Failure Scenarios
Rubrik can withstand a number of failures at the Brik (appliance) and hardware
component level. A Brik represents a group of nodes that operate independently of each
other.
The table below details what happens at each level of hardware failure.
Power Supply A Brik offers dual power supply for redundancy. If one power supply fails, the
Unit Failure system will failover to the remaining power supply.
A failure of both power supply units will be similar to a Brik failure (see
bottom of chart).
Transceiver If a transceiver fails, the system will failover to the other transceiver
Failure (assuming both transceivers are plugged in).
Hard Disk A cluster (minimum of three nodes) can withstand a concurrent failure of
Failure up to 2 hard disk drives. The system continues to handle data ingest and
operational tasks while simultaneously creating copies of data stored on the
missing blocks to maintain three-way replication. As more nodes are added
to the cluster, we’ll ensure that three copies of data are spread across the
cluster to tolerate a node or Brik failure.
DIMM Failure A DIMM failure will cause a node to be unavailable to the cluster. Removing
the offending DIMM can allow the node to continue operation. However, this
is not recommended.
EASE OF RECOVERY
NIC Failure An entire NIC failure within a Brik will be similar to a node failure. Data ingest
and operational tasks—backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes. Data and metadata is rebuilt in
the background to maintain three-way replication. If only a component of the
NIC fails, the system will failover to the other port with both cables plugged
in.
SSD Failure A cluster impacted by SSD failure is similar to a node failure. Data ingest
and operational tasks—backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes. Data and metadata is rebuilt in
the background to maintain three-way replication.
Node Failure A cluster can tolerate a maximum concurrent failure of one node. During a
node failure, data ingest and operational tasks—backup, archival, replication,
reporting, etc.—will be redistributed to the remaining healthy nodes. Data is
rebuilt in the background.
299 South California Ave. #250 Palo Alto, CA 94306 [email protected] | www.rubrik.com