Netapp Ontap Ha
Netapp Ontap Ha
ONTAP HA Overview:
my
over its storage and continue to serve data from it. The partner gives
a
back storage when the node is brought back on line.
us
th
The HA pair controller configuration consists of a pair of matching
Mu
FAS/AFF storage controllers (local node and partner node). Each of
these nodes is connected to the other’s disk shelves. When one node
r
partner detects the failed status of the partner and takes over all data
processing from that controller.
l ku
partner’s storage.
nt
my
▪ A system failure occurs on a node, and the node
cannot reboot.
a
us
▪ Heartbeat messages are not received from the node’s
partner.
th
▪ The remote management device (Service Processor)
Mu
detects failure of the partner node.
r
ma
l ku
hi
nt
Se
a my
us
th
Mu
r
ma
ku
(Service Processor).
Se
my
Automatic Takeover and Giveback:
a
us
The automatic takeover and giveback operations can work together to
reduce and avoid client outages.
th
By default, if one node in the HA pair panics, reboots, or halts, the
Mu
partner node automatically takes over and then returns storage when
the affected node reboots. The HA pair then resumes a normal
r
operating state.
ma
ku
HA Policy Overview:
l
storage failover operations occur for the aggregate and its volumes.
Se
The two options, CFO and SFO, determine the aggregate control
sequence ONTAP uses during storage failover and giveback
operations.
Although the terms CFO and SFO are sometimes used informally to
refer to storage failover (takeover and giveback) operations, they
actually represent the HA policy assigned to the aggregates. For
example, the terms SFO aggregate or CFO aggregate simply refer to
the aggregate’s HA policy assignment.
my
performance by relocating SFO (non-root) aggregates serially
a
to the partner before takeover. During the giveback process,
us
aggregates are given back serially after the taken-over system
th
boots and the management applications come online,
enabling the node to receive its aggregates.
Mu
• Because aggregate relocation operations entail reassigning
aggregate disk ownership and shifting control from a node to
r
This can occur if the node being taken over holds epsilon or if the node
with epsilon is not healthy. To maintain a more resilient cluster, you
can transfer epsilon to a healthy node that is not being taken over.
my
Typically, this would be the HA partner.
a
us
th Epsilon is True – Node1
Mu
(Master Node)
r
ma
l ku
a my
us
th
Mu
Flexvol volume (volp1) resides
in aggr_prod1 aggregate.
r
ma
l ku
hi
a my
us
To check the HA failover status. Both the
nodes are connected to partner node.
th
Mu
r
ma
l ku
hi
nt
Using System
Se
HA Planned Failover:
You should move epsilon if you expect that any manually initiated
takeovers could result in your storage system being one unexpected
node failure away from a cluster-wide loss of quorum.
my
some instances,
performing the takeover can result in a cluster that is one unexpected
a
us
node failure away from cluster-wide loss of quorum.
th
This can occur if the node being taken over holds epsilon or if the node
with epsilon is not healthy. To maintain a more resilient cluster, you
Mu
can transfer epsilon to a healthy node that is not being taken over.
Typically, this would be the HA partner.
r
a my
us
th
Mu
r
a my
us
th
Mu
r
ma
As per the LIF failover policy defined, VIFMGR unit (RDB unit) failover
to partner node. (nas2 LIF home node is false)
a my
us
th
Mu
r
However, this depends upon the stage of giveback in which the failure
occurred. If the node encountered failure or a power outage during
partial giveback state (after it has given back the root aggregate), it
will not return to takeover mode. Instead, the node returns to partial-
giveback mode. If this occurs, complete the process by repeating the
giveback operation.
After you configure all aspects of your HA pair, you need to verify that
it is operating as expected in maintaining uninterrupted access to both
nodes' storage during takeover and giveback operations. Throughout
the takeover process, the local (or takeover) node should continue
serving the data normally provided by the partner node. During
my
giveback, control and delivery of the partner’s storage should return
a
to the partner node.
us
th
Initiate the storage failover giveback command to giveback the node
manually.
Mu
The local node returns ownership to the partner node when issues are
resolved, when the partner node boots up, or when giveback is
ku
initiated.
l
hi
this discussion, Node A has taken over Node B. Any issues on Node B
have been resolved and it is ready to resume serving data.
Se
my
to the partner before takeover. During the giveback process,
aggregates are given back serially after the taken-over system
a
us
boots and the management applications come online,
enabling the node to receive its aggregates.
•
th
Because aggregate relocation operations entail reassigning
Mu
aggregate disk ownership and shifting control from a node to
its partner, only aggregates with an HA policy of SFO are
r
my
then relocate the SFO aggregate.
a
us
thAll CFO and SFO aggregates are relocated, it
will wait for applications to come online.
Mu
r
ma
l ku
hi
nt
Se
my
Revert back the failover LIF to its home node. After successful giveback, still nas2
LIF uses cluster1-01 node e0d port
a
only.
us
th
Mu
r
ma
After you configure all aspects of your HA pair, you need to verify that
it is operating as expected in maintaining uninterrupted access to both
nodes' storage during takeover and giveback operations. Throughout
the takeover process, the local (or takeover) node should continue
serving the data normally provided by the partner node. During
giveback, control and delivery of the partner’s storage should return
to the partner node.