0% found this document useful (0 votes)

45 views16 pages

Netapp Ontap Ha

Uploaded by

Akmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views16 pages

Netapp Ontap Ha

Uploaded by

Akmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

1

NetApp ONTAP High Availability

ONTAP HA Overview:

Cluster nodes are configured in high-availability (HA) pairs for fault

tolerance and nondisruptive operations. If a node fails or if you need
to bring a node down for routine maintenance, its partner can take

my
over its storage and continue to serve data from it. The partner gives

a
back storage when the node is brought back on line.

us
th
The HA pair controller configuration consists of a pair of matching
Mu
FAS/AFF storage controllers (local node and partner node). Each of
these nodes is connected to the other’s disk shelves. When one node
r

in an HA pair encounters an error and stops processing data, its

partner detects the failed status of the partner and takes over all data
processing from that controller.
l ku

Takeover is the process in which a node assumes control of its

partner’s storage.
nt

Giveback is the process in which the storage is returned to the partner.

An internal HA interconnect allows each node to continually check

whether its partner is functioning and to mirror log data for the other’s
nonvolatile memory. When a write request is made to a node, it is
logged in NVRAM on both nodes before a response is sent back to the
client or host. On failover, the surviving partner commits the failed
node’s uncommitted write requests to disk, ensuring data
consistency.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 1

Connections to the other controller’s storage media allow each node

to access the other’s storage in the event of a takeover. Network path
failover mechanisms ensure that clients and hosts continue to
communicate with the surviving node.

By default, takeovers occur automatically in any of the following

situations:

▪ A software or system failure occurs on a node that

leads to a panic.

my
▪ A system failure occurs on a node, and the node
cannot reboot.

a
us
▪ Heartbeat messages are not received from the node’s
partner.
th
▪ The remote management device (Service Processor)
Mu
detects failure of the partner node.
r
ma
l ku
hi
nt
Se

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 2

a my
us
th
Mu
r
ma
ku

Hardware Assisted Takeover:

l
hi

Enabled by default, the hardware-assisted takeover feature can speed

up the takeover process by using a node’s remote management device
nt

(Service Processor).
Se

When the remote management device detects a failure, it quickly

initiates the takeover rather than waiting for ONTAP to recognize that
the partner’s heartbeat has stopped. If a failure occurs without this
feature enabled, the partner waits until it notices that the node is no
longer giving a heartbeat, confirms the loss of heartbeat, and then
initiates the takeover.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 3

The hardware-assisted takeover feature uses the following process to

avoid that wait:

1. The remote management device monitors the local system

for certain types of failures.
2. If a failure is detected, the remote management device
immediately sends an alert to the partner node.
3. Upon receiving the alert, the partner initiates takeover.

my
Automatic Takeover and Giveback:

a
us
The automatic takeover and giveback operations can work together to
reduce and avoid client outages.
th
By default, if one node in the HA pair panics, reboots, or halts, the
Mu
partner node automatically takes over and then returns storage when
the affected node reboots. The HA pair then resumes a normal
r

operating state.
ma
ku

HA Policy Overview:
l

ONTAP automatically assigns an HA policy of CFO (controller failover)

and SFO (storage failover) to an aggregate. This policy determines how

storage failover operations occur for the aggregate and its volumes.
Se

The two options, CFO and SFO, determine the aggregate control
sequence ONTAP uses during storage failover and giveback
operations.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 4

Although the terms CFO and SFO are sometimes used informally to
refer to storage failover (takeover and giveback) operations, they
actually represent the HA policy assigned to the aggregates. For
example, the terms SFO aggregate or CFO aggregate simply refer to
the aggregate’s HA policy assignment.

HA policies affect takeover and giveback operations as follows:

• Aggregates created on ONTAP systems (except for the root

aggregate containing the root volume) have an HA policy of
SFO. Manually initiated takeover is optimized for

my
performance by relocating SFO (non-root) aggregates serially

a
to the partner before takeover. During the giveback process,

us
aggregates are given back serially after the taken-over system
th
boots and the management applications come online,
enabling the node to receive its aggregates.
Mu
• Because aggregate relocation operations entail reassigning
aggregate disk ownership and shifting control from a node to
r

its partner, only aggregates with an HA policy of SFO are

eligible for aggregate relocation.

• The root aggregate always has an HA policy of CFO and is

given back at the start of the giveback operation. This is

necessary to allow the taken-over system to boot. All other

aggregates are given back serially after the taken-over system

completes the boot process and the management
Se

applications come online, enabling the node to receive its

aggregates.
Manual Takeover:
You should move epsilon if you expect that any manually initiated
takeovers could result in your storage system being one unexpected
node failure away from a cluster-wide loss of quorum.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 5

To perform planned maintenance, you must take over one of the

nodes in an HA pair. Cluster-wide quorum must be maintained to
prevent unplanned client data disruptions for the remaining nodes. In
some instances,
performing the takeover can result in a cluster that is one unexpected
node failure away from cluster-wide loss of quorum.

This can occur if the node being taken over holds epsilon or if the node
with epsilon is not healthy. To maintain a more resilient cluster, you
can transfer epsilon to a healthy node that is not being taken over.

my
Typically, this would be the HA partner.

a
us
th Epsilon is True – Node1
Mu
(Master Node)
r
ma
l ku

Planned Failover – we can change

the Epsilon Node.

nt
Se

Planned Failover – we can change

the Epsilon Node.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 6

aggr_prod1 owned by cluster1-02

node.

a my
us
th
Mu
Flexvol volume (volp1) resides
in aggr_prod1 aggregate.
r
ma
l ku
hi

NFS data protocol service, svm_prod1 uses two LIF’s.

e0d of each node mapped.

Volume volp1 mounted in linux

server.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 7

As per our example,

1. Volp1 → resides aggr_prod1→ owned by cluster1-02 node.

Create files in that share.

a my
us
To check the HA failover status. Both the
nodes are connected to partner node.

th
Mu
r
ma
l ku
hi
nt

Using System
Se

Manager, you can

manage the HA.
Both nodes are
online and ready to
takeover.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 8

HA Planned Failover:
You should move epsilon if you expect that any manually initiated
takeovers could result in your storage system being one unexpected
node failure away from a cluster-wide loss of quorum.

To perform planned maintenance, you must take over one of the

nodes in an HA pair. Cluster-wide quorum must be maintained to
prevent unplanned client data disruptions for the remaining nodes. In

my
some instances,
performing the takeover can result in a cluster that is one unexpected

a
us
node failure away from cluster-wide loss of quorum.

th
This can occur if the node being taken over holds epsilon or if the node
with epsilon is not healthy. To maintain a more resilient cluster, you
Mu
can transfer epsilon to a healthy node that is not being taken over.
Typically, this would be the HA partner.
r

Initiate the Planned Failover or Manual

Takeover of partner Node (cluster1-02)

l ku
hi

Check the failover status. Relocating its

SFO aggregates to cluster1-01.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 9

Optimized takeover of partner in progress.

Cluster1-02 node being taken over by

partner. Relocating SFO aggregates.

a my
us
th
Mu
r

Cluster1-02 node relocated its aggregates

and in Take over status

l ku
hi

Relocated both SFO and CFO

Aggregates to its partner node.
nt
Se

Partner node is taken over successfully.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 10

After taken over by cluster1-01 node, you can the aggr_prod1

aggregate owned by cluster1-01 node (before take over it was owned
by cluster1-02 node)
aggr_prod1 owned by cluster1-01 node.

a my
us
th
Mu
r
ma

From unix hosts you can access the

NFS shares.
l ku
hi
nt
Se

As per the LIF failover policy defined, VIFMGR unit (RDB unit) failover
to partner node. (nas2 LIF home node is false)

LIF failover done successfully. Nas2 LIF failover to cluster1-01

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 11

Before takeover nas2 LIF owned by cluster1-02

node e0d port.

a my
us
th
Mu
r

Manual Give Back:

You can perform a normal giveback, a giveback in which you terminate

processes on the partner node, or a forced giveback.

l
hi

If the takeover node experiences a failure or a power outage during

the giveback process, that process stops and the takeover node
nt

returns to takeover mode until the failure is repaired or the power is

restored.
Se

However, this depends upon the stage of giveback in which the failure
occurred. If the node encountered failure or a power outage during
partial giveback state (after it has given back the root aggregate), it
will not return to takeover mode. Instead, the node returns to partial-
giveback mode. If this occurs, complete the process by repeating the
giveback operation.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 12

If giveback is vetoed, you must check the EMS messages to determine

the cause. Depending on the reason or reasons, you can decide
whether you can safely override the vetoes.

After you configure all aspects of your HA pair, you need to verify that
it is operating as expected in maintaining uninterrupted access to both
nodes' storage during takeover and giveback operations. Throughout
the takeover process, the local (or takeover) node should continue
serving the data normally provided by the partner node. During

my
giveback, control and delivery of the partner’s storage should return

a
to the partner node.

us
th
Initiate the storage failover giveback command to giveback the node
manually.
Mu

How Giveback Works:

r
ma

The local node returns ownership to the partner node when issues are
resolved, when the partner node boots up, or when giveback is
ku

initiated.
l
hi

The following process takes place in a normal giveback operation. In

this discussion, Node A has taken over Node B. Any issues on Node B
have been resolved and it is ready to resume serving data.
Se

1. Any issues on Node B are resolved and it displays the

following message: Waiting for giveback
2. The giveback is initiated by the storage failover
giveback command or by automatic giveback if the system is
configured for it. This initiates the process of returning
ownership of Node B’s aggregates and volumes from Node A
back to Node B.
3. Node A returns control of the root aggregate first.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 13

4. Node B completes the process of booting up to its normal

operating state.

HA policies affect takeover and giveback operations as follows:

• Aggregates created on ONTAP systems (except for the root

aggregate containing the root volume) have an HA policy of
SFO. Manually initiated takeover is optimized for
performance by relocating SFO (non-root) aggregates serially

my
to the partner before takeover. During the giveback process,
aggregates are given back serially after the taken-over system

a
us
boots and the management applications come online,
enabling the node to receive its aggregates.
•
th
Because aggregate relocation operations entail reassigning
Mu
aggregate disk ownership and shifting control from a node to
its partner, only aggregates with an HA policy of SFO are
r

eligible for aggregate relocation.

• The root aggregate always has an HA policy of CFO and is

given back at the start of the giveback operation. This is

necessary to allow the taken-over system to boot. All other
l

aggregates are given back serially after the taken-over system

completes the boot process and the management

applications come online, enabling the node to receive its

aggregates.
Se

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 14

Initiate the giveback manually once the partner node

is up.

Check the cluster HA giveback status.

Relocating CFO aggregate first and

my
then relocate the SFO aggregate.

a
us
thAll CFO and SFO aggregates are relocated, it
will wait for applications to come online.
Mu
r
ma
l ku
hi
nt
Se

It’s connected to cluster1-01 and giveback SFO

aggregates in progress.

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 15

Giveback successful and both the nodes are

connected to partner.

my
Revert back the failover LIF to its home node. After successful giveback, still nas2
LIF uses cluster1-01 node e0d port

a
only.

us
th
Mu
r
ma

Revert back the LIF to original

Home node and port.
l ku
hi
nt
Se

SENTHILKUMAR MUTHUSAMY | SAN MASTERS 16

Ebook Hyper V Cluster
100% (1)
Ebook Hyper V Cluster
85 pages
A Guide To Windows Server 2012 R2 Failover Clustering1
No ratings yet
A Guide To Windows Server 2012 R2 Failover Clustering1
112 pages
Lecture 7 Overview of High Availability and Disaster Recovery
No ratings yet
Lecture 7 Overview of High Availability and Disaster Recovery
50 pages
Fail Over
No ratings yet
Fail Over
39 pages
Module 9
No ratings yet
Module 9
53 pages
Edb Efm User
No ratings yet
Edb Efm User
115 pages
ONTAP 90 HighAvailability Configuration Guide
No ratings yet
ONTAP 90 HighAvailability Configuration Guide
64 pages
Module 04C Scaleout & Consistent Hashing - Redundancy
No ratings yet
Module 04C Scaleout & Consistent Hashing - Redundancy
39 pages
Introduction and Concepts
No ratings yet
Introduction and Concepts
51 pages
Hyper-V Failover Clusters - Hyper-V Failover Clustering Series - Part 1
No ratings yet
Hyper-V Failover Clusters - Hyper-V Failover Clustering Series - Part 1
3 pages
ADB3
No ratings yet
ADB3
35 pages
Ha Failover
No ratings yet
Ha Failover
54 pages
Availabilty
No ratings yet
Availabilty
23 pages
Ha Failover
No ratings yet
Ha Failover
42 pages
Active Active Failover
No ratings yet
Active Active Failover
23 pages
Cluster Enabler
No ratings yet
Cluster Enabler
29 pages
Managing PostgreSQL High Availability
No ratings yet
Managing PostgreSQL High Availability
10 pages
Vcs and Oracle Ha
No ratings yet
Vcs and Oracle Ha
168 pages
Fail Over
No ratings yet
Fail Over
8 pages
Emc Recoverpoint/Cluster Enabler For Microsoft Failover Cluster
No ratings yet
Emc Recoverpoint/Cluster Enabler For Microsoft Failover Cluster
17 pages
Ha Failover
No ratings yet
Ha Failover
54 pages
System Operating Lab#7-8
No ratings yet
System Operating Lab#7-8
6 pages
Failover Clustering Overview
No ratings yet
Failover Clustering Overview
2 pages
Pix Asa Failover
No ratings yet
Pix Asa Failover
30 pages
Synology High Availability White Paper: Based On
No ratings yet
Synology High Availability White Paper: Based On
17 pages
Poster Companion Reference - Hyper-V and Failover Clustering
No ratings yet
Poster Companion Reference - Hyper-V and Failover Clustering
12 pages
Elden Christensen - Principal Program Manager Lead - Microsoft Symon Perriman - Vice President - 5nine Software
No ratings yet
Elden Christensen - Principal Program Manager Lead - Microsoft Symon Perriman - Vice President - 5nine Software
34 pages
High Availability Strategies
No ratings yet
High Availability Strategies
19 pages
Introduction and Concepts
No ratings yet
Introduction and Concepts
51 pages
How VMware HA Works 4.x
No ratings yet
How VMware HA Works 4.x
8 pages
All Netapp Print Ha
No ratings yet
All Netapp Print Ha
60 pages
Microsoft Official Course: Planning and Implementing A High Availability Infrastructure Using Failover Clustering
No ratings yet
Microsoft Official Course: Planning and Implementing A High Availability Infrastructure Using Failover Clustering
42 pages
Failover Cluster
No ratings yet
Failover Cluster
20 pages
Failover Cluster: Download The Authoritative Guide: Download The Authoritative Guide
No ratings yet
Failover Cluster: Download The Authoritative Guide: Download The Authoritative Guide
4 pages
Failover Cluster
No ratings yet
Failover Cluster
9 pages
Astaro Deployment Guide: High Availability Options
No ratings yet
Astaro Deployment Guide: High Availability Options
15 pages
Implementing Failover Clustering
No ratings yet
Implementing Failover Clustering
46 pages
Hana Database Landscape
No ratings yet
Hana Database Landscape
2 pages
Radware DefensePro Imp Points
No ratings yet
Radware DefensePro Imp Points
3 pages
Microsoft Official Course: Implementing Failover Clustering With Hyper-V
No ratings yet
Microsoft Official Course: Implementing Failover Clustering With Hyper-V
45 pages
Synology HASWhite Paper
No ratings yet
Synology HASWhite Paper
13 pages
Ha Cluster: Prepared by
No ratings yet
Ha Cluster: Prepared by
14 pages
PowerHA Workshop Part1
No ratings yet
PowerHA Workshop Part1
50 pages
Arubaos-Cx Switching Fundamentals, Rev. 20.21: Course Description
33% (3)
Arubaos-Cx Switching Fundamentals, Rev. 20.21: Course Description
4 pages
Hyper-V and Failover Clustering Mini Poster
No ratings yet
Hyper-V and Failover Clustering Mini Poster
1 page
Cisco: Exam Questions 300-415
No ratings yet
Cisco: Exam Questions 300-415
33 pages
NetEngine 8000 M8 Universal Service Router Datasheet
No ratings yet
NetEngine 8000 M8 Universal Service Router Datasheet
14 pages
Replacing A Controller Module in A 32xx Data
No ratings yet
Replacing A Controller Module in A 32xx Data
31 pages
Hacmp Basics
No ratings yet
Hacmp Basics
62 pages
Ericsson Mini Link TN MGMT Operation and Maintenance
100% (13)
Ericsson Mini Link TN MGMT Operation and Maintenance
28 pages
IP Addressing and Subnetting
No ratings yet
IP Addressing and Subnetting
28 pages
100-490 V12.35
No ratings yet
100-490 V12.35
14 pages
White Paper Synology HA Configuration
No ratings yet
White Paper Synology HA Configuration
13 pages
NS2 Wired Program Examples PDF
95% (22)
NS2 Wired Program Examples PDF
117 pages
Test 2
No ratings yet
Test 2
53 pages
HACMP Best Practices
No ratings yet
HACMP Best Practices
29 pages
20740C 08-GMG
No ratings yet
20740C 08-GMG
66 pages
BRKIPM-1261 Multicast
No ratings yet
BRKIPM-1261 Multicast
105 pages
White Paper Linux Cluster
No ratings yet
White Paper Linux Cluster
17 pages
Manual Book LG IPTV Server
No ratings yet
Manual Book LG IPTV Server
70 pages
1.1 System Models For Distributed and Cloud Computing
No ratings yet
1.1 System Models For Distributed and Cloud Computing
37 pages
PSS5000-USGU PIPI Display 80601900
No ratings yet
PSS5000-USGU PIPI Display 80601900
11 pages
Airvana OneCell Presentation
No ratings yet
Airvana OneCell Presentation
45 pages
ADML IoT 1-0-1
No ratings yet
ADML IoT 1-0-1
115 pages
Internet of Things
No ratings yet
Internet of Things
83 pages
D05!00!216P RFTS-400 Specification Sheet
No ratings yet
D05!00!216P RFTS-400 Specification Sheet
9 pages
CN-UNIT-1 - Part-1
No ratings yet
CN-UNIT-1 - Part-1
101 pages
ZTE F673AV9 Datasheet
No ratings yet
ZTE F673AV9 Datasheet
2 pages
Technical Information Microscan3 Outdoorscan3 Nanoscan3 Data Output Via Udp and TCP Ip en Im0083701
No ratings yet
Technical Information Microscan3 Outdoorscan3 Nanoscan3 Data Output Via Udp and TCP Ip en Im0083701
76 pages
Can Bus Presentation
No ratings yet
Can Bus Presentation
32 pages
5G Tech
No ratings yet
5G Tech
41 pages
4 Slide Networking IoT
No ratings yet
4 Slide Networking IoT
20 pages
Ethernet Over Twisted Pair Technologies
No ratings yet
Ethernet Over Twisted Pair Technologies
2 pages
Fortinet Product Matrix
No ratings yet
Fortinet Product Matrix
6 pages
Lksn2019 Itnetwork Modul A Actual
No ratings yet
Lksn2019 Itnetwork Modul A Actual
11 pages
HUAWEI E5330BsTCPU-21.210.19.00.00 Release Notes
No ratings yet
HUAWEI E5330BsTCPU-21.210.19.00.00 Release Notes
9 pages
Huawei NetEngine AR600 Series Enterprise Routers Datasheet
No ratings yet
Huawei NetEngine AR600 Series Enterprise Routers Datasheet
13 pages
Advanced Computer Networks-Outline
No ratings yet
Advanced Computer Networks-Outline
3 pages
RSVP Protocol Operation in MPLS
No ratings yet
RSVP Protocol Operation in MPLS
2 pages
VANET Routing Protocols :implementation and Analysis Using NS3 and SUMO
No ratings yet
VANET Routing Protocols :implementation and Analysis Using NS3 and SUMO
6 pages
Oracle Recovery Appliance Handbook: An Insider’S Insight
From Everand
Oracle Recovery Appliance Handbook: An Insider’S Insight
Ramesh Raghav
No ratings yet
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Operator’S Guide to Rotating Equipment: An Introduction to Rotating Equipment Construction, Operating Principles, Troubleshooting, and Best Practices
From Everand
Operator’S Guide to Rotating Equipment: An Introduction to Rotating Equipment Construction, Operating Principles, Troubleshooting, and Best Practices
Robert Perez
5/5 (4)
Ganache for Ethereum Development: Definitive Reference for Developers and Engineers
From Everand
Ganache for Ethereum Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Alertmanager Configuration and Operations Guide: Definitive Reference for Developers and Engineers
From Everand
Alertmanager Configuration and Operations Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Starfleet Survival Guide
From Everand
The Starfleet Survival Guide
David Mack
4.5/5 (4)
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
From Everand
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
First Hop Redundancy Protocol: Network Redundancy Protocol
From Everand
First Hop Redundancy Protocol: Network Redundancy Protocol
Mulayam Singh
No ratings yet
Analog Dialogue, Volume 48, Number 2
From Everand
Analog Dialogue, Volume 48, Number 2
Analog Dialogue
No ratings yet
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet