0% found this document useful (0 votes)

177 views2 pages

Storage Knowledge Base - Failed Disk Replacement in NetApp

This document discusses disk replacement procedures in NetApp storage systems and potential complications that may arise. It describes two scenarios: 1) A replaced disk is still shown as broken, which could be due to improper labeling or a faulty disk. Contacting NetApp for guidance is recommended. 2) Two disks fail in the same RAID group without spare disks available, putting data at high risk. Replacing disks ASAP from another system or contacting NetApp for immediate replacement is advised. Always maintaining adequate spare disks is important for avoiding data loss during rebuilds or replacement delays.

Uploaded by

panwar14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

177 views2 pages

Storage Knowledge Base - Failed Disk Replacement in NetApp

Uploaded by

panwar14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

1/18/2019 Storage Knowledge Base : Failed disk replacement in NetApp

More Create Blog Sign In

Storage Knowledge Base

Monday, January 20, 2014 Blog Archive

Failed disk replacement in NetApp ▼ 2014 (26)

▼
► February (8)
►
Disk failures are very common in storage environment and as a storage administrator we come across this situation very often, ▼ January (18)
▼
how often that depends how much disks your storage systems is having; more disks you manage more often you come across
Restoring data from Snashots
this situation. using Snaprestore com...
This post I have written considering RAID-DP with FC-AL disks because it’s always better than RAID4 and SCSI loops we don’t
use. Due to its design RAID-DP gives protection from double disk failure in a single raid group. To say that it means you will not
Data Deduplication Concepts
loose data even if 2 disks are failed in a single RG at same time or one after another. How to check part number of
As like any other storage system Ontap also uses a disk from spare disks pool to rebuild the data from surviving disk as soon as installed adapter in O...
it encounters a failed disk situation and sends an autosupport message to NetApp for parts replacement. Once autosupport is
Which is faster, NDMPcopy or vol
received by NetApp they initiate RMA process and part gets delivered to the address listed for that failed system in NetApp copy?
records. Once the disk arrives you change the disk by yourself or ask a NetApp engineer to come at onsite and change it,
How to check unplanned
whatever way as soon as you replace the disk your system finds the newly working disk and adds it in spare pool.
downtime detail for a NetAp...
Now wasn’t that pretty simple and straightforward? Oh yes; because we are using software based disk ownership and disk auto
assignment is turned on. Much like your baby had some cold so he called-up GP himself and got it cured rather than asking you
HA Configuration Checker (ha-
config-check.cgi)
to take care of him, but what about if there are some more complication.
Now, will cover what all other things can come in way and any other complications. HA GROUP ERROR:
Scenario 1: DISK/SHELF COUNT
MISMATCH ERROR
I have replaced my drive and light shows Green or Amber but ‘sysconfig -r' still shows the drive as broken?
Sometimes we face this problem because system was not able to either label the disks properly or replaced disk itself is not Failed disk replacement in NetApp
good. The first thing we try is to label the disk correctly if that doesn’t work try replacing with another disk or known good disk but
How to map a snapshot of a LUN
what if that too doesn’t work, just contact NetApp and follow their guidelines.
to another server f...
To label the disk from "BROKEN" to "SPARE" first you have to note down the broken disk id, which you can get from “aggr status
-r", now go to advance mode with “priv set advanced” and run “disk unfail ” at this stage your filer will throw some 3-4 errors on
How to fix WAFL hung in SK
process
console or syslog or snmp traps, depends on how you have configured but this was the final step and now disks should be good
which you can confirm with “disk show” for detailed status or “sysconfig -r” command. Give it a few seconds to recognize the Updating SP firmware
changed status of disk if status change doesn’t shows at first.
How to troubleshoot takeover of
Scenario 2: partner is disable...
Two disks have failed from same raid group and I don’t have any spare disk in my system.
Netapp NFS Exportfs CLI
Now in this case you are really in big trouble because always you need to have at least one spare disk available in your system
Configuration Guide
whereas NetApp recommends 1:28 ratio i.e. have one spare on each 28 disks. In the situation of dual disk failure you have very
high chances of loosing your data if another disk goes while you are rebuilding the data on spare disk or while you are waiting for Netapp Hardware connection
new disks to arrive. LUN Creation Using Snapdrive
So always have minimum 2 disks available in your system one disk is also fine and system will not complain about spare disk but
NetApp FC LUN Allocation on
if you leave system with only one spare disk then maintenance centre will not work and system will not scan any disk for
Windows Server - Hard ...
potential failure.
Now going to your above situation that you have dual disk failure with no spares available, so best bet is just ring NetApp to Steps for creating Vfiler
replace failed disk ASAP or if you think you are loosing your patient select same type of disk from another healthy system, do a Netapp Interview Questions
disk fail, remove disk and replace it with failed disk on other system.
After adding the disk to another filer if it shows Partial/failed volume, make sure the volume reported as partial/failed belongs to
► 2013 (31)
►
newly inserted disk by using “vol status -v” and “vol status -r" commands, if so just destroy the volume with “vol destroy”
command and then zero out the disk with “disk zero spares”.
This exercise will not take more than 15 min(except disk zeroing which depends on your disk type and capacity) and you will About Me
have single disk failure in 2 systems which can survive with another disk failure, but what if that doesn’t happens and you keep
running your system with dual disk failure. Your system will shut down by itself after 24 hours; yes it will shut down itself without
any failover to take, your attention. There is a registry setting to control how long your system should run after disk failure but I
think 24hrs is a good time and you shouldn’t increase or decrease it until and unless you think you don’t care of the data sitting
there and anyone accessing it. Rajat Garg
Scenario 3: Working as
My drive failed but there is no disk with amber lights Associate
A number of times these things happen because disk electricals are failed and no more system can recognize it as part of it. So
consultant in TCS
in this situation first you have to know the disk name. There are couple of methods to know which disk has failed.
with over 8 years
a) “sysconfig -r “ look for broken disk list of experience in
b) From autosupport message check for failed disk ID Storage
c) "fcadmin device_map" looks for a disk with xxx or “BYP” message Implementation
d) In /etc/messages look for failed or bypassed disk warning and there it gives disk ID and Support
Now once you have identified failed disk ID run “disk fail ” and check if you see amber light if not use “blink_on ” in advanced
View my
mode to turn on the disk LED or if that that fails turn on the adjusting disk’s light so you can identify the disk correctly using same
complete profile

http://rajat926.blogspot.com/2014/01/failed-disk-replacement-in-netapp.html 1/2
1/18/2019 Storage Knowledge Base : Failed disk replacement in NetApp
blink_on command. Alternatively you can use led_on command also instead of blink_on to turn on the disk LEDs adjacent to the
defective disk rather than its red LED.
If you use auto assign function then system will assign the disk to spare pool automatically otherwise use “disk assign ”
command to assign the disk to system.
Scenario 4:
Disk LED remains orange after replacing failed disk
This error is because you were in very hurry and haven’t given enough time for system to recognize the changes. When the
failed disk is removed from slot, the disk LED will remain lit until the Enclosure Services notices and corrects it generally it takes
around 30 seconds after removing failed one.
Now as you have already done it so better use led_off command from advanced mode or if that doesn’t works because system
believes that the LED is off when it is actually on, so simply turn the LED on and then back off again using “led_on ” then
“led_off ” commands.
Scenario 5:
Disk reconstruction failed
There could be a number of issues to fail the RAID reconstruction fail on new disk including enclosure access error, file system
disk not responding/missing, spare disk not responding/missing or something else, however most common reason for this failure
is outdated firmware on newly inserted disk.
Check if newly inserted disk is having same firmware as other disks if not first update the firmware on newly inserted disk and it
then reconstruction should finish successfully.
Scenario 6:
Disk reconstruction stuck at 0% or failed to start
This might be an error or due to limitation in ONTAP i.e. no more than 2 reconstructions should be running at same time. Error
which you might find a time is because RAID was in degraded state and system went through unclean shutdown hence parity
will be marked inconsistent and need to be recomputed after boot. However as parity recomputation requires all data disks to be
present in the RAID group and we already have a failed disk in RG so aggregate will be marked as WAFL_inconsistent. You can
confirm this condition with “aggr status -r" command.

If this is the case then you have to run wafliron, giving command “aggr wafliron start ” while you are in advance mode. Make sure

you contact NetApp before starting walfiron as it will un-mount all the volumes hosted in the aggregate until first phase of tests

are not completed. As the time walfiron takes to complete first phase depends on lots of variables like size of

volume/aggregate/RG, number of files/snapshot/Luns and lots of other things therefore you can’t predict how much time it will

take to complete, it might be 1 hr or might be 4-5 hrs. So if you are running wafliron contact NetApp at fist hand.

Posted by Rajat Garg at 7:38 AM

1 comment:

Anonymous October 21, 2014 at 3:26 AM

can you also let us know how we can use disk replace to replace mismatched disk
types like FC and ATA in an aggregate, is it possible.
Reply

Enter your comment...

Comment as: sukesh.panwa

Publish Preview

Newer Post Home Older Post

Simple theme. Theme images by luoman. Powered by Blogger.

http://rajat926.blogspot.com/2014/01/failed-disk-replacement-in-netapp.html 2/2

ASM Troubleshooting Overview PDF
No ratings yet
ASM Troubleshooting Overview PDF
27 pages
Physical Storage Management: Data ONTAP® 7.3 Fundamentals
No ratings yet
Physical Storage Management: Data ONTAP® 7.3 Fundamentals
49 pages
Notice
No ratings yet
Notice
27 pages
Windows Unified Host Utilities 70 Installation
No ratings yet
Windows Unified Host Utilities 70 Installation
53 pages
All Netapp2
No ratings yet
All Netapp2
167 pages
Disk Replacement Commands
No ratings yet
Disk Replacement Commands
2 pages
Disk Failures in The Real World: What Does An MTTF of 1,000,000 Hours Mean To You?
No ratings yet
Disk Failures in The Real World: What Does An MTTF of 1,000,000 Hours Mean To You?
16 pages
Netapp Setup
100% (1)
Netapp Setup
47 pages
01 Hardware and Loop
No ratings yet
01 Hardware and Loop
43 pages
1
No ratings yet
1
5 pages
MegaRaid CLI Common Usage (Rev.5) PDF
No ratings yet
MegaRaid CLI Common Usage (Rev.5) PDF
5 pages
Handling Disk Failure in MapR FS - #WhiteboardWalkthrough - MapR
No ratings yet
Handling Disk Failure in MapR FS - #WhiteboardWalkthrough - MapR
13 pages
Windows Unified Host Utilities 70 Release Notes
No ratings yet
Windows Unified Host Utilities 70 Release Notes
30 pages
Solaris Disk Repl Sds
No ratings yet
Solaris Disk Repl Sds
3 pages
Faq - Ontap - Data Ontap Log Overview
No ratings yet
Faq - Ontap - Data Ontap Log Overview
8 pages
RMA Process Overview
No ratings yet
RMA Process Overview
29 pages
HP Disk Drive Replacement Instructions: About This Document
No ratings yet
HP Disk Drive Replacement Instructions: About This Document
5 pages
Usp/Nsc MICROCODE VERSION 50-09-82-00/00 RELEASED 01/07/2009 Newly Supported Features and Functions For Version 50-09-82-00/00
No ratings yet
Usp/Nsc MICROCODE VERSION 50-09-82-00/00 RELEASED 01/07/2009 Newly Supported Features and Functions For Version 50-09-82-00/00
5 pages
Storage Knowledge Base - NetApp FC LUN Allocation On Windows Server - Hard Zoning
No ratings yet
Storage Knowledge Base - NetApp FC LUN Allocation On Windows Server - Hard Zoning
6 pages
Business Continuity Presentation
100% (2)
Business Continuity Presentation
30 pages
Keep Sensitive Data Safe With HPE Defective Media Solutions Solution Brief-4aa1-8067enw - 2
No ratings yet
Keep Sensitive Data Safe With HPE Defective Media Solutions Solution Brief-4aa1-8067enw - 2
2 pages
Netapp Commands Helpful
No ratings yet
Netapp Commands Helpful
53 pages
VMware Vsphere® Metro Storage Cluster Recommended Practices
No ratings yet
VMware Vsphere® Metro Storage Cluster Recommended Practices
48 pages
How To Match A LUN's NAA Number To Its Serial Number
No ratings yet
How To Match A LUN's NAA Number To Its Serial Number
4 pages
Status Code - 213 (No Storage Units Available For Use)
No ratings yet
Status Code - 213 (No Storage Units Available For Use)
7 pages
Storage Knowledge Base - Netapp NFS Exportfs CLI Configuration Guide
No ratings yet
Storage Knowledge Base - Netapp NFS Exportfs CLI Configuration Guide
3 pages
Shelves With IOM12 IOM12B Modules Maintain
No ratings yet
Shelves With IOM12 IOM12B Modules Maintain
60 pages
NetApp Command Line Cheatsheet
No ratings yet
NetApp Command Line Cheatsheet
15 pages
Recover A Failed RAID Without Deleting Data On APG40
No ratings yet
Recover A Failed RAID Without Deleting Data On APG40
11 pages
How To Replace Disk With Solaris Volume Manager
100% (1)
How To Replace Disk With Solaris Volume Manager
4 pages
NetApp Disk Administration
No ratings yet
NetApp Disk Administration
30 pages
How To Replace Disk NetApp
No ratings yet
How To Replace Disk NetApp
2 pages
NetApp CheatSheet
No ratings yet
NetApp CheatSheet
20 pages
Hard Drive Troubleshooting: Isolating The Problem
No ratings yet
Hard Drive Troubleshooting: Isolating The Problem
3 pages
Diff Failing Failed Disks
No ratings yet
Diff Failing Failed Disks
2 pages
ASM Troubleshooting Overview PDF
No ratings yet
ASM Troubleshooting Overview PDF
27 pages
Comparison On DMX4 and VMAX
No ratings yet
Comparison On DMX4 and VMAX
18 pages
NetApp Commandline Cheatsheet
No ratings yet
NetApp Commandline Cheatsheet
18 pages
Ds4000 Rlogin CMD
No ratings yet
Ds4000 Rlogin CMD
2 pages
Disk Replacement NetApp
No ratings yet
Disk Replacement NetApp
1 page
NetApp - Cheatsheet
No ratings yet
NetApp - Cheatsheet
21 pages
GDC Tech Bulletin Hard Drive Diagnostics v2
No ratings yet
GDC Tech Bulletin Hard Drive Diagnostics v2
5 pages
How To Remove and Replace A Data Disk That Is About To Fail
No ratings yet
How To Remove and Replace A Data Disk That Is About To Fail
2 pages
How To Identify A Failed Disk Prior To Replacement
No ratings yet
How To Identify A Failed Disk Prior To Replacement
4 pages
Storage
No ratings yet
Storage
3 pages
Disk Replacement Procedure-SDS4 2
No ratings yet
Disk Replacement Procedure-SDS4 2
2 pages
Five Little-Known Tips To Increase Netapp Storage Resiliency
No ratings yet
Five Little-Known Tips To Increase Netapp Storage Resiliency
4 pages
Physical Storage: Data ONTAP 8.0 7-Mode Administration
No ratings yet
Physical Storage: Data ONTAP 8.0 7-Mode Administration
71 pages
Netapp Disk Assign
100% (2)
Netapp Disk Assign
6 pages
Status Code 84: Media Write Error
No ratings yet
Status Code 84: Media Write Error
6 pages
Detect Drive Failure
No ratings yet
Detect Drive Failure
3 pages
How To Run A Disk Check To Fix Bad Sectors
No ratings yet
How To Run A Disk Check To Fix Bad Sectors
6 pages
Raid Levels - Raid 0, Raid1, Raid 10, Raid 5, Raid 6 (Complete Tutorial)
No ratings yet
Raid Levels - Raid 0, Raid1, Raid 10, Raid 5, Raid 6 (Complete Tutorial)
8 pages
OneFS - How To View Active Directory Provider Status and User Mapping Token Information - Dell India
No ratings yet
OneFS - How To View Active Directory Provider Status and User Mapping Token Information - Dell India
5 pages
SOP - Removing Disabled Paths From Solaris-Veritas
No ratings yet
SOP - Removing Disabled Paths From Solaris-Veritas
6 pages
Isilon OneFS - SW - SIQ - RPO - EXCEEDED Warning SyncIQ RPO Exceeded For policyXXXXXX - Dell India
No ratings yet
Isilon OneFS - SW - SIQ - RPO - EXCEEDED Warning SyncIQ RPO Exceeded For policyXXXXXX - Dell India
4 pages
(Avamar) - How To Create - Modify - Verify Isilon NDMP User - Settings - Dell India
No ratings yet
(Avamar) - How To Create - Modify - Verify Isilon NDMP User - Settings - Dell India
5 pages
Isilon - SyncIQ Workers Repeatedly Restart Causing Replicated Data To Be Larger Than The Actual Data Set - Dell India
No ratings yet
Isilon - SyncIQ Workers Repeatedly Restart Causing Replicated Data To Be Larger Than The Actual Data Set - Dell India
2 pages
On System Reboot, Node Is Reporting - Disk Auto-Partitioning Is Disabled On This System - ADP DISABLED.
No ratings yet
On System Reboot, Node Is Reporting - Disk Auto-Partitioning Is Disabled On This System - ADP DISABLED.
3 pages
Disk Remains Amber Light After Replacing The Failed Disk
No ratings yet
Disk Remains Amber Light After Replacing The Failed Disk
3 pages
EMS Log Reports - Scsi CMD Checkcondition - Error On Disk
No ratings yet
EMS Log Reports - Scsi CMD Checkcondition - Error On Disk
4 pages
NS0 093 Demo
No ratings yet
NS0 093 Demo
7 pages
Centera How To Find A Faulty Disk and Bring The Node Back Online
No ratings yet
Centera How To Find A Faulty Disk and Bring The Node Back Online
4 pages

Storage Knowledge Base - Failed Disk Replacement in NetApp

Uploaded by

Storage Knowledge Base - Failed Disk Replacement in NetApp

Uploaded by

1/18/2019 Storage Knowledge Base : Failed disk replacement in NetApp

More Create Blog Sign In

Storage Knowledge Base

Monday, January 20, 2014 Blog Archive

Failed disk replacement in NetApp ▼ 2014 (26)

Posted by Rajat Garg at 7:38 AM

Anonymous October 21, 2014 at 3:26 AM

Enter your comment...

Comment as: sukesh.panwa

Newer Post Home Older Post

Subscribe to: Post Comments (Atom)

Simple theme. Theme images by luoman. Powered by Blogger.

You might also like