Storage Knowledge Base - Failed Disk Replacement in NetApp
Storage Knowledge Base - Failed Disk Replacement in NetApp
http://rajat926.blogspot.com/2014/01/failed-disk-replacement-in-netapp.html 1/2
1/18/2019 Storage Knowledge Base : Failed disk replacement in NetApp
blink_on command. Alternatively you can use led_on command also instead of blink_on to turn on the disk LEDs adjacent to the
defective disk rather than its red LED.
If you use auto assign function then system will assign the disk to spare pool automatically otherwise use “disk assign ”
command to assign the disk to system.
Scenario 4:
Disk LED remains orange after replacing failed disk
This error is because you were in very hurry and haven’t given enough time for system to recognize the changes. When the
failed disk is removed from slot, the disk LED will remain lit until the Enclosure Services notices and corrects it generally it takes
around 30 seconds after removing failed one.
Now as you have already done it so better use led_off command from advanced mode or if that doesn’t works because system
believes that the LED is off when it is actually on, so simply turn the LED on and then back off again using “led_on ” then
“led_off ” commands.
Scenario 5:
Disk reconstruction failed
There could be a number of issues to fail the RAID reconstruction fail on new disk including enclosure access error, file system
disk not responding/missing, spare disk not responding/missing or something else, however most common reason for this failure
is outdated firmware on newly inserted disk.
Check if newly inserted disk is having same firmware as other disks if not first update the firmware on newly inserted disk and it
then reconstruction should finish successfully.
Scenario 6:
Disk reconstruction stuck at 0% or failed to start
This might be an error or due to limitation in ONTAP i.e. no more than 2 reconstructions should be running at same time. Error
which you might find a time is because RAID was in degraded state and system went through unclean shutdown hence parity
will be marked inconsistent and need to be recomputed after boot. However as parity recomputation requires all data disks to be
present in the RAID group and we already have a failed disk in RG so aggregate will be marked as WAFL_inconsistent. You can
confirm this condition with “aggr status -r" command.
If this is the case then you have to run wafliron, giving command “aggr wafliron start ” while you are in advance mode. Make sure
you contact NetApp before starting walfiron as it will un-mount all the volumes hosted in the aggregate until first phase of tests
are not completed. As the time walfiron takes to complete first phase depends on lots of variables like size of
volume/aggregate/RG, number of files/snapshot/Luns and lots of other things therefore you can’t predict how much time it will
take to complete, it might be 1 hr or might be 4-5 hrs. So if you are running wafliron contact NetApp at fist hand.
1 comment:
Publish Preview
http://rajat926.blogspot.com/2014/01/failed-disk-replacement-in-netapp.html 2/2