108nl Replace Boot Drive

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Replace a Boot Drive in a 108NL

Node

May 10, 2012


994-0033-01REVB
Copyright © 2012 Isilon Systems LLC. All rights reserved.
Due to ongoing product development, innovation, and revision, the information contained in this document is subject to
change without notice. Isilon will publish updates and revisions to this document as needed. The products documented
herein are provided under End User License Agreements. Refer to the agreements for details governing the use of Isilon
products.
To comment on this documentation, submit your feedback to [email protected].
Field replace a boot drive

Follow this procedure to replace a failed boot drive in the field.


This node contains two flash boot drives. The boot drives contain data vital to the health of the node, including the
OneFS operating system, backups of the node journal, and more. The two boot drives mirror each other to ensure a
source of uncorrupted data in the event of a boot drive failure.
Because there are two boot drives, first accurately identify the failed drive, and then replace the failed drive.

Attention If this procedure is not followed accurately, data loss and severe disruption of cluster operations can
occur. Perform every step in this procedure; if the system does not respond as expected, contact Isilon Product
Support.

Replace a failed boot drive by following these steps:


1. Identify the failed boot drive.
2. Install the boot drive replacement.
3. Verify the successful installation of the replacement boot drive.
This chapter includes the following sections:

■ Install the boot drive replacement on page 3


■ Return the failed part to Isilon on page 9
■ Where to go for support on page 9

Install the boot drive replacement


Identify and remove the failed boot drive, and then install and test the replacement drive.

Write a sentinel file


Because replacement of an incorrect boot drive severely affects cluster stability, verify the successful replacement of
the failed drive. Prior to replacing the failed drive, write a sentinel file to the root partition. After the failed drive is
replaced, the file is only readable if the correct boot drive is replaced.

Procedure

◆ Use the output of isi devices as the content of the file. Write a sentinel file to the root partition by typing the
command:
isi devices > /sentinel.txt

3
Field replace a boot drive

Identify a failed boot drive


When performing a boot drive replacement on a node with two boot drives, first determine which of the two drives
must be replaced.

Procedure

1. Using a serial cable, connect to the node you are going to work on.
2. View boot drive information by typing the following command:
atacontrol list
The following information appears:
ATA channel 0:
Master: no device present
Slave: no device present
ATA channel 1:
Master: ad2 <SanDisk SSD P4 8GB/SSD 8.10> Serial ATA v1.0 II
Slave: ad3 <SanDisk SSD P4 8GB/SSD 8.10> Serial ATA v1.0 II

The boot drives are listed under ATA channel 1. In the previous example, both boot drives are healthy. If one
of the boot drives has failed, the display reads no device present for that drive.
Determine whether the failed boot drive is the ad2 or ad3 device, and then use the following table to determine
the location of the boot drive inside the node.

Boot Order OneFS Drive ID Board Drive Slot Inside Node

Master ad2 J3

Slave ad3 J4

Make note of the board drive slot that contains the failed boot drive.

! Caution If both drives appear to have failed, do not continue. Contact Isilon Product Support immediately.
3. If both drives appear to be healthy, one of the drives may have partially failed. To identify a partially failed drive,
check the status of the individual partition mirrors by typing the following command:
gmirror status
From left to right, the output displays the name of each mirror, the status of the mirror relationship, and the
component IDs for each boot drive.
The following example shows the boot drive partition layout in a healthy node. The mirrors for each partition
show:
■ a value of COMPLETE in the Status column.
■ the component IDs for both boot drives in the Components column. The component IDs are a combination of
the OneFS Drive ID, and the partition number (the number following the letter p). Both boot drives are listed
for each mirror with the exception of the var-crash mirror, which only lists the slave drive.

Note The partition numbers in the display may differ from the following example.

Name Status Components


mirror/root0 COMPLETE ad3p4
ad2p4
mirror/var-crash COMPLETE ad3p10
mirror/mfg COMPLETE ad3p9
ad2p10
mirror/journal-backup COMPLETE ad3p8

4 Isilon Systems LLC.


Field replace a boot drive

ad2p8
mirror/var1 COMPLETE ad3p7
ad2p7
mirror/var0 COMPLETE ad3p6
ad2p6
mirror/root1 COMPLETE ad3p5
ad2p5

The following example shows the boot drive partition layout as it appears in the event of a failed boot drive. A
failed boot drive forces the mirrors for a partition to show:
■ a value of DEGRADED in the Status column.
■ only the component ID of the healthy boot drive in the Components column. The failed boot drive does not
appear.

Attention DEGRADED does not refer to a specific drive, but to the mirror relationship between the drives. If a
drive appears in the Components column next to the DEGRADED status, it is healthy and should not be removed.

Name Status Components


mirror/root0 DEGRADED ad2p4
mirror/var-crash COMPLETE ad3p10
mirror/mfg COMPLETE ad3p9
ad2p10
mirror/journal-backup COMPLETE ad3p8
ad2p8
mirror/var1 COMPLETE ad3p7
ad2p7
mirror/var0 DEGRADED ad2p6
mirror/root1 COMPLETE ad3p5
ad2p5

In the previous example, ad3p4 is missing from the degraded partition mirror/root0, and ad3p6 is missing
from the degraded partition mirror/var0. The missing drive, ad3, is the failed drive.
Determine which drive has failed. Use the previous table to determine which board drive slot contains the failed
boot drive and make a note of the number (J3 or J4).

Attention If both drives have failed, do not continue. Contact Isilon Product Support.

Shut down the node


Shut down the node before performing maintenance.

Procedure
1. Connect to an available node in the cluster with a serial cable or network drop.
2. From the node that you connected to, open a secure shell (SSH) connection to the node that is to be shut down by
typing the command:
ssh [node IP address]
Type the command isi status to determine the IP address of a node.
3. Shut down the node by typing the following command:
shutdown -p now
4. Verify that the node is shut down by typing the following command:
isi status
Confirm that the node has a D (Down) status. See node 3 in the following example.
ID |IP Address |DASR| In Out Total| Used / Size |Used / Size
---+---------------+----+-----+-----+-----+------------------+-
1|10.53.217.201 | OK | 48M| 0| 48M| 19G/ 6.2T(< 1%)|(No SSDs)

Isilon Systems LLC. 5


Field replace a boot drive

2|10.53.217.202 | OK | 46M| 0| 46M| 23G/ 6.2T(< 1%)|(No SSDs)


3|10.53.217.203 |D---| n/a| n/a| n/a| n/a/ n/a( n/a)|n/a/n/a( n/a)

Unplug AC power from the node


Interrupt standby power to the boot drives by disconnecting A/C power from the node.
Procedure
◆ Remove all A/C power connections to the node.

Slide the node out of the rack


Slide the node away from the rack to access the contents of the node.
Procedure
1. Remove the retaining screws that secure the node to the rack cabinet.
2. Slide the node from the rack cabinet to fully extend the slide rails and provide clear access to the node. Do not
remove the node from the slide rails.

Remove the node top panel


Remove the top panel to gain access to the contents of the node.
Procedure
1. Properly ground yourself to prevent electrostatic discharge from damaging the node. For example, attach an ESD
strap to your wrist and the node chassis.
2. Loosen the captive screw securing the node top panel.
3. Slide the top panel toward the rear of the node, and lift the top panel to access the node interior.

Replace the failed boot drive


After the node is open, locate the failed boot drive. Remove and replace the failed drive with a new drive.

6 Isilon Systems LLC.


Field replace a boot drive

1. Boot drives 2. Boot drive retainer

Procedure

1. Remove the SSD boot drive retainer.


2. Locate the two board drive slots that contain the boot drives. The slots are labeled J3 and J4. Gently pull the failed
boot drive from the board drive slot.

1. J3 connector 2. J4 connector
3. Insert the replacement boot drive into the empty boot drive slot and gently press down to secure the drive.
4. Replace the boot drive retainer to secure the drives.

Install the node top panel


Secure the top panel, and then return the node to the rack.

Procedure

1. Place the top panel on the node so that the front edge of the top panel is about one inch behind the drive bays and
then slide the top panel forward into place.

! Caution The chassis intrusion switch can be damaged if the top panel is slid too far back on the node.
2. Tighten the captive top panel screw to secure the top panel to the node.

Isilon Systems LLC. 7


Field replace a boot drive

Return the node to the rack


Return the node to the rack after all work is complete.

Procedure

1. Slide the node back into the rack cabinet.


2. Secure the node to the rack cabinet.

Isolate the node from the cluster


Isolate the node from the cluster by disconnecting it from the InfiniBand network.

Procedure

1. Label the InfiniBand cables to ensure that they are reconnected properly later.
2. Disconnect the InfiniBand cables from the back of the node.
3. Connect directly to the node using a serial cable.

Power on the node


Ensure that the node is connected to power and turn it on.

Procedure

◆ Power on the node by pressing the ON/OFF button on the back panel of the node. It is located just left of center,
toward the upper part of the back panel.

Verify sentinel file


Confirm that the correct boot drive is replaced. Locate the sentinel file that was sent to the root partition.

Procedure

1. Locate the sentinel file in the root partition by typing the following command:
cat /sentinel.txt
If the sentinel file appears, you replaced the correct boot drive. If the file is missing, do not continue. Contact
Isilon Product Support.
2. Remove the file by typing the following command:
rm /sentinel.txt

Verify healthy boot drives


Verify that the new drive is active and healthy after the replacement boot drive is installed in the node.

Procedure

1. Verify that the boot drives are healthy by typing the following command:
gmirror status
The following information appears:
Name Status Components
mirror/root0 COMPLETE ad3p3
ad2p4
mirror/var-crash COMPLETE ad3p9

8 Isilon Systems LLC.


Field replace a boot drive

mirror/mfg COMPLETE ad3p8


ad2p10
mirror/journal-backup COMPLETE ad3p7
ad2p8
mirror/var1 COMPLETE ad3p6
ad2p7
mirror/var0 COMPLETE ad3p5
ad2p6
mirror/root1 COMPLETE ad3p4
ad2p5

Confirm that the values in the Status column all read COMPLETE.
2. Verify boot drive information by typing the following command:
atacontrol list
The following information appears:
ATA channel 0:
Master: no device present
Slave: no device present
ATA channel 1:
Master: ad2 <SanDisk SSD P4 8GB/SSD 8.10> Serial ATA v1.0 II
Slave: ad3 <SanDisk SSD P4 8GB/SSD 8.10> Serial ATA v1.0 II

Ensure that both boot drives are listed.

Connect the node back to the cluster


Reconnect all InfiniBand cables to add the node back into the cluster.

Procedure

◆ Connect all InfiniBand cables to the rear of the node.


The cluster automatically accepts the node; it is unnecessary to power cycle the node.

Return the failed part to Isilon


Return the failed part to Isilon Product Support.

Procedure

1. Contact Isilon Product Support to notify them that you are returning a failed part.
2. Package the failed part in the packaging materials provided with the replacement part.
3. Attach the return label that was included with the replacement part.
4. When filling in the RMA number, use the support case number provided by Isilon Product Support.
5. Ship the failed part to the address specified on the return label.

Where to go for support


Contact Isilon Product Support for any questions about Isilon products.

Local: 1-206-777-7970
Toll Free: 1-866-276-0723
Email: [email protected]
Online: isilon.com/support
Japan Support: 03-5358-7180

Isilon Systems LLC. 9

You might also like