Processor Controller Module (PCM) Replacement For The Fas2040 For Netapp Authorized Service Engineers
Processor Controller Module (PCM) Replacement For The Fas2040 For Netapp Authorized Service Engineers
README FIRST
New Battery Process for Processor Controller Module (PCM) Replacement:
1) The replacement PCM comes with a pre-installed NVMEM battery that may be discharged.
2) Effective 8-May-2015, new process is in place to ship a new NVMEM battery (p/n X1845A-R6) in a separate box
to replace the battery that is pre-installed on the rev "A", 3244A-R5 replacement PCM.
- No separate battery is shipped for a rev "B", 3244B PCM.
3) Check your dispatch data to see if a battery is mentioned or the p/n X1845A-R6 is listed. On rare occasions a
separate NVMEM battery might not have shipped. In this case, the battery from the original PCM should be
moved to the replacement PCM.
No "Failed" Disks can exist in the target node in a HA config or the disk reassign will not execute. The AP covers this.
If this system has SAN attached Tape drives, confirm a storage admin is available to remap the switch if the SAN
Tape is using on-board FC Adapter.
Known Bugs/Issues - Bug Table and Notes Below
Bug Description First Fixed Release
TSB-1110-04 is an internal Bulletin: When the disk reassign is performed on the
TSB-1110-
1 partner (HA-takeover) the GB must be immediately performed, a TO/GB from the See Note 1
04 repaired node is required to sync the system-IDs.
Note: For ONTAP 8.0.5 or higher and 8.1.3 or higher the console message to perform an additional TO/GB can be ignored.
(The bugs listed in the TSB are fixed in ONTAP releases 8.0.5 and 8.1.3 and higher, but the console message was not
removed.)
590488 2 In “disruptive” MB w/NVMEM replacements, a TO/GB from the repaired node is req'd. See Note 2
489060 3 NDMP, Qtree-SnapMirror, Vol-SnapMirror or SnapVault processes can hang TO/GB See Note 3
Bug Notes:
1 In some versions of ONTAP when the 'disk reassign' command is executed from the partner, ONTAP may print out a
warning that states 2 things.
(i) The giveback must be done right way - IF a GB will not be immediately performed, the disk reassign needs to be
post-poned.
(ii) A second TO/GB should be performed from the repaired node. This is covered in the AP. (TSB-1110-04)
2 IF this system has a partner AND the partner did NOT takeover this controller, it is still necessary to sync the new system-ids
by executing a TO/GB from the repaired node although no console message is displayed. This is covered in the AP.
3 The AP will cover asking the customer if they are running these processes. If so, there is a link how to disable them.
README FIRST
This AP has been updated to include commands for systems running "Cluster-Mode" (C-Mode) ONTAP.
● The login name for C-Mode systems is "admin", not "root".
● The ONTAP version and mode is listed in your dispatch!
● C-Mode: Has two console command shells, clustershell and nodeshell. The default shell is clustershell.
IF clustershell, the console prompt includes a double colon ( :: ). Ex(1): cluster ::> Ex(2): cluster ::storage>
● To switch from clustershell to nodeshell, enter 'run local' at the ::> prompt, then the double colons (::) are
removed. To exit nodeshell, enter 'exit' or Ctrl-D.
● From clustershell, nodeshell commands can be entered by prefacing the 7-Mode command with “run local".
Ex: cluster::> run local sysconfig -v Note, all 7-Mode commands are not supported in C-Mode.
This AP has been updated to include additional commands and procedures for a system configured with NSE
(NetApp Storage Encryption ) disks.
● NSE is supported in DOT 8.1 and higher, 7-mode only - No cluster mode support at this time.
● The DOT version and mode is listed in your dispatch!
● The "badging" on the NSE disk canister is embossed as compared to standard disk badging. See picture >> here
● Presently all NSE systems are HA configured, and all disks in all shelves must be NSE disks.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 1 of 26
2u Fig 2
A
AC
" ! " LED is ON when hardware
failures are detected or if B
controller failover is disabled. Rear View
HA (Active-Active) Configurations:
2 PCMs, (A & B)
Non-HA Configurations:
1 PCM in the bottom slot
Fibre Channel Ports: 0a, 0b Ethernet Ports: e0a, e0b, e0c, e0d
Fig 3
Page 2 of 26
5. FC port configuration, disk list and the system date are captured prior to
removing the original Controller.
6. Compact Flash (CF) Card needs to be moved from the Original PCM to the
Replacement PCM.
2 Check the state of the node by viewing the console port responses from (each) controller if HA (Active-Active) configuration . HA
config requires two controller assemblies installed in the same physical chassis. Detailed messages here> Appliance Check
3 Non-HA Controller Configuration: If the console response is "login" or "password" or the <system prompt>, the end-user
will have issue a 'halt' on the system for proper shutdown. Work with NGS if you have questions.
NOTE The "LOADER" prompt will include -A if attached to the top controller or -B if attached to the bottom controller.
NOTE HA-config Status Command: After logging in, "cf status" will display the state of the HA . Example of >> cf status cmd
WARNING for HA configurations:
STOP! If the failure has caused a HA failover you may have been dispatched on the surviving controller's serial number, not the
failed one.
4 Dual Controller Configuration
a) If both controllers' are UP and Online: the end-user will have to issue a cf takeover from the partner node if
controller failover is active or halt it if controller failover is disabled. Work with NGS if you have questions.
5 If the console response is "LOADER-A|B>", go to Section IV.
6 Continue with Section III on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 4 of 26
LOADER-A>
b) Disable the auto-giveback option if enabled from the partner node. (copy-n-paste)
2 The date and time is stored in the system PROM in Greenwich Mean Time, (GMT) also known as Universal Time Clock, (UTC).
At the LOADER> prompt, enter: show date Record on paper the system's GMT time and the local time to determine the
number of hours (and minutes) the local time is ahead or behind GMT.
3 Enter: printenv This command displays (and captures) all boot environmental variables.
Page 5 of 26
LOADER> ^G (CTRL-G)
=== OEMCLP v1.0.0 BMC v1.5 === STEP 5: Enter: (CTRL-G) to enter the BMC-shell
bmc shell ->
Example Only
From the output, verify how the "ipaddr" is configured. If it shows a valid IP Adress, you will need to run "BMC
STOP setup" on the replacement MB later in this AP.
No "BMC setup" will be required if the "ipaddr" value is "0.0.0.0" - means BMC is not configured in this system.
b) Enter: 'exit' to return to "LOADER>" prompt.
7 Continue with Section IV on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 6 of 26
9 From the *> prompt enter: fcadmin config to log the configuration of the integrated FC host adapters.
a) Check if "0a" and "0b" Adapter ports are configured as a "target". If so, it will need to be verified later.
Page 7 of 26
17 Go to Section V, "Remove the cables and extract the PCM" on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 8 of 26
Fig 6
FAS2040 PCM
Bottom View NetApp Label is on the Top Side
4 Exchange the CF cards between the PCMs. The one marked "O" should now be in the replacement PCM.
5 Go to Section VII, "Replace the NVMEM Battery" on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 9 of 26
Fig 7
Battery and cable connector. Press
tab to release connector. Battery
FAS2040 is held into the module by Velcro tape.
PCM
Connector is next to Heat Sink
which may be Hot, let it cool
VIII. FAS2040: Partially Reinsert the Replacement PCM and Reconnect the cables
Step Action Description
1 Partially insert the PCM into the slot so that the cables can be attached- DO NOT engage the backplane yet.
2 Cables: Fully insert each cable that was removed to its proper port until it clicks in. Test by pulling on them. Especially the
FC and SAS ports!
4 IF you miss the window to abort the autoboot, look for this message: "Press CTRL-C for boot menu" and complete steps
4a-4c, otherwise if at the "LOADER" prompt, skip to step 5.
a. Immediately Press ^C (CTRL-C) to access the "Boot menu".
b. If a 'System ID mismatch' warning message below is displayed, answer : y
.......
.......
*******************************
* *
* Press Ctrl-C for Boot Menu. *
* *
******************************* STEP 4a:
^C Press "CTRL-C"
Boot Menu will be available.
Restoring /var from /cfcard/x86/freebsd/varfs.tgz
WARNING: System id mismatch. This usually occurs when replacing CF or NVRAM cards!
Override system id? {y|n} [n] y
STEP 4b: Enter: y
c. Next, drop to the LOADER prompt from the Boot Menu by following the linked process > here
5 Continue with Section IX on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 11 of 26
IX. FAS2040: Set date and time on the RTC (cont.)
Step Action Description
6 At the LOADER-A|B> prompt enter: show date to display the date and time in GMT on the new PCM
7 The original motherboard's GMT time and local time should have been recorded in Section IV. If you don't have it, you can
obtain the GMT time from the partner node, or another NetApp appliance or any Unix Server using: date -u (The "-u" option
displays the time in GMT/UTC) The new motherboard's Real Time Clock (RTC) must be set within 2 minutes of the time
displayed (which is GMT time) for users to be able to re-connect to this appliance.
NOTE Detailed instructions for another method of obtaining the time in GMT and setting the date and time is here> RTC Check
8 To set the time issue: set time hh:mm:ss Set the time in GMT using 24 hour format - Do not set the time to local time.
NOTE If this maintenance period spans across the midnight hour in GMT time, the DATE will also need to be set.
9 To change the date, issue: set date mm/dd/yyyy (mm = 2-digit month, dd = 2-digit Day, yyyy = 4-digit Year)
10 If the date or time was changed, issue: show date again to verify the GMT date and time are correct.
11 Go to Section X, "Verify Battery Status" on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 12 of 26
LOADER-A> ^G (CTRL-G)
=== OEMCLP v1.0.0 BMC v1.5 === STEP 4: Enter: (CTRL-G) to enter the BMC-shell.
bmc shell ->
Page 13 of 26
LOADER-A> boot_diags
Loading X86_ELF/diag/diag.krn:..0x200000/12629600 0xe0b660/4226832 0x1213570/8 Entry at
0x00200000
Starting program at 0x00200000 STEP 1: Enter: boot_diags
Diagnostic Monitor
version: 5.4.3
built: Tue Oct 6 15:00:13 PDT 2009
--------------------------------------
all Run all system diagnostics
mb FAS2040 motherboard diagnostic
mem Main memory diagnostic
cf-card CompactFlash controller diagnostic
stress System wide stress diagnostic
Commands:
Config (print a list of configured PCI devices)
Default (restore all options to default settings)
Exit (exit diagnostics)
Help (print this commands list)
Options (print current option settings)
Version (print the diagnostic version)
Run <diag ... diag> (run selected diagnostics)
Options:
Count <number> (loop selected diagnostic(s) (number) of passes)
Loop <yes|no> (loop selected diagnostic(s))
Status <yes|no> (print status messages)
Stop <yes|no> (stop-on-error / keep running) NOTE: New RUN
Xtnd <yes|no> (extended tests / regular tests) Command options
Mchk <auto|off|on|halt> (machine check control)
Cpu <0|1> (default cpu)
Seed <number> (random seed (0:use machine generated number))
Page 14 of 26
Page 15 of 26
CompactFlash Diagnostic
------------------------
..... Note: That the Comprehensive
****** Comprehensive CompactFlash test .......... PASSED Memory test & Comprehensive
CompactFlash test "PASSED"
Pass = 1, Current date = Saturday Jul 15 09:46:39 2011
Page 16 of 26
WARNING: System id mismatch. This usually occurs when replacing CF or NVRAM cards!
Override system id? {y|n} [n] y STEP 1a): Enter: y
.....
If the replacement PCM fails to boot to the Maintenance menu, confirm the original Boot Device (CF Card) moved from the
NOTE
original MB to the replacement. Engage NGS for assistance.
If the system reports the battery voltage is too low or a critical failure, do NOT proceed - Do NOT bypass the system stop.
STOP Engage NGS for assistance.
Under NO CIRCUMSTANCES bypass the system halt to "boot" the system on a NVMEM battery voltage issue.
2 Review the fcadmin config output from Section IV. If any onboard Adapters (0a, 0b) were configured as "target" verify they are
still configured by entering: fcadmin config If one or more adapters need to be set as a "target" follow steps 2a-2b. If all
are OK, skip to step 3.
a) For each Adapter to be configured as a target enter: fcadmin config -t target <HA> Issue one command per
adapter. This example configures Adapter ports '0a' and '0b' as targets:
If the adapter that needs to be changed to a target, is listed as " online", it must be off-lined first before it can be
NOTE
changed. Issue: fcadmin offline <HA>
b) Enter: fcadmin config to confirm the changed FC Adapters are displaying as PENDING: (target) ports.
3 If any FC cables were disconnected from adapters '0a' or '0b' due to boot issue, firmly reconnect them now. Must click in.
4 Continue with Section XII on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 17 of 26
*> disk show -v In this example, the local System ID for the new Controller is 1743755272.
Local System ID: 1743755272 The old MB System ID was 122217803 (disk show -v from Section IV). The
Example Only disks need to be reassigned to the local System ID.
DISK OWNER POOL SERIAL NUMBER HOME
-------- ------------------ ----- -------------------- ------------------
0c.00.0 tsst-2 (142217816) Pool0 3LM17RW900009750Q6SF tsst-2 (142217816)
0c.00.1 tsst-2 (142217816) Pool0 3LM1623E00009750Q7YT tsst-2 (142217816)
.....
.....
0a.41 tsst-2 (142217816) Pool0 JLVT29GC tsst-2 (142217816)
0a.43 tsst-2 (142217816) Pool0 JLVT7BUC tsst-2 (142217816)
.....
0b.21 tsst-1 (122217803) Pool0 JLVT0KDC tsst-1 (122217803)
0b.18 tsst-1 (122217803) Pool0 JLVT2HZC tsst-1 (122217803)
.....
.....
0d.01.6 tsst-1 (122217803) Pool0 9QJ75925 tsst-1 (122217803)
0d.01.10 tsst-1 (122217803) Pool0 9QJ74TQG tsst-1 (122217803)
0d.01.9 tsst-1 (122217803) Pool0 9QJ758NQ tsst-1 (122217803)
0d.01.5 tsst-1 (122217803) Pool0 9QJ74VNZ tsst-1 (122217803)
Page 18 of 26
Page 19 of 26
Disk ownership will be updated on all disks previously belonging to Filer with
sysid 122217803. 7-Mode only: A console message will be displayed for
Would you like to continue (y/n)? y Enter: y each disk changing ownership (System ID)
If the console messages stated that the giveback must be completed immediately, do not enter any other commands
STOP
on the partner node until "after" the disk ownership on the down node is verified and the giveback is completed.
A10 Continue with step 2 on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 20 of 26
IF the system reports: "Partner node must not be in Takeover mode during disk reassignment
from maintenance mode. Serious problems could result!!.
The system sees controller as a HA configuration.
CAUTION
(i) Confirm with the end-user that the partner did not takeover and continue with step B3 OR
(ii) IF the partner did takeover, enter the appropriate response to "Abort/Cancel" the disk reassignment and
then follow Procedure-A on previous page.
B2 IF Single Controller configuration, follow steps (i)-(ii). IF the partner did NOT takeover, skip to step B3.
(i) Enter: y to question "Would you like to continue (y/n)?"
(ii) Skip to step 2.
2 From the console port on "target" controller on which you replaced the MB (in maintenance mode):
a) Enter: disk show -s <old-sysID> No disks or V-Series LUNs should be listed as shown in console window below.
(The "-s old-sysID" was specified in the disk reassign step1)
IF any disks/V-LUNs are listed, a reservation may not have released, continue with step 2(b). IF no output, skip to step 3.
b) Not all disks re-assigned: Re-issue the disk reassign, from the node it was entered on, to see if the reservation releases.
Then repeat the disk show -s command in step 2(a). IF disks/V-LUNs are still listed in the output, call Support.
3 Continue with Section XIV on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 21 of 26
5 At the maintenance mode prompt: " * >", enter: halt to exit to LOADER-A|B.
2 Go to Section XVI, "Boot the Operating System - 'giveback' if applicable" on next page.
Processor Controller Module (PCM) Replacement for the FAS2040
For NetApp Authorized Service Engineers
Page 22 of 26
4 Login into the PARTNER node (7-Mode=root , C-Mode=admin ). Engage end-user for password.
5 Check takeover status by entering the appropriate command shown for the specified ONTAP Mode. If "partner not ready" may
have to wait 2-4 minutes for the NVRAMs to synchronize.
7-Mode Cluster-Mode
partner(takeover)> cf status cluster::> run local cf status
6 IF Pre-ONTAP 8.2 and 7-Mode: Ask the customer if there are any heavy NDMP, SnapMirror or SnapVault processes running. If
Yes, they should be disabled due to bug 489060. The procedure to disable the processes is here.
7 Enter the proper controller giveback command(s) based on the mode running as follows:
A giveback cannot be completed due to: "a failed disk" or "Open CIFS sessions" or "partner not ready" .
IF FAILED disk: Physically dis-engage the failed disk (Leave the disk in the slot till replacement is received).
NOTE IF Open CIFS sessions: Check with customer how to close out CIFS sessions. Terminating CIFS can cause loss of data.
IF partner "not ready": Wait 5 minutes for the NVMEMs to snychronize.
Giveback fails due to any other reason? contact NGS.
7-Mode Cluster-Mode
partner(takeover)> cf giveback cluster::> storage failover giveback -fromnode local
Page 23 of 26
a) IF Cluster-Mode: Confirm the "giveback" status of the storage, refer this doc > ONTAP 8 failover show
If the "giveback" is incomplete, wait 2 minutes and re-check. If still not complete after 10 minutes, contact Support.
NOTE
Do not proceed to next step if 'incomplete or partial giveback'!
10 IF Motherboard was Replaced and the partner printed out the following highlighted message after the "disk reassign" command
was executed, go to Step 11. If no message reported, skip to step 12 on next page.
disk reassign: A giveback must be done immediately following a reassign of partner disks.
After the partner node becomes operational, do a takeover and giveback of
this node to complete the disk reassign process.
IF this system has a partner controller, but the partner did not takeover, (Disks were assigned in maintenance mode) continue
STOP with step 10a (Ref Internal TSB-1209-02). Note, the console message above is not displayed when disks are reassigned in
maintenance mode.
a) Login to the "repaired node" (target) and re-enable "controller failover" using proper command syntax below (copy-and-
paste).
7-Mode (1st cmd is for 2-node clusters ONLY, 2nd cmd is for 3 or more node clusters)
target> cf enable cluster::> cluster ha modify -configured true
OR
cluster::> storage failover modify -node local -enabled true
11 From the "repaired node", execute a takeover using the proper command below to sync the sys-IDs.
a) Wait! 60 seconds for 7-Mode or 90 seconds for Cluster-Mode after takeover reports complete- Then check takeover
status by entering the appropriate command shown for the specified ONTAP Mode.
7-Mode Cluster-Mode
b) After the appropriate Wait period in step 11a) and the cf status reports: "Ready for giveback" , enter the proper
"giveback" command below. This is the final synchronization of the system-Ids across the HA pair.
7-Mode Cluster-Mode
7-Mode Cluster-Mode
target> cf status cluster::> storage failover show
Controller Failover enabled,
XYZ is up. Follow step (i).
(i) For ONTAP Cluster Mode, storage failover show should not show any "partial" givebacks. If there are, wait
another 60 seconds and recheck. Some large systems may take up to 10 minutes to complete.
Click > ONTAP 8 failover show to see examples of output. Issues? Call NGS.
12 IF Cluster-Mode: Follow steps (a-b) below, otherwise skip to Step 13.
a) From the clustershell on each node, enter the command below to list the logical interfaces that are not on their home
server and port.
Cluster-Mode
cluster::> net int show -is-home false
fas2050cl1-rtp> bmc setup Step 3): Enter 'bmc setup' to configure the BMC
Please enter the gratuitous ARP Interval for the BMC [10 sec (max 60)]:
XIX. FAS2040: Controller registration, Enable options, Submit logs and Part Return
Step Action Description
NOTE Service entitlements break when the MB is swapped because the new motherboard changes the system serial number.
1 Ask end-user if using "AutoSupport"? If YES, perform step 1(a). If NO, perform step 1(b).
a) ASUP system: Request end-user to send NetApp an ASUP Message from the target (repaired) node so the configuration
setup can be verified and the new system serial number can be registered by NGS. If the target system is not UP, send
ASUP from its partner. Use the corresponding command for the version of ONTAP running. Enter your dispatch's 7-digit
FSO number (begins with 5).
b) If ASUP is disabled: Call NGS CSR and provide the new MB serial number so they can register it as the new system s/n.
2 IF NDMP, SnapMirror or SnapVault options were disabled, enable them now. Refer to page 2 of doc > > here
3 Ask customer if using Operations Manager? If so, can they still access the controllers. If not, see bug > > 583160
4 C-Mode Only: Re-enable "auto-giveback" options if they were disabled on either node. C-Mode command here
5 Email the console log with the NetApp Reference Number in the Subject Line to [email protected]
6 Place the defective part in the antistatic bag and seal the box.
7 Follow the return shipping instructions on the box to ship the part(s) back to NetApp’s RMA processing center. If the
shipping label is missing see process to obtain a shipping label here Missing Shipping Label?
8 Verify with customer that the system is OK and if working with NGS ask them if it is OK to be released.
9 Close dispatch per Rules of Engagement.