0% found this document useful (0 votes)
46 views34 pages

Cisco UCS C-Series Servers Troubleshooting Guide: Americas Headquarters

Uploaded by

Dr-Puneet Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views34 pages

Cisco UCS C-Series Servers Troubleshooting Guide: Americas Headquarters

Uploaded by

Dr-Puneet Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Cisco UCS C-Series Servers Troubleshooting Guide

First Published: 2011-03-04


Last Modified: 2016-07-13

Americas Headquarters
Cisco Systems, Inc.
170 West Tasman Drive
San Jose, CA 95134-1706
USA
http://www.cisco.com
Tel: 408 526-4000
800 553-NETS (6387)
Fax: 408 527-0883
© 2016 Cisco Systems, Inc. All rights reserved.
CONTENTS

Preface Preface v
Audience v
Conventions v
Related Cisco UCS Documentation vii

CHAPTER 1 Introduction 1
Guidelines for Troubleshooting 1

CHAPTER 2 Troubleshooting Server Hardware or Software Issues 3


Troubleshooting Operating System and Drivers Installation 3
Troubleshooting Disk Drive and RAID Issues 7
Disk Drive/RAID Configuration Issues 7
Configuring Multiple (Redundant) RAID controllers 10
RHEL 5.4 64-bit Recommended Installation with RAID (C200) 10
DIMM Memory Issues 11
Memory Terms and Acronyms 11
Troubleshooting DIMM Errors 12
Correct Installation of DIMMs 12
Troubleshooting DIMM Errors Using Cisco IMC CLI 13
Troubleshooting DIMM errors using Cisco IMC GUI 14
Troubleshooting Degraded DIMM Errors 14
Troubleshooting Inoperable DIMMs Errors 15
Recommended Solutions for DIMM Issues 16
Troubleshooting Server and Memory Issues 17
Troubleshooting Communication Issues 18
“No Signal” on vKVM and Physical Video Connection 18

Cisco UCS C-Series Servers Troubleshooting Guide


iii
Contents

CHAPTER 3 Troubleshooting Utilities 19


Troubleshooting Problems with Host Upgrade Utility 19
Troubleshooting Problems with Cisco UCS Server Configuration Utility (SCU) 19

CHAPTER 4 Contacting Customer Support 21


Gathering Information Before Calling Support 21
Using the Cisco CIMC GUI to Export Technical Support Data 21
Using the Cisco CIMC GUI to Display SEL Events 23
Using Cisco IMC GUI to Display Sensor Readings 23
Using Cisco IMC GUI to Display CIMC Log 23
Using Command Line Interface (CLI) to Collect show-tech Details 24

Cisco UCS C-Series Servers Troubleshooting Guide


iv
Preface
This preface includes the following sections:

• Audience, page v
• Conventions, page v
• Related Cisco UCS Documentation, page vii

Audience
This guide is intended primarily for data center administrators with responsibilities and expertise in one or
more of the following:
• Server administration
• Storage administration
• Network administration
• Network security

Conventions
Text Type Indication
GUI elements GUI elements such as tab titles, area names, and field labels appear in this font.
Main titles such as window, dialog box, and wizard titles appear in this font.

Document titles Document titles appear in this font.

TUI elements In a Text-based User Interface, text the system displays appears in this font.

System output Terminal sessions and information that the system displays appear in this
font.

Cisco UCS C-Series Servers Troubleshooting Guide


v
Preface
Conventions

Text Type Indication


CLI commands CLI command keywords appear in this font.
Variables in a CLI command appear in this font.

[] Elements in square brackets are optional.

{x | y | z} Required alternative keywords are grouped in braces and separated by vertical


bars.

[x | y | z] Optional alternative keywords are grouped in brackets and separated by vertical


bars.

string A nonquoted set of characters. Do not use quotation marks around the string or
the string will include the quotation marks.

<> Nonprinting characters such as passwords are in angle brackets.

[] Default responses to system prompts are in square brackets.

!, # An exclamation point (!) or a pound sign (#) at the beginning of a line of code
indicates a comment line.

Note Means reader take note. Notes contain helpful suggestions or references to material not covered in the
document.

Tip Means the following information will help you solve a problem. The tips information might not be
troubleshooting or even an action, but could be useful information, similar to a Timesaver.

Timesaver Means the described action saves time. You can save time by performing the action described in the
paragraph.

Caution Means reader be careful. In this situation, you might perform an action that could result in equipment
damage or loss of data.

Cisco UCS C-Series Servers Troubleshooting Guide


vi
Preface
Related Cisco UCS Documentation

Warning IMPORTANT SAFETY INSTRUCTIONS


This warning symbol means danger. You are in a situation that could cause bodily injury. Before you
work on any equipment, be aware of the hazards involved with electrical circuitry and be familiar with
standard practices for preventing accidents. Use the statement number provided at the end of each warning
to locate its translation in the translated safety warnings that accompanied this device.
SAVE THESE INSTRUCTIONS

Related Cisco UCS Documentation


Documentation Roadmaps
For a complete list of all B-Series documentation, see the Cisco UCS B-Series Servers Documentation Roadmap
available at the following URL: http://www.cisco.com/go/unifiedcomputing/b-series-doc.
For a complete list of all C-Series documentation, see the Cisco UCS C-Series Servers Documentation Roadmap
available at the following URL: http://www.cisco.com/go/unifiedcomputing/c-series-doc.
For information on supported firmware versions and supported UCS Manager versions for the rack servers
that are integrated with the UCS Manager for management, refer to Release Bundle Contents for Cisco UCS
Software.

Other Documentation Resources


Follow Cisco UCS Docs on Twitter to receive document update notifications.

Cisco UCS C-Series Servers Troubleshooting Guide


vii
Preface
Related Cisco UCS Documentation

Cisco UCS C-Series Servers Troubleshooting Guide


viii
CHAPTER 1
Introduction
This chapter includes the following sections:

• Guidelines for Troubleshooting, page 1

Guidelines for Troubleshooting


When you troubleshoot issues with a C-Series Rack-Mount Server or any component in it, we recommend
that you follow the guidelines in the following table.

Guideline Description
Take screenshots of the fault or error message These screenshots provide visual cues about the state of
dialog box and other relevant areas. the C-Series server when the problem occurred. If your
computer does not have software to take screenshots,
check the documentation for your operating system, as it
may include this functionality.

Record the steps that you took directly before If you have access to screen or keystroke recording
the issue occurred. software, repeat the steps you took and record what occurs.
If you do not have access to this type of software, repeat
the steps you took and make detailed notes of the steps
and what happens after each step.

Enter the show tech-support command. The information about the current state of the server is
very helpful to the Cisco Technical Assistance Center
(TAC) and frequently provides the information needed to
identify the source of the problem.

Cisco UCS C-Series Servers Troubleshooting Guide


1
Introduction
Guidelines for Troubleshooting

Cisco UCS C-Series Servers Troubleshooting Guide


2
CHAPTER 2
Troubleshooting Server Hardware or Software
Issues
This chapter includes the following sections:

• Troubleshooting Operating System and Drivers Installation, page 3


• Troubleshooting Disk Drive and RAID Issues, page 7
• DIMM Memory Issues, page 11
• Troubleshooting Server and Memory Issues, page 17
• Troubleshooting Communication Issues, page 18

Troubleshooting Operating System and Drivers Installation


Table 1: Operating System and Driver Issues

Issue Recommended Solution

• Basic server configuration steps • For information on the correct server hardware
guides, see http://www.cisco.com/en/US/
• Steps for CIMC or BMC configuration
products/ps10493/prod_installation_guides_
• BIOS settings information list.html
• BIOS upgrade steps • For information on the correct server GUI and
CLI configuration guides, see http://
• CIMC or BMC firmware upgrade steps
www.cisco.com/en/US/products/ps10739/
products_installation_and_configuration_
guides_list.html

Cisco UCS C-Series Servers Troubleshooting Guide


3
Troubleshooting Server Hardware or Software Issues
Troubleshooting Operating System and Drivers Installation

The Windows 2003 R2 64-bit install is not starting


because the system is not seeing the install CD on the • Set the boot order in the BIOS so that the server
boots from the install CD.
C200 servers.
• Use this virtual media installation process as an
alternative installation process. If a list of drivers
are needed they are also available here: http://
www.cisco.com/en/US/docs/unified_computing/
ucs/c/sw/os/install/2003-vmedia-install.html

Slow performance (slow mouse and keyboard) on There is a known issue with Intel 82576 driver
C200 or C210 servers when running Windows 2008 included with Windows 2008 R2. Update to the latest
R2. Intel driver for this chipset at the following link:
https://downloadcenter.intel.com/product/32261/
Intel-82576-Gigabit-Ethernet-Controller

Installation of the Windows 2008 R2 OS failed with On the C200 server, Windows 2008 R2 install fails
error message: with the Intel Quad Port NIC. Start the install without
The computer restarted unexpectedly the NIC and put it in after the install is complete.
or encountered an unexpected error. Also, see this forum message: https://
Windows installation cannot proceed. supportforums.cisco.com/message/3179297

VMware ESX/ESXi on C200, C210, or C250 failed.


• The onboard NIC might be disabled or not
recognized. Check the BIOS to ensure the
onboard NICs are enabled.
• It is possible that the device ID of the Intel NIC
is wrong. Use the Host Upgrade Utility to
update the LOM firmware.
• Download the latest ISO of the SCU from
Cisco.com for the specific server.

Running Windows 2008 R2, Task Manager shows Go to this URL and update the drivers to the latest
multiple spikes. version: http://www.cisco.com/en/US/docs/
unified_computing/ucs/overview/guide/UCS_rack_
roadmap.html

The ESXi installation does not recognize the LOM


or NIC Ethernet ports. • Update when the LOM is used for ESXi.
• Update when add-on adapters are used for ESXi.

Cisco UCS C-Series Servers Troubleshooting Guide


4
Troubleshooting Server Hardware or Software Issues
Troubleshooting Operating System and Drivers Installation

The ESXi update does not recognize the NICs. Update the LOM firmware using the Cisco Host
Update Utility. Download the 1.2.x version from this
link: http://www.cisco.com/en/US/docs/
unified_computing/ucs/c/sw/lomug/install/
LOMUG.html
Download the 1.3.x version from this link: http://
www.cisco.com/en/US/docs/unified_computing/ucs/
c/sw/lomug/1.3.x/install/HUUUG.html

Unable to install older OS. Different C-Series servers support different versions
of OS. Use the following link to see matrix of
supported operating systems: http://www.cisco.com/
en/US/products/ps10477/prod_technical_reference_
list.html

Cannot upgrade BIOS on the system with no OS. Use the BIOS upgrade instructions for the HW
installation and service guide for their server. Go to:
http://www.cisco.com/en/US/products/ps10493/prod_
installation_guides_list.html

With ESXi installed on the drives, unable to boot Review the documentation at the following link: http:/
from the partition. /www.VMware.com

CIMC defaults to DHCP and will not retain the IP Review the documentation at the following link: http:/
address. /www.cisco.com/en/US/products/ps10739/products_
installation_and_configuration_guides_list.html

System becomes unresponsive during BIOS POST. When the system boots, if the system is hanging at
LSI, waiting for user input, follow the instructions on
the screen. Possible reasons would be:
• Battery HW missing or disabled. This warning
can be disabled by entering D to disable this
message during the next boot. This bypasses the
warning and the system will not hang for this
reason.
• The message could be about importing a foreign
configuration. A foreign configuration could be
imported by pressing F. An alternative
procedure is to enter the config utility (press
Ctrl+C) and enter the WebBIOS which is the
LSI config utility. Preview the foreign
configuration and decide if it should be
imported.

Cisco UCS C-Series Servers Troubleshooting Guide


5
Troubleshooting Server Hardware or Software Issues
Troubleshooting Operating System and Drivers Installation

Drives are not detected or the system hangs when the


adapter ROM for the ICH10R SATA Software RAID • ICH10R is SATA controller software embedded
in the motherboard on the C200 and C210
scans the SATA ports.
servers only. There is no adapter. It might not
see a SAS drive because it does not support SAS
drives. Only SATA drives are supported.
• The cable from the HDD backplane must be
connected to the motherboard to use ICH10R.

The drives are not detected or the system hangs when


the adapter ROM for the LSI RAID Controller scans • ICH10R is SATA controller software embedded
in the motherboard on the C200 and C210
the SAS/SATA Drives.
servers only. There is no adapter. It might not
see a SAS drive because it does not support SAS
drives. Only SATA drives are supported.
• The onboard ICH10R controller is not
compatible for use with VMware software."
They must use an add-on controller card in this
case.
• The cable from the HDD backplane must be
connected to the motherboard to use ICH10R.
• Make sure all the drives are plugged in properly
(reseat the drives if needed).

The Operating System does not boot.


• Make sure that the correct virtual drive on which
the OS is installed is selected in the LSI
WebBIOS. Do this by entering the LSI
WebBIOS using Ctrl+H during system boot
up. In the LSI WebBIOS menu, navigate to the
virtual drive menu and get a list of the virtual
drives. Choose the virtual drive as the boot drive
by selecting it.
• Make sure that you have properly selected the
boot device in the system BIOS setup by
pressing F2. Navigate to the boot devices screen
and make sure the LSI RAID controller appears
before all of the other bootable devices attached
to the server. We recommend that this be the
third bootable device in the list.

Cisco UCS C-Series Servers Troubleshooting Guide


6
Troubleshooting Server Hardware or Software Issues
Troubleshooting Disk Drive and RAID Issues

Troubleshooting Disk Drive and RAID Issues


Disk Drive/RAID Configuration Issues
Table 2: RAID Configuration Issues

Issue Recommended Solution

Windows does not detect hard drives. LSI drivers may not be bundled with the Windows
OS version being installed. These drivers must be
installed during the installation process. During the
install process, if the hard drives fail to be detected,
use the load driver option to point the drives to the
correct drivers for the LSI controller in the system.
The drivers can be loaded using a USB drive. When
loaded, the hard drives are displayed and the hard
drive for the OS can be selected.

Installing Windows 2008 64-bit and RAID controller LSI drivers are not bundled in Windows 2008 64-bit.
had issues. These must be installed during the installation
process. During the install process, if the hard drives
fail to be detected, use the load driver option to point
the drives to the correct drivers for the LSI controller
in the system. The drivers can be loaded using a USB
drive. When loaded, the hard drives are displayed and
the hard drive for the OS can be selected.

Unable to install ESX on server with only the onboard The LSI hardware RAID controller is required.
controller.

Cisco UCS C-Series Servers Troubleshooting Guide


7
Troubleshooting Server Hardware or Software Issues
Disk Drive/RAID Configuration Issues

• Unable to see the LSI RAID controller in the • During the BIOS POST, the LSI option ROM
BOOT environment. should be displayed. The LSI RAID controller
can be configured using Ctrl+H to create virtual
• Unable to access the onboard RAID controller.
drives. When configured, the BIOS should list
the RAID controller in the boot device menu.
To verify, enter the BIOS POST menu by
pressing F2. Confirm that the LSI RAID
controller is listed in the boot device menu.
• If, after completing the above process, the LSI
RAID card is not detected, power off the system
and reseat the LSI card. Make sure that the
cables are connected to the backplane and then
follow the above procedure to verify that the
LSI card is seen in the BIOS Setup menu.
• If reseating the card does not solve the problem,
replace the LSI controller (the card could be
bad) and verify if this card is seen during BIOS
POST.

VMware does not show the local drive during VMware supports a maximum of two TB partitions
installation. sizes. Resize the partition to not exceed the 2TB
partition size limitation.

The RAID controller card is not working. Verify that the card installed is supported for this
server. If supported, follow the steps listed in Unable
to see LSI RAID controller in BOOT environment.
(above).

Cisco UCS C-Series Servers Troubleshooting Guide


8
Troubleshooting Server Hardware or Software Issues
Disk Drive/RAID Configuration Issues

Problem with setup of the RAID6 virtual device and


installation of Windows 2003 X64. • When the system boots up and the LSI Option
ROM screen displays, press Ctrl+H to enter the
LSI option ROM screen.
• Choose the Configuration Wizard and follow
the instructions to configure the RAID 6 array
group. (RAID 6 needs a minimum of three
drives.) Once RAID 6 is created, initialize the
virtual drives (full initialization) on which the
OS is to be installed.
• After the virtual drive is initialized, the virtual
drive on which the OS is to be installed must
be set as the boot drive.
• Go to the virtual drive menu and choose the
virtual drive number and click Set Virtual
drive. This is very important because Windows
will report an error message during install if this
is not set.
• When the Windows 2003 installation is started,
follow the instructions on the screen to install
the LSI controller drivers on Win2003. The LSI
drivers need to be copied on a floppy disc and
the floppy drive connected to the server. During
install, press F6 to install the drivers. This is a
very important step to follow for Windows LSI
driver installation. This will ensure that the LSI
virtual drive is seen during the install process.

Unable to see HDD.


• If not able to see the LSI controller during
system boot up, follow the instructions in
Unable to see LSI controller (above) to ensure
the LSI controller is seen during BIOS bootup.
• If the LSI controller does not see the hard
drives, ensure they are properly plugged in and
making contact and that the green LED is
visible. If still not seen, insert a different HDD
(in case of a bad HDD).
• Note that the BIOS will not see the physical
drives plugged in the boot device menu. It will
only display the RAID controller which points
to the virtual drive (set as the boot virtual disk).
Make sure to configure the virtual drives using
the LSI WebBIOS to ensure the RAID controller
is seen in the boot device menu of the BIOS
setup.

Cisco UCS C-Series Servers Troubleshooting Guide


9
Troubleshooting Server Hardware or Software Issues
Configuring Multiple (Redundant) RAID controllers

Problem setting up the RAID configuration.


• During system boot, enter the WebBIOS by
pressing Ctrl+H. Use the Configuration Wizard
and follow the screen instructions to create the
RAID configurations.
• Check the BIOS and CIMC version and upgrade
to the latest version. Get the upgrade software
at the following link: http://www.cisco.com/
cisco/software/navigator.html

Configuring Multiple (Redundant) RAID controllers


Cisco does not support multiple (redundant) RAID controllers that automatically fail over if one RAID
controllers fails. It is possible to recover from a RAID controller failure. Install a new RAID card of the same
type and model.
Configuration data about a RAID array is stored inside the disks being managed by the controller. A new
controller can import those configurations from disks to restore proper RAID operation. Each disk has its own
copy of the metadata. If there are 16 disks in an array, each disk can contain its own copy of the metadata.
Detailed steps are available in the LSI document 80-00156-01_RevH_SAS_SW_UG.pdf.
This document is available from the Documents & Downloads section of the LSI support site at this URL:
http://www.lsi.com
When configuring the RAID card for the first time, the step “Import foreign config” in the file provides details
on how to import the RAID configuration from previously configured disks.

RHEL 5.4 64-bit Recommended Installation with RAID (C200)


To ensure that the RAID drives are properly recognized, complete the following steps:

Procedure

Step 1 Follow the normal installation process of RHEL 5.4 i386 from the ISO or DVD.
Step 2 At the prompt, enter the command:
boot: linux dd noprobe=ata1 noprobe=ata2 noprobe=ata3 noprobe=ata4

Step 3 Mount the megaraid driver and map it from the virtual media. The .img file is emulated as a floppy. The file
Drivers\Linux\Storage\Intel\ICH10R\RHEL\RHEL5.4 is also on the driver CD available on CCO and the
path from the root.
Step 4 At the “before installation starts” step, the system will ask whether you want to add any additional drivers.
Step 5 Provide the drivers (usually the mapped file will be /dev/sdb, because it is a floppy).
Step 6 Continue the installation.
Step 7 When the system looks for storage, it should list the RAID as “LSI MegaSR”.

Cisco UCS C-Series Servers Troubleshooting Guide


10
Troubleshooting Server Hardware or Software Issues
DIMM Memory Issues

DIMM Memory Issues


Types of DIMM Errors
Cisco UCS Servers can detect and report correctable and uncorrectable DIMM errors.
Correctable DIMM Errors
DIMMs with correctable errors are not disabled and are available for the OS to use. The total memory
and effective memory are the same (memory mirroring is taken into account). These correctable errors
are reported in Cisco IMC as degraded once they exceed pre-determined error thresholds.

Uncorrectable DIMM Errors


Uncorrectable errors generally cannot be fixed, and may make it impossible for the application or
operating system to continue execution. The DIMMs with uncorrectable error will be disabled if DIMM
blacklisting is enabled or if the DIMM fails upon reboot during BIOS POST and OS will not see that
memory. Cisco IMC operState will be inoperable for this DIMM in this case.

A problem with the DIMM memory can cause a server to fail to boot or cause the server to run below its
capabilities. If DIMM issues are suspected, consider the following:
• DIMMs tested, qualified, and sold by Cisco are the only DIMMs supported on your system. Third-party
DIMMs are not supported, and if they are present, Cisco technical support will ask you to replace them
with Cisco DIMMs before continuing to troubleshoot a problem.
• Check if the malfunctioning DIMM is supported on that model of server. Refer to the server’s installation
guide and technical specifications to verify whether you are using the correct combination of server,
CPU and DIMMs.
• Check if the malfunctioning DIMM seated correctly in the slot. Remove and reseat the DIMMs.
• All Cisco servers have either a required or recommended order for installing DIMMs. Refer to the
server’s installation guide and technical specifications to verify that you are adding the DIMMs
appropriately for a given server type.
• If the replacement DIMMs have a maximum speed lower than those previously installed, all DIMMs in
a server run at the slower speed or not work at all. All of the DIMMs in a server should be of the same
type. All of the DIMMs in a server should be of the same type for optimal performance.
• The number and size of DIMMs should be the same for all CPUs in a server. Mismatching DIMM
configurations can degrade system performance.

Memory Terms and Acronyms


Table 3: Memory Terms and Acronyms

Acronym Meaning

DIMM Dual In-line Memory Module

Cisco UCS C-Series Servers Troubleshooting Guide


11
Troubleshooting Server Hardware or Software Issues
Troubleshooting DIMM Errors

DRAM Dynamic Random Access Memory

ECC Error Correction Code

LVDIMM Low voltage DIMM

MCA Machine Check Architecture

MEMBIST Memory Built-In Self Test

MRC Memory Reference Code

POST Power On Self Test

SPD Serial Presence Detect

DDR Double Data Rate

CAS Column Address Strobe

RAS Row Address Strobe

Troubleshooting DIMM Errors

Correct Installation of DIMMs


Verify that the DIMMs are installed correctly.
In the first example in the following figure, a DIMM is correctly inserted and latched. Unless there is a small
bit of dust blocking one of the contacts, this DIMM should function correctly. The second example shows a
DIMM that is mismatched with the key for its slot. That DIMM cannot be inserted in this orientation and
must be rotated to fit into the slot. In the third example, the left side of the DIMM seems to be correctly seated
and the latch is fully connected, but the right side is just barely touching the slot and the latch is not seated

Cisco UCS C-Series Servers Troubleshooting Guide


12
Troubleshooting Server Hardware or Software Issues
Troubleshooting DIMM Errors

into the notch on the DIMM. In the fourth example, the left side is again fully inserted and seated, and the
right side is partially inserted and incompletely latched.

Figure 1: Installation of DIMMs

Troubleshooting DIMM Errors Using Cisco IMC CLI


You can check memory information to identify possible DIMM errors in the Cisco IMC CLI.

Procedure

Command or Action Purpose


Step 1 Server# scope chassis Enters chassis command mode.

Step 2 Server /chassis # show dimm [detail] Displays memory properties.

Cisco UCS C-Series Servers Troubleshooting Guide


13
Troubleshooting Server Hardware or Software Issues
Troubleshooting DIMM Errors

The following example shows how to check memory information using the Cisco IMC CLI:

Server# scope chassis


Server /chassis# show dimm detail

Name DIMM_A1:
Capacity: Failed
Channel Speed (MHz): NA
Channel Type: NA
Memory Type Detail: NA
Bank Locator: NA
Visibility: NA
Operability: NA
Manufacturer: NA
Part Number: NA
Serial Number: NA
Asset Tag: NA
Data Width: NA
Name DIMM_A2:
Capacity: Not Installed
Channel Speed (MHz): NA
Channel Type: NA
Memory Type Detail: NA
Bank Locator: NA
Visibility: NA
Operability: NA
Manufacturer: NA
Part Number: NA
Serial Number: NA
Asset Tag: NA
Data Width: NA
...

Troubleshooting DIMM errors using Cisco IMC GUI


You can determine the type of DIMM errors being experienced using the Cisco IMC GUI.

Procedure

Step 1 In the Navigation pane, click the Server tab.


Step 2 On the Server tab, click Inventory.
Step 3 In the Inventory pane, click the Memory tab.
Step 4 In the Memory Summary area, review the summary information about memory.
A list of DIMMs are displayed. Corrupt or bad DIMMs are displayed as Failed.
Step 5 Replace the corrupt or bad DIMM with a good DIMM.

Troubleshooting Degraded DIMM Errors


DIMMs with correctable errors are not disabled and are available for the OS to use. The total memory and
effective memory are the same (memory mirroring is taken into account). These correctable errors are reported
in Cisco IMC as degraded.
If you see a correctable error reported in Cisco IMC, the problem can be corrected by resetting the BMC.
Resetting the BMC just hides the DIMM with correctable error. However, to troubleshoot the DIMM physically,
see Troubleshooting Inoperable DIMMs Errors, on page 15

Cisco UCS C-Series Servers Troubleshooting Guide


14
Troubleshooting Server Hardware or Software Issues
Troubleshooting DIMM Errors

Use the following Cisco IMC CLI commands to reset BMC:

Procedure

Command or Action Purpose


Step 1 Server # scope chassis Enters chassis configuration mode.

Step 2 Server /chassis # show dimm Displays if there are any correctable DIMMs.
Correctable DIMMs display capacity as Failed. Clear
the DIMM error flag by running the error correctable
code (ECC) command.

Step 3 Server /chassis # scope reset-ecc Enters error correctable code configuration mode.

Step 4 Server /chassis/reset-ecc # set enabled Enables ECC.


yes
Step 5 Server /chassis/reset-ecc * # commit Commits the transaction to the system configuration.

The following example shows how to view and reset the DIMM error flag:
Server/ scope chassis
Server /chassis # show dimm
Name Capacity Channel Speed (MHz) Channel Type
-------------------- --------------- ------------------- ---------------
DIMM_A1 Failed NA NA
DIMM_A2 Ignored/Disa... NA NA
DIMM_B1 16384 MB 1866 DDR3
DIMM_B2 16384 MB 1866 DDR3
DIMM_C1 16384 MB 1866 DDR3
DIMM_C2 16384 MB 1866 DDR3
DIMM_D1 16384 MB 1866 DDR3
DIMM_D2 16384 MB 1866 DDR3
DIMM_E1 16384 MB 1866 DDR3
DIMM_E2 16384 MB 1866 DDR3
DIMM_F1 16384 MB 1866 DDR3
DIMM_F2 16384 MB 1866 DDR3
DIMM_G1 16384 MB 1866 DDR3
DIMM_G2 16384 MB 1866 DDR3
DIMM_H1 16384 MB 1866 DDR3
DIMM_H2 16384 MB 1866 DDR3

Clear DIMM Error flag:


Server/chassis# top
Server/chassis# scope reset-ecc
Server/chassis /reset-ecc # set enabled yes
Server/chassis /reset-ecc *# commit

Troubleshooting Inoperable DIMMs Errors


DIMMs with uncorrectable errors are disabled and the OS on the server does not see that memory. If a DIMM
or DIMMs fail while the system is up, the OS could crash unexpectedly. Cisco IMC shows the DIMMs as
inoperable in the case of uncorrectable DIMM errors. These errors are not correctable using the software. You
can identify a bad DIMM and remove it to allow the server to boot. For example, the BIOS fails to pass the
POST due to one or more bad DIMMs.

Cisco UCS C-Series Servers Troubleshooting Guide


15
Troubleshooting Server Hardware or Software Issues
Troubleshooting DIMM Errors

To view and identify a bad DIMM using the Cisco IMC GUI, see Troubleshooting DIMM errors using Cisco
IMC GUI, on page 14

Procedure

Step 1 Remove the inoperable DIMM from the system.


Step 2 Install a single DIMM (preferably a tested good DIMM) or a DIMM pair in the first usable slot for the first
processor (minimum requirement for POST success).
Step 3 Re-attempt to boot the system.
Step 4 If the BIOS POST is still unsuccessful, repeat steps 1 through 3 using a different DIMM in step 2.
Step 5 If the BIOS POST is successful, continue adding memory. Follow the population rules for that server model.
If the system can successfully pass the BIOS POST in some memory configurations but not others, use that
information to help isolate the source of the problem.

Recommended Solutions for DIMM Issues


The following table lists guidelines and recommended solutions for troubleshooting DIMM issues.

Table 4: DIMM Issues

Issue Recommended Solution


DIMM is not recognized. Verify that the DIMM is in a slot that supports an active
CPU.
Verify that the DIMM is sourced from Cisco. Third-party
memory is not supported in Cisco UCS.

DIMM does not fit in slot. Verify that the DIMM is supported on that server model.
Verify that the DIMM is oriented correctly in the slot.
DIMMs and their slots are keyed and only seat in one of
the two possible orientations.

The DIMM is reported as bad in the SEL, POST, Verify that the DIMM is supported on that server model.
or LEDs, or the DIMM is reported as inoperable Verify that the DIMM is populated in its slot according
in Cisco IMC. to the population rules for that server model.
Verify that the DIMM is seated fully and correctly in its
slot. Reseat it to assure a good contact and rerun POST.
Verify that the DIMM is the problem by trying it in a slot
that is known to be functioning correctly.
Verify that the slot for the DIMM is not damaged by trying
a DIMM that is known to be functioning correctly in the
slot.
Reset the BMC.

Cisco UCS C-Series Servers Troubleshooting Guide


16
Troubleshooting Server Hardware or Software Issues
Troubleshooting Server and Memory Issues

Issue Recommended Solution


The DIMM is reported as degraded in the GUI Reset the BMC.
or CLI, or is running slower than expected. Reseat the rack server in the chassis.

The DIMM is reported as overheating. Verify that the DIMM is seated fully and correctly in its
slot. Reseat it to assure a good contact and rerun POST.
Verify that all empty HDD bays, server slots, and power
supply bays use blanking covers to assure that the air is
flowing as designed.
Verify that the server air baffles are installed to assure that
the air is flowing as designed.
Verify that any needed CPU air blockers are installed to
assure that the air is flowing as designed.

Troubleshooting Server and Memory Issues


Table 5: Server and Memory Issues

Issue Recommended Solution

Server Related Issues


Every several days, the server requires a hard boot.
• For instructions on updating
the BIOS, go to: http://
www.cisco.com/en/US/
products/ps10493/prod_
installation_guides_list.html
• For CIMC upgrade
instructions in the GUI or
CLI configuration guides for
the correct FW release, go to:
http://www.cisco.com/en/US/
products/ps10739/products_
installation_and_
configuration_guides_
list.html

Host is unreachable via IP, the CIMC works but KVM shows a blank Upgrade the CIMC firmware and
screen. BIOS.

Memory Configuration Issues


Memory fault LED is amber on a new server. Upgrade the CIMC and BIOS.

Cisco UCS C-Series Servers Troubleshooting Guide


17
Troubleshooting Server Hardware or Software Issues
Troubleshooting Communication Issues

Memory errors on a previously working server.


• Replace any DIMM with a
reported error.
• Upgrade the BIOS.

Troubleshooting Communication Issues


“No Signal” on vKVM and Physical Video Connection
If immediately at boot you receive a “No Signal” message from the vKVM and physical video connection,
the PCI riser card might not be properly seated to the motherboard. To resolve the issue, complete these steps:

Procedure

Step 1 Power off the server and disconnect the power cord.
Step 2 Confirm that all cards are properly seated.
Step 3 Connect the power cord and power on the server.

Cisco UCS C-Series Servers Troubleshooting Guide


18
CHAPTER 3
Troubleshooting Utilities
This chapter includes the following sections:

• Troubleshooting Problems with Host Upgrade Utility, page 19


• Troubleshooting Problems with Cisco UCS Server Configuration Utility (SCU), page 19

Troubleshooting Problems with Host Upgrade Utility


The following table elaborates on problems that you might encounter with Host Upgrade Utility.

While upgrading firmware, the Host Upgrade Utility Press Return to exit the utility.
screen might freeze or black out.

Troubleshooting Problems with Cisco UCS Server Configuration


Utility (SCU)
The following table elaborates on the problems that you could face while working with Cisco UCS SCU, and
the recommended solutions.

Issue Recommended Solution


SCU displays the following error message even
when a virtual USB device is mapped or when a • For USB devices that are mapped through Vmedia,
use the USB reset option from the Vmedia user
physical USB device is connected:
interface. (Virtual Media Session > Details > USB
No USB Disk on Key detected Reset)
• For USB devices that are physically connected,
check the vendor or product information. Try
connecting a different USB device.

Cisco UCS C-Series Servers Troubleshooting Guide


19
Troubleshooting Utilities
Troubleshooting Problems with Cisco UCS Server Configuration Utility (SCU)

Issue Recommended Solution


After installing Microsoft Windows operating Determine the version of the CIMC running in your
systems, the KVM mouse does not work. The environment. Older versions of the CIMC would cause
Windows Device Manager displays a yellow this issue.
bang for the USB human interface device.

After the RAID configuration process is


complete, the new disks that are created are not
updated in the Inventory data.

Network tests in the Diagnostic tool run on only


Broadcom and Intel cards

Installing Microsoft Windows 2008 fails and the This problem occurs when the EFI CDROM device for
following error message is displayed: virtual drives is used to boot the Windows 2008 image.
Selected disk has MBR partition table.
Use the CDROM device from BIOS ‘CDROM order’.
On EFI systems, Windows can only be
installed to GPT disks.

After installing a Microsoft Windows operating This problem occurs because of:
system using UCS SCU, the Windows Device
Manager shows some devices with a yellow • The devices are not supported by Cisco UCS SCU.
bang. • You have not selected the device drivers in the SCU
GUI.

The Windows set up fails and the following error This error is displayed when you have not selected a device
message is displayed: driver for a boot controller in the user interface.
Inaccessible boot device

Installation of the RHEL 6 is interrupted and the This error is displayed when the DHCP option is selected
following error message is displayed: during the installation, and DHCP does not provide an IP
address.

Cisco UCS C-Series Servers Troubleshooting Guide


20
CHAPTER 4
Contacting Customer Support
This chapter includes the following sections:

• Gathering Information Before Calling Support, page 21


• Using the Cisco CIMC GUI to Export Technical Support Data, page 21
• Using the Cisco CIMC GUI to Display SEL Events, page 23
• Using Cisco IMC GUI to Display Sensor Readings, page 23
• Using Cisco IMC GUI to Display CIMC Log , page 23
• Using Command Line Interface (CLI) to Collect show-tech Details, page 24

Gathering Information Before Calling Support


If you cannot isolate the issue to a particular component, consider the following questions. They can be helpful
when you contact the Cisco Technical Assistance Center (TAC).
• Was the server working before the problem occurred?
• Was this a newly installed server?
• Was this server installed onsite or did it arrive assembled from Cisco?
• Has the memory been reseated?
• Was the server powered off or moved from one location to another?
• Have there been any recent hardware or software upgrades? If so, list them.

When contacting Cisco TAC for issues, you should always capture the tech-support output from the Cisco
CIMC CLI or the Technical Support Data from the Cisco CIMC GUI.

Using the Cisco CIMC GUI to Export Technical Support Data


You can generate a summary report that contains configuration information, logs, and diagnostics from the
Cisco CIMC GUI.

Cisco UCS C-Series Servers Troubleshooting Guide


21
Contacting Customer Support
Using the Cisco CIMC GUI to Export Technical Support Data

To generate a summary report, follow these steps:

Procedure

Step 1 In the Navigation pane, click the Admin tab.


Step 2 On the Admin tab, click Utilities.
Step 3 In the Actions area of the Utilities pane, click Export Technical Support Data.
Step 4 In the Export Technical Support Data dialog box, complete these fields:
Name Description
Export Technical Support Data The remote server type. This can be one of the following:
to drop-down list
• TFTP Server
• FTP Server
• SFTP Server
• SCP Server
• HTTP Server

Note If you chose SCP or SFTP as the remote server type while
performing this action, a pop-up window is displayed with the
message Server (RSA) key fingerprint is <server_finger_print
_ID> Do you wish to continue?. Click Yes or No depending
on the authenticity of the server fingerprint.
The fingerprint is based on the host's public key and helps you
to identify or verify the host you are connecting to.
Server IP/Hostname field The IP address or hostname of the server on which the support data file
should be stored. Depending on the setting in the Export Technical
Support Data to drop-down list, the name of the field may vary.

Path and Filename field The path and filename should use when exporting the file to the remote
server.
Note If the server includes any of the supported network adapter
cards, the data file also includes technical support data from
the adapter card.
Username The username the system should use to log in to the remote server. This
field does not apply if the protocol is TFTP or HTTP.

Password The password for the remote server username. This field does not apply
if the protocol is TFTP or HTTP.

Step 5 Click Export.

Cisco UCS C-Series Servers Troubleshooting Guide


22
Contacting Customer Support
Using the Cisco CIMC GUI to Display SEL Events

Using the Cisco CIMC GUI to Display SEL Events


To display the System Event Log (SEL) events, follow these steps:

Procedure

Step 1 In the Navigation pane, click the Server tab.


Step 2 On the Server tab, click System Event Log.
Step 3 To review the information for each event in the log, navigate the log using these options:
• From the Entries Per Page drop-down list, choose the number of system events to display on each page.
• Click <Newer or Older> to move through the pages, or click <<Newest to move to the top of the list.
By default, the newest system events are displayed at the top of the list.

Using Cisco IMC GUI to Display Sensor Readings


On the Cisco IMC GUI, complete these steps to display the sensor readings:

Procedure

Step 1 In the Navigation pane, click the Server tab.


Step 2 On the Server tab, click Sensors.
Step 3 View various sensors by clicking the desired sensor.

Using Cisco IMC GUI to Display CIMC Log


On the Cisco IMC GUI, complete these steps to view the CIMC log:

Procedure

Step 1 In the Navigation pane, click the Admin tab


Step 2 On the Admin tab, click CIMC Log.
Step 3 On the Entries Per Page drop-down list, select the number of CIMC events to display on each page.

Cisco UCS C-Series Servers Troubleshooting Guide


23
Contacting Customer Support
Using Command Line Interface (CLI) to Collect show-tech Details

Using Command Line Interface (CLI) to Collect show-tech Details


On the CLI enter:
~ # scope cimc
~ /cimc # scope
firmware
log
network
tech-support
~ /cimc # scope tech-support
~ /cimc/tech-support # set tftp-ip 192.168.1.1
~ /cimc/tech-support *# set path \techsupport\showtech
~ /cimc/tech-support *# commit
~ /cimc/tech-support *# start
These are descriptions of some of the key fields within the show-tech command:
• var/—Contains detailed logs, and status of all monitored services. It also contains services information
files such as the configuration of SOL and IPMI sensor alarms.
• var/log—Contains the rolling volatile log messages.
• obfl/—Contains the rolling non-volatile log messages.
• met/—Non-volatile configuration and SEL.
• mp/—The show tech-support text files, along with BIOS tech-support text files. The text files contain
all process, network, system, mezzanine, and BIOS state information.
• mctool—Gets basic information on the state of the CIMC.
• network—Gets current network configuration and socket information.
• obfl—Gets live obfl
• messages—Gets live /var/log/messages file
• alarms—Lists sensors in alarm states.
• sensors—Current sensor readings from IPMI.
• power—Current power state of the x86.

Cisco UCS C-Series Servers Troubleshooting Guide


24
INDEX

C G
Correctable DIMM errors 14 guidelines 1
troubleshooting 14 troubleshooting 1

D I
DIMM errors 14, 15 inoperable DIMM error 15
inoperable 15 troubleshooting 15
troubleshooting 15
DIMM issues 16
recommended solutions 16

Cisco UCS C-Series Servers Troubleshooting Guide


IN-1
Index

Cisco UCS C-Series Servers Troubleshooting Guide


IN-2

You might also like