Cisco UCS C-Series Servers Troubleshooting Guide: Americas Headquarters
Cisco UCS C-Series Servers Troubleshooting Guide: Americas Headquarters
Americas Headquarters
Cisco Systems, Inc.
170 West Tasman Drive
San Jose, CA 95134-1706
USA
http://www.cisco.com
Tel: 408 526-4000
800 553-NETS (6387)
Fax: 408 527-0883
© 2016 Cisco Systems, Inc. All rights reserved.
CONTENTS
Preface Preface v
Audience v
Conventions v
Related Cisco UCS Documentation vii
CHAPTER 1 Introduction 1
Guidelines for Troubleshooting 1
• Audience, page v
• Conventions, page v
• Related Cisco UCS Documentation, page vii
Audience
This guide is intended primarily for data center administrators with responsibilities and expertise in one or
more of the following:
• Server administration
• Storage administration
• Network administration
• Network security
Conventions
Text Type Indication
GUI elements GUI elements such as tab titles, area names, and field labels appear in this font.
Main titles such as window, dialog box, and wizard titles appear in this font.
TUI elements In a Text-based User Interface, text the system displays appears in this font.
System output Terminal sessions and information that the system displays appear in this
font.
string A nonquoted set of characters. Do not use quotation marks around the string or
the string will include the quotation marks.
!, # An exclamation point (!) or a pound sign (#) at the beginning of a line of code
indicates a comment line.
Note Means reader take note. Notes contain helpful suggestions or references to material not covered in the
document.
Tip Means the following information will help you solve a problem. The tips information might not be
troubleshooting or even an action, but could be useful information, similar to a Timesaver.
Timesaver Means the described action saves time. You can save time by performing the action described in the
paragraph.
Caution Means reader be careful. In this situation, you might perform an action that could result in equipment
damage or loss of data.
Guideline Description
Take screenshots of the fault or error message These screenshots provide visual cues about the state of
dialog box and other relevant areas. the C-Series server when the problem occurred. If your
computer does not have software to take screenshots,
check the documentation for your operating system, as it
may include this functionality.
Record the steps that you took directly before If you have access to screen or keystroke recording
the issue occurred. software, repeat the steps you took and record what occurs.
If you do not have access to this type of software, repeat
the steps you took and make detailed notes of the steps
and what happens after each step.
Enter the show tech-support command. The information about the current state of the server is
very helpful to the Cisco Technical Assistance Center
(TAC) and frequently provides the information needed to
identify the source of the problem.
• Basic server configuration steps • For information on the correct server hardware
guides, see http://www.cisco.com/en/US/
• Steps for CIMC or BMC configuration
products/ps10493/prod_installation_guides_
• BIOS settings information list.html
• BIOS upgrade steps • For information on the correct server GUI and
CLI configuration guides, see http://
• CIMC or BMC firmware upgrade steps
www.cisco.com/en/US/products/ps10739/
products_installation_and_configuration_
guides_list.html
Slow performance (slow mouse and keyboard) on There is a known issue with Intel 82576 driver
C200 or C210 servers when running Windows 2008 included with Windows 2008 R2. Update to the latest
R2. Intel driver for this chipset at the following link:
https://downloadcenter.intel.com/product/32261/
Intel-82576-Gigabit-Ethernet-Controller
Installation of the Windows 2008 R2 OS failed with On the C200 server, Windows 2008 R2 install fails
error message: with the Intel Quad Port NIC. Start the install without
The computer restarted unexpectedly the NIC and put it in after the install is complete.
or encountered an unexpected error. Also, see this forum message: https://
Windows installation cannot proceed. supportforums.cisco.com/message/3179297
Running Windows 2008 R2, Task Manager shows Go to this URL and update the drivers to the latest
multiple spikes. version: http://www.cisco.com/en/US/docs/
unified_computing/ucs/overview/guide/UCS_rack_
roadmap.html
The ESXi update does not recognize the NICs. Update the LOM firmware using the Cisco Host
Update Utility. Download the 1.2.x version from this
link: http://www.cisco.com/en/US/docs/
unified_computing/ucs/c/sw/lomug/install/
LOMUG.html
Download the 1.3.x version from this link: http://
www.cisco.com/en/US/docs/unified_computing/ucs/
c/sw/lomug/1.3.x/install/HUUUG.html
Unable to install older OS. Different C-Series servers support different versions
of OS. Use the following link to see matrix of
supported operating systems: http://www.cisco.com/
en/US/products/ps10477/prod_technical_reference_
list.html
Cannot upgrade BIOS on the system with no OS. Use the BIOS upgrade instructions for the HW
installation and service guide for their server. Go to:
http://www.cisco.com/en/US/products/ps10493/prod_
installation_guides_list.html
With ESXi installed on the drives, unable to boot Review the documentation at the following link: http:/
from the partition. /www.VMware.com
CIMC defaults to DHCP and will not retain the IP Review the documentation at the following link: http:/
address. /www.cisco.com/en/US/products/ps10739/products_
installation_and_configuration_guides_list.html
System becomes unresponsive during BIOS POST. When the system boots, if the system is hanging at
LSI, waiting for user input, follow the instructions on
the screen. Possible reasons would be:
• Battery HW missing or disabled. This warning
can be disabled by entering D to disable this
message during the next boot. This bypasses the
warning and the system will not hang for this
reason.
• The message could be about importing a foreign
configuration. A foreign configuration could be
imported by pressing F. An alternative
procedure is to enter the config utility (press
Ctrl+C) and enter the WebBIOS which is the
LSI config utility. Preview the foreign
configuration and decide if it should be
imported.
Windows does not detect hard drives. LSI drivers may not be bundled with the Windows
OS version being installed. These drivers must be
installed during the installation process. During the
install process, if the hard drives fail to be detected,
use the load driver option to point the drives to the
correct drivers for the LSI controller in the system.
The drivers can be loaded using a USB drive. When
loaded, the hard drives are displayed and the hard
drive for the OS can be selected.
Installing Windows 2008 64-bit and RAID controller LSI drivers are not bundled in Windows 2008 64-bit.
had issues. These must be installed during the installation
process. During the install process, if the hard drives
fail to be detected, use the load driver option to point
the drives to the correct drivers for the LSI controller
in the system. The drivers can be loaded using a USB
drive. When loaded, the hard drives are displayed and
the hard drive for the OS can be selected.
Unable to install ESX on server with only the onboard The LSI hardware RAID controller is required.
controller.
• Unable to see the LSI RAID controller in the • During the BIOS POST, the LSI option ROM
BOOT environment. should be displayed. The LSI RAID controller
can be configured using Ctrl+H to create virtual
• Unable to access the onboard RAID controller.
drives. When configured, the BIOS should list
the RAID controller in the boot device menu.
To verify, enter the BIOS POST menu by
pressing F2. Confirm that the LSI RAID
controller is listed in the boot device menu.
• If, after completing the above process, the LSI
RAID card is not detected, power off the system
and reseat the LSI card. Make sure that the
cables are connected to the backplane and then
follow the above procedure to verify that the
LSI card is seen in the BIOS Setup menu.
• If reseating the card does not solve the problem,
replace the LSI controller (the card could be
bad) and verify if this card is seen during BIOS
POST.
VMware does not show the local drive during VMware supports a maximum of two TB partitions
installation. sizes. Resize the partition to not exceed the 2TB
partition size limitation.
The RAID controller card is not working. Verify that the card installed is supported for this
server. If supported, follow the steps listed in Unable
to see LSI RAID controller in BOOT environment.
(above).
Procedure
Step 1 Follow the normal installation process of RHEL 5.4 i386 from the ISO or DVD.
Step 2 At the prompt, enter the command:
boot: linux dd noprobe=ata1 noprobe=ata2 noprobe=ata3 noprobe=ata4
Step 3 Mount the megaraid driver and map it from the virtual media. The .img file is emulated as a floppy. The file
Drivers\Linux\Storage\Intel\ICH10R\RHEL\RHEL5.4 is also on the driver CD available on CCO and the
path from the root.
Step 4 At the “before installation starts” step, the system will ask whether you want to add any additional drivers.
Step 5 Provide the drivers (usually the mapped file will be /dev/sdb, because it is a floppy).
Step 6 Continue the installation.
Step 7 When the system looks for storage, it should list the RAID as “LSI MegaSR”.
A problem with the DIMM memory can cause a server to fail to boot or cause the server to run below its
capabilities. If DIMM issues are suspected, consider the following:
• DIMMs tested, qualified, and sold by Cisco are the only DIMMs supported on your system. Third-party
DIMMs are not supported, and if they are present, Cisco technical support will ask you to replace them
with Cisco DIMMs before continuing to troubleshoot a problem.
• Check if the malfunctioning DIMM is supported on that model of server. Refer to the server’s installation
guide and technical specifications to verify whether you are using the correct combination of server,
CPU and DIMMs.
• Check if the malfunctioning DIMM seated correctly in the slot. Remove and reseat the DIMMs.
• All Cisco servers have either a required or recommended order for installing DIMMs. Refer to the
server’s installation guide and technical specifications to verify that you are adding the DIMMs
appropriately for a given server type.
• If the replacement DIMMs have a maximum speed lower than those previously installed, all DIMMs in
a server run at the slower speed or not work at all. All of the DIMMs in a server should be of the same
type. All of the DIMMs in a server should be of the same type for optimal performance.
• The number and size of DIMMs should be the same for all CPUs in a server. Mismatching DIMM
configurations can degrade system performance.
Acronym Meaning
into the notch on the DIMM. In the fourth example, the left side is again fully inserted and seated, and the
right side is partially inserted and incompletely latched.
Procedure
The following example shows how to check memory information using the Cisco IMC CLI:
Name DIMM_A1:
Capacity: Failed
Channel Speed (MHz): NA
Channel Type: NA
Memory Type Detail: NA
Bank Locator: NA
Visibility: NA
Operability: NA
Manufacturer: NA
Part Number: NA
Serial Number: NA
Asset Tag: NA
Data Width: NA
Name DIMM_A2:
Capacity: Not Installed
Channel Speed (MHz): NA
Channel Type: NA
Memory Type Detail: NA
Bank Locator: NA
Visibility: NA
Operability: NA
Manufacturer: NA
Part Number: NA
Serial Number: NA
Asset Tag: NA
Data Width: NA
...
Procedure
Procedure
Step 2 Server /chassis # show dimm Displays if there are any correctable DIMMs.
Correctable DIMMs display capacity as Failed. Clear
the DIMM error flag by running the error correctable
code (ECC) command.
Step 3 Server /chassis # scope reset-ecc Enters error correctable code configuration mode.
The following example shows how to view and reset the DIMM error flag:
Server/ scope chassis
Server /chassis # show dimm
Name Capacity Channel Speed (MHz) Channel Type
-------------------- --------------- ------------------- ---------------
DIMM_A1 Failed NA NA
DIMM_A2 Ignored/Disa... NA NA
DIMM_B1 16384 MB 1866 DDR3
DIMM_B2 16384 MB 1866 DDR3
DIMM_C1 16384 MB 1866 DDR3
DIMM_C2 16384 MB 1866 DDR3
DIMM_D1 16384 MB 1866 DDR3
DIMM_D2 16384 MB 1866 DDR3
DIMM_E1 16384 MB 1866 DDR3
DIMM_E2 16384 MB 1866 DDR3
DIMM_F1 16384 MB 1866 DDR3
DIMM_F2 16384 MB 1866 DDR3
DIMM_G1 16384 MB 1866 DDR3
DIMM_G2 16384 MB 1866 DDR3
DIMM_H1 16384 MB 1866 DDR3
DIMM_H2 16384 MB 1866 DDR3
To view and identify a bad DIMM using the Cisco IMC GUI, see Troubleshooting DIMM errors using Cisco
IMC GUI, on page 14
Procedure
DIMM does not fit in slot. Verify that the DIMM is supported on that server model.
Verify that the DIMM is oriented correctly in the slot.
DIMMs and their slots are keyed and only seat in one of
the two possible orientations.
The DIMM is reported as bad in the SEL, POST, Verify that the DIMM is supported on that server model.
or LEDs, or the DIMM is reported as inoperable Verify that the DIMM is populated in its slot according
in Cisco IMC. to the population rules for that server model.
Verify that the DIMM is seated fully and correctly in its
slot. Reseat it to assure a good contact and rerun POST.
Verify that the DIMM is the problem by trying it in a slot
that is known to be functioning correctly.
Verify that the slot for the DIMM is not damaged by trying
a DIMM that is known to be functioning correctly in the
slot.
Reset the BMC.
The DIMM is reported as overheating. Verify that the DIMM is seated fully and correctly in its
slot. Reseat it to assure a good contact and rerun POST.
Verify that all empty HDD bays, server slots, and power
supply bays use blanking covers to assure that the air is
flowing as designed.
Verify that the server air baffles are installed to assure that
the air is flowing as designed.
Verify that any needed CPU air blockers are installed to
assure that the air is flowing as designed.
Host is unreachable via IP, the CIMC works but KVM shows a blank Upgrade the CIMC firmware and
screen. BIOS.
Procedure
Step 1 Power off the server and disconnect the power cord.
Step 2 Confirm that all cards are properly seated.
Step 3 Connect the power cord and power on the server.
While upgrading firmware, the Host Upgrade Utility Press Return to exit the utility.
screen might freeze or black out.
Installing Microsoft Windows 2008 fails and the This problem occurs when the EFI CDROM device for
following error message is displayed: virtual drives is used to boot the Windows 2008 image.
Selected disk has MBR partition table.
Use the CDROM device from BIOS ‘CDROM order’.
On EFI systems, Windows can only be
installed to GPT disks.
After installing a Microsoft Windows operating This problem occurs because of:
system using UCS SCU, the Windows Device
Manager shows some devices with a yellow • The devices are not supported by Cisco UCS SCU.
bang. • You have not selected the device drivers in the SCU
GUI.
The Windows set up fails and the following error This error is displayed when you have not selected a device
message is displayed: driver for a boot controller in the user interface.
Inaccessible boot device
Installation of the RHEL 6 is interrupted and the This error is displayed when the DHCP option is selected
following error message is displayed: during the installation, and DHCP does not provide an IP
address.
When contacting Cisco TAC for issues, you should always capture the tech-support output from the Cisco
CIMC CLI or the Technical Support Data from the Cisco CIMC GUI.
Procedure
Note If you chose SCP or SFTP as the remote server type while
performing this action, a pop-up window is displayed with the
message Server (RSA) key fingerprint is <server_finger_print
_ID> Do you wish to continue?. Click Yes or No depending
on the authenticity of the server fingerprint.
The fingerprint is based on the host's public key and helps you
to identify or verify the host you are connecting to.
Server IP/Hostname field The IP address or hostname of the server on which the support data file
should be stored. Depending on the setting in the Export Technical
Support Data to drop-down list, the name of the field may vary.
Path and Filename field The path and filename should use when exporting the file to the remote
server.
Note If the server includes any of the supported network adapter
cards, the data file also includes technical support data from
the adapter card.
Username The username the system should use to log in to the remote server. This
field does not apply if the protocol is TFTP or HTTP.
Password The password for the remote server username. This field does not apply
if the protocol is TFTP or HTTP.
Procedure
Procedure
Procedure
C G
Correctable DIMM errors 14 guidelines 1
troubleshooting 14 troubleshooting 1
D I
DIMM errors 14, 15 inoperable DIMM error 15
inoperable 15 troubleshooting 15
troubleshooting 15
DIMM issues 16
recommended solutions 16