0% found this document useful (0 votes)
114 views63 pages

X8 8 Diagnostics

X8 8 Diagnostics

Uploaded by

Faisal K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views63 pages

X8 8 Diagnostics

X8 8 Diagnostics

Uploaded by

Faisal K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

X86 West Diagnostics

Bhumika Malik
Salomon Chavez
X86 West Diagnostics

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 1
Agenda
U-Boot Diagnostics
Hwdiag
UEFI Diagnostics

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 2
Diagnostics Test Coverage
Server Component U-Boot Hwdiag UEFIdiag
Service Processor Yes Yes No
CPU and memory No Yes Yes
IO No Yes Yes
Fans No Yes No
Power Supply No Yes No
Storage devices No Yes (NVMe only) Yes
Network interfaces No Yes Yes

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 3
U-Boot Diagnostics
Service Processor Diagnostics – same coverage X7-8/X8-8

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
Hwdiag
SP-based Diagnostics Tool

Agenda:
AEP Support
Sanity Test

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
Hwdiag – AEP Support 1/2
– AEP support added
– Hwdiag mem info
– Hwdiag mem spd
[(flash)root@ORACLESP-P01_SYSTEM_OS_00A:~]# hwdiag mia
HWdiag - Build Number 128711 (Jan 10 2019, 18:34:44)
Current Date/Time: Feb 25 2019, 17:41:40
This is a X8-8.
CPU 0 Memory Devices
Location Mfg Size(GB) Rank Width Speed(MT/s) Chan Dimm Enabled-Ranks
/SYS/CMOD0/P0/D0 Samsung 16.00 Dual x4 2666 5 0 2/2
/SYS/CMOD0/P0/D2 Samsung 16.00 Dual x4 2666 4 0 2/2
/SYS/CMOD0/P0/D4 Samsung 16.00 Dual x4 2666 3 0 2/2
/SYS/CMOD0/P0/D5 Intel 128.00 Single x8 2666 3 1 1/1
/SYS/CMOD0/P0/D6 Intel 128.00 Single x8 2666 0 1 1/1
/SYS/CMOD0/P0/D7 Samsung 16.00 Dual x4 2666 0 0 2/2
/SYS/CMOD0/P0/D9 Samsung 16.00 Dual x4 2666 1 0 2/2
/SYS/CMOD0/P0/D11 Samsung 16.00 Dual x4 2666 2 0 2/2

Total memory populated on CPU 0: 352.00 GB

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 6
Hwdiag – AEP Support 2/2
[(flash)root@ORACLESP-P01_SYSTEM_OS_00A:~]# hwdiag mem spd /SYS/CMOD5/P5/D5
HWdiag - Build Number 128711 (Jan 10 2019, 18:34:44)
Current Date/Time: Feb 25 2019, 17:57:09
This is a X8-8.

/SYS/CMOD5/P5/D5:
Size: 128.00 GB
DIMM Type-Speed: NVDIMM-2666
DIMM Module Voltage Capabilities
Operable: 1.2V 1.05V
Endurant: 1.2V 1.05V
Part Number: NMA1XBD128GQS
Serial Number: 00000417
Manufacturer: Intel (89)
Manufactured date: Week 38 of '18
SPD Base Config CRC check: Passed
SPD Module Specific CRC check: Passed

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 7
Hwdiag – Sanity Test 1/2
– Run overall system check by running a subset of hwdiag commands
– Provides exit status for each command
– Provides overall pass/fail result
– Syntax: hwdiag sanity any|off|booted|show
[(flash)root@ORACLESP-1715XC4010A:~]# hwdiag sanity booted
[…]
cpld iof realtime dump all : PASSED
cpld log read 20 : PASSED
cpld mbus : PASSED
cpld reg all : PASSED
cpld vr_check all : PASSED
cpu capid all : PASSED
cpu info all : PASSED
cpu pirom_info all : PASSED
fan get all : PASSED

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 8
Hwdiag – Sanity Test 2/2
fan info : PASSED
fan test : PASSED
gpio get all : PASSED
i2c scan all : PASSED
i2c test all : PASSED
io error all : PASSED
io nvme_info : PASSED
io nvme_test : PASSED
led get all : PASSED
led info all : PASSED
mem info all : PASSED
mem spd all : PASSED
pci info all : PASSED
pci scan : PASSED
pci status all : PASSED
power get amps all : PASSED
power get volts all : PASSED
power get watts all : PASSED
power vrdfw_test all : PASSED
system fabric test all : PASSED
system info : PASSED
system port80 : PASSED
system rtc : PASSED
system summary : PASSED
system thermal : PASSED
temp get all : PASSED
temp info all : PASSED

Sanity Test Result : PASSED

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 9
UEFIdiag
Host-based Diagnostics Tool

Agenda
Overview
Differences vs x7 Platforms
New features

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
UEFidiag – Overview
• A controlled test environment that runs Udiag tool over EFI/UEFI shell
• A configuration process that makes use of several ILOM features to know
the system status, test progress and results.
• Infrastructure able to manipulate system power status, modify BIOS
settings and execute scripts in the Service Processor and in the Host.
• Udiag is a lightweight tool executed in the Host that using UEFI services is
able to access protocols and variables to get system info and exercise
devices attached to the sockets.
– Targeted to the Host CPUs, memory, storage and I/O subsystems

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 11
UEFidiag – Differences with x7-8
• Only CLI mode
• NVDIMM support
– Configuration information only
– Access to ACPI tables
• Better memory test coverage, “non-interleave” case for NVDIMMs that
creates multiple memory blocks
• Secure boot – signed version

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 12
NVDIMM info
System Info System Inventory

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 13
NVDIMM PCAT and NFIT tables

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 14
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 15
Back-up Slides
Extended version

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 16
Agenda
Diagnostics Strategy
U-Boot Diagnostics
Hwdiag
UEFI Diagnostics

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 17
Diagnostics Strategy
• Progressive Series Approach
Use firmware (U-Boot) diagnostics
to validate the SP and attached
hardware.
Given a stable SP, expand system
hardware testing scope/coverage
using standalone (SP-based)
diagnostics.
Use host-based diagnostics for full
system level exercisers

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 18
What does each tool target?
U-Boot Diag targets the Service Processor
– Failures here may lead to bad ILOM behavior
Hwdiag targets the chassis and host subsystems
– Fans, power supplies, LEDs, etc...
– QPI, PCIe and memory interfaces
UEFIdiag targets the overall host functionality
– Coverage similar to SunVTS

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 19
Diagnostics Test Coverage
Server Component U-Boot Hwdiag UEFIdiag
Service Processor Yes Yes No
CPU and memory No Yes Yes
IO No Yes Yes
Fans No Yes No
Power Supply No Yes No
Storage devices No Yes (NVMe only) Yes
Network interfaces No Yes Yes

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 20
U-Boot Diagnostics
Service Processor Diagnostics

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
U-Boot Diagnostics
– Integrated in the Service Processor U-Boot firmware
– Designed to test the hardware required to enable the Service Processor to
successfully boot into Linux.
– Tests SP and associated memory, network devices and I/O
– Tests connectivity to I2C devices required for SP functionality
– Fault status is stored in an environment variable and a fault is created in ILOM
– Service Processor will boot regardless of failure status
– Automatically executed when ILOM boots up (in normal/quick mode depending on
whether the SP is AC power-cycled or just rebooted)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 22
Uboot Diagnostics Sample Output Extended Mode
Enter Diagnostics Mode ['q'uick/'n'ormal(default)/e'x'tended] ..... 0
Diagnostics Mode - Extended(Manufacturing Mode)
<DIAGS> Memory Data Bus Test ... PASSED
<DIAGS> Memory Address Bus Test ... PASSED
<DIAGS> Testing 0x81000000 to 0x86000000 ... PASSED
<DIAGS> Testing 0x8B000000 to 0x8EF00000 ... PASSED
<DIAGS> Flash Test ... PASSED
<DIAGS> Testing Watchdog ... PASSED
<DIAGS> Testing RTC . PASSED

I2C Probe Test

SMOD0

Bus Device Description Addr Result


=== ========================================= ==== ======
2 FRUID - AT24C64 (TBD ) 0xA0 PASSED

<DIAGS> I2C Probe Test ... PASSED

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 23
U-Boot Diagnostics Coverage
U-Boot Component Test Quick Normal Extended Description
Memory Data Bus Test X X X Checks for opens and shorts on the SP memory data bus.
Memory Address Bus Test X X X Checks for opens and shorts on the SP memory address bus.
Memory Data Integrity Test X Checks for data integrity on the SP memory.
Flash Test X Checks access to flash memory.
WatchDog Test X X Checks the watchdog functionality on the SP.
I2C Probe Tests X X Checks the connectivity to I2C devices on standby power.
Ethernet Test X X X Verifies ability to read from the specified Ethernet port.
Ethernet Link Test X X X Verifies link on the specified PHY.
Ethernet Internal Loopback Test X X Verifies Ethernet functionality by sending and receiving packets.
NAND Controller and Chip Test X X X Tests the NAND flash chip.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 24
U-Boot Diagnostics
For more information:
https://docs.oracle.com/cd/E23161_01/html/E23099/gkxse.html

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 25
Hwdiag
SP-based Diagnostics Tool

Agenda:
Overview
Main Commands
User Interface
Test Coverage
Debugging Tips

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
Hwdiag – Overview
• Part of ILOM package
– Targeted to the devices that are accessible from the Service Processor
• All I2C devices (PSUs, fans, LEDs, VRDs, FPGAs, …)
• Host chipset devices using sideband access (CPU, PCH, GPU, network cards)
• Command line based tool:
– hwdiag [options] main_command sub_command1 sub_command2 …

• Intuitive UI – easy to use


– If user does not know all parameters of command, list of next available parameters is
displayed

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 27
Hwdiag – Diag shell
• New feature for X8 platforms for customers to access hwdiag and diag logs
– Hwdiag not available via restricted shell anymore
• Provides a subset of non-intrusive (root access) hwdiag commands – that
do not involve set/write access
• Accessible using SP shell:
-> start /SP/diag/shell/
Are you sure you want to start /SP/diag/shell (y/n)? y

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 28
Hwdiag – Diag shell
diag> help

Built-in commands:
echo - Display information to user.
Typical use: echo $?
help - Produces this help.
Use 'help <command>' for more information about an external command.
exit - Exit this shell.

External commands:
hwdiag - Run hardware diagnostics
ls - List diagnostics log directories and files
cat - Print content of diagnostics log files

For more detailed information on hwdiag commands, run “hwdiag –h”.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 29
Hwdiag – Diag shell
Diag shell also provides access to hwdiag logs:
diag> ls -l hwdiag/
-rw-r--r-- 1 root root 29706 Aug 9 16:17 hwdiag_i2c_test.log
-rw-r--r-- 1 root root 53386 Aug 1 18:08 hwdiag_i2c_test.log.1

diag> cat hwdiag/hwdiag_i2c_test.log


Hwdiag i2c test all -e
HWdiag - Build Number 120002 (Jul 25 2017, 09:37:04)
Current Date/Time: Aug 01 2017, 18:13:20
This is SM1(2U).
I2C DEVICE CHIP BUS/MUX/CH/ADDR RESULT
----------------------------------------------------------------------
/SYS/SM1/CPLD XC6SLX16 1/FF/FF/4E OK
/SYS/SM1/DB_CPLD XC6SLX16 1/FF/FF/56 OK
/SYS/SM1/T_IN NCT214 1/FF/FF/30 OK
/SYS/SM1/12V_STBY_ECB LTC4215 1/FF/FF/94 OK
/SYS/SM1/12V_MAIN_ECB LTC4282 1/FF/FF/96 OK
/SYS/SM1/FRONT_9552 PCA9552 1/FF/FF/C0 OK
/SYS/MIO AT24C64 2/FF/FF/AC OK
/SYS/CCM AT24C64 2/FF/FF/A8 OK
/SYS/CCM/CPLD XC6SLX16 2/FF/FF/4E OK
/SYS/CCM/FAN_CTRL ADT7470 2/FF/FF/58 OK
/SYS/SM1 AT24C64 3/FF/FF/A0 OK
/SYS/SM1/PCA9554_0 PCA9554 3/FF/FF/70 OK
/SYS/SM1/PCA9554_1 PCA9554 3/FF/FF/7E OK
/SYS/SM1/PCA9555 PCA9555 3/FF/FF/40 OK
...

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 30
Hwdiag – List of Main Commands
Command Description Command Description
cpld Power CPLD Tests/Utilities led Control the various LEDs
cpu Display CPU information mem Display Memory Configuration
dda Direct Device Access pci PCI Tests/Utilities
fan Fan Tests/Utilities perf Performance Tests/Utilities
gpio Misc. Pilot3 GPIO functions power Display Power
i2c Test the sideband I2C topology system System Level Tests/Utilities

• Use hwdiag –h to get list of main commands


• Use hwdiag –h <main command> to get list of subcommands/parameters
including options

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 31
Hwdiag User Interface – Example main command led 1/2
[(flash)root@ORACLESP-DIAG_SYSTEM_06A:~]# hwdiag led
HWdiag - Build Number 112373 (Sep 19 2016, 21:02:16) < - - \ Displayed with all commands (hide with -xh)
Current Date/Time: Sep 22 2016, 23:19:01 < - - /

Syntax: hwdiag led ...


cleanup - Return control for DPCC LEDs back to host
get all|<led>
- Display LED Status
info all|<device>
- Dump LED Controller Registers
set all|<led> <state>
- Set LED Status
test all|<led>
- Test LED functionality by devices

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 32
Hwdiag User Interface – Example main command led 2/2
[(flash)root@ORACLESP-DIAG_SYSTEM_06A:~]# hwdiag led set
[…]
Valid Options for LED Names :
ALL /SYS/CMOD1/P1/D3/SERVICE
/SYS/LOCATE /SYS/CMOD1/P1/D4/SERVICE
/SYS/SERVICE /SYS/CMOD1/P1/D5/SERVICE
/SYS/OK /SYS/CMOD1/P1/D6/SERVICE
[…]

[(flash)root@ORACLESP-DIAG_SYSTEM_06A:~]# hwdiag led set /SYS/SERVICE


[…]
Valid Options for LED State :
on blink-PMW0
off blink-PMW1

[(flash)root@ORACLESP-DIAG_SYSTEM_06A:~]# hwdiag led set /SYS/SERVICE on


[…]
/SYS/SERVICE set to on

[(flash)root@ORACLESP-DIAG_SYSTEM_06A:~]# hwdiag led get /SYS/SERVICE


[…]
LED VALUE
------------------------------------------
/SYS/SERVICE : on

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 33
Hwdiag – Testing I2C Topology, LEDs
• Ping all I2C devices
– hwdiag i2c test <bus>|all
– Pass/Fail test
• Scan possible I2C addresses behind
all buses/muxes
– hwdiag i2c scan <bus>|all
– Get a list of all I2C addresses on system
• Get LED status and set (sunservice
only) LEDs
– hwdiag led get <led_nac>|all
– hwdiag led set <led_nac>|all on|off

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 34
Hwdiag – Testing Fans, Temperature and Power
• Get fan status and set (sunservice only) fans
– hwdiag fan get
– hwdiag fan set <fan_nac>|all <pwm>

• Get temperatures
– hwdiag temp get <temp_nac>|all (CPU/DIMM/NVMe temperatures via PECI)

• Get voltages, currents and power info from PSUs/VRDs/ECBs


– hwdiag power get amps|volts|watts all|<sensor>
– hwdiag info all|<sensor> – read registers and decode, e.g. STATUS_WORD

• Check if PSUs are load sharing per specification


– hwdiag power psutest (requires to run some load, e.g. UEFIdiag)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 35
Hwdiag – Decoding CPLD Registers, Device Direct Access
Commands
• Get CPLD related information (diagnose SP/host related problems)
– Read and decode all registers from CPLDs in platform
• hwdiag cpld reg <cpld>
– Check VRD status registers and decode bits to get VRD status
• hwdiag cpld vr_check
– Get CPLD event log (Get important events, e.g. host power on/off, PROCHOT, CATERR)
• hwdiag cpld log read

• Read, write (sunservice only) and dump registers from I2C devices
– hwdiag dda read <device> <register> <# bytes>|<device spec. params>
– hwdiag dda write <device> <register> <data>
– hwdiag dda dump <device>

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 36
Hwdiag – Accessing Host Side Information - CPUs
• Accessing host side information via PECI from CPU registers

• CPU information – CPUID, #cores (SKU), #threads (SKU), #cores enabled,


#thread enabled, QPI link status
hwdiag cpu info all|<cpu>

• Get information from CPU PIROM


hwdiag cpu pirom_info all|<cpu>

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 37
Hwdiag – Checking Host Memory
• Display memory configuration (location, mfg, size (GB), rank, width, speed,
channel, DIMM on channel, enabled ranks)
hwdiag mem info <cpu|<dimm>|all
– Information collected via I2C and PECI
• Read DIMM SPD information
– Read/write I2C (host power off) or SP memory (host power on)
hwdiag mem spd all|<dimm>
hwdiag mem spd_chk all|<dimm>

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 38
Hwdiag – Checking PCIe devices
• Get information by reading PCIe configuration space via PECI
• Display PCIe information by slot name (status, root port capability and
negotiation, VID, DID, SVID, SDID, target node capability and negotiation,
description)
hwdiag pci info <cpu>|all

• Scan PCIe devices – probe all possible bus/device/functions and list devices
hwdiag pci scan

• Read, write (sunservice only) PCIe registers or dump entire std/ext space
hwdiag pci read|write|dump …

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 39
Hwdiag – NVMe
• Display NVMe information (part of pci info)
– hwdiag pci info all|<cpu>

• Perform NVMe check


– hwdiag io nvme test
– Checks NVMe drive FRU content and drive serial number (DSN) content
– Checks NVMe drive PCIe link width/speed and compares with capability
– Pass/Fail test

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 40
Hwdiag – System commands
• Get system information
– Display ILOM version, SP revision, MB revision, CPLD versions, BIOS version, port80,
CPU information, memory information, fan/PSU/Disk/USB presence, PCI information
in one command
– hwdiag system info
– Shorter form (one pager): hwdiag system summary
• Check QPI links, memory, PCIe properly setup
– hwdiag system fabric test all|<cpu> (Pass/Fail test)

• Get thermal-related information (temperatures, PSU/CPU/memory/fan


power usage, fan speeds)
– hwdiag system thermal

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 41
Hwdiag – List of Important Commands for Debugging
Command Description
hwdiag cpld vr_check Checks voltage regulator status reported in FPGA, use for host power
on problems
hwdiag cpld log read Reads from the FPGA event log
hwdiag system fabric test all Checks QPI links, memory, PCIe (check for any degradation)
hwdiag power info all Check status words of voltage regulators devices for faults
hwdiag i2c test all Pings all I2C devices
hwdiag system port80 Get last known port80 code (use when BIOS hangs)

A subset of Hwdiag commands is executed when collecting an ILOM snapshot.


For more information about Hwdiag
Oracle® x86 Server Diagnostics, Applications, and Utilities Guide:
https://docs.oracle.com/cd/E23161_01/html/E23099/gmcfn.html

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 42
UEFIdiag
Host-based Diagnostics Tool

Agenda
Overview
What is UEFIdiag?
What is Udiag
UEFIdiag Components
UEFIdiag Modes and Configuration Commands
Automated Test vs Manual mode
Q & A

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted
UEFI Overview

• UEFI (Unified Extensible Firmware Interface) is a specification that defines a


software interface between an OS (or UEFI shell) and platform firmware
• Provides platform information, boot and runtime services that are used by
UEFIdiag
• Extensive use of “protocols”; software interfaces used for communications
between binary modules
• Provides a UEFI shell environment used to execute UEFI applications and drivers
• The UEFI Shell allows launching UEFI applications which include UEFI
bootloaders.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.Oracle


| Confidential – Internal 44
What is UEFI diagnostics?
• A controlled test environment that runs Udiag tool over EFI/UEFI shell

• It follows a configuration process that makes use of several ILOM features that
enable a constant communication with the host in order to know the system status,
test progress and results.

• UEFIdiag has a very well developed infrastructure able to manipulate the system
power status, modify BIOS settings and execute (and stop) scripts in the Service
Processor and in the Host.

• Udiag (Mp.efi) is the tool that is executed in the Host that using UEFI services is
able to access protocols and variables to get system info and exercise devices
attached to the sockets.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.Oracle


| Confidential – Internal 45
Udiag Tool

• Lightweight tool that runs over EFI/UEFI shell


• Targeted to the Host CPUs, memory, storage and I/O subsystems
• UEFI application that runs in CLI mode
• Because of UEFI shell is single thread, a mini-kernel has been developed to
provide multitasking/multiprocessing services
–Two or more diagnostics can be run in parallel using one or several cores

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.Oracle


| Confidential – Internal 46
UEFIdiag Components
X86 Platforms
Execution Environment Configuration/Report Environment

HOST SP: Pilot 4


… APn Power On/Off
AP1 ILOM: x8-rom.pkg
BSP Virtual USB drive Log events
BIOS
sp_trace
Snapshots
UEFI shell  Udiag …
IPMI udiag.img.gz
Scripts  logs

Notes:
- System boots directly to UEFI shell when BIOS detects the virtual USB drive (Udiags)
- Udiag and test scripts are embedded in ILOM pkg (uefidiag1.img.gz)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.Oracle


| Confidential – Internal 47
UEFIdiag Modes
Enabled: Touch every accessible device. ~5 - 90 minute burn-in script
Extended: Add more stress to system (exhaustive memory test). ~10 - 180 minute
burn-in script
Manual: User can select the tests scripts/commands to run
Disabled: Default mode

UEFIdiag Configuration Commands


set mode: to indicate what mode should be used
start /HOST/diag: to start UEFidiag test
stop /HOST/diag: to interrupt or resume UEFidiag test
show status: to get current status

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.Oracle


| Confidential – Internal 48
Testing executed by Enabled and Extended mode
Test Group Test Command Notes
TEST_INV udiag system inventory udiag system info PRTFRU info is added to system.inv at the end
udiag storage info -v
TEST_CPU udiag cpu simd udiag fpu -np all -pc 10 In extended mode more stress is added to fpu
udiag cpu cpuid udiag cpu linpack -np all -v command
udiag cpu model udiag cpu lapack –pc 10
TEST_MEM udiag memory test addr0 -np all udiag memory test walk0 -np all “walk0 ” and “walk1 ” are executed only in
udiag memory test block0 -np all *udiag memory test walk1 -np all Extended mode
* Removed for 8 sockets system
TEST_GFX udiag graphics bars -time 5 udiag graphics motion Visible only if RKVMs is used
udiag graphics gradient -time 5 udiag graphics memory
udiag graphics grid -time 5
TEST_QPI udiag system cpusockets To keep compatibility, QPI acronym will be used
instead UPI for x8 platforms
TEST_PCI udiag system pelink test It verify all PCIe devices are trained correctly

TEST_STR udiag storage srt all -time 30 Virtual drives (arrays) should be created for each
HDD
TEST_NET udiag network /SYS/SM0/NET0 /SYS/SM0/NET31 NAC names is specific to each platform

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 49
UEFIdiag Configuration Process (Web GUI vs Serial Console)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.Oracle


| Confidential – Internal 50
Monitoring Test Progress

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.Oracle


| Confidential – Internal 51
Evaluating Test Results -> start -script /SP/diag/shell/
diag> cat uefidiag/uefidiag.log

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.Oracle


| Confidential – Internal 52
UEFIdiag Manual Mode CLI
• Use “-h”, “-hv” or “-hV” to display help information

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 53
FS0:\> mp system inventory

System.Manufacturer = Oracle Corporation
System.Product = ORACLE SERVER X8-8
System.Version =

System inventory System.Serial

Motherboard.Manufacturer
= 1715XC4010A

= Oracle Corporation
Motherboard.Product = SMOD TOP LEVEL ASSY
Motherboard.Version = Rev 07
Motherboard.Serial = 465136N+1743PJ001J

• Udiag system info Enclosure.Manufacturer


Enclosure.Version
= Oracle Corporation
= ORACLE SERVER X8-8
Enclosure.Serial = 1715XC4010A
• Udiag system inventory BIOS.Vendor = American Megatrends Inc.
BIOS.Version = 57002600
• Udiag storage info BIOS.Release = 01/30/2019

Processor 0.Manufacturer = Intel(R) Corporation


Processor 0.Name = Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
Processor 0.Type = Central Processor
Processor 0.Part.Number = QRAT
Processor 0.Socket = /SYS/CMOD0
Processor 0.Clock.Frequency = 2900 MHz
Processor 0.Cores = 24
Processor 0.Cores.Enabled = 24
Processor 0.Threads = 48

Memory 0.Socket = /SYS/CMOD0/P0/D7
Memory 0.Manufacturer = Samsung
Memory 0.Size = 32 GiB
Memory 0.Type = DDR4
Memory 0.MaxCapSpeed = 2666 MT/s
Memory 0.ConfiguredSpeed = 2666 MT/s

Memory Total.Size = 3072 GiB (3.00 TiB)

Disk Drive HDD0.Product_Name = HGST H101212SESUN1.2TA44513
Disk Drive HDD0.Serial_Number = 001329DB9J7D

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 54
CPU test commands
FS0:\> udiag cpu
cpuid - Show CPU ID string
cpuid <EAX> [<ECX>] - Execute cpuid instruction
info [-ap <n>['<n>]* | -np <n> | all] - Shows and Tests Logical Processors
model - Show CPU model
speed - Measure cpu speed
simd [-ap <n> | -np <n>] - Tests Multimedia Extensions
lapack - Runs double complex test routines
linpack [<matrix_size>] - Linpack benchmark

FS0:\> udiag fpu -h


fpu - Test FPU on BSP or APs
FS0:\> udiag system cpusockets -h
cpusockets - Show info on CPU sockets and QPI links

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 55
Memory test commands
FS0:\> udiag memory
test info - Display available test algorithms
test ALG [RANGE] [MP] [TIME] - Run a memory test <n>
ALG = addr0 | pat0 | pat1 | walk0 | walk1 | rand0 |
rand1 | block0 | refresh0
RANGE = -s <start> [-e <end>] ; <end> = last address + 1
MP = -ap <n>['<n>] [-x] | -np <n> [-@y|-@z] | -np all [-@y|-@z]
TIME = -time <n> ; unit in seconds
<s,e,n> = hexadecimal numbers without "0x"
info freespace - Show total free memory size
info maxblock - Show largest memory blocks
info dimm <address> - Show DIMM info for physical <address>
info addr DRAM_INFO - Show physical address given DRAM_INFO
info map [<address>|RANK] - Show address mapping

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 56
PCIe test command
FS0:\> udiag system pelink test
P# RootCap LinkSts Test Capability/Device Description
-- --------- -------- ---- ------------------------------------------------
0 x4 8G x4 8G PASS 00:00:00 P0 CLX0 x4 8G DMI3
1 x8 8G x8 8G PASS x8 8G 10:00:00 LSI Oracle Storage 12 Gb SAS PCIe HBA, 16 port, RAID, internal
2 x4 8G x4 8G PASS 80:00:00 P0 CLX4 x4 8G DMI3
3 x8 8G x8 8G PASS x8 8G 8C:00:00 LSI Oracle Storage 12 Gb SAS PCIe HBA, 16 port, RAID, internal
4 x8 8G x8 8G PASS x16 8G 2C:00:00 Intel Lewisburg PCIe x16 Uplink (NPX16)
5 x8 8G x8 8G PASS x16 8G AC:00:00 Intel Lewisburg PCIe x16 Uplink (NPX16)
6 x16 8G x4 5G PASS x4 5G 17:00:00 Intel Sun Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP
7 x8 8G x4 5G PASS x4 5G 0C:00:00 Intel Sun Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP
8 x16 8G x4 5G PASS x4 5G 34:00:00 Intel Sun Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP
9 x8 8G x4 5G PASS x4 5G 28:00:00 Intel Sun Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP
A x16 8G x8 8G PASS x8 8G 54:00:00 PLX Oracle Flash Accelerator F640 PCIe Card v2: 6.4 TB, NVMe PCIe 3.0
B x8 8G x8 5G PASS x8 5G 48:00:00 Intel Sun Dual Port 10 GbE PCIe 2.0 Low Profile Adapter, Base-T
C x16 8G x16 8G PASS x16 8G 74:00:00 Oracle Oracle Dual Port EDR InfiniBand Adapter
D x8 8G x4 5G PASS x4 5G 68:00:00 Intel Sun Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP
E x16 8G x16 8G PASS x16 8G 94:00:00 Oracle Oracle Dual Port EDR InfiniBand Adapter
F x8 8G x4 5G PASS x4 5G 88:00:00 Intel Sun Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP
10 x16 8G x4 5G PASS x4 5G B4:00:00 Intel Sun Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP
11 x8 8G x8 5G PASS x8 5G A8:00:00 Intel Sun Dual Port 10 GbE PCIe 2.0 Low Profile Adapter, Base-T
12 x16 8G x4 5G PASS x4 5G D4:00:00 Intel Sun Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP
13 x8 8G x8 8G PASS x8 8G C8:00:00 Intel Oracle Quad Port 10GBase-T Adapter
14 x16 8G x4 5G PASS x4 5G F4:00:00 Intel Sun Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP
15 x8 8G x8 8G PASS x8 8G E8:00:00 PLX Oracle Flash Accelerator F640 PCIe Card: 6.4 TB, NVMe PCIe 3.0
16 x1 2.5G x1 2.5G PASS x1 5G 02:00:00 Emulex Integrated PCI-Express bridge

0: system_pelink_test$1: Pass=1, Fail=0

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 57
NVDIMM info
System Info System Inventory

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 58
NVDIMM PCAT and NFIT tables

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 59
UEFIdiag Network Test (Manual mode)
• Connect cables between ports to be tested (Back-to-back for static and to a switch for DHCP), then follow these
steps:
• 1. Clear all port configurations using:
ifconfig -r <port_name>
Example: fs1:\> ifconfig -r eth0
• 2. Configure ports using:
Static: ifconfig -s <port_name> <ip_address> <subnet_mask> <gateway_address>
Example: fs1:\> ifconfig -s eth0 static 192.168.1.20 255.255.255.0 192.168.1.1
DHCP: ifconfig -s <port_name> dhcp
Example: fs1:\> ifconfig -s eth0 dhcp
• 3. Verify that ports have been configured correctly using:
fs1:\> udiag network ifs
This command will indicate the NAC names or port numbers to use
• 4. Run network test using:
fs1:\> udiag network <Tx> <Rx>
For example:
fs1:\> udiag network /SYS/SM1/NET0 /SYS/SM1/NET1
fs1:\> udiag network 4 5

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 60
UEFIdiag log files

Using /SP/diag/shell

•PASSED.stress_test (FAILED/ABORTED)
•Done
•system.info
•system.inv
•test.log
•uefi_started
•uefidiag.log

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.Oracle


| Confidential – Internal 61
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 62

You might also like