FusionServer V7 Server GPU Card Operation Guide 08
FusionServer V7 Server GPU Card Operation Guide 08
Operation Guide
Issue 08
Date 2024-07-04
Notice
In this document, "xFusion" is used to refer to "xFusion Digital Technologies Co., Ltd." for concise description
and easy understanding, which does not mean that "xFusion" may have any other meaning. Any "xFusion"
mentioned or described hereof may not be understood as any meaning other than "xFusion Digital
Technologies Co., Ltd.", and xFusion Digital Technologies Co., Ltd. shall not bear any liability resulting from
the use of "xFusion".
The purchased products, services and features are stipulated by the contract made between xFusion and
the customer. All or part of the products, services and features described in this document may not be within
the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: https://www.xfusion.com
Purpose
This document describes how to configure a graphics processing unit (GPU) on
servers and provides solutions to frequently asked questions (FAQs).
Intended Audience
This document is intended for:
The server maintenance personnel must have adequate knowledge about the server
products and service skills to avoid injury to human body or damage to devices
during maintenance.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Symbol Description
Change History
Issue Release Date Description
05 2023-09-27 Added:
● 2.3.3 Mapping Between Riser Cards
and V7 Rack Servers.
● 2.4.3 V7 Rack Servers Riser Card
Power Cable.
● 3.2.1.3 V7 Rack Server GPU
Configuration Rule.
Updated:
● 2.3.2 Mapping Between Riser Cards
and V6 Rack Servers.
● 3.2.1.2 V6 Rack Server GPU
Configuration Rule.
● 3.2.2 Rack Server GPU Card Cable
Connection Method.
Contents
A Appendix...................................................................................................................... 157
A.1 FAQ......................................................................................................................................................... 157
A.1.1 GUI Cannot Be Displayed After the GPU Card Is Inserted After the OS Is Installed........................... 157
A.1.2 GUI Cannot Be Displayed After the GPU Driver Is Installed................................................................ 157
A.1.3 Driver Fails to Be Installed Because the GCC Is Not Installed.............................................................158
A.1.4 How to Disable the Nouveau Driver for Different Linux Systems......................................................... 158
A.1.5 Failed to Detect the Kernel Source.......................................................................................................159
A.1.6 Compatibility Between Drivers and GPUs/OSs.................................................................................... 159
A.1.7 Insufficient MMIOH Resources on the OS with Tesla A100 40 GB Configured....................................160
A.1.8 The GPU Temperature Exception Alarm Is Displayed on the BMC......................................................161
A.2 Getting Help.............................................................................................................................................162
A.2.1 Collecting Fault Information..................................................................................................................162
A.2.2 Preparing for Debugging...................................................................................................................... 162
A.2.3 Using Product Documentation..............................................................................................................162
A.2.4 Technical Support.................................................................................................................................162
1 Overview
This document describes how to install a GPU card on servers, provides precautions
for installing the GPU card, and provides solutions to common problems.
To install a GPU card, perform the following steps:
----End
2 Knowing Devices
2.1 Server
2.2 GPU Card
2.3 Riser Card
2.4 Riser Card Power Cable
2.1 Server
Determine the server model based on the silkscreen on the front panel of the server.
For details about common riser cards and power cables used on servers, see 2.3
Riser Card and 2.4.1 V5 Rack Servers Riser Card Power Cable.
Serie Tesla
s
Chip TU10 GP10 GV10 GV10 GV10 GP10 GP10 GV10 GV10
Name 4 4 0 0 0 0 0 0 0
GPU 16 8 GB 32 32 16 16 12 32 16
Mem GB GDD GB GB GB GB GB GB GB
ory GDD R5 HBM HBM HBM2 HBM HBM HBM HBM
R6 2 2 2 2 2 2
Serie Tesla
s
GPU 256 256 4096 4096 4096 4096 4096 4096 4096
Mem bit bit bit bit bit bit bit bit bit
ory
Bit
Width
GPU 320 192 900 900 900 732 549 900 900
Mem GBps GBps GBps GBps GBps GBps GBps GBps GBps
ory
Band
width
NVLin / / / / / / / 6 6
k NVLin NVLin
ks 50 ks 50
Gbps Gbps
CUD 2560 2560 5120 5120 5120 3584 3584 5120 5120
A
Core
RT 40 / / / / / / / /
Core
Dimw HHHL HHHL FHFL FHFL FHFL FHFL FHFL SXM SXM
naons dual- dual- dual- dual- dual- moudl moudl
width width width width width e e
GPU N N N N N N N N N
Card
Port
(DP/
HDMI
)
ECC Y Y Y Y Y Y Y Y Y
Open / / / / / / / / /
GL
Vulka / / / / / / / / /
n
Serie Tesla
s
Shad / / / / / / / / /
er
Model
Direct / / / / / / / / /
X
CUD Y Y Y Y Y Y Y Y Y
A
Direct Y Y Y Y Y Y Y Y Y
Comp
ute
Open Y Y Y Y Y Y Y Y Y
CL
Feat P100 P100 A100 A100 A100 A800 A40 P40 M60 M10
ure 16G 12G 40G 40G 80G 80G
SXM SXM SXM PCIe PCIe PCIe
2 2 4
Chip GP1 GP1 GA1 GA1 GA1 GA GA1 GP1 GM2 GM1
Nam 00 00 00 00 00 100 02 02 04*2 07*4
e
GPU 16 12 40 40 80 80G 48 24 16 32
Mem GB GB GB GB GB B GB GB GB GB
ory HBM HBM HBM HBM HBM HBM DDR GDD GDD GDD
2 2 2 2 2e 2e 6 R5 R5 R5
GPU 4096 4096 5120 5120 5120 5120 384 384 256 128
Mem bit bit bit bit bit -bit bit bit bit bit
ory
Bit
Widt
h
GPU 732 732 1555 1555 1935 1.94 696 347 160 83
Mem GBp GBp GBp GBp GBp TB/s GBp GBp GBp GBp
ory s s s s s s s s s
Band
width
Seri Tesla
es
NVLi 4 4 12 12 12 8 112 / / /
nk NVLi NVLi NVLi NVLi NVLi NVLi GB/s
nks nks nks nks nks nks
40 40 50 50 50 50G
Gbps Gbps Gbps Gbps Gbps bps
CUD 3584 3584 6900 6900 6912 6912 1075 3840 4096 2560
A 2
Core
RT / / / / / / 84 / / /
Core
Pow 300 300 400 250 300 300 300 250 300 225
er W W W W W W W W W W
Cons
umpt
ion
Dime SXM SXM SXM FHF FHF FHF FHF FHF FHF FHF
nsion mou mou mou L L L L L L L
s dle dle dle dual- dual- dual- dual- dual- dual- dual-
width width width width width width width
GPU N N N N N N N N N N
Card
Port
(DP/
HDM
I)
ECC Y Y Y Y Y Y Y Y Y Y
Ope / / / / / / Y / Y Y
nGL
Vulk / / / / / / Y / Y Y
an
Shad / / / / / / Y / Y Y
er
Mod
el
Direc / / / / / / Y / Y Y
tX
CUD Y Y Y Y Y Y Y Y Y Y
A
Seri Tesla
es
Direc Y Y Y Y Y Y / Y / /
tCom
pute
Ope Y Y Y Y Y Y Y Y / /
nCL
Fea A16 A2 A30 A10 L40 L40 L2 L2 H10 H80 A40 A60
ture S 0 0 0 00 00
PCI PCI
e e
Chi GA GA GA GA / / / / GH GH GA GA
p 107 107 100 102 100 100 104 102
Na
me
GP 128 128 307 384 384 384 38 192 512 512 256 384
U bit bits 2 bit bits -bits -bits 4- - 0- 0- bit bits
Me x4 bits bits bits bits
mor
y
Bit
Wid
th
GP 231 200 933 600 864 864 86 300 2TB 2TB 448 768
U .9 GB/ .1 .2 GB/ GB/ 4G GB/ /s /s GB/ .0
Me GB/ s GB/ GB/ s s B/s s s GB/
mor s x4 s s s
y
Ban
dwi
dth
CU 256 128 358 921 181 181 117 588 145 145 614 107
DA 0 x4 0 4 6 76 76 76 8 92 92 4 52
Cor
e
Ten 256 40 224 288 568 568 36 184 456 456 192 336
sor 0 x4 8
Cor
e
Po 250 40- 165 150 300 350 35 72 350 350 140 300
wer W 60 W W W W 0W W W W W W
Con W
sum
ptio
n
Dim FH HH FH FH FH FH FH HH FH FH FH FH
ensi FL HL FL FL FL FL FL HL FL FL FL FL
ons dual dual dual dual dual du sin dual dual dual dual
- - -slot - - al- gle- - - -slot -
widt widt widt widt wid widt widt widt widt
h h h h th h h h h
GP N / N N 4x 4x 4x / N N N N
U DP DP DP
Car
d
Port
(DP
/
HD
MI)
EC Y Y Y Y Y Y Y Y Y Y Y Y
C
Ope Y / / Y Y Y Y Y N N Y Y
nGL
Vulk Y / / Y Y Y Y Y N N Y Y
an
Sha Y / / Y Y Y Y Y N N Y Y
der
Mo
del
Dire Y / N Y Y Y Y Y N N Y Y
ctX
CU Y / Y Y Y Y Y Y Y Y Y Y
DA
Dire / / / / / / / / N N Y /
ctC
om
put
e
Ope Y / Y Y Y Y Y Y Y N Y Y
nCL
Featu RTX6 RTX6 RTX5 P600 P500 P400 P200 P600 P400
re 000 000 000 0 0 0 0
Active Passi
Cooli ve
ng Cooli
ng
Chip TU10 TU10 TU10 GP10 GP10 GP10 GP10 GP10 GP10
Name 2 2 4 2 4 4 6 7 7
GPU 24GB 24GB 16GB 24GB 16GB 8GB 5GB 2GB 2GB
Mem GDD GDD GDD GDD GDD GDD GDD GDD GDD
ory R6 R6 R6 R5 R5 R5 R5 R5 R5
GPU 384 384 256 384 256 256 160 128 64 bit
Mem bit bit bit bit bit bit bit bit
ory
Bit
Width
NVLin / / / / / / / / /
k
Serie Quadro
s
CUD 4608 4608 3072 3840 2560 1792 1024 354 256
A
Core
RT 72 72 48 / / / / / /
Core
Dime FHFL FHFL FHFL FHFL FHFL FHFL FHFL HHHL HHHL
nsion dual- dual- dual- dual- dual- dual- dual-
s width width width width width width slot
ECC Y Y Y Y Y N N N N
Open Y Y Y Y Y Y Y Y Y
GL
Vulka Y Y Y Y Y Y Y Y Y
n
Shad Y Y Y Y Y Y Y Y Y
er
Model
Direct Y Y Y Y Y Y Y Y Y
X
CUD Y Y Y Y Y Y Y Y Y
A
Direct Y Y Y Y Y Y Y Y Y
Comp
ute
Open Y Y Y Y Y Y Y Y Y
CL
NO TE
In the table, Y indicates that the feature is supported, N indicates that the feature is not
supported, and / indicates that the feature is not involved.
PCIe 8x
signal
specificati
ons
PCIe 16x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 16x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 16x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 16x
signal
specificati
ons
PCIe 16x
signal
specificati
onsPCIe
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 8x
signal
specificati
onsPCIe
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 16x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 16x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 16x
signal
specificati
ons
PCIe 4x
signal
specificati
ons
PCIe 16x
signal
specificati
ons
PCIe 4x
signal
specificati
ons
PCIe 16x
signal
specificati
onsPCIe
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 8x
signal
specificati
ons
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe Not
signal supporte
specificat d
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe Not
signal supporte
specificat d
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe Not
signal supporte
specificat d
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe Not
signal supporte
specificat d
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe Not
signal supporte
specificat d
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe Not
signal supporte
specificat d
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe Not
signal supporte
specificat d
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe Not
signal supporte
specificat d
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe Not
signal supporte
specificat d
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 8X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
PCIe 16X
signal
specificat
ions
NO TICE
● Before installing a GPU card, ensure that the server has been shut down and the
external power cables have been disconnected.
● A GPU card is expensive. Incorrect power cable connection may damage the
server or GPU card.
Step 1 Check the server model and GPU card model. For details, see 2 Knowing Devices.
Step 2 Use the Compatibility List to check the compatibility between the server and the
GPU card, and check the configuration rules of the GPU card and the corresponding
server model.
Pay attention to the following:
● Part number of the required riser card or ejector lever.
● Number and part number of required power cables for GPU cards
● Part number of the required fan
● Maximum number of GPUs supported by a server
● BIOS parameters to be set (For details, see the BIOS parameter reference
document of the corresponding server model.)
----End
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
GPU GPU Serve Riser GPU Fan Maxi Supp Maxi GPU
Part Mode r Card Powe Part mum orted mum Card
No. l Mode Part r No. Num Instal Mem Cable
l No. Cable ber lation ory Conn
Part of Slot Capa ectio
No. GPU city n
and Card Meth
Quan s od
tity
the GPU separately. For details about how to connect a GPU power cable, see 3.3.2
GPU Server GPU Card Cable Connection Method.
GP31 - - 16 No Meth
6 od 1:
No
pow
er
cabl
e is
used
.
Figure 3-7 Power cables for connecting the GP608 to GPU cards
Figure 3-8 Power cables for connecting the GP308 to dual-width GPU cards
Figure 3-9 Power cables for connecting the GP308 to single-slot GPU cards
Figure 3-10 Connecting a GPU power cable to the G5500 V6 or G5500 V7 server
Step 2 Select the corresponding GPU, OS, and latest CUDA Toolkit version (supported by
the driver), and click Search.
----End
NO TE
Step 3 Select the corresponding GPU, OS, and historical CUDA Toolkit versions, and click
Search.
Step 4 Download the found GPU driver.
----End
● If a driver in .run format has been installed, you need to uninstall the driver
in .run format before installing a driver in .rpm format. You can directly install the
driver in .run format after uninstalling the driver in .run format.
● If a driver in the .rpm format has been installed, you need to uninstall the driver
in the .rpm format before installing the driver in the .run format. Then you can
directly install the driver in the .rpm format.
NO TE
Step 1 On the OS CLI, run the lsmod|grep -i nouveau command to check whether the
Nouveau driver is loaded.
NO TE
If the command output contains nouveau, the Nouveau driver has been loaded. Otherwise,
the Nouveau driver is not loaded.
Step 2 Disable the Nouveau driver by referring to A.1.4 How to Disable the Nouveau
Driver for Different Linux Systems.
Step 3 On the OS CLI, run the reboot command to restart the server.
Step 4 Perform Step 1 to check whether the Nouveau driver is disabled.
----End
For details about how to use Directory on the remote virtual console of the iBMC WebUI, see
the FusionServer Server iBMC User Guide.
Step 2 Log in to the OS CLI as the root user and run the init 3 command to access the text
interface.
Step 3 Run the mount/dev/sr0 /mnt/ command to mount the driver file to the /mnt directory.
Step 4 Run the following command to copy the driver file NVIDIA-Linux-x86_64-445.50.run
from the /mnt/ directory to the root directory:
#cp /mnt/NVIDIA-Linux-x86_64-445.50.run /
Step 5 Run the lspci | grep -i nvidia command to check whether the GPU card is properly
installed.
NO TE
If the command output contains the NVIDIA device (as shown in Figure 4-1), the GPU card
can be identified. Otherwise, the GPU card is incorrectly configured or the GPU card hardware
is faulty. In this case, resolve the GPU card identification failure first.
The OpenGL component in the driver conflicts with the OS. Therefore, you do to add
the --no-opengl-files parameter in the command. If this parameter is not added, the
GUI cannot be displayed during the installation and an error is reported, as shown in
Figure 4-2.
Step 9 A dialog box containing the following information is displayed. In the dialog box,
select Yes and press Enter.
Install NVIDIA's 32-bit compatibility libraries?
Step 10 The following dialog box may be displayed for the driver of an earlier version.
NO TE
● If a computing video card is used, select No. By default, the video card on the mainboard is
used. Otherwise, the GUI cannot be displayed.
● If a display video card is used and the output needs to be displayed through the GPU card,
select Yes. After the driver is installed, restart the system and set the GPU card as the
default video card in the BIOS (For details, see the BIOS parameter reference document of
the corresponding server model.). In this case, the GUI can be properly displayed.
Step 11 A dialog box containing the following information is displayed. In the dialog box,
select OK and press Enter. The NVIDIA GPU card driver is installed.
Installation of the kernel module for the NVIDIA Accelerated Graphics Driver for Linux x86_64 (version 445.50) is now
complete.
Step 12 After the installation is complete, run the nvidia-smi command to view the video card
information. If the model and related information about the video card are displayed
in the command output, the driver is installed properly.
----End
NO TE
If the xconf file is generated during the GPU card driver installation and the NVIDIA GPU card
is used, select Yes to restore the default configuration and use the video card on the
mainboard.
Step 3 On the screen shown in the following figure, select OK and press Enter. The GPU
card driver is uninstalled.
----End
package_name indicates the driver package name, which does not contain the file
name extension .rpm.
NO TE
Run the command to query the command for uninstalling the driver in RPM format. Use the
queried uninstallation command.
For details about how to use Directory on the remote virtual console of the iBMC WebUI, see
the iBMC User Guide.
Step 2 In the OS, open Device Manager and check whether the NVIDIA video card can be
identified.
Step 3 Double-click the NVIDIA video card driver installation program in the server OS. In
the displayed dialog box, click OK.
Step 8 Run the nvidia-smi.exe command to check the video card information. If the video
card model and related information is displayed in the command output, the driver is
installed properly.
----End
Step 2 Right-click NVIDIA Graphics Driver, choose Uninstall/Change from the shortcut
menu, and uninstall the NVIDIA video driver as prompted.
Step 3 Click RESTART NOW to restart the OS. The NVIDIA video card driver is uninstalled.
----End
A Appendix
A.1 FAQ
Symptom
If the display video card is inserted (or the computing video card is in Graphic mode),
the Nouveau driver is used together with the external video card by default. As the
Nouveau driver is incompatible with NVIDIA GPU cards, the GUI cannot be properly
displayed.
Solution
Access the text terminal and disable the Nouveau driver by referring to A.1.4 How to
Disable the Nouveau Driver for Different Linux Systems.
Symptom
When installing the driver in CentOS and RHEL, select yes in the following step to
generate the xorg file. The file is configured to be displayed using the GPU card, but
the computing video card has no display interface. As a result, the GUI fails to be
displayed.
Solution
Go to the /etc/X11 directory and delete the xorg.conf file.
Solution
Install the GCC and g++ compilers in advance.
RHEL/CentOS
● Create the /etc/modprobe.d/blacklist-nouveau.conf file and add the following
information to the file.
blacklist nouveau
options nouveau modeset=0
● Re-generate initramfs.
$sudo dracut --force
OpenSUSE
● Create the /etc/modprobe.d/blacklist-nouveau.conf file and add the following
information to the file.
blacklist nouveau
options nouveau modeset=0
● Re-generate initrd.
$sudo /sbin/mkinitrd
SLES
The Nouveau driver is not installed in SLES.
Ubuntu
● Create the /etc/modprobe.d/blacklist-nouveau.conf file and add the following
information to the file.
blacklist nouveau
options nouveau modeset=0
● Re-generate initramfs.
$sudo update-initramfs -u
Symptom
The header file of the kernel source is used during the installation of the NVIDIA
driver. Therefore, you need to download the kernel source. Otherwise, an error
message is displayed and the installation fails.
Solution
1. If you select a development package during the OS installation, the package will
be installed in advance.
2. Install the kernel-devel package. When installing the driver, run the following
command to specify the kernel source path:
./NVIDIA-Linux-x86_64-396.26.run --kernel-source-path=/usr/src/kernels/3.10.0-x
NO TE
b9:00.0 is the bus address of Tesla A100 40 GB on the OS. The bus address may vary
depending on hardware configurations.
Solution
On the BIOS setup screen, set MMIO High Granularity Size to 256G or larger. The
procedure is as follows:
Step 3 Choose MMIO High Granularity Size and press Enter. Select 256G or larger and
press Enter.
----End
degrees C, which is lower than the temperature difference threshold (1.000 degrees
C).
Cause
Due to the NV-GPU feature, the minimum system boot is required during reboot.
Therefore, the temperature value obtained before boot-up is invalid.
Solution
After waiting for approximately 20 seconds, the GPU temperature information is
normal when the minimum system boots up.
Before contacting technical support, get ready the spare parts and tools such as
screwdrivers, screws, serial cables, and network cables.
Refer to the documentation before you contact xFusion for technical support.
Cases
To obtain case study about servers, visit Knowledge Base.
Contact xFusion
xFusion provides comprehensive technical support and services. To obtain
assistance, contact xFusion technical support as follows:
● Contact xFusion customer service center.
– Email: [email protected]
● Contact technical support personnel at your local xFusion branch office.