Nvidia Ai Enterprise User Guide
Nvidia Ai Enterprise User Guide
User Guide
The AI and data science frameworks are delivered as container images. Containerized
software can be run directly with a tool such as Docker.
‣ You have a system that meets the requirements in NVIDIA AI Enterprise Release Notes.
‣ One or more supported NVIDIA GPUs are installed in your system.
‣ If you are using an NVIDIA A100 GPU, the following BIOS settings are enabled on your
system:
Note: If NVIDIA card detection does not include all the installed GPUs, set this option to
Enabled.
The process for installing and configuring NVIDIA Virtual GPU Manager depends on the
hypervisor that you are using. After you complete this process, you can install the display
drivers for your guest OS and license any NVIDIA AI Enterprise licensed products that you
are using.
Each NVIDIA vGPU is analogous to a conventional GPU, having a fixed amount of GPU
framebuffer, and one or more virtual display outputs or “heads”. The vGPU’s framebuffer
is allocated out of the physical GPU’s framebuffer at the time the vGPU is created, and
the vGPU retains exclusive use of that framebuffer until it is destroyed.
Depending on the physical GPU, different types of vGPU can be created on the vGPU:
‣ On all GPUs that support NVIDIA AI Enterprise, time-sliced vGPUs can be created.
‣ Additionally, on GPUs that support the Multi-Instance GPU (MIG) feature, MIG-backed
vGPUs can be created. The MIG feature is introduced on GPUs that are based on the
NVIDIA Ampere GPU architecture.
1
C-series vGPU types are NVIDIA Virtual Compute Server vGPU types, which are optimized for compute-intensive
workloads. As a result, they support only a single display head and do not provide Quadro graphics acceleration.
2
The maximum number of NVIDIA Virtual Compute Server vGPUs is limited to 12 vGPUs per physical GPU, irrespective of
the available hardware resources of the physical GPU.
The number after the board type in the vGPU type name denotes the amount of frame
buffer that is allocated to a vGPU of that type. For example, a vGPU of type A16-4C is
allocated 4096 Mbytes of frame buffer on an NVIDIA A16 board.
Due to their differing resource requirements, the maximum number of vGPUs that can be
created simultaneously on a physical GPU varies according to the vGPU type. For example,
an NVDIA A16 board can support up to 4 A16-4C vGPUs on each of its two physical GPUs,
for a total of 16 vGPUs, but only 2 A16-8C vGPUs, for a total of 8 vGPUs.
When enabled, the frame-rate limiter (FRL) limits the maximum frame rate in frames per
second (FPS) for C-series vGPUs to 60 FPS.
By default, the FRL is enabled for all GPUs. The FRL is disabled when the vGPU scheduling
behavior is changed from the default best-effort scheduler on GPUs that support
alternative vGPU schedulers. For details, see Changing Scheduling Behavior for Time-
Sliced vGPUs. On vGPUs that use the best-effort scheduler, the FRL can be disabled
as explained in the release notes for your chosen hypervisor at NVIDIA AI Enterprise
Documentation.
Note: NVIDIA vGPU is a licensed product on all supported GPU boards. An NVIDIA AI
Enterprise software license is required to enable all vGPU features within the guest VM.
For details of the virtual GPU types available from each supported GPU, see Virtual GPU
Types for Supported GPUs.
‣ A configuration with A16-16C vGPUs on GPU 0 and GPU 1, A16-8C vGPUs on GPU 2,
and A16-4C vGPUs on GPU3 is valid.
‣ A configuration with a mixture of A16-8C vGPUs and A16-4C vGPUs on GPU0 is
invalid.
A GPU that is supplied from the factory in displayless mode, such as the NVIDIA A40 GPU,
might be in a display-enabled mode if its mode has previously been changed.
To change the mode of a GPU that supports multiple display modes, use the
displaymodeselector tool, which you can request from the NVIDIA Display Mode
Selector Tool page on the NVIDIA Developer website.
Note:
Only the following GPUs support the displaymodeselector tool:
‣ NVIDIA A40
‣ NVIDIA RTX A5000
‣ NVIDIA RTX A6000
Other GPUs that support NVIDIA AI Enterprise do not support the displaymodeselector
tool and, unless otherwise stated, do not require display mode switching.
Note:
Some servers, for example, the Dell R740, do not configure SR-IOV capability if the SR-IOV
SBIOS setting is disabled on the server. If you are using the Tesla T4 GPU with VMware
vSphere on such a server, you must ensure that the SR-IOV SBIOS setting is enabled on
the server.
However, with any server hardware, do not enable SR-IOV in VMware vCenter Server
for the Tesla T4 GPU. If SR-IOV is enabled in VMware vCenter Server for T4, VMware
vCenter Server lists the status of the GPU as needing a reboot. You can ignore this status
message.
‣ For any supported VMware vSphere release, set the automation level to Manual.
‣ For VMware vSphere 6.7 Update 1 or later, set the automation level to Partially
Automated or Manual.
For more information about these settings, see Edit Cluster Settings in the VMware
documentation.
‣ The ZIP archive that contains NVIDIA AI Enterprise has been downloaded from the
NVIDIA Licensing Portal.
‣ The NVIDIA Virtual GPU Manager package has been extracted from the downloaded
ZIP archive.
1. Copy the NVIDIA Virtual GPU Manager package file to the ESXi host.
2. Put the ESXi host into maintenance mode.
$ esxcli system maintenanceMode set –-enable true
3. Use the esxcli command to install the vGPU Manager package.
For more information about the esxcli command, see esxcli software Commands in
the VMware vSphere documentation.
[root@esxi:~] esxcli software vib install -d /vmfs/volumes/datastore/software-
component.zip
datastore
The name of the VMFS datastore to which you copied the software component.
software-component
The name of the file that contains the NVIDIA Virtual GPU Manager package
in the form of a software component. Ensure that you specify the file that
was extracted from the downloaded ZIP archive. For example, for VMware
vSphere 7.0.2, software-component is NVD.NVIDIA_bootbank_NVIDIA-
VMware_470.161.02-1OEM.702.0.0.8169922-offline_bundle-build-number.
4. Exit maintenance mode.
$ esxcli system maintenanceMode set –-enable false
5. Reboot the ESXi host.
$ reboot
1. Verify that the NVIDIA AI Enterprise package installed and loaded correctly by
checking for the NVIDIA kernel driver in the list of kernel loaded modules.
[root@esxi:~] vmkload_mod -l | grep nvidia
nvidia 5 8420
2. If the NVIDIA driver is not listed in the output, check dmesg for any load-time errors
reported by the driver.
3. Verify that the NVIDIA kernel driver can successfully communicate with the NVIDIA
physical GPUs in your system by running the nvidia-smi command.
The nvidia-smi command is described in more detail in NVIDIA System Management
Interface nvidia-smi.
Running the nvidia-smi command should produce a listing of the GPUs in your platform.
[root@esxi:~] nvidia-smi
Fri Dec 23 17:56:22 2024
+------------------------------------------------------+
| NVIDIA-SMI 470.161.02 Driver Version: 470.161.02 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:05:00.0 Off | Off |
| N/A 25C P8 24W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 00000000:06:00.0 Off | Off |
| N/A 24C P8 24W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 On | 00000000:86:00.0 Off | Off |
| N/A 25C P8 25W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 On | 00000000:87:00.0 Off | Off |
| N/A 28C P8 24W / 150W | 13MiB / 8191MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
If nvidia-smi fails to report the expected output for all the NVIDIA GPUs in your system,
see Troubleshooting for troubleshooting steps.
Note: Ensure that you select the vCenter Server instance, not the vCenter Server VM.
Note:
Change the default graphics type before configuring vGPU. Output from the VM console
in the VMware vSphere Web Client is not available for VMs that are running vGPU.
Before changing the default graphics type, ensure that the ESXi host is running and that
all VMs on the host are powered off.
5. In the Edit Host Graphics Settings dialog box that opens, select Shared Direct and
click OK.
Note: In this dialog box, you can also change the allocation scheme for vGPU-enabled
VMs. For more information, see Modifying GPU Allocation Policy on VMware vSphere.
After you click OK, the default graphics type changes to Shared Direct.
6. Click the Graphics Devices tab to verify the configured type of each physical GPU on
which you want to configure vGPU.
The configured type of each physical GPU must be Shared Direct. For any physical
GPU for which the configured type is Shared, change the configured type as follows:
a). On the Graphics Devices tab, select the physical GPU and click the Edit icon.
b). In the Edit Graphics Device Settings dialog box that opens, select Shared Direct
and click OK.
7. Restart the ESXi host or stop and restart nv-hostengine on the ESXi host.
To stop and restart nv-hostengine, perform these steps:
a). Stop nv-hostengine.
[root@esxi:~] nv-hostengine -t
b). Wait for 1 second to allow nv-hostengine to stop.
c). Start nv-hostengine.
[root@esxi:~] nv-hostengine -d
8. In the Graphics Devices tab of the VMware vCenter Web UI, confirm that the active
type and the configured type of each physical GPU are Shared Direct.
After changing the default graphics type, configure vGPU as explained in Configuring a
vSphere VM with NVIDIA vGPU.
See also the following topics in the VMware vSphere documentation:
1. Open a command shell as the root user on your hypervisor host machine.
On all supported hypervisors, you can use secure shell (SSH) for this purpose.
Individual hypervisors may provide additional means for logging in. For details, refer to
the documentation for your hypervisor.
2. Determine whether MIG mode is enabled.
Use the nvidia-smi command for this purpose. By default, MIG mode is disabled.
This example shows that MIG mode is disabled on GPU 0.
Note: In the output from nvidia-smi, the NVIDIA A100 HGX 40GB GPU is referred to
as A100-SXM4-40GB.
$ nvidia-smi -i 0
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.02 Driver Version: 470.161.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB On | 00000000:36:00.0 Off | 0 |
| N/A 29C P0 62W / 400W | 0MiB / 40537MiB | 6% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
3. If MIG mode is disabled, enable it.
$ nvidia-smi -i [gpu-ids] -mig 1
gpu-ids
A comma-separated list of GPU indexes, PCI bus IDs or UUIDs that specifies the
GPUs on which you want to enable MIG mode. If gpu-ids is omitted, MIG mode is
enabled on all GPUs on the system.
This example enables MIG mode on GPU 0.
$ nvidia-smi -i 0 -mig 1
Enabled MIG Mode for GPU 00000000:36:00.0
All done.
Note: If the GPU is being used by another process, this command fails and displays a
warning message that MIG mode for the GPU is in the pending enable state. In this
situation, stop all processes that are using the GPU and retry the command.
1. Open a command shell as the root user on your hypervisor host machine.
You can use secure shell (SSH) for this purpose.
2. Determine whether MIG mode is disabled.
Use the nvidia-smi command for this purpose. By default, MIG mode is disabled, but
might have previously been enabled.
This example shows that MIG mode is enabled on GPU 0.
Note: In the output from output from nvidia-smi, the NVIDIA A100 HGX 40GB GPU is
referred to as A100-SXM4-40GB.
$ nvidia-smi -i 0
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.02 Driver Version: 470.161.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB Off | 00000000:36:00.0 Off | 0 |
| N/A 29C P0 62W / 400W | 0MiB / 40537MiB | 6% Default |
| | | Enabled |
+-------------------------------+----------------------+----------------------+
3. If MIG mode is enabled, disable it.
$ nvidia-smi -i [gpu-ids] -mig 0
gpu-ids
A comma-separated list of GPU indexes, PCI bus IDs or UUIDs that specifies the
GPUs on which you want to disable MIG mode. If gpu-ids is omitted, MIG mode is
disabled on all GPUs on the system.
This example disables MIG mode on GPU 0.
$ sudo nvidia-smi -i 0 -mig 0
Disabled MIG Mode for GPU 00000000:36:00.0
All done.
4. Confirm that MIG mode was disabled.
Use the nvidia-smi command for this purpose.
This example shows that MIG mode is disabled on GPU 0.
$ nvidia-smi -i 0
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.02 Driver Version: 470.161.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB Off | 00000000:36:00.0 Off | 0 |
| N/A 29C P0 62W / 400W | 0MiB / 40537MiB | 6% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
CAUTION: Output from the VM console in the VMware vSphere Web Client is not available
for VMs that are running vGPU. Make sure that you have installed an alternate means of
accessing the VM (such as VMware Horizon or a VNC server) before you configure vGPU.
VM console in vSphere Web Client will become active again once the vGPU parameters
are removed from the VM’s configuration.
Note: If you are configuring a VM to use VMware vSGA, omit this task.
5. From the GPU Profile drop-down menu, choose the type of vGPU you want to
configure and click OK.
6. Ensure that VMs running vGPU have all their memory reserved:
a). Select Edit virtual machine settings from the vCenter Web UI.
b). Expand the Memory section and click Reserve all guest memory (All locked).
After you have configured a vSphere VM with a vGPU, start the VM. VM console in
vSphere Web Client is not supported in this vGPU release. Therefore, use VMware Horizon
or VNC to access the VM’s desktop.
After the VM has booted, install the NVIDIA AI Enterprise graphics driver as explained in
Installing and Licensing NVIDIA AI Enterprise Components Required in a Guest VM.
‣ pciPassthru0.cfg.parameter
‣ pciPassthru1.cfg.parameter
parameter
The name of the vGPU plugin parameter that you want to set. For example, the
name of the vGPU plugin parameter for enabling unified memory is enable_uvm.
To enable unified memory for two vGPUs that are assigned to a VM, set
pciPassthru0.cfg.enable_uvm and pciPassthru1.cfg.enable_uvm to 1.
handling double-bit errors. However, not all GPUs, vGPU types, and hypervisor software
versions support ECC memory with NVIDIA vGPU.
On GPUs that support ECC memory with NVIDIA vGPU, ECC memory is supported with C-
series and Q-series vGPUs, but not with A-series and B-series vGPUs. Although A-series
and B-series vGPUs start on physical GPUs on which ECC memory is enabled, enabling
ECC with vGPUs that do not support it might incur some costs.
On physical GPUs that do not have HBM2 memory, the amount of frame buffer that is
usable by vGPUs is reduced. All types of vGPU are affected, not just vGPUs that support
ECC memory.
The effects of enabling ECC memory on a physical GPU are as follows:
‣ ECC memory is exposed as a feature on all supported vGPUs on the physical GPU.
‣ In VMs that support ECC memory, ECC memory is enabled, with the option to disable
ECC in the VM.
‣ ECC memory can be enabled or disabled for individual VMs. Enabling or disabling ECC
memory in a VM does not affect the amount of frame buffer that is usable by vGPUs.
GPUs based on the Pascal GPU architecture and later GPU architectures support ECC
memory with NVIDIA vGPU. To determine whether ECC memory is enabled for a GPU, run
nvidia-smi -q for the GPU.
Tesla M60 and M6 GPUs support ECC memory when used without GPU virtualization, but
NVIDIA vGPU does not support ECC memory with these GPUs. In graphics mode, these
GPUs are supplied with ECC memory disabled by default.
Some hypervisor software versions do not support ECC memory with NVIDIA vGPU.
If you are using a hypervisor software version or GPU that does not support ECC memory
with NVIDIA vGPU and ECC memory is enabled, NVIDIA vGPU fails to start. In this
situation, you must ensure that ECC memory is disabled on all GPUs if you are using
NVIDIA vGPU.
‣ For a physical GPU, perform this task from the hypervisor host.
‣ For a vGPU, perform this task from the VM to which the vGPU is assigned.
Note: ECC memory must be enabled on the physical GPU on which the vGPUs reside.
Before you begin, ensure that NVIDIA Virtual GPU Manager is installed on your hypervisor.
If you are changing ECC memory settings for a vGPU, also ensure that the NVIDIA AI
Enterprise graphics driver is installed in the VM to which the vGPU is assigned.
1. Use nvidia-smi to list the status of all physical GPUs or vGPUs, and check for ECC
noted as enabled.
# nvidia-smi -q
==============NVSMI LOG==============
Attached GPUs : 1
GPU 0000:02:00.0
[...]
Ecc Mode
Current : Enabled
Pending : Enabled
[...]
2. Change the ECC status to off for each GPU for which ECC is enabled.
‣ If you want to change the ECC status to off for all GPUs on your host machine or
vGPUs assigned to the VM, run this command:
# nvidia-smi -e 0
‣ If you want to change the ECC status to off for a specific GPU or vGPU, run this
command:
# nvidia-smi -i id -e 0
==============NVSMI LOG==============
Attached GPUs : 1
GPU 0000:02:00.0
[...]
Ecc Mode
Current : Disabled
Pending : Disabled
[...]
If you later need to enable ECC on your GPUs or vGPUs, follow the instructions in Enabling
ECC Memory.
‣ For a physical GPU, perform this task from the hypervisor host.
‣ For a vGPU, perform this task from the VM to which the vGPU is assigned.
Note: ECC memory must be enabled on the physical GPU on which the vGPUs reside.
Before you begin, ensure that NVIDIA Virtual GPU Manager is installed on your hypervisor.
If you are changing ECC memory settings for a vGPU, also ensure that the NVIDIA AI
Enterprise graphics driver is installed in the VM to which the vGPU is assigned.
1. Use nvidia-smi to list the status of all physical GPUs or vGPUs, and check for ECC
noted as disabled.
# nvidia-smi -q
==============NVSMI LOG==============
Attached GPUs : 1
GPU 0000:02:00.0
[...]
Ecc Mode
Current : Disabled
Pending : Disabled
[...]
2. Change the ECC status to on for each GPU or vGPU for which ECC is enabled.
‣ If you want to change the ECC status to on for all GPUs on your host machine or
vGPUs assigned to the VM, run this command:
# nvidia-smi -e 1
‣ If you want to change the ECC status to on for a specific GPU or vGPU, run this
command:
# nvidia-smi -i id -e 1
==============NVSMI LOG==============
Attached GPUs : 1
GPU 0000:02:00.0
[...]
Ecc Mode
Current : Enabled
Pending : Enabled
[...]
If you later need to disable ECC on your GPUs or vGPUs, follow the instructions in
Disabling ECC Memory.
https://docs.nvidia.com/datacenter/cloud-native/gpu-
operator/getting-started.html#nvidia-ai-enterprise
1. Install the vGPU software graphics driver for Linux on the VM that you downloaded
from the NVIDIA Licensing Portal.
2. License the NVIDIA vGPU.
3. Install NVIDIA Container Toolkit.
The following table lists the Docker pull commands for downloading other software that
is distributed as NGC container images through the NGC private registry.
1. Copy the NVIDIA AI Enterprise Linux driver package, for example nvidia-linux-
grid-470_470.161.03_amd64.deb, to the guest VM where you are installing the driver.
2. Log in to the guest VM as a user with sudo privileges.
3. Open a command shell and change to the directory that contains the NVIDIA AI
Enterprise Linux driver package.
4. From the command shell, run the command to install the package.
$ sudo apt-get install ./nvidia-linux-grid-470_470.161.03_amd64.deb
5. Verify that the NVIDIA driver is operational.
a). Reboot the system and log in.
b). After the system has rebooted, confirm that you can see your NVIDIA vGPU device
in the output from the nvidia-smi command.
$ nvidia-smi
‣ Ports 443 and 80 in your firewall or proxy must be open to allow HTTPS traffic
between a service instance and its the licensed clients. These ports must be open for
both CLS instances and DLS instances.
Note: For DLS releases before DLS 1.1, ports 8081 and 8082 were also required to be
open to allow HTTPS traffic between a DLS instance and its licensed clients. Although
these ports are no longer required, they remain supported for backward compatibility.
The graphics driver creates a default location in which to store the client configuration
token on the client.
The process for configuring a licensed client is the same for CLS and DLS instances but
depends on the OS that is running on the client.
Note: You can create the /etc/nvidia/gridd.conf file by copying the supplied
template file /etc/nvidia/gridd.conf.template.
Note: You can also perform this step from NVIDIA X Server Settings. Before using
NVIDIA X Server Settings to perform this step, ensure that this option has been
enabled as explained in NVIDIA AI Enterprise Client Licensing User Guide.
# Possible values:
# 0 => for unlicensed state
# 1 => for NVIDIA vGPU
# 2 => for NVIDIA RTX Virtual Workstation
# 4 => for NVIDIA Virtual Compute Server
FeatureType=
...
3. Copy the client configuration token to the /etc/nvidia/ClientConfigToken
directory.
4. Ensure that the file access modes of the client configuration token allow the owner to
read, write, and execute the token, and the group and others only to read the token.
a). Determine the current file access modes of the client configuration token.
# ls -l client-configuration-token-directory
b). If necessary, change the mode of the client configuration token to 744.
# chmod 744 client-configuration-token-directory/client_configuration_token_*.tok
client-configuration-token-directory
The directory to which you copied the client configuration token in the previous
step.
5. Save your changes to the /etc/nvidia/gridd.conf file and close the file.
6. Restart the nvidia-gridd service.
The NVIDIA service on the client should now automatically obtain a license from the CLS
or DLS instance.
Attached GPUs : 1
GPU 00000000:00:08.0
Product Name : Tesla T4
Product Brand : Grid
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : WDDM
Pending : WDDM
Serial Number : 0334018000638
GPU UUID : GPU-ba2310b6-95d1-802b-f96f-5865410fe517
Minor Number : N/A
‣ Docker 20.10 for your Linux distribution. For instructions, refer to Install Docker
Engine on Ubuntu in the Docker product manuals.
‣ The NVIDIA AI Enterprise graphics driver. For instructions, refer to Installing the
NVIDIA AI Enterprise Graphics Driver on Linux from a Debian Package.
Note: You do not need to install NVIDIA CUDA Toolkit on the hypervisor host.
1. Set up the GPG key and configure apt to use NVIDIA Container Toolkit packages in the
file /etc/apt/sources.list.d/nvidia-docker.list.
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-
docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
2. Download information from all configured sources about the latest versions of the
packages and install the nvidia-container-toolkit package.
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
3. Restart the Docker service.
$ sudo systemctl restart docker
The following table lists the Docker pull commands for downloading other software that
is distributed as NGC container images through the NVIDIA Enterprise Catalog.
a). To run ResNet-50 with the default FP32 precision, run this command:
# trtexec --duration=90 --workspace=1024 --percentile=99 --avgRuns=100 \
--deploy=ResNet50_N2.prototxt --batch=1 --output=prob
b). To run ResNet-50 with FP16 precision, add the --fp16 option:
# trtexec --duration=90 --workspace=1024 --percentile=99 --avgRuns=100 \
--deploy=ResNet50_N2.prototxt --batch=1 --output=prob --fp16
c). To run ResNet-50 with INT8 precision, add the --int8 option:
# trtexec --duration=90 --workspace=1024 --percentile=99 --avgRuns=100 \
--deploy=ResNet50_N2.prototxt --batch=1 --output=prob --int8
4. Press Ctrl+P, Ctrl+Q to exit the container runtime and return to the Linux command
shell.
1. Set up the GPG key and configure apt to use NVIDIA Container Toolkit packages in the
file /etc/apt/sources.list.d/nvidia-docker.list.
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list
| sudo tee /etc/apt/sources.list.d/nvidia-docker.list \
&& sudo apt-get update
2. Install the NVIDIA Container Toolkit packages and the packages on which it depends,
and restart Docker.
$ sudo apt-get install -y nvidia-docker2 \
&& sudo systemctl restart docker
3. Test the installation of the NVIDIA Container Toolkit on the VM.
‣ MIG-gpu-uuid/gpu-instance-id/compute-instance-id
gpu-uuid
The UUID of the physical GPU, for example, GPU-786035d5-1e85-11b2-9fec-
ac9c9a792daf.
gpu-instance-id
The index number the GPU instance on which the vGPU resides, for example,
0 for the first GPU instance.
compute-instance-id
The index number of the compute instance within the GPU instance, for
example, 0 for the first compute instance.
This example sets NVIDIA_VISIBLE_DEVICES for compute instance 0 on
a MIG-enabled vGPU on GPU instance 0 of the physical GPU with UUID
GPU-786035d5-1e85-11b2-9fec-ac9c9a792daf.
NVIDIA_VISIBLE_DEVICES=MIG-GPU-786035d5-1e85-11b2-9fec-ac9c9a792daf/0/0
‣ gpu-device-index:mig-device-index
gpu-device-index
The index number the physical GPU.
mig-device-index
The index number the GPU instance.
Note:
Perform the tasks for configuring multinode scaling before performing the tasks in
Getting Started with NVIDIA AI Enterprise.
The procedures for configuring switches and NICs apply to NVIDIA Mellanox NICs and
switches. If you are using other makes of NICs and switches, consult the vendor's
documentation for the products that you are using.
You are free to choose how to run your training jobs in a cluster. For information about
the cluster architecture that can be used to run BERT training jobs, see Multi-node BERT
User Guide.
‣ The hypervisor hosts must be connected to a network switch that supports RoCE. For
best performance, NVIDIA recommends the NVIDIA Mellanox Spectrum switch.
‣ One GPU is required for each VM.
For best performance, NVIDIA recommends the NVIDIA A100 GPU.
‣ Each GPU on each hypervisor host must be paired with a NIC in the same NUMA node.
‣ Whole-server VM with two GPUs and two NICs across both NUMA nodes
‣ Per-socket VM with one GPU and one NIC paired on a single NUMA node
The hardware configuration of the server is as follows:
1. Determine the NUMA node to which the GPUs and NICs are attached.
a). Determine the NUMA node to which the GPUs are attached.
$ esxcli hardware pci list | grep -A 30 -B 10 NVIDIA
b). Determine the NUMA node to which the NICs are attached.
$ esxcli hardware pci list | grep -A 30 -B 10 Mellanox
The following output describes two GPUs. One GPU is attached to NUMA node 0 and
the other GPU is attached to NUMA node 1.
#GPU 1
0000:37:00.0
Address: 0000:37:00.0
Segment: 0x0000
Bus: 0x37
Slot: 0x00
Function: 0x0
VMkernel Name: vmgfx0
Vendor Name: NVIDIA Corporation
Device Name: NVIDIAA100-PCIE-40GB
#GPU 2
0000:86:00.0
Address: 0000:86:00.0
Segment: 0x0000
Bus: 0x86
Slot: 0x00
Function: 0x0
VMkernel Name: vmgfx1
Vendor Name: NVIDIA Corporation
Device Name: NVIDIAA100-PCIE-40GB
Configured Owner: VMkernel
Current Owner: VMkernel
Vendor ID: 0x10de
Device ID: 0x20f1
SubVendor ID: 0x10de
SubDevice ID: 0x145f
Device Class: 0x0302
Device Class Name: 3D controller
Programming Interface: 0x00
Revision ID: 0xa1
Interrupt Line: 0xff
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x3001
Module ID: 50
Module Name: nvidia
Chassis: 0
Physical Slot: 5
Slot Description: PCI-E Slot 5
Device Layer Bus Address: s00000005.00
Passthru Capable: true
Parent Device: PCI 0:133:0:0
Dependent Device: PCI 0:134:0:0
Reset Method: Bridge reset
3. With two GPUs and NICs in the VM across NUMA nodes, set the NUMA affinity in the
VM configuration to include both NUMA nodes 0 and 1.
numa.nodeAffinity = 0,1
1. Determine the NUMA node to which the GPUs and NICs are attached.
a). Determine the NUMA node to which the GPUs are attached.
$ esxcli hardware pci list | grep -A 30 -B 10 NVIDIA
b). Determine the NUMA node to which the NICs are attached.
$ esxcli hardware pci list | grep -A 30 -B 10 Mellanox
Perform this task from a host computer that has an Ethernet LAN connection to the
switch.
This example puts four NVIDIA ConnectX NICs into the vLAN with the identifier 111 as
access ports 1/1 - 1/4.
switch (config) # interface ethernet 1/1-1/4 switchport access vlan 111
6. Set the maximum transmission unit (MTU) frame size to 9216.
a). Disable all the ports related to the interface.
switch (config) # interface ethernet port-range shutdown
b). Set the MTU frame size for the NVIDIA ConnectX NICs in the created vLAN to
9216.
switch (config) # interface ethernet port-range mtu 9216
c). Enable all the ports related to the interface.
switch (config) # interface ethernet port-range no shutdown
7. If your switch is running Cumulus Linux, enable RoCE with Cumulus Linux.
When this option is set, VMware vSphere ESXi locates an ATS-capable pass-
through device, finds its parent switch or root port, and enables the ACS Direct
Translated bit.
During the installation process, OFED detects the ConnectX-6 NICs and updates the
firmware.
6. When the installation is complete, confirm that the versions of OFED are correct.
a). Determine the OFED version.
$ dpkg -l | grep mlnx-ofed
b). Determine the firmware version.
$ cat /sys/class/infiniband/mlx5*/fw_ver
If the firmware is not updated, download the latest firmware, update the firmware
manually, and install the Mellanox OFED driver again.
7. Load the installed driver.
$ sudo /etc/init.d/openibd restart
1. Change the ATS configuration to enabled on each guest VM on the hypervisor host.
a). Start Mellanox software tools.
$ sudo mst start
b). Determine whether ATS is enabled.
$ sudo mlxconfig -d /dev/mst/mt4123_pciconf0 query | grep -i ATS
If the installed version of the firmware supports ATS, output similar to the
following example is displayed.
ATS_ENABLED False(0)
If no output is displayed, the installed version of the firmware does not support
ATS. In this situation, update to a version of the firmware that supports ATS.
c). If ATS is disabled, enable it.
$sudo mlxconfig -d /dev/mst/mt4123_pciconf0 set ATS_ENABLED=true
Device #1:
----------
Device type: ConnectX6
Name: MCX653105A-HDA_Ax
Description: ConnectX-6 VPI adapter card; HDR IB (200Gb/s) and 200GbE;
single-port QSFP56; PCIe4.0 x16; tall bracket; ROHS R6
Device: /dev/mst/mt4123_pciconf0
Note:
To apply the changed ATS configuration setting, you must turn off the power to the
VMware vSphere ESXi host and turn the power back on again. Rebooting the host is
insufficient to apply this change.
If the installed version of the firmware supports ATS, output similar to the
following example is displayed.
ATS_ENABLED True(1)
d). Obtain detailed information about all PCI buses and devices in the VM and confirm
that the ATS capability of Mellanox ConnectX-6 device is shown as Enable+.
$ sudo lspci -vvv
...
Capabilities: [480 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
...
You can modify a VM's NVIDIA vGPU configuration by removing the NVIDIA vGPU
configuration from a VM or by modifying GPU allocation policy.
1. Select Edit settings after right-clicking on the VM in the vCenter Web UI.
2. Select the Virtual Hardware tab.
3. Mouse over the PCI Device entry showing NVIDIA GRID vGPU and click on the (X) icon
to mark the device for removal.
4. Click OK to remove the device and update the VM settings.
depth-first
The depth-first allocation policy attempts to maximize the number of vGPUs running
on each physical GPU. Newly created vGPUs are placed on the physical GPU that can
support the new vGPU and that has the most vGPUs already resident on it. This policy
generally leads to higher density of vGPUs, particularly when different types of vGPUs
are being run, but may result in lower performance because it attempts to maximize
sharing of physical GPUs.
By default, VMware vSphere ESXi uses the breadth-first allocation policy.
If the default GPU allocation policy does not meet your requirements for performance or
density of vGPUs, you can change it.
5. In the Edit Host Graphics Settings dialog box that opens, select these options and
click OK.
a). If not already selected, select Shared Direct.
b). Select Group VMs on GPU until full.
After you click OK, the default graphics type changes to Shared Direct and the
allocation scheme for vGPU-enabled VMs is breadth-first.
Note: vGPU migration is disabled for a VM for which any of the following NVIDIA CUDA
Toolkit features is enabled:
‣ Unified memory
‣ Debuggers
‣ Profilers
How to migrate a VM configured with vGPU depends on the hypervisor that you are using.
After migration, the vGPU type of the vGPU remains unchanged.
The time required for migration depends on the amount of frame buffer that the vGPU
has. Migration for a vGPU with a large amount of frame buffer is slower than for a vGPU
with a small amount of frame buffer.
‣ Your hosts are correctly configured for VMware vMotion. See Host Configuration for
vMotion in the VMware documentation.
‣ The prerequisites listed for all supported hypervisors in Migrating a VM Configured
with vGPU are met.
‣ NVIDIA vGPU migration is configured. See Configuring VMware vMotion with vGPU for
VMware vSphere.
1. Context-click the VM and from the menu that opens, choose Migrate.
2. For the type of migration, select Change compute resource only and click Next.
If you select Change both compute resource and storage, the time required for the
migration increases.
3. Select the destination host and click Next.
The destination host must have a physical GPU of the same type as the GPU where
the vGPU currently resides. Furthermore, the physical GPU must be capable of hosting
the vGPU. If these requirements are not met, no available hosts are listed.
4. Select the destination network and click Next.
5. Select the migration priority level and click Next.
6. Review your selections and click Finish.
For more information, see the following topics in the VMware documentation:
If you see this error, configure NVIDIA vGPU migration as explained in Configuring
VMware vMotion with vGPU for VMware vSphere.
If your version of VMware vSpehere ESXi does not support vMotion for VMs configured
with NVIDIA vGPU, any attempt to migrate a VM with an NVIDIA vGPU fails and a window
containing the following error message is displayed:
Compatibility Issues
...
A required migration feature is not supported on the "Source" host 'host-name'.
For details about which VMware vSphere versions, NVIDIA GPUs, and guest OS releases
support suspend and resume, see NVIDIA AI Enterprise Release Notes.
‣ To suspend a VM, context-click the VM that you want to suspend, and from the
context menu that pops up, choose Power > Suspend .
‣ To resume a VM, context-click the VM that you want to resume, and from the context
menu that pops up, choose Power > Power On .
‣ The GPU instance is not being used by any other processes, such as CUDA
applications, monitoring applications, or the nvidia-smi command.
Perform this task in a guest VM command shell.
Note: If the GPU instance is being used by another process, this command fails. In this
situation, stop all processes that are using the GPU instance and retry the command.
gpu-instance-id
The GPU instance ID that specifies the GPU instance within which you want to
create the compute instance.
Note: If the GPU instance is being used by another process, this command fails. In this
situation, stop all processes that are using the GPU and retry the command.
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 0 0 0 | 1058MiB / 10235MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 4096MiB | | |
+------------------+----------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
This example confirms that two MIG 1c.2g.10gb compute instances were created on
GPU instance 0.
$ nvidia-smi
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 0 0 0 | 1058MiB / 10235MiB | 14 0 | 2 0 1 0 0 |
| | 0MiB / 4096MiB | | |
+------------------+ +-----------+-----------------------+
| 0 0 1 1 | | 14 0 | 2 0 1 0 0 |
| | | | |
+------------------+----------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
vgpu-id
A positive integer that identifies the vGPU assigned to a VM. For the first vGPU
assigned to a VM, vgpu-id is 0. For example, if two vGPUs are assigned to a VM and you
are enabling unified memory for both vGPUs, set pciPassthru0.cfg.enable_uvm and
pciPassthru1.cfg.enable_uvm to 1.
Note: Enabling profiling for a VM gives the VM access to the GPU’s global performance
counters, which may include activity from other VMs executing on the same GPU. Enabling
profiling for a VM also allows the VM to lock clocks on the GPU, which impacts all other
VMs executing on the same GPU.
Because NVIDIA CUDA Toolkit profilers can be used on only one VM at a time, you should
enable them for only one VM assigned a vGPU on a GPU. However, NVIDIA AI Enterprise
cannot enforce this requirement. If NVIDIA CUDA Toolkit profilers are enabled on more
than one VM assigned a vGPU on a GPU, profiling data is collected only for the first VM to
start the profiler.
NVIDIA AI Enterprise enables you to monitor the performance of physical GPUs and
virtual GPUs from the hypervisor and from within individual guest VMs.
‣ From a hypervisor command shell, such as the VMware ESXi host shell, nvidia-smi
reports management information for NVIDIA physical GPUs and virtual GPUs present
in the system.
‣ From a guest VM, nvidia-smi retrieves usage statistics for vGPUs or pass-through
GPUs that are assigned to the VM.
Gpu : 3 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats :
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats :
Active Sessions : 1
Average FPS : 227
Average Latency : 4403
[root@vgpu ~]#
Statistic Column
3D/Compute sm
Each reported percentage is the percentage of the physical GPU’s capacity that a vGPU
is using. For example, a vGPU that uses 20% of the GPU’s graphics engine’s capacity will
report 20%.
To modify the reporting frequency, use the –l or --loop option.
To limit monitoring to a subset of the GPUs on the platform, use the –i or --id option to
select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -u
# gpu vgpu sm mem enc dec
# Idx Id % % % %
0 11924 6 3 0 0
1 11903 8 3 0 0
2 11908 10 4 0 0
3 - - - - -
4 - - - - -
5 - - - - -
0 11924 6 3 0 0
1 11903 9 3 0 0
2 11908 10 4 0 0
3 - - - - -
4 - - - - -
5 - - - - -
0 11924 6 3 0 0
1 11903 8 3 0 0
2 11908 10 4 0 0
3 - - - - -
4 - - - - -
5 - - - - -
^C[root@vgpu ~]#
For each application on each vGPU, the usage statistics in the following table are reported
once every second. Each application is identified by its process ID and process name.
The table also shows the name of the column in the command output under which each
statistic is reported.
Statistic Column
3D/Compute sm
Each reported percentage is the percentage of the physical GPU’s capacity used by
an application running on a vGPU that resides on the physical GPU. For example, an
application that uses 20% of the GPU’s graphics engine’s capacity will report 20%.
To modify the reporting frequency, use the –l or --loop option.
To limit monitoring to a subset of the GPUs on the platform, use the –i or --id option to
select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -p
# GPU vGPU process process sm mem enc dec
# Idx Id Id name % % % %
0 38127 1528 dwm.exe 0 0 0 0
1 37408 4232 DolphinVS.exe 32 25 0 0
1 257869 4432 FurMark.exe 16 12 0 0
1 257969 4552 FurMark.exe 48 37 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 37408 4232 DolphinVS.exe 16 12 0 0
1 257911 656 DolphinVS.exe 32 24 0 0
1 257969 4552 FurMark.exe 48 37 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 257869 4432 FurMark.exe 38 30 0 0
1 257911 656 DolphinVS.exe 19 14 0 0
1 257969 4552 FurMark.exe 38 30 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 257848 3220 Balls64.exe 16 12 0 0
1 257869 4432 FurMark.exe 16 12 0 0
1 257911 656 DolphinVS.exe 16 12 0 0
1 257969 4552 FurMark.exe 48 37 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 257911 656 DolphinVS.exe 32 25 0 0
1 257969 4552 FurMark.exe 64 50 0 0
0 38127 1528 dwm.exe 0 0 0 0
1 37408 4232 DolphinVS.exe 16 12 0 0
To monitor the encoder sessions for processes running on multiple vGPUs, run nvidia-
smi vgpu with the –es or --encodersessions option.
For each encoder session, the following statistics are reported once every second:
‣ GPU ID
‣ vGPU ID
‣ Encoder session ID
‣ PID of the process in the VM that created the encoder session
‣ Codec type, for example, H.264 or H.265
‣ Encode horizontal resolution
‣ Encode vertical resolution
‣ One-second trailing average encoded FPS
‣ One-second trailing average encode latency in microseconds
To modify the reporting frequency, use the –l or --loop option.
To limit monitoring to a subset of the GPUs on the platform, use the –i or --id option to
select one or more GPUs.
[root@vgpu ~]# nvidia-smi vgpu -es
# GPU vGPU Session Process Codec H V Average Average
# Idx Id Id Id Type Res Res FPS Latency(us)
1 21211 2 2308 H.264 1920 1080 424 1977
1 21206 3 2424 H.264 1920 1080 0 0
1 22011 1 3676 H.264 1920 1080 374 1589
1 21211 2 2308 H.264 1920 1080 360 807
1 21206 3 2424 H.264 1920 1080 325 1474
1 22011 1 3676 H.264 1920 1080 313 1005
1 21211 2 2308 H.264 1920 1080 329 1732
1 21206 3 2424 H.264 1920 1080 352 1415
1 22011 1 3676 H.264 1920 1080 434 1894
1 21211 2 2308 H.264 1920 1080 362 1818
1 21206 3 2424 H.264 1920 1080 296 1072
1 22011 1 3676 H.264 1920 1080 416 1994
1 21211 2 2308 H.264 1920 1080 444 1912
1 21206 3 2424 H.264 1920 1080 330 1261
1 22011 1 3676 H.264 1920 1080 436 1644
1 21211 2 2308 H.264 1920 1080 344 1500
1 21206 3 2424 H.264 1920 1080 393 1727
1 22011 1 3676 H.264 1920 1080 364 1945
1 21211 2 2308 H.264 1920 1080 555 1653
To view detailed information about the supported vGPU types, add the –v or --verbose
option:
[root@vgpu ~]# nvidia-smi vgpu -s -i 0 -v | less
GPU 00000000:83:00.0
vGPU Type ID : 0xb
Name : GRID M60-0B
Class : NVS
Max Instances : 16
Device ID : 0x13f210de
Sub System ID : 0x13f21176
FB Memory : 512 MiB
Display Heads : 2
Maximum X Resolution : 2560
Maximum Y Resolution : 1600
Frame Rate Limit : 45 FPS
GRID License : GRID-Virtual-PC,2.0;GRID-Virtual-WS,2.0;GRID-
Virtual-WS-Ext,2.0;Quadro-Virtual-DWS,5.0
vGPU Type ID : 0xc
Name : GRID M60-0Q
Class : Quadro
Max Instances : 16
Device ID : 0x13f210de
Sub System ID : 0x13f2114c
FB Memory : 512 MiB
Display Heads : 2
Maximum X Resolution : 2560
Maximum Y Resolution : 1600
Frame Rate Limit : 60 FPS
This property is a dynamic property that varies for each GPU depending on whether MIG
mode is enabled for the GPU.
‣ If MIG mode is not enabled for the GPU, or if the GPU does not support MIG, this
property reflects the number and type of vGPUs that are already running on the GPU.
‣ If no vGPUs are running on the GPU, all vGPU types that the GPU supports are
listed.
‣ If one or more vGPUs are running on the GPU, but the GPU is not fully loaded, only
the type of the vGPUs that are already running is listed.
‣ If the GPU is fully loaded, no vGPU types are listed.
‣ If MIG mode is enabled for the GPU, the result reflects the number and type of GPU
instances on which no vGPUs are already running.
To view detailed information about the vGPU types that can currently be created, add the
–v or --verbose option.
‣ 3D/Compute
‣ Memory controller
‣ Video encoder
‣ Video decoder
‣ Frame buffer usage
Other metrics normally present in a GPU are not applicable to a vGPU and are reported as
zero or N/A, depending on the tool that you are using.
‣ GPU
‣ Video encoder
‣ Video decoder
‣ Frame buffer
To use nvidia-smi to retrieve statistics for the total resource usage by all applications
running in the VM, run the following command:
nvidia-smi dmon
NVIDIA GPUs implement a best effort vGPU scheduler that aims to balance performance
across vGPUs. The best effort scheduler allows a vGPU to use GPU processing cycles that
are not being used by other vGPUs. Under some circumstances, a VM running a graphics-
intensive application may adversely affect the performance of graphics-light applications
running in other VMs.
To address this issue with the best effort vGPU scheduler, NVIDIA GPUs additionally
support equal share and fixed share vGPU schedulers. These schedulers impose a limit on
GPU processing cycles used by a vGPU, which prevents graphics-intensive applications
running in one VM from affecting the performance of graphics-light applications running
in other VMs. On GPUs that support multiple vGPU schedulers, you can select the vGPU
scheduler to use. You can also set the length of the time slice for the equal share and
fixed share vGPU schedulers.
Note: If you use the equal share or fixed share vGPU scheduler, the frame-rate limiter
(FRL) is disabled.
The best effort scheduler is the default scheduler for all supported GPU architectures.
‣ For workloads that require low latency, a shorter time slice is optimal. Typically, these
workloads are applications that must generate output at a fixed interval, such as
graphics applications that generate output at a frame rate of 60 FPS. These workloads
are sensitive to latency and should be allowed to run at least once per interval. A
shorter time slice reduces latency and improves responsiveness by causing the
scheduler to switch more frequently between VMs.
‣ For workloads that require maximum throughput, a longer time slice is optimal.
Typically, these workloads are applications that must complete their work as quickly as
possible and do not require responsiveness, such as CUDA applications. A longer time
slice increases throughput by preventing frequent switching between VMs.
Note: You can change the vGPU scheduling behavior only on GPUs that support multiple
vGPU schedulers, that is, GPUs based on NVIDIA GPU architectures after the Maxwell
architecture.
Type
Dword
Contents
Value Meaning
0x00 (default) Best effort scheduler
0x01 Equal share scheduler with the default time slice length
0x00TT0001 Equal share scheduler with a user-defined time slice length TT
0x11 Fixed share scheduler with the default time slice length
0x00TT0011 Fixed share scheduler with a user-defined time slice length TT
The default time slice length depends on the maximum number of vGPUs per physical
GPU allowed for the vGPU type.
TT
Two hexadecimal digits in the range 01 to 1E that set the length of the time slice in
milliseconds (ms) for the equal share and fixed share schedulers. The minimum length
is 1 ms and the maximum length is 30 ms.
If TT is 00, the length is set to the default length for the vGPU type.
If TT is greater than 1E, the length is set to 30 ms.
Examples
This example sets the vGPU scheduler to equal share scheduler with the default time
slice length.
RmPVMRL=0x01
This example sets the vGPU scheduler to equal share scheduler with a time slice that is 3
ms long.
RmPVMRL=0x00030001
This example sets the vGPU scheduler to fixed share scheduler with the default time slice
length.
RmPVMRL=0x11
This example sets the vGPU scheduler to fixed share scheduler with a time slice that is 24
(0x18) ms long.
RmPVMRL=0x00180011
‣ BEST_EFFORT
‣ EQUAL_SHARE
‣ FIXED_SHARE
If the scheduling behavior is equal share or fixed share, the scheduler time slice in ms
is also displayed.
This example gets the scheduling behavior of the GPUs in a system in which the
behavior of one GPU is set to best effort, one GPU is set to equal share, and one GPU
is set to fixed share.
$ dmesg | grep NVRM | grep scheduler
2020-10-05T02:58:08.928Z cpu79:2100753)NVRM: GPU at 0000:3d:00.0 has software
scheduler DISABLED with policy BEST_EFFORT.
2020-10-05T02:58:09.818Z cpu79:2100753)NVRM: GPU at 0000:5e:00.0 has software
scheduler ENABLED with policy EQUAL_SHARE.
NVRM: Software scheduler timeslice set to 1 ms.
2020-10-05T02:58:12.115Z cpu79:2100753)NVRM: GPU at 0000:88:00.0 has software
scheduler ENABLED with policy FIXED_SHARE.
NVRM: Software scheduler timeslice set to 1 ms.
value
The value that sets the GPU scheduling policy and the length of the time slice that
you want, for example:
0x01
Sets the vGPU scheduling policy to equal share scheduler with the default time
slice length.
0x00030001
Sets the GPU scheduling policy to equal share scheduler with a time slice that is
3 ms long.
0x11
Sets the vGPU scheduling policy to fixed share scheduler with the default time
slice length.
0x00180011
Sets the GPU scheduling policy to fixed share scheduler with a time slice that is
24 (0x18) ms long.
For all supported values, see RmPVMRL Registry Key.
3. Reboot your hypervisor host machine.
Confirm that the scheduling behavior was changed as required as explained in Getting
the Current Time-Sliced vGPU Scheduling Behavior for All GPUs.
‣ On VMware vSphere, pipe the output of lspci to the grep command to display
information only for NVIDIA GPUs.
# lspci | grep NVIDIA
The NVIDIA GPU listed in this example has the PCI domain 0000 and BDF 86:00.0.
0000:86:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
3. Use the module parameter NVreg_RegistryDwordsPerDevice to set the pci and
RmPVMRL registry keys for each GPU.
-p "NVreg_RegistryDwordsPerDevice=pci=pci-domain:pci-bdf;RmPVMRL=value\
[;pci=pci-domain:pci-bdf;RmPVMRL=value...]"
This example changes the scheduling behavior of a single GPU. The command sets
the scheduling policy of the GPU at PCI domain 0000 and BDF 15:00.0 to fixed share
scheduler with a time slice that is 24 (0x18) ms long.
# esxcli system module parameters set -m nvidia -p \
"NVreg_RegistryDwordsPerDevice=pci=0000:15:00.0;RmPVMRL=0x11[;pci=0000:15:00.0;RmPVMRL=0x00180011]"
4. Reboot your hypervisor host machine.
Confirm that the scheduling behavior was changed as required as explained in Getting
the Current Time-Sliced vGPU Scheduling Behavior for All GPUs.
This chapter describes basic troubleshooting steps for NVIDIA vGPU and how to collect
debug information when filing a bug report.
The following table provides links to additional information about each application or
framework in NVIDIA AI Enterprise.
NVIDIA vGPU is available as a licensed product on supported NVIDIA GPUs. For a list
of recommended server platforms and supported GPUs, consult the release notes for
supported hypervisors at NVIDIA AI Enterprise Documentation.
MIG-Backed C-Series Virtual GPU Types for NVIDIA A100 PCIe 40GB
Required license edition: vCS or vWS
For details of GPU instance profiles, see NVIDIA Multi-Instance GPU User Guide.
Time-Sliced C-Series Virtual GPU Types for NVIDIA A100 PCIe 40GB
Required license edition: vCS or vWS
These vGPU types support a single display with a fixed maximum resolution.
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A100-40C 40960 1 1 4096×2160 1
Workloads
Training 1
A100-20C 20480 2 2 4096×2160 1
Workloads
Training 1
A100-10C 10240 4 4 4096×2160 1
Workloads
Training 1
A100-8C 8192 5 5 4096×2160 1
Workloads
Inference 1
A100-5C 5120 8 8 4096×2160 1
Workloads
Inference 1
A100-4C 4096 10 10 4096×2160 1
Workloads
MIG-Backed C-Series Virtual GPU Types for NVIDIA A100 HGX 40GB
Required license edition: vCS or vWS
For details of GPU instance profiles, see NVIDIA Multi-Instance GPU User Guide.
Time-Sliced C-Series Virtual GPU Types for NVIDIA A100 HGX 40GB
Required license edition: vCS or vWS
These vGPU types support a single display with a fixed maximum resolution.
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A100X-40C 40960 1 1 4096×2160 1
Workloads
Training 1
A100X-20C 20480 2 2 4096×2160 1
Workloads
Training 1
A100X-10C 10240 4 4 4096×2160 1
Workloads
Training 1
A100X-8C 8192 5 5 4096×2160 1
Workloads
Inference 1
A100X-5C 5120 8 8 4096×2160 1
Workloads
Inference 1
A100X-4C 4096 10 10 4096×2160 1
Workloads
MIG-Backed C-Series Virtual GPU Types for NVIDIA A100 PCIe 80GB
Required license edition: vCS or vWS
For details of GPU instance profiles, see NVIDIA Multi-Instance GPU User Guide.
Time-Sliced C-Series Virtual GPU Types for NVIDIA A100 PCIe 80GB
Required license edition: vCS or vWS
These vGPU types support a single display with a fixed maximum resolution.
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A100D-80C 81920 1 1 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A100D-40C 40960 2 2 4096×2160 1
Workloads
Training 1
A100D-20C 20480 4 4 4096×2160 1
Workloads
Inference 1
A100D-16C 16384 5 5 4096×2160 1
Workloads
Training 1
A100D-10C 10240 8 8 4096×2160 1
Workloads
Training 1
A100D-8C 8192 10 10 4096×2160 1
Workloads
Inference 1
A100D-4C 4096 20 20 4096×2160 1
Workloads
MIG-Backed C-Series Virtual GPU Types for NVIDIA A100 HGX 80GB
Required license edition: vCS or vWS
For details of GPU instance profiles, see NVIDIA Multi-Instance GPU User Guide.
Time-Sliced C-Series Virtual GPU Types for NVIDIA A100 HGX 80GB
Required license edition: vCS or vWS
These vGPU types support a single display with a fixed maximum resolution.
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A100DX-80C 81920 1 1 4096×2160 1
Workloads
Training 1
A100DX-40C 40960 2 2 4096×2160 1
Workloads
Training 1
A100DX-20C 20480 4 4 4096×2160 1
Workloads
Inference 1
A100DX-16C 16384 5 5 4096×2160 1
Workloads
Training 1
A100DX-10C 10240 8 8 4096×2160 1
Workloads
Training 1
A100DX-8C 8192 10 10 4096×2160 1
Workloads
Inference 1
A100DX-4C 4096 20 20 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A40-48C 49152 1 1 4096×2160 1
Workloads
Training 1
A40-24C 24576 2 2 4096×2160 1
Workloads
Training 1
A40-16C 16384 3 3 4096×2160 1
Workloads
Training 1
A40-12C 12288 4 4 4096×2160 1
Workloads
Training 1
A40-8C 8192 6 6 4096×2160 1
Workloads
Training 1
A40-6C 6144 8 8 4096×2160 1
Workloads
Inference 2 1
A40-4C 4096 12 12 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A30-24C 24576 1 1 4096×2160 1
Workloads
Training 1
A30-12C 12288 2 2 4096×2160 1
Workloads
Training 1
A30-8C 8192 3 3 4096×2160 1
Workloads
Inference 1
A30-6C 6144 4 4 4096×2160 1
Workloads
Inference 1
A30-4C 4096 6 6 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A16-16C 16384 1 4 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A16-8C 8192 2 8 4096×2160 1
Workloads
Inference 1
A16-4C 4096 4 16 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
A10-24C 24576 1 1 4096×2160 1
Workloads
Training 1
A10-12C 12288 2 2 4096×2160 1
Workloads
Training 1
A10-8C 8192 3 3 4096×2160 1
Workloads
Training 1
A10-6C 6144 4 4 4096×2160 1
Workloads
Inference 1
A10-4C 4096 6 6 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
RTXA6000-48C 49152 1 1 4096×2160 1
Workloads
Training 1
RTXA6000-24C 24576 2 2 4096×2160 1
Workloads
Training 1
RTXA6000-16C 16384 3 3 4096×2160 1
Workloads
Training 1
RTXA6000-12C 12288 4 4 4096×2160 1
Workloads
Training 1
RTXA6000-8C 8192 6 6 4096×2160 1
Workloads
Training 1
RTXA6000-6C 6144 8 8 4096×2160 1
Workloads
Inference 2 1
RTXA6000-4C 4096 12 12 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
RTXA5000-24C 24576 1 1 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
RTXA5000-12C 12288 2 2 4096×2160 1
Workloads
Training 1
RTXA5000-8C 8192 3 3 4096×2160 1
Workloads
Training 1
RTXA5000-6C 6144 4 4 4096×2160 1
Workloads
Inference 1
RTXA5000-4C 4096 6 6 4096×2160 1
Workloads
Virtual
Frame Maximum Maximum Maximum
Virtual Intended Displays
Buffer vGPUs vGPUs Display
GPU Type Use Case per
(MB) per GPU per Board Resolution
vGPU
Training 1
T4-16C 16384 1 1 4096×2160 1
Workloads
Training 1
T4-8C 8192 2 2 4096×2160 1
Workloads
Inference 1
T4-4C 4096 4 4 4096×2160 1
Workloads
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without
notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise
agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects
to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual
obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in
applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental
damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use
is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each
product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained
in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in
order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA
product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related
to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this
document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document.
Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or
a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights
of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full
compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS
(TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR
OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND
FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING
WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF
THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the
products described herein shall be limited in accordance with the Terms of Sale for the product.
VESA DisplayPort
DisplayPort and DisplayPort Compliance Logo, DisplayPort Compliance Logo for Dual-mode Sources, and DisplayPort Compliance Logo for Active Cables are
trademarks owned by the Video Electronics Standards Association in the United States and other countries.
HDMI
HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of HDMI Licensing LLC.
OpenCL
OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.
Trademarks
NVIDIA, the NVIDIA logo, NVIDIA Maxwell, NVIDIA Pascal, NVIDIA Turing, NVIDIA Volta, Quadro, and Tesla are trademarks or registered trademarks of NVIDIA
Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are
associated.
Copyright
© 2024 NVIDIA Corporation & affiliates. All rights reserved.