0% found this document useful (0 votes)

19 views26 pages

CUDA Toolkit Release Notes

Uploaded by

mrinal manhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views26 pages

CUDA Toolkit Release Notes

Uploaded by

mrinal manhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Release Notes

Release 12.1

NVIDIA

Mar 16, 2023

Contents

1 CUDA Toolkit Major Component Versions 3

2 New Features 9
2.1 General CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 CUDA Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 CUDA Developer Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Deprecated or Dropped Features 11

4 Known Issues 13
4.1 General CUDA Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 CUDA Compiler Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 CUDA Libraries 15
5.1 cuBLAS Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.1 cuBLAS: Release 12.0 Update 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.2 cuBLAS: Release 12.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 cuFFT Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2.1 cuFFT: Release 12.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2.2 cuFFT: Release 12.0 Update 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2.3 cuFFT: Release 12.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 cuSPARSE Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3.1 cuSPARSE: Release 12.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3.2 cuSPARSE: Release 12.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.4 Math Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.1 CUDA Math: Release 12.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.2 CUDA Math: Release 12.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.5 NVIDIA Performance Primitives (NPP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.5.1 NPP: Release 12.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.6 nvJPEG Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.6.1 nvJPEG: Release 12.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6 Notices 21
6.1 Notice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2 OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

i
ii
Release Notes, Release 12.1

NVIDIA CUDA Toolkit Release Notes

The Release Notes for the CUDA Toolkit.

The release notes for the NVIDIA® CUDA® Toolkit can be found online at https://docs.nvidia.com/cuda/
cuda-toolkit-release-notes/index.html.

Note: The release notes have been reorganized into two major sections: the general CUDA release
notes, and the CUDA libraries release notes including historical information for 12.x releases.

Contents 1
Release Notes, Release 12.1

2 Contents
Chapter 1. CUDA Toolkit Major
Component Versions

CUDA Components Starting with CUDA 11, the various components in the toolkit are versioned inde-
pendently.
For CUDA 12.1, the table below indicates the versions:

Table 1: Table 1. CUDA 12.1 Component Versions

Component Name Version Informa- Supported Archi- Supported Plat-
tion tectures forms
CUDA C++ Core Thrust 2.0.1 x86_64, POWER, Linux, Windows
Compute Li- aarch64-jetson
CUB 2.0.1
braries
libcu++ 1.9.0
Cooperative 12.0.0
Groups
CUDA Compatibility 12.1.32432504 x86_64, POWER, Linux, Windows
aarch64-jetson
CUDA Runtime (cudart) 12.1.55 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
cuobjdump 12.1.55 x86_64, POWER, Linux, Windows
aarch64-jetson
CUPTI 12.1.62 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA cuxxfilt (demangler) 12.1.55 x86_64, POWER, Linux, Windows
aarch64-jetson
CUDA Demo Suite 12.1.55 x86_64 Linux, Windows
CUDA GDB 12.1.55 x86_64, POWER, Linux, WSL
aarch64-jetson
CUDA Nsight Eclipse Plugin 12.1.55 x86_64, POWER Linux
CUDA NVCC 12.1.66 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
continues on next page

3
Release Notes, Release 12.1

Table 1 – continued from previous page

Component Name Version Informa- Supported Archi- Supported Plat-
tion tectures forms
CUDA nvdisasm 12.1.55 x86_64, POWER, Linux, Windows
aarch64-jetson
CUDA NVML Headers 12.1.55 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA nvprof 12.1.55 x86_64, POWER Linux, Windows
CUDA nvprune 12.1.55 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA NVRTC 12.1.55 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
NVTX 12.1.66 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA NVVP 12.1.55 x86_64, POWER Linux, Windows
CUDA OpenCL 12.1.56 x86_64 Linux, Windows
CUDA Profiler API 12.1.55 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA Compute Sanitizer API 12.1.55 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA cuBLAS 12.1.0.26 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA cuDLA 12.1.55 aarch64-jetson Linux
CUDA cuFFT 11.0.2.4 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA cuFile 1.6.0.25 x86_64 Linux
CUDA cuRAND 10.3.2.56 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA cuSOLVER 11.4.4.55 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA cuSPARSE 12.0.2.55 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA NPP 12.0.2.50 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA nvJitLink 12.1.55 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA nvJPEG 12.1.0.39 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
CUDA NVVM Samples 12.1.55 x86_64, POWER, Linux, Windows
aarch64-jetson
continues on next page

4 Chapter 1. CUDA Toolkit Major Component Versions

Release Notes, Release 12.1

Table 1 – continued from previous page

Component Name Version Informa- Supported Archi- Supported Plat-
tion tectures forms
Nsight Compute 2023.1.0.15 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL (Windows
11)
Nsight Systems 2023.1.2.43 x86_64, POWER, Linux, Windows,
aarch64-jetson WSL
Nsight Visual Studio Edition (VSE) 2023.1.0.23041 x86_64 (Win- Windows
dows)
nvidia_fs1 2.15.1 x86_64, aarch64- Linux
jetson
Visual Studio Integration 12.1.55 x86_64 (Win- Windows
dows)
NVIDIA Linux Driver 530.30.02 x86_64, POWER, Linux
aarch64-jetson
NVIDIA Windows Driver 531.14 x86_64 (Win- Windows, WSL
dows)

CUDA Driver Running a CUDA application requires the system with at least one CUDA capable GPU
and a driver that is compatible with the CUDA Toolkit. See Table 3. For more information various
GPU products that are CUDA capable, visit https://developer.nvidia.com/cuda-gpus.
Each release of the CUDA Toolkit requires a minimum version of the CUDA driver. The CUDA driver
is backward compatible, meaning that applications compiled against a particular version of the
CUDA will continue to work on subsequent (later) driver releases.
More information on compatibility can be found at https://docs.nvidia.com/cuda/
cuda-c-best-practices-guide/index.html#cuda-compatibility-and-upgrades.
Note: Starting with CUDA 11.0, the toolkit components are individually versioned, and the toolkit
itself is versioned as shown in the table below.
The minimum required driver version for CUDA minor version compatibility is shown be-
low. CUDA minor version compatibility is described in detail in https://docs.nvidia.com/deploy/
cuda-compatibility/index.html
1 Only available on select Linux distros

5
Release Notes, Release 12.1

Table 2: Table 2. CUDA Toolkit and Minimum Required Driver

Version for CUDA Minor Version Compatibility
CUDA Toolkit Minimum Required Driver Version for CUDA Minor Version Compatibility*
Linux x86_64 Driver Version Windows x86_64 Driver Version
CUDA 12.1.x >=525.60.13 >=527.41
CUDA 12.0.x >=525.60.13 >=527.41
CUDA 11.8.x >=450.80.02 >=452.39
CUDA 11.7.x >=450.80.02 >=452.39
CUDA 11.6.x >=450.80.02 >=452.39
CUDA 11.5.x >=450.80.02 >=452.39
CUDA 11.4.x >=450.80.02 >=452.39
CUDA 11.3.x >=450.80.02 >=452.39
CUDA 11.2.x >=450.80.02 >=452.39
CUDA 11.1 (11.1.0) >=450.80.02 >=452.39
CUDA 11.0 (11.0.3) >=450.36.06** >=451.22**

* Using a Minimum Required Version that is different from Toolkit Driver Version could be allowed in
compatibility mode – please read the CUDA Compatibility Guide for details.
** CUDA 11.0 was released with an earlier driver version, but by upgrading to Tesla Recommended
Drivers 450.80.02 (Linux) / 452.39 (Windows), minor version compatibility is possible across the CUDA
11.x family of toolkits.
The version of the development NVIDIA GPU Driver packaged in each CUDA Toolkit release is shown
below.

Table 3: Table 3. CUDA Toolkit and Corresponding Driver

Versions
CUDA Toolkit Toolkit Driver Version
Linux x86_64 Driver Version Windows x86
CUDA 12.1 GA >=530.30.02 >=531.14
CUDA 12.0 Update 1 >=525.85.12 >=528.33
CUDA 12.0 GA >=525.60.13 >=527.41
CUDA 11.8 GA >=520.61.05 >=520.06
CUDA 11.7 Update 1 >=515.48.07 >=516.31
CUDA 11.7 GA >=515.43.04 >=516.01
CUDA 11.6 Update 2 >=510.47.03 >=511.65
CUDA 11.6 Update 1 >=510.47.03 >=511.65
CUDA 11.6 GA >=510.39.01 >=511.23
cont

6 Chapter 1. CUDA Toolkit Major Component Versions

Release Notes, Release 12.1

Table 3 – continued from previous page

CUDA Toolkit Toolkit Driver Version
CUDA 11.5 Update 2 >=495.29.05 >=496.13
CUDA 11.5 Update 1 >=495.29.05 >=496.13
CUDA 11.5 GA >=495.29.05 >=496.04
CUDA 11.4 Update 4 >=470.82.01 >=472.50
CUDA 11.4 Update 3 >=470.82.01 >=472.50
CUDA 11.4 Update 2 >=470.57.02 >=471.41
CUDA 11.4 Update 1 >=470.57.02 >=471.41
CUDA 11.4.0 GA >=470.42.01 >=471.11
CUDA 11.3.1 Update 1 >=465.19.01 >=465.89
CUDA 11.3.0 GA >=465.19.01 >=465.89
CUDA 11.2.2 Update 2 >=460.32.03 >=461.33
CUDA 11.2.1 Update 1 >=460.32.03 >=461.09
CUDA 11.2.0 GA >=460.27.03 >=460.82
CUDA 11.1.1 Update 1 >=455.32 >=456.81
CUDA 11.1 GA >=455.23 >=456.38
CUDA 11.0.3 Update 1 >= 450.51.06 >= 451.82
CUDA 11.0.2 GA >= 450.51.05 >= 451.48
CUDA 11.0.1 RC >= 450.36.06 >= 451.22
CUDA 10.2.89 >= 440.33 >= 441.22
CUDA 10.1 (10.1.105 general release, and updates) >= 418.39 >= 418.96
CUDA 10.0.130 >= 410.48 >= 411.31
CUDA 9.2 (9.2.148 Update 1) >= 396.37 >= 398.26
CUDA 9.2 (9.2.88) >= 396.26 >= 397.44
CUDA 9.1 (9.1.85) >= 390.46 >= 391.29
CUDA 9.0 (9.0.76) >= 384.81 >= 385.54
CUDA 8.0 (8.0.61 GA2) >= 375.26 >= 376.51
CUDA 8.0 (8.0.44) >= 367.48 >= 369.30
CUDA 7.5 (7.5.16) >= 352.31 >= 353.66
CUDA 7.0 (7.0.28) >= 346.46 >= 347.62

For convenience, the NVIDIA driver is installed as part of the CUDA Toolkit installation. Note that this
driver is for development purposes and is not recommended for use in production with Tesla GPUs.
For running CUDA applications in production with Tesla GPUs, it is recommended to download the
latest driver for Tesla GPUs from the NVIDIA driver downloads site at https://www.nvidia.com/drivers.

7
Release Notes, Release 12.1

During the installation of the CUDA Toolkit, the installation of the NVIDIA driver may be skipped on
Windows (when using the interactive or silent installation) or on Linux (by using meta packages).
For more information on customizing the install process on Windows, see https://docs.nvidia.com/
cuda/cuda-installation-guide-microsoft-windows/index.html#install-cuda-software.
For meta packages on Linux, see https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.
html#package-manager-metas.

8 Chapter 1. CUDA Toolkit Major Component Versions

Chapter 2. New Features

This section lists new general CUDA and CUDA compilers features.

2.1. General CUDA

▶ New meta-packages for Linux installation.
▶ cuda-toolkit
▶ Installs all CUDA Toolkit packages required to develop CUDA applications.
▶ Handles upgrading to the latest version of CUDA when it’s released.
▶ Does not include the driver.
▶ cuda-toolkit-12
▶ Installs all CUDA Toolkit packages required to develop CUDA applications.
▶ Handles upgrading to the next 12.x version of CUDA when it’s released.
▶ Does not include the driver.
▶ New CUDA API to enable mini core dump programmatically is now available. Re-
fer to https://docs.nvidia.com/cuda/cuda-gdb/index.html#gpu-core-dump-support and
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__COREDUMP.html#group_
_CUDA__COREDUMP for more information.

2.2. CUDA Compilers

▶ NVCC has added support for host compiler: GCC 12.2, NVC++ 22.11, Clang 15.0, VS2022 17.4
▶ Breakpoint and single stepping behavior for a multi-line statement in device code has been im-
proved, when code is compiled with nvcc using gcc/clang host compiler compiler or when com-
piled with NVRTC on non-Windows platforms. The debugger will now correctly breakpoint and
single-step on each source line of the multiline source code statement.
▶ PTX has exposed a new special register in the public ISA, which can be used to query total size
of shared memory which includes user shared memory and SW reserved shared memory.
▶ NVCC and NVRTC now show preprocessed source line and column info in a diagnostic to help
users to understand the message and identify the issue causing the diagnostic. The source line
and column info can be turned off with --brief-diagnostics=true.

9
Release Notes, Release 12.1

2.3. CUDA Developer Tools

▶ For changes to nvprof and Visual Profiler, see the changelog.
▶ For new features, improvements, and bug fixes in CUPTI, see the changelog.
▶ For new features, improvements, and bug fixes in Nsight Compute, see the changelog.
▶ For new features, improvements, and bug fixes in Compute Sanitizer, see the changelog.
▶ For new features, improvements, and bug fixes in CUDA-GDB, see the changelog.

10 Chapter 2. New Features

Chapter 3. Deprecated or Dropped
Features

Features deprecated in the current release of the CUDA software still work in the current release,
but their documentation may have been removed, and they will become officially unsupported in a
future release. We recommend that developers employ alternative solutions to these features in their
software.
General CUDA
▶ None.
CUDA Tools
▶ None.
CUDA Compiler
▶ None.

11
Release Notes, Release 12.1

12 Chapter 3. Deprecated or Dropped Features

Chapter 4. Known Issues

4.1. General CUDA Known Issues

▶ For a cross-compile toolkit (such as linux64 host, aarch64 target), we are missing the host-
side stub library for libnvJitLink. As a workaround, you can copy the libnvJitlink.so from
the target install (for example, ∕usr∕local∕cuda-12.1∕targets∕aarch64-linux∕lib∕
libnvJitLink.so) to the host install (∕usr∕local∕cuda-12.1∕targets∕aarch64-linux∕
lib∕stubs∕libnvJitLink.so). Similarly if you are using the static library version (∕usr∕
local∕cuda-12.1∕targets∕aarch64-linux∕lib∕libnvJitLink_static.a), can copy it
from the target install (i.e. install on the device) to the same path on the host install. For a
sbsa cross-compile replace aarch64 with sbsa in the above copies.
▶ Due to an issue in the way CUDA processes memory attachment for NVLink multicast allocations,
memory must be aligned to 512MB. Alignments below this will result in a failure to attach and an
error issued by the driver.

4.2. CUDA Compiler Known Issues

▶ There is an issue regarding the handling of -split-compile=0 in nvcc and nvlink. In nvcc, split
compilation will be disabled when given the value of ‘0’, whereas in nvlink, ‘0’ is the default value
for split compilation when invoked for Link Time Optimization (LTO). This issue will be addressed
in a subsequent update.
▶ nvJitLink static and stub library for dynamic linking are not part of the cross-compilation builds
of Aarch64-Jetson and arm64-sbsa. This will be resolved in a future release.

13
Release Notes, Release 12.1

14 Chapter 4. Known Issues

Chapter 5. CUDA Libraries

This section covers CUDA Libraries release notes for 12.x releases.
▶ CUDA Math Libraries toolchain uses C++11 features, and a C++11-compatible standard library
(libstdc++ >= 20150422) is required on the host.
▶ Support for the following compute capabilities is removed for all libraries:
▶ sm_35 (Kepler)
▶ sm_37 (Kepler)

5.1. cuBLAS Library

5.1.1. cuBLAS: Release 12.0 Update 1

▶ New Features
▶ Improve performance on NVIDIA H100 SXM and NVIDIA H100 PCIe GPUs.
▶ Known Issues
▶ For optimal performance on NVIDIA Hopper architecture, cuBLAS needs to allocate a big-
ger internal workspace (64 MiB) than on the previous architectures (8 MiB). In the current
and previous releases, cuBLAS allocates 256 MiB. This will be addressed in a future release.
A possible workaround is to set the CUBLAS_WORKSPACE_CONFIG environment variable to
:32768:2 when running cuBLAS on NVIDIA Hopper architecture.
▶ Resolved Issues
▶ Reduced cuBLAS host-side overheads caused by not using the cublasLt heuristics cache.
This began in the CUDA Toolkit 12.0 release.
▶ Added forward compatible single precision complex GEMM that does not require workspace.

15
Release Notes, Release 12.1

5.1.2. cuBLAS: Release 12.0

▶ New Features
▶ cublasLtMatmul now supports FP8 with a non-zero beta.
▶ Added int64 APIs to enable larger problem sizes; refer to 64-bit integer interface.
▶ Added more Hopper-specific kernels for cublasLtMatmul with epilogues:
▶ CUBLASLT_EPILOGUE_BGRAD{A,B}
▶ CUBLASLT_EPILOGUE_{RELU,GELU}_AUX
▶ CUBLASLT_EPILOGUE_D{RELU,GELU}
▶ Improved Hopper performance on arm64-sbsa by adding Hopper kernels that were previ-
ously supported only on the x86_64 architecture for Windows and Linux.
▶ Known Issues
▶ There are no forward compatible kernels for single precision complex gemms that do not
require workspace. Support will be added in a later release.
▶ Resolved Issues
▶ Fixed an issue on NVIDIA Ampere architecture and newer GPUs where cublasLtMatmul
with epilogue CUBLASLT_EPILOGUE_BGRAD{A,B} and a nontrivial reduction scheme (that
is, not CUBLASLT_REDUCTION_SCHEME_NONE) could return incorrect results for the bias gra-
dient.
▶ cublasLtMatmul for gemv-like cases (that is, m or n equals 1) might ignore bias with the
CUBLASLT_EPILOGUE_RELU_BIAS and CUBLASLT_EPILOGUE_BIAS epilogues.
Deprecations
▶ Disallow including cublas.h and cublas_v2.h in the same translation unit.
▶ Removed:
▶ CUBLAS_MATMUL_STAGES_16x80 and CUBLAS_MATMUL_STAGES_64x80 from
cublasLtMatmulStages_t. No kernels utilize these stages anymore.
▶ cublasLt3mMode_t, CUBLASLT_MATMUL_PREF_MATH_MODE_MASK, and
CUBLASLT_MATMUL_PREF_GAUSSIAN_MODE_MASK from cublasLtMatmulPref-
erenceAttributes_t. Instead, use the corresponding flags from cublasLtNumeri-
calImplFlags_t.
▶ CUBLASLT_MATMUL_PREF_POINTER_MODE_MASK, CUBLASLT_MATMUL_PREF_EPILOGUE_MASK,
and CUBLASLT_MATMUL_PREF_SM_COUNT_TARGET from cublasLtMatmulPref-
erenceAttributes_t. The corresponding parameters are taken directly from
cublasLtMatmulDesc_t.
▶ CUBLASLT_POINTER_MODE_MASK_NO_FILTERING from cublasLtPointerMode-
Mask_t. This mask was only applicable to CUBLASLT_MATMUL_PREF_MATH_MODE_MASK
which was removed.

16 Chapter 5. CUDA Libraries

Release Notes, Release 12.1

5.2. cuFFT Library

5.2.1. cuFFT: Release 12.1

▶ New Features
▶ Improved performance on Hopper GPUs for hundreds of FFTs of sizes ranging from 14 to
28800. The improved performance spans over 542 cases across single and double precision
for FFTs with contiguous data layout.
▶ Known Issues
▶ Starting from CUDA 11.8, CUDA Graphs are no longer supported for callback routines that
load data in out-of-place mode transforms. An upcoming release will update the cuFFT
callback implementation, removing this limitation. cuFFT deprecated callback functional-
ity based on separate compiled device code in cuFFT 11.4.
▶ Resolved Issues
▶ cuFFT no longer produces errors with compute-sanitizer at program exit if the CUDA context
used at plan creation was destroyed prior to program exit.

5.2.2. cuFFT: Release 12.0 Update 1

▶ Resolved Issues
▶ Scratch space requirements for multi-GPU, single-batch, 1D FFTs were reduced.

5.2.3. cuFFT: Release 12.0

▶ New Features
▶ PTX JIT kernel compilation allowed the addition of many new accelerated cases for Maxwell,
Pascal, Volta and Turing architectures.
▶ Known Issues
▶ cuFFT plan generation time increases due to PTX JIT compiling. Refer to Plan Initialization
TIme.
▶ Resolved Issues
▶ cuFFT plans had an unintentional small memory overhead (of a few kB) per plan. This is
resolved.

5.2. cuFFT Library 17

Release Notes, Release 12.1

5.3. cuSPARSE Library

5.3.1. cuSPARSE: Release 12.0

▶ New Features
▶ JIT LTO functionalities (cusparseSpMMOp()) switched from driver to nvJitLto library. Start-
ing from CUDA 12.0 the user needs to link to libnvJitLto.so, see cuSPARSE documenta-
tion. JIT LTO performance has also been improved for cusparseSpMMOpPlan().
▶ Introduced const descriptors for the Generic APIs, for example, cusparseConst-
SpVecGet(). Now the Generic APIs interface clearly declares when a descriptor and its
data are modified by the cuSPARSE functions.
▶ Added two new algorithms to cusparseSpGEMM() with lower memory utilization. The first
algorithm computes a strict bound on the number of intermediate product, while the second
one allows partitioning the computation in chunks.
▶ Added int8_t support to cusparseGather(), cusparseScatter(), and cuspar-
seCsr2cscEx2().
▶ Improved cusparseSpSV() performance for both the analysis and the solving phases.
▶ Improved cusparseSpSM() performance for both the analysis and the solving phases.
▶ Improved cusparseSDDMM() performance and added support for batch computation.
▶ Improved cusparseCsr2cscEx2() performance.
▶ Resolved Issues
▶ cusparseSpSV() and cusparseSpSM() could produce wrong results.
▶ cusparseDnMatGetStridedBatch() did not accept batchStride == 0.
▶ Deprecations
▶ Removed deprecated CUDA 11.x APIs, enumerators, and descriptors.

18 Chapter 5. CUDA Libraries

Release Notes, Release 12.1

5.4. Math Library

5.4.1. CUDA Math: Release 12.1

▶ New Features
▶ Performance and accuracy improvements in atanf, acosf, asinf, sinpif, cospif, powf,
erff, and tgammaf.

5.4.2. CUDA Math: Release 12.0

▶ New Features
▶ Introduced new integer/fp16/bf16 CUDA Math APIs to help expose performance benefits of
new DPX instructions. Refer to https://docs.nvidia.com/cuda/cuda-math-api/index.html.
Known Issues
▶ Double precision inputs that cause the double precision division algorithm in the de-
fault ‘round to nearest even mode’ produce spurious overflow: an infinite result is deliv-
ered where DBL_MAX 0x7FEF_FFFF_FFFF_FFFF is expected. Affected CUDA Math APIs:
__ddiv_rn(). Affected CUDA language operation: double precision / operation in the de-
vice code.
▶ Deprecations
▶ All previously deprecated undocumented APIs are removed from CUDA 12.0.

5.5. NVIDIA Performance Primitives (NPP)

5.5.1. NPP: Release 12.0

▶ Deprecations
▶ Deprecating non-CTX API support from next release.
▶ Resolved Issues
▶ A performance issue with the NPP ResizeSqrPixel API is now fixed and shows improved
performance.

5.4. Math Library 19

Release Notes, Release 12.1

5.6. nvJPEG Library

5.6.1. nvJPEG: Release 12.0

▶ New Features
▶ Immproved the GPU Memory optimisation for the nvJPEG codec.
▶ Resolved Issues
▶ An issue that causes runtime failures when nvJPEGDecMultipleInstances was tested
with a large number of threads is resolved.
▶ An issue with CMYK four component color conversion is now resolved.
▶ Known Issues
▶ Backend NVJPEG_BACKEND_GPU_HYBRID - Unable to handle bistreams with extra scans
lengths.
▶ Deprecations
▶ The reuse of Huffman table in Encoder (nvjpegEncoderParamsCopyHuffmanTables).

20 Chapter 5. CUDA Libraries

Chapter 6. Notices

6.1. Notice
This document is provided for information purposes only and shall not be regarded as a warranty of a
certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no repre-
sentations or warranties, expressed or implied, as to the accuracy or completeness of the information
contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall
have no liability for the consequences or use of such information or for any infringement of patents
or other rights of third parties that may result from its use. This document is not a commitment to
develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any
other changes to this document, at any time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that
such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the
time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by
authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects
to applying any customer general terms and conditions with regards to the purchase of the NVIDIA
product referenced in this document. No contractual obligations are formed either directly or indirectly
by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military,
aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA
product can reasonably be expected to result in personal injury, death, or property or environmental
damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or
applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for
any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA.
It is customer’s sole responsibility to evaluate and determine the applicability of any information con-
tained in this document, ensure the product is suitable and fit for the application planned by customer,
and perform the necessary testing for the application in order to avoid a default of the application or
the product. Weaknesses in customer’s product designs may affect the quality and reliability of the
NVIDIA product and may result in additional or different conditions and/or requirements beyond those
contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or prob-
lem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is
contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other
NVIDIA intellectual property right under this document. Information published by NVIDIA regarding
third-party products or services does not constitute a license from NVIDIA to use such products or

21
Release Notes, Release 12.1

services or a warranty or endorsement thereof. Use of such information may require a license from a
third party under the patents or other intellectual property rights of the third party, or a license from
NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA
in writing, reproduced without alteration and in full compliance with all applicable export laws and
regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS,
DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE
BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR
OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WAR-
RANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CON-
SEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARIS-
ING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY
OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatso-
ever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein
shall be limited in accordance with the Terms of Sale for the product.

6.2. OpenCL
OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.

6.3. Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the
U.S. and other countries. Other company and product names may be trademarks of the respective
companies with which they are associated.

22 Chapter 6. Notices

Parallel Programming With CUDA - Architecture, Analysis
No ratings yet
Parallel Programming With CUDA - Architecture, Analysis
93 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Disney V Midjourney
No ratings yet
Disney V Midjourney
110 pages
GPU Programming On Windows 11
No ratings yet
GPU Programming On Windows 11
176 pages
Play Station Service Manual
0% (1)
Play Station Service Manual
7 pages
Nvidia Leadership (AutoRecovered)
No ratings yet
Nvidia Leadership (AutoRecovered)
34 pages
State of DePIN 2023
100% (1)
State of DePIN 2023
53 pages
The World of Graphics Cards An Insight Into GPU Technology
No ratings yet
The World of Graphics Cards An Insight Into GPU Technology
19 pages
Nvidia - Ug - Matlab Gpu Coder
100% (1)
Nvidia - Ug - Matlab Gpu Coder
66 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
CUDA Fortran
No ratings yet
CUDA Fortran
88 pages
Getting Started With CUDA Samples
No ratings yet
Getting Started With CUDA Samples
9 pages
Cupti
No ratings yet
Cupti
129 pages
Comcot User Manual
No ratings yet
Comcot User Manual
87 pages
CUDA For Tegra AppNote
No ratings yet
CUDA For Tegra AppNote
60 pages
Cuda
No ratings yet
Cuda
15 pages
CUDA 4 1 Webinar v11-11-22
100% (1)
CUDA 4 1 Webinar v11-11-22
41 pages
CUDA Toolkit Release Notes
No ratings yet
CUDA Toolkit Release Notes
50 pages
FPGA Implementation of A Face Recognition System
No ratings yet
FPGA Implementation of A Face Recognition System
5 pages
Cuda Reference Manual
No ratings yet
Cuda Reference Manual
256 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
LBDL A5 Booklet
No ratings yet
LBDL A5 Booklet
82 pages
Overview of GPGPU's
No ratings yet
Overview of GPGPU's
81 pages
Uci e
No ratings yet
Uci e
4 pages
25-04 Gpu Programming Without Cuda
No ratings yet
25-04 Gpu Programming Without Cuda
38 pages
CUDA Installation Guide Windows
No ratings yet
CUDA Installation Guide Windows
28 pages
Gpucoder Ug
No ratings yet
Gpucoder Ug
560 pages
CUDA Toolkit Reference Manual
No ratings yet
CUDA Toolkit Reference Manual
441 pages
S62256 - Demystify CUDA Debugging and Performance With Powerful Developer Tools
No ratings yet
S62256 - Demystify CUDA Debugging and Performance With Powerful Developer Tools
44 pages
CUDA Toolkit Reference Manual
No ratings yet
CUDA Toolkit Reference Manual
384 pages
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
38 pages
ACA Unit3 Revised
No ratings yet
ACA Unit3 Revised
53 pages
Cud A Reference Manual
No ratings yet
Cud A Reference Manual
299 pages
NVIDIA P40 Supported Servers
No ratings yet
NVIDIA P40 Supported Servers
13 pages
Turing Compatibility Guide
No ratings yet
Turing Compatibility Guide
22 pages
CUDA Compatibility
No ratings yet
CUDA Compatibility
24 pages
PDSCUDA
No ratings yet
PDSCUDA
11 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Metal Shading Language Specification
No ratings yet
Metal Shading Language Specification
249 pages
Animation 1
No ratings yet
Animation 1
39 pages
Computer Graphics Term Paper
100% (1)
Computer Graphics Term Paper
5 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
CUDA Programming: Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen
No ratings yet
CUDA Programming: Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen
28 pages
CUDA
No ratings yet
CUDA
20 pages
Transfer Learning For Finetuning
No ratings yet
Transfer Learning For Finetuning
19 pages
Intro GPUs
No ratings yet
Intro GPUs
36 pages
CUDA 6.0: Acknowledgements
No ratings yet
CUDA 6.0: Acknowledgements
13 pages
Maxwell Compatibility Guide
No ratings yet
Maxwell Compatibility Guide
8 pages
Cuda Toolkit Release Notes
No ratings yet
Cuda Toolkit Release Notes
17 pages
CUDA Part-1
No ratings yet
CUDA Part-1
52 pages
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
No ratings yet
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
28 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
Getting Started
No ratings yet
Getting Started
7 pages
Chapter 9 - Multiple Core Computers
No ratings yet
Chapter 9 - Multiple Core Computers
44 pages
Acceleratingpythonongpus
No ratings yet
Acceleratingpythonongpus
33 pages
CUDA Installation Guide Windows
No ratings yet
CUDA Installation Guide Windows
28 pages
Image Processing With CUDA
No ratings yet
Image Processing With CUDA
66 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
GPGPU Tutorial
No ratings yet
GPGPU Tutorial
155 pages
Vostro 15 3510 Setup Specs en Us
No ratings yet
Vostro 15 3510 Setup Specs en Us
24 pages
CUDA Zone - Library of Resources - NVIDIA Developer
No ratings yet
CUDA Zone - Library of Resources - NVIDIA Developer
7 pages
CUDA C Programming Guide
No ratings yet
CUDA C Programming Guide
173 pages
NVIDIA CUDA C Programming Guide 3.1
No ratings yet
NVIDIA CUDA C Programming Guide 3.1
173 pages
4 - Key Concepts
No ratings yet
4 - Key Concepts
2 pages
Lastexception 63867925894
No ratings yet
Lastexception 63867925894
61 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
Unit 5 - CUDA Architecture
No ratings yet
Unit 5 - CUDA Architecture
17 pages
CMT Assiment 3
No ratings yet
CMT Assiment 3
8 pages
Fanless Box Computer Intel® DAKM-2059 - INF - DS - EN
No ratings yet
Fanless Box Computer Intel® DAKM-2059 - INF - DS - EN
2 pages
CUDA Wikipedia
No ratings yet
CUDA Wikipedia
10 pages
DX Diag
No ratings yet
DX Diag
32 pages
Victus by HP Gaming Laptop 15z-fb200, 15.6
No ratings yet
Victus by HP Gaming Laptop 15z-fb200, 15.6
6 pages
AMD Radeon Pro w7700 Datasheet
No ratings yet
AMD Radeon Pro w7700 Datasheet
4 pages
Nvidia Jetson TX2
No ratings yet
Nvidia Jetson TX2
2 pages
Python, Performance, and GPUs - Towards Data Science
No ratings yet
Python, Performance, and GPUs - Towards Data Science
8 pages
Datsheet Udoo Bolt
No ratings yet
Datsheet Udoo Bolt
2 pages
Nvidia Cuda Tegra Toolkit 10.2.89: Release Notes For Development Auto 5.1.9
No ratings yet
Nvidia Cuda Tegra Toolkit 10.2.89: Release Notes For Development Auto 5.1.9
8 pages
Production Ranking Systems: A Review: Murium Iqbal Nishan Subedi Kamelia Aryafar
No ratings yet
Production Ranking Systems: A Review: Murium Iqbal Nishan Subedi Kamelia Aryafar
10 pages
Vector GP76 12UHSO-877
No ratings yet
Vector GP76 12UHSO-877
1 page
Misic (Parallel Implementation) PDF
No ratings yet
Misic (Parallel Implementation) PDF
4 pages
Gpu, Cuda and Pycuda
No ratings yet
Gpu, Cuda and Pycuda
11 pages
Linux Shell Scripting - A Beginner's Guide: First Edition
From Everand
Linux Shell Scripting - A Beginner's Guide: First Edition
Michael Basler
No ratings yet
The Gracious Lily Affair
From Everand
The Gracious Lily Affair
Van Wyck Mason
5/5 (1)
Bimbo Heaven: Stone Angel #7
From Everand
Bimbo Heaven: Stone Angel #7
Marvin H. Albert
No ratings yet
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet

CUDA Toolkit Release Notes

Uploaded by

CUDA Toolkit Release Notes

Uploaded by

Release Notes

Mar 16, 2023

1 CUDA Toolkit Major Component Versions 3

3 Deprecated or Dropped Features 11

NVIDIA CUDA Toolkit Release Notes

The Release Notes for the CUDA Toolkit.

Table 1: Table 1. CUDA 12.1 Component Versions

Table 1 – continued from previous page

4 Chapter 1. CUDA Toolkit Major Component Versions

Table 1 – continued from previous page

Table 2: Table 2. CUDA Toolkit and Minimum Required Driver

Table 3: Table 3. CUDA Toolkit and Corresponding Driver

6 Chapter 1. CUDA Toolkit Major Component Versions

Table 3 – continued from previous page

8 Chapter 1. CUDA Toolkit Major Component Versions

2.1. General CUDA

2.2. CUDA Compilers

2.3. CUDA Developer Tools

10 Chapter 2. New Features

12 Chapter 3. Deprecated or Dropped Features

4.1. General CUDA Known Issues

4.2. CUDA Compiler Known Issues

14 Chapter 4. Known Issues

5.1. cuBLAS Library

5.1.1. cuBLAS: Release 12.0 Update 1

5.1.2. cuBLAS: Release 12.0

16 Chapter 5. CUDA Libraries

5.2. cuFFT Library

5.2.1. cuFFT: Release 12.1

5.2.2. cuFFT: Release 12.0 Update 1

5.2.3. cuFFT: Release 12.0

5.2. cuFFT Library 17

5.3. cuSPARSE Library

5.3.1. cuSPARSE: Release 12.0

18 Chapter 5. CUDA Libraries

5.4. Math Library

5.4.1. CUDA Math: Release 12.1

5.4.2. CUDA Math: Release 12.0

5.5. NVIDIA Performance Primitives (NPP)

5.5.1. NPP: Release 12.0

5.4. Math Library 19

5.6. nvJPEG Library

5.6.1. nvJPEG: Release 12.0

20 Chapter 5. CUDA Libraries

You might also like