Release Notes
Release Notes
Release Notes
TABLE OF CONTENTS
www.nvidia.com
Nsight Compute v2023.1.1 | ii
Chapter 3. Support............................................................................................. 31
3.1. Platform Support........................................................................................ 31
3.2. GPU Support..............................................................................................32
3.3. System Requirements................................................................................... 32
www.nvidia.com
Nsight Compute v2023.1.1 | iii
LIST OF TABLES
www.nvidia.com
Nsight Compute v2023.1.1 | iv
Chapter 1.
RELEASE NOTES
www.nvidia.com
Nsight Compute v2023.1.1 | 1
Release Notes
‣ Fixed potential memory leak while collecting SW counters for modules with
unpatched kernel functions.
www.nvidia.com
Nsight Compute v2023.1.1 | 2
Release Notes
‣ Fixed performance issues on the Summary and Raw pages for large reports.
‣ Improved support for non-ASCII characters in filenames.
‣ Fixed an issue with delayed updates of assembly analysis information on the Source
page's Source and PTX views.
‣ Fixed potential crashes when using the Python report interface.
www.nvidia.com
Nsight Compute v2023.1.1 | 3
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 4
Release Notes
memory accesses which are uncoalesced and result in inefficient DRAM accesses.
Refer to the README, sample code and document under extras/samples/
uncoalescedGlobalAccesses.
‣ Added Metrics Reference in the documentation that lists metrics not available
through --query-metrics.
‣ Reduced the overhead of collecting SASS-patching based metrics.
‣ On Multi-Instance GPU (MIG) configurations, NVIDIA Nsight Compute cannot lock
clocks anymore. Users are expected to lock clocks externally using nvidia-smi.
NVIDIA Nsight Compute
‣ Wrapper script nv-nsight-cu is deprecated in favor of ncu-ui and will be
removed in a future release.
‣ Source page supports range replay results.
‣ Added a second chart on the Compute Workload Analysis section to avoid mixing
metrics with different meaning.
‣ NVIDIA Nsight Compute now tracks traversable handles created with
optixAccelRelocate.
‣ NVIDIA Nsight Compute now tracks traversable handles created as updates from
others.
‣ The Acceleration Structure viewer now reports unsupported inputs.
‣ The Acceleration Structure viewer now supports opening multiple traversable
handles.
‣ The Acceleration Structure viewer now uses OptiX naming for displayed elements.
NVIDIA Nsight Compute CLI
‣ Wrapper script nv-nsight-cu-cli is deprecated in favor of ncu and will be
removed in a future release.
‣ Added new option --filter-mode per-gpu to enable filtering of kernel launches
on each GPU separately.
‣ Added new option --app-replay-mode relaxed to produce profiling results for
valid kernels even if the number of kernel launches is inconsistent across application
replay passes.
‣ Added a documentation section on supported environment variables.
‣ Improved the performance when loading existing reports on the command line.
Resolved Issues
‣ Fixed an issue when resolving files on the Source page.
‣ Fixed an issue when profiling OptiX applications.
‣ Fixed an issue in the OptiX traversable handle management caused by clashing
handle values.
‣ Fixed an issue in the Acceleration Structure viewer causing the display of invalid
memory when viewing AABB buffers.
www.nvidia.com
Nsight Compute v2023.1.1 | 5
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 6
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 7
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 8
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 9
Release Notes
‣ Added a new tool window showing the CPU call stack at the location where the
current thread was suspended during interactive profiling activities.
‣ If enabled, the Call Stack / NVTX page of the profile report shows the captured CPU
call stack for the selected kernel launch.
NVIDIA Nsight Compute CLI
‣ Added support for printing source/metric content with the new --page source
and --print-source command line options.
‣ Added new option --call-stack to enable collecting the CPU call stack for every
profiled kernel launch.
Resolved Issues
‣ Fixed that memory_* metrics could not be collected with the --metrics option.
‣ Fixed that selection and copy/paste was not supported for section header tables on
the Details page.
‣ Fixed issues with the Source page when collapsing the content.
‣ Fixed that the UI could crash when applying rules to a new profile result.
‣ Fixed that PC Sampling metrics were not available for Profile Series.
‣ Fixed that local profiling did not work if no non-loopback address was configured
for the system.
‣ Fixed termination of remote-launched applications. On QNX, terminating an
application profiled via Remote Launch is now supported. Canceling remote-
launched Profile activities is now supported.
www.nvidia.com
Nsight Compute v2023.1.1 | 10
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 11
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 12
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 13
Release Notes
‣ Fixed behavior of horizontal scroll bars when clicking in the source views on the
Source page.
‣ Fixed appearance of multi-line entries in column chooser on the Source page.
‣ Fixed enablement state of the reset button on the Connection dialog.
‣ Fixed potential crash of NVIDIA Nsight Compute when windows size becomes
small while being on the Source page.
‣ Fixed potential crash of NVIDIA Nsight Compute when relative paths for section/
rules files could not be found.
‣ Fixed potential crash of NVIDIA Nsight Compute after removing baselines.
www.nvidia.com
Nsight Compute v2023.1.1 | 14
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 15
Release Notes
‣ A warning is shown if kernel replay starts staging GPU memory to CPU memory or
the file system.
‣ Section and rule files are deployed to a versioned directory in the user's home
directory to allow easier editing of those files, and to prevent modifying the base
installation.
‣ Removed support for NVLINK(nvl*) metrics due to a potential application hang
during data collection. The metrics will be added back in a future version of the
driver/tool.
NVIDIA Nsight Compute
‣ Added support for Profile Series. Series allow you to profile a kernel with a range of
configurable parameters to analyze the performance of each combination.
‣ Added a new Allocations view to the Resources tool window which shows the state of
all current memory allocations.
‣ Added a new Memory Pools view to the Resources tool window which shows the state
of all current memory pools.
‣ Added coverage of peer memory to the Memory Chart.
‣ The Source page now shows the number of excessive sectors requested from L1 or
L2, e.g. due to uncoalesced memory accesses.
‣ The Source column on the Source page can now be scrolled horizontally.
‣ The kernel duration gpu__time_duration.sum was added as column on the
Summary page.
‣ Improved the performance of application replay when not all kernels in the
application are profiled.
NVIDIA Nsight Compute CLI
‣ Added a new --app-replay-match option to select the mechanism used for
matching kernel instances across application replay passes.
‣ An error is shown if --nvtx-include/exclude are used without --nvtx.
Resolved Issues
‣ The Grid Size column on the Raw page now shows the CUDA grid size like the
Launch Statistics section, rather than the combined grid and block sizes.
‣ The Branch Resolving wap stall reason was added to the PC sampling metric groups
and the Warp State Statistics section.
‣ The API Stream tool window shows kernel names according to the selected Function
Name Mode.
‣ Fixed that an incorrect line could be shown after a heatmap selection on the Source
page.
‣ Fixed incorrect metric usage for system memory in the Memory Chart. Previously,
all requested memory of L2 from system memory was reported instead of only the
portion that missed in L2.
www.nvidia.com
Nsight Compute v2023.1.1 | 16
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 17
Release Notes
‣ Added a new --log-file option to decide the output stream for printing tool
output.
‣ Added a new --check-exit-code option to decide if the child application exit
code should be checked.
Resolved Issues
‣ The profiling progress dialog is not dismissed automatically anymore after an error.
‣ The inter-process lock is now automatically given write permissions for all users.
‣ All project extensions are enabled in the default dialog filter.
‣ Fixed handling of targets using tcsh during remote profiling.
‣ Fixed handling of quoted application arguments on Windows.
www.nvidia.com
Nsight Compute v2023.1.1 | 18
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 19
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 20
Release Notes
‣ Added a snap-select feature to the Source page heatmap help navigate large files
‣ Added support for loading remote CUDA-C source files via SSH on demand for
Linux x86_64 targets
‣ Charts on the Details page provide better help in tool tips when hovering metric
names
‣ Improved the performance of the Source page when scrolling or collapsing
‣ The charts for Warp States and Compute pipelines are now sorted by value
NVIDIA Nsight Compute CLI
‣ Added support for GPU cache control, see --cache-control
‣ Added support for setting the kernel name base in command line output, see --
kernel-base
‣ Added support for listing the available names for --chips, see --list-chips
‣ Improved the stability on Windows when using --target-processes all
‣ Reduced the profiling overhead for small metric sets in applications with many
kernels
Resolved Issues
‣ Reduced the overhead caused by demangling kernel names multiple times
‣ Fixed an issue that kernel names were not demangled in CUDA Graph Nodes
resources window
‣ The connection dialog better disables unsupported combinations or warns of invalid
entries
‣ Fixed metric thread_inst_executed_true to derive from
smsp_not_predicated_off_thread_inst_executed on Volta+ GPUs
‣ Fixed an issue with computing the theoretical occupancy on GV100
‣ Selecting an entry on the Source page heatmap no longer selects the respective
source line, to avoid losing the current selection
‣ Fixed the current view indicator of the Source page heatmap to be line-accurate
‣ Fixed an issue when comparing metrics from Pascal and later architectures on the
Summary page
‣ Fixed an issue that metrics representing constant values on Volta+ couldn't be
collected without non-constant metrics
www.nvidia.com
Nsight Compute v2023.1.1 | 21
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 22
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 23
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 24
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 25
Release Notes
www.nvidia.com
Nsight Compute v2023.1.1 | 26
Chapter 2.
KNOWN ISSUES
Installation
‣ The installer might not show all patch-level version numbers during installation.
‣ Some command line options listed in the help of a .run installer of NVIDIA Nsight
Compute are affecting only the archive extraction, but not the installation stage. To
pass command line options to the embedded installer script, specify those options
after -- in the form of -- -<option>. The available options for the installer script
are:
For example, specifying only option --quiet extracts the installer archive
without any output to the console, but still prompts for user interaction during the
installation. To install NVIDIA Nsight Compute without any console output nor any
user interaction, please specify --quiet -- -noprompt.
‣ After using the SDK Manager to install the NVIDIA Nsight Compute tools, their
binary path needs to be manually added to your PATH environment variable.
‣ See also the System Requirements for more installation instructions.
Launch and Connection
‣ Launching applications on remote targets/platforms is not supported for several
combinations. See Platform Support for details. Manually launch the application
using command line ncu --mode=launch on the remote system and connect using
the UI or CLI afterwards.
‣ In the NVIDIA Nsight Compute connection dialog, a remote system can only be
specified for one target platform. Remove a connection from its current target
platform in order to be able to add it to another.
‣ Loading of CUDA sources via SSH requires that the remote connection is
configured, and that the hostname/IP address of the connection matches the
target (as seen in the report session details). For example, prefer my-machine.my-
domain.com, instead of my-machine, even though the latter resolves to the same.
www.nvidia.com
Nsight Compute v2023.1.1 | 27
Known Issues
‣ Other issues concerning remote connections are discussed in the documentation for
remote connections.
‣ Local connections between NVIDIA Nsight Compute and the launched
target application might not work on some ppc64le or aarch64 (sbsa)
systems configured to only support IPv6. On these platforms, the
NV_COMPUTE_PROFILER_LOCAL_CONNECTION_OVERRIDE=uds
environment variable can be set to use Unix Domain Sockets instead of TCP for
local connections to workaround the problem. On x86_64 Linux, Unix Domain
Sockets are used by default, but local TCP connections can be forced using
NV_COMPUTE_PROFILER_LOCAL_CONNECTION_OVERRIDE=tcp.
Profiling and Metrics
‣ Profiling of 32-bit processes is not supported.
‣ Profiling kernels executed on a device that is part of an SLI group is not supported.
An "Unsupported GPU" error is shown in this case.
‣ Profiling a kernel while other contexts are active on the same device (e.g. X server,
or secondary CUDA or graphics application) can result in varying metric values for
L2/FB (Device Memory) related metrics. Specifically, L2/FB traffic from non-profiled
contexts cannot be excluded from the metric results. To completely avoid this issue,
profile the application on a GPU without secondary contexts accessing the same
device (e.g. no X server on Linux).
‣ In the current release, profiling a kernel while any other GPU work is executing on
the same MIG compute instance can result in varying metric values for all units.
NVIDIA Nsight Compute enforces serialization of the CUDA launches within
the target application to ensure those kernels do not influence each other. See
Serialization for more details. However, GPU work issued through other APIs in the
target process or workloads created by non-target processes running simultaneously
in the same MIG compute instance will influence the collected metrics. Note that it is
acceptable to run CUDA processes in other MIG compute instances as they will not
influence the profiled MIG compute instance.
‣ On Linux kernels settings fs.protected_regular=1 (e.g. some Ubuntu 20.04
cloud service provider instances), root users may not be able to access the inter-
process lock file. See the FAQ for workarounds.
‣ Profiling only supports up to 32 device instances, including instances of MIG
partitions. Profiling the 33rd or higher device instance will result in indeterminate
data.
‣ Enabling certain metrics can cause GPU kernels to run longer than the driver's
watchdog time-out limit. In these cases the driver will terminate the GPU kernel
resulting in an application error and profiling data will not be available. Please
disable the driver watchdog time out before profiling such long running CUDA
kernels.
‣ On Linux, setting the X Config option Interactive to false is recommended.
‣ For Windows, detailed information on disabling the Windows TDR is available
at https://docs.microsoft.com/en-us/windows-hardware/drivers/display/
timeout-detection-and-recovery
‣ Collecting device-level metrics, such as the NVLINK metrics (nvl*), is not
supported on NVIDIA virtual GPUs (vGPUs).
www.nvidia.com
Nsight Compute v2023.1.1 | 28
Known Issues
‣ As of CUDA 11.4 and R470 TRD1 driver release, NVIDIA Nsight Compute is
supported in a vGPU environment which requires a vGPU license. If the license is
not obtained after 20 minutes, the reported performance metrics data from the GPU
will be inaccurate. This is because of a feature in vGPU environment which reduces
performance but retains functionality as specified here.
‣ Profiling on NVIDIA live-migrated virtual machines is not supported and can result
in undefined behavior.
‣ Profiling with enabled multi-process service (MPS) can result in undefined behavior.
‣ The NVLink Topology section is not supported for a configuration using NVSwitch.
‣ NVIDIA Nsight Compute does not support per-NVLink metrics.
‣ NVIDIA Nsight Compute does not support the Logical NVLink Throughput table.
‣ Profiling CUDA graph kernel nodes that can launch device graphs or are part of
device-launchable graphs is not supported. Use Graph Profiling mode instead.
‣ On CUDA drivers older than 530.x, profiling on Windows Subsystem for Linux
(WSL) is not supported if the system has multiple physical NVIDIA GPUs. This is
not affected by setting CUDA_VISIBLE_DEVICES.
Compatibility
‣ Applications calling blocking functions on std input/output streams can result in the
profiler to stop, until the blocking function call is resolved.
‣ NVIDIA Nsight Compute can hang on applications using RAPIDS in versions 0.6
and 0.7, due to an issue in cuDF.
‣ Profiling child processes launched via clone() is not supported.
‣ Profiling child processes launched from Python using os.system() is not
supported.
‣ Profiling of Cooperative Groups kernels launched with
cuLaunchCooperativeKernelMultiDevice is not yet supported.
‣ On Linux systems, when profiling bsd-csh scripts, the original application output
will not be printed. As a workaround, use a different C-shell, e.g. tcsh.
‣ Attempting to use the --clock-control option to set the GPU clocks will
fail when profiling on a GPU partition. Please use nvidia-smi (installed with
NVIDIA display driver) to control the clocks for the entire GPU. This will require
administrative privileges when the GPU is partitioned.
‣ On Linux aarch64, NVIDIA Nsight Compute does not work if the HOME
environment variable is not set.
‣ NVIDIA Nsight Compute versions 2020.1.0 to 2020.2.1 are not compatible with
CUDA driver version 460+ if the application launches Cooperative Groups kernels.
Profiling will fail with error "UnknownError".
‣ Collecting CPU call stack information on Windows Server 2016 can hang NVIDIA
Nsight Compute in some cases. Currently, the only workaround is to skip CPU call
stack collection on such systems by not specifying the option --call-stack.
‣ When profiling a script, --target-processes all may target utility executables
such as xargs, uname or ls. To avoid profiling these, use the --target-processes-
filter option accordingly.
‣ On mobile platforms, --kill option is not supported with application replay mode.
User Interface
www.nvidia.com
Nsight Compute v2023.1.1 | 29
Known Issues
‣ The API Statistics filter in NVIDIA Nsight Compute does not support units.
‣ File size is the only property considered when resolving source files. Timestamps
are currently ignored.
‣ Terminating or disconnecting an application in the Interactive Profiling activity while
the API Stream View is updated can lead to a crash.
‣ See the OptiX library support section for limitations concerning the Acceleration
Structure Viewer.
‣ After updating from a previous version of NVIDIA Nsight Compute on Linux, the
file load dialog may not allow column resizing and sorting. As a workaround, the
~/.config/QtProject.conf file can be edited to remove the treeViewHeader entry from the
[FileDialog] section.
www.nvidia.com
Nsight Compute v2023.1.1 | 30
Chapter 3.
SUPPORT
Host Targets
Windows Yes Windows*, Linux (x86_64)
Windows Subsystem for Yes Windows Subsystem for Linux (WSL2) as part of the
Linux (WSL2) Linux (x86_64) package.
Linux (x86_64) Yes Windows*, Linux (x86_64), Linux (ppc64le), Linux
(aarch64 sbsa)
Linux (ppc64le) No Linux (ppc64le)
Linux (aarch64 sbsa) Yes Linux (aarch64 sbsa)
Linux (x86_64) (Drive Yes Windows*, Linux (x86_64), Linux (aarch64), QNX
SDK)
MacOSX 10.15+ Yes Windows*, Linux (x86_64), Linux (ppc64le)
Linux (aarch64) No Linux (aarch64)
QNX No QNX
Target platforms marked with * do not support remote launch from the respective host.
Remote launch means that the application can be launched on the target system from the
host UI. Instead, the application must be launched from the target system.
Profiling of 32-bit processes is not supported.
www.nvidia.com
Nsight Compute v2023.1.1 | 31
Support
Architecture Support
Kepler No
Maxwell No
Pascal No
Volta GV100 Yes
Volta GV11b Yes
Turing TU1xx Yes
NVIDIA GA100 Yes
NVIDIA GA10x Yes
NVIDIA GA10b Yes
NVIDIA AD10x Yes
NVIDIA GH100 Yes
Most metrics used in NVIDIA Nsight Compute are identical to those of the PerfWorks
Metrics API and follow the documented Metrics Structure. A comparison between the
metrics used in nvprof and their equivalent in NVIDIA Nsight Compute can be found in
the NVIDIA Nsight Compute CLI User Manual.
‣ Ubuntu 18.04
www.nvidia.com
Nsight Compute v2023.1.1 | 32
Support
Profiling on Windows Subsystem for Linux (WSL) is only supported with WSL version
2, NVIDIA display driver version 525 or higher and Windows 11. The Linux (x86_64)
NVIDIA Nsight Compute package can be used and should be installed directly within
WSL2. Remote profiling to and from WSL2 works equivalently to regular Linux (x86_64)
hosts and targets, as long as it's accessible via SSH. Access to NVIDIA GPU Performance
Counters must be enabled in the NVIDIA Control Panel of the Windows host. See also
the CUDA on WSL User Guide.
Windows
Only Windows 10 and 11 are supported as host and target.
The Visual Studio 2017 redistributable is not automatically installed by the NVIDIA
Nsight Compute installer. The workaround is to install the x64 version of the 'Microsoft
Visual C++ Redistributable for Visual Studio 2017' manually. The installer is linked on
the main download page for Visual Studio at https://www.visualstudio.com/downloads/
or download directly from https://go.microsoft.com/fwlink/?LinkId=746572.
www.nvidia.com
Nsight Compute v2023.1.1 | 33
Notice
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS,
DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY,
"MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES,
EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE
MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF
NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR
PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA
Corporation assumes no responsibility for the consequences of use of such
information or for any infringement of patents or other rights of third parties
that may result from its use. No license is granted by implication of otherwise
under any patent rights of NVIDIA Corporation. Specifications mentioned in this
publication are subject to change without notice. This publication supersedes and
replaces all other information previously supplied. NVIDIA Corporation products
are not authorized as critical components in life support devices or systems
without express written approval of NVIDIA Corporation.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA
Corporation in the U.S. and other countries. Other company and product names
may be trademarks of the respective companies with which they are associated.
Copyright
© 2018-2023 NVIDIA Corporation and affiliates. All rights reserved.
This product includes software developed by the Syncro Soft SRL (http://
www.sync.ro/).
www.nvidia.com