S62256 - Demystify CUDA Debugging and Performance with Powerful Developer Tools
S62256 - Demystify CUDA Debugging and Performance with Powerful Developer Tools
https://developer.nvidia.com/tools-overview
Developer Tools Ecosystem
Debuggers: cuda-gdb, Nsight Visual Studio Edition Profilers: Nsight Systems, Nsight Compute, CUPTI, NVIDIA Tools eXtension (NVTX)
Nsight Visual Studio Code Edition
Correctness Checker:: Compute Sanitizer IDE integrations: Nsight Visual Studio Code Edition
Nsight Visual Studio Edition
Nsight Eclipse Edition
Compute Debuggers and IDEs
Compute Debuggers
Debug GPU Kernels Running on Device
• CUDA GDB
• CPU + GPU CUDA kernel debugger
• Supports stepping, breakpoints, in-line functions, variable inspection etc…
• Built on GDB and uses many of the same CLI commands
• Local/Remote connection support
• Nsight Visual Studio Edition
• IDE integration for Visual Studio
• Build and Debug CPU+GPU code from Visual Studio
• Nsight Visual Studio Code Edition
• New IDE integration for VS Code
• Build and Debug CPU+GPU code from Visual Studio Code
• Remotely target Linux targets from Windows or Linux
• Nsight Eclipse Edition
• IDE integration for Eclipse
• Build and Debug CPU+GPU code from Eclipse
Compute Sanitizer
Automatically Scan for Bugs and Memory Issues
https://github.com/NVIDIA/compute-sanitizer-samples
Compute Sanitizer
Reading a Memcheck Example Report
========= and Device and host backtracesbytes before the nearest allocation at 0x7f953da00000 of size 1,024 bytes
is 140,276,689,190,112
========= Saved host backtrace up to driver entry point at kernel launch time
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= in /home/cuda/github/compute-sanitizer-samples/Memcheck/memcheck_demo
========= in /home/cuda/github/compute-sanitizer-samples/Memcheck/memcheck_demo
Compute Debuggers and IDEs
New Features
Debuggers/IDE
IDEs
• VS Code Autostart tasks for remote debugging.
• VS Code remote debug (QNX/L4T)
• VS Code Docker support
Compute Sanitizer
• Racecheck support for device-launched graphs
• Memcheck support for Address Translation
Service (ATS)
• Memcheck support for Heterogeneous Memory
Management (HMM)
NVTX Tools Extension API
NVIDIA Tools eXtension (NVTX)
• Decorate application source code with annotations (markers, ranges, nested ranges, …) to help visualize execution with debugging, tracing and profiling tools
• Marker:
nvtxMark("This is a marker");
• Push-Pop range
nvtxRangePush("This is a push/pop range");
// Do something interesting in the range
nvtxRangePop(); // Pop must be on same thread as corresponding Push
• Start-End range
nvtxRangeHandle_t handle = nvtxRangeStart("This is a start/end range");
// Somewhere else in the code, not necessarily same thread as Start call:
nvtxRangeEnd(handle);
Accel. GStreamer
GXF GXF
Plugins
cuBLAS cuML
Python and NVTX
Key Features:
• System-wide application algorithm tuning
• Multi-process tree support
• Locate optimization opportunities
• Visualize millions of events on a very fast GUI timeline
• Identify gaps of unused CPU and GPU time
• Balance your workload across multiple CPUs and GPUs
• CPU algorithms, utilization and thread state
GPU streams, kernels, memory transfers, etc
• Command Line, Standalone, IDE Integration
• OS: Linux (x86, ARM Server, Tegra), Windows, macOS X (host)
• GPUs: Pascal+
• Docs/product: https://developer.nvidia.com/nsight-systems
Processes and
threads
Thread state
cuDNN and
cuBLAS trace
Kernel and
memory transfer
activities
Multi-GPU
Zoom/Filter to Exact Areas of Interest
Nsight Systems
New Features
Grace Host Profiling
Hardware Counters and Metrics
Single threaded CPU matrix multiplication with poor memory access patterns
• Extension to JupyterLab
• Profile individual Jupyter cells
• Text-based results can be viewed directly in Jupyter
• Launch new remote GUI streaming container
directly in JupyterLab
• Servers without X, Windowing Manager, …
• Container with X, WM, & WebRTC server
• Dockerfile inside Nsight Systems Installer
• See it in action:
• DLIT61667: Profilers, Python, and Performance:
Nsight Tools for Optimizing Modern CUDA Workloads
Python Profiling Updates
Key Features:
• Interactive CUDA API debugging and kernel profiling
• Built-in rules expertise
• Fully customizable data collection and display
• Command Line, Standalone, IDE Integration, Remote Targets
• OS: Linux (x86, Power, Tegra, Arm SBSA), Windows, macOS X (host only)
• GPUs: Volta+
• Docs/product: https://developer.nvidia.com/nsight-compute
Nsight Compute GUI Interface
Customizable data
collection and
presentation