Performance - DeepStream Documentation 6.4 Documentation
Performance - DeepStream Documentation 6.4 Documentation
Performance
DeepStream application is benchmarked across various NVIDIA TAO Toolkit and open source models. The measured
performance represents end-to-end performance of the entire video analytic application considering video capture
and decode, pre-processing, batching, inference, and post-processing to generate metadata. The output rendering is
turned off to achieve peak inference performance. For information on disabling the output rendering, see DeepStream
Reference Application - deepstream-app chapter.
To Run higher number of streams (200+) on Hopper, Ampere and Ada, follow below instructions:
Orin NX Nano
All the models in the table above can run solely on DLA. This saves valuable GPU resources to run more complex
models.
Note
Running inference simultaneously on multiple models is not supported on the DLA. You can only run one model
at a time on the DLA.
NA : Not available for Jetson
NA* : For these models DLA falls back to GPU
A100
Model Arch Inference Precision Inference GPU GPU GPU GPU GPU
resolution Engine (FPS) (FPS) (FPS) (FPS) (FPS)
Model Arch Inference Precision Inference GPU GPU GPU GPU (FPS)
resolution Engine (FPS) (FPS) (FPS)
Note
NA : Not available
To disable OSD, tiled display and output sink, make the following changes in the DeepStream config file.
[osd]
enable=0
[tiled-display]
enable=0
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
DeepStream SDK 6.2 introduces a new reference low-level tracker library, NvMultiObjectTracker, along with a set of
configuration files:
config_tracker_IOU.yml
config_tracker_NvDCF_max_perf.yml
config_tracker_NvDCF_perf.yml
config_tracker_NvDCF_accuracy.yml
To achieve the peak performance shown in the table above when using the NvDCF tracker, make sure the max_perf
configuration is used with video frame resolution matched to that of the inference module. If the inference module
uses 480x272 resolution, for example, it would be recommended to use a reduced resolution (e.g., 480x288) for the
tracker module like the following:
[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_max_perf.yml
gpu-id=0
enable-batch-process=1
display-tracking-id=1
When the IOU tracker is used, the video frame resolution doesn’t matter, and the default config_tracker_IOU.yml can
be used.
To use DLA on Jetson AGX Orin and Orin NX for performance measurement, refer to the Using DLA for inference
section in the Quickstart Guide.
CudaDeviceScheduleBlockingSync flag is set by default on dGPU
On dGPU only, cudaDeviceScheduleBlockingSync flag is set by default on the GPU where the Deepstream
pipeline runs. In general, for pipelines with multiple streams, this helps in reducing the CPU utilization without
affecting the performance much.
Setting cudaDeviceScheduleBlockingSync flag when sub batches are enabled in the tracker, results in
significant reduction in CPU utilization with similar or negligible dip in performance.
When the environment variable NVDS_DISABLE_CUDADEV_BLOCKINGSYNC is set to 1,
cudaDeviceScheduleBlockingSync flag is not set by default.
There is a remote possibility that setting cudaDeviceScheduleBlockingSync flag might affect the pipeline
peformance negatively when the pipeline already runs with GPU utilization close to 100%. Hence, when the user
encounters a situation where a Deepstream pipeline is GPU bound and the GPU utilization does not reach close
to 100%, then the user may experiment with setting NVDS_DISABLE_CUDADEV_BLOCKINGSYNC to 1 and
check if it helps in improving the performance of the pipeline.
System Configuration
The system configuration for the DeepStream SDK is listed below:
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
T4 System configuration
CPU Dual Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz (48 threads total)
CUDA 12.2
TensorRT 8.6.1.6
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
T4 application configuration
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
A30 System configuration
GPU A30
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
A2 System configuration
GPU A2
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
GPU A10
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
H100 System configuration
GPU H100
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
GPU L40
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
L4 System configuration
GPU L4
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
A4000 System configuration
GPU L4
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
System Configuration
The system configuration for the DeepStream SDK is listed below:
GPU L4
CUDA 12.2
TensorRT 8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:
Jetson
This section describes configuration and settings for the DeepStream SDK on NVIDIA Jetson™ platforms. JetPack 6.0
GA is used for software installation.
System Configuration
For the performance test:
For information about supported power modes, see the “Supported Modes and Power Efficiency” section in the power
management topics of NVIDIA Tegra Linux Driver Package Development Guide, e.g., “Power Management for Jetson
AGX Orin Devices.”
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IOU tracker.
The following tables describe performance results for the NVIDIA Jetson Orin™.
2× secondary GIEs All batches are size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Renderer Disabled
Achieved Performance
Jetson Orin NX
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IOU tracker.
The following tables describe performance results for the NVIDIA Jetson Orin NX™.
2× secondary GIEs All batches are size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Renderer Disabled
Achieved Performance
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IOU tracker.
The following tables describe performance results for the NVIDIA Jetson Orin Nano™.
2× secondary GIEs All batches are size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Renderer Disabled
Achieved Performance