0% found this document useful (0 votes)

122 views22 pages

Performance - DeepStream Documentation 6.4 Documentation

Uploaded by

Huynh Tranvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views22 pages

Performance - DeepStream Documentation 6.4 Documentation

Uploaded by

Huynh Tranvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

 » Performance

Performance
DeepStream application is benchmarked across various NVIDIA TAO Toolkit and open source models. The measured
performance represents end-to-end performance of the entire video analytic application considering video capture
and decode, pre-processing, batching, inference, and post-processing to generate metadata. The output rendering is
turned off to achieve peak inference performance. For information on disabling the output rendering, see DeepStream
Reference Application - deepstream-app chapter.

To Run higher number of streams (200+) on Hopper, Ampere and Ada, follow below instructions:

$ sudo service display-manager stop

#Make sure no process is running on GPU i.e. Xorg or trition server etc
$ sudo pkill -9 Xorg
#Remove kernel modules
$ sudo rmmod nvidia_drm nvidia_modeset nvidia
#Load Modules with Regkeys
$ sudo modprobe nvidia NVreg_RegistryDwords="RMDebugOverridePerRunlistChannelRam = 1;RMIncreaseRsvdMemorySizeMB =
1024;RMDisableChIdIsolation = 0x1;RmGspFirmwareHeapSizeMB = 256"
$ sudo service display-manager start

TAO Pre-trained models

TAO toolkit  has a set of pretrained models listed in the table below. If the models below satisfy your requirement,
you should start with one of them. These could be used for various applications in smart city or smart places. If your
application is beyond the scope of these models, you may re-train one of the popular model architecture using TAO
toolkit. The table below shows the end-to-end performance on highly accurate pre-trained models from TAO toolkit.
All models are available on NGC. These models are natively integrated with DeepStream and the instructions to run
these models are in /opt/nvidia/deepstream/deepstream-6.4/samples/configs/tao_pretrained_models/ . The following
numbers are obtained with sample_1080p_h265.mp4 .
Performance jetson- pretrained models

Jetson Jetson Jetson

AGX Orin Orin

Orin NX Nano

Model Arch Inference Precision GPU DLA1 GPU DLA1/ GPU

resolution /DLA2 DLA2
(FPS) (FPS) (FPS)
(FPS) (FPS)

PeopleNet- 960x544 INT8 970 329 372 175 256

ResNet34 

TrafficCamNet – 960x544 INT8 370 NA 180 NA 120

ResNet18  640x480
License Plate 96x48
Detection 
License Plate
Recognition 

TrafficCamNet – 960x544 INT8 1105 512 590 283 419

ResNet18 

DashCamNet – 960x544 INT8 1107 516 574 271 406

ResNet18 

FaceDetectIR- 384x240 INT8 1112 554 963 481 591

ResNet18 

Action 224x224x32 FP16 147 NA 51 NA 34

Recognition(3D
Conv) 

All the models in the table above can run solely on DLA. This saves valuable GPU resources to run more complex
models.

 Note

 Running inference simultaneously on multiple models is not supported on the DLA. You can only run one model
at a time on the DLA.
 NA : Not available for Jetson
 NA* : For these models DLA falls back to GPU

Performance dgpu- pretrained models

A100

T4 PCIe A30 A2 A10

Model Arch Inference Precision Inference GPU GPU GPU GPU GPU
resolution Engine (FPS) (FPS) (FPS) (FPS) (FPS)

PeopleNet- 960x544 INT8 TRT 912 4952 3273 610 2059

ResNet34 
A100

T4 PCIe A30 A2 A10

PeopleNet- 960x544 INT8 Triton 797 4214 2730 522 2081

ResNet34 

PeopleNet- 960x544 INT8 Triton 826 3161 2281 517 1929

ResNet34  gRPC

TrafficCamNet 960x544 INT8 TRT 382 2150 1327 253 1071

– ResNet18  640x480
License Plate 96x48
Detection 
License Plate
recognition 

TrafficCamNet 960x544 INT8 TRT 1296 5292 4483 968 2388

– ResNet18 

DashCamNet – 960x544 INT8 TRT 1358 5322 4391 903 2359

ResNet18 

FaceDetectIR- 384x240 INT8 TRT 2458 5637 5656 3141 3112

ResNet18 

Action 224x224x32 FP16 TRT 173 996 552 74 450

Recognition(3D
Conv) 

Performance dgpu- pretrained models

H100 L40 L4 Quadro (A6000)

Model Arch Inference Precision Inference GPU GPU GPU GPU (FPS)
resolution Engine (FPS) (FPS) (FPS)

PeopleNet- 960x544 INT8 TRT 6920 4443 1674 2787

ResNet34 

PeopleNet- 960x544 INT8 Triton 6150 4080 1506 2833

ResNet34 

PeopleNet- 960x544 INT8 Triton 4822 3560 1451 2466

ResNet34  gRPC

TrafficCamNet 960x544 INT8 TRT 2801 2280 741 1404

– ResNet18  640x480
License Plate 96x48
Detection 
License Plate
recognition 

TrafficCamNet 960x544 INT8 TRT 8259 5176 2485 3092

– ResNet18 
H100 L40 L4 Quadro (A6000)

DashCamNet – 960x544 INT8 TRT 8311 5235 2527 3071

ResNet18 

FaceDetectIR- 384x240 INT8 TRT 8372 5821 5775 3464

ResNet18 

Action 224x224x32 FP16 TRT 1270 870 313 638

Recognition(3D
Conv) 

 Note

 NA : Not available

DeepStream reference model and tracker

DeepStream SDK ships with a reference DetectNet_v2-ResNet10 model and three ResNet18 classifier models. The
detailed instructions to run these models with DeepStream are provided in the next section. DeepStream provides
four reference trackers: IOU, NvSORT, NvDeepSORT and NvDCF. For more information about trackers, See the Gst-
nvtracker section.

Configuration File Settings for Performance Measurement

To achieve peak performance, make sure the devices are properly cooled. For Turing and Ampere GPUs, make sure you
use a server that meets the thermal and airflow requirements. Along with the hardware setup, a few other options in
the config file need to be set to achieve the published performance. Make the required changes to one of the config
files from DeepStream SDK to replicate the peak performance.

Turn off output rendering, OSD, and tiler

OSD (on-screen display) is used to display bounding box, masks, and labels on the screen. If output rendering is
disabled, creating bounding boxes is not required unless the output needs to be streamed over RTSP or saved to
disk. Tiler is used to display the output in NxM tiled grid. It is not needed if rendering is disabled. Output rendering,
OSD and tiler use some percentage of compute resources, so it can reduce the inference performance.

To disable OSD, tiled display and output sink, make the following changes in the DeepStream config file.

 To disable OSD, change enable to 0

[osd]
enable=0

 To disable tiling, change enable to 0

[tiled-display]
enable=0

 To turn-off output rendering, change the sink to fakesink.

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0

Use the max_perf setting for tracker

DeepStream SDK 6.2 introduces a new reference low-level tracker library, NvMultiObjectTracker, along with a set of
configuration files:

 config_tracker_IOU.yml
 config_tracker_NvDCF_max_perf.yml
 config_tracker_NvDCF_perf.yml
 config_tracker_NvDCF_accuracy.yml

To achieve the peak performance shown in the table above when using the NvDCF tracker, make sure the max_perf
configuration is used with video frame resolution matched to that of the inference module. If the inference module
uses 480x272 resolution, for example, it would be recommended to use a reduced resolution (e.g., 480x288) for the
tracker module like the following:

[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_max_perf.yml
gpu-id=0
enable-batch-process=1
display-tracking-id=1

When the IOU tracker is used, the video frame resolution doesn’t matter, and the default config_tracker_IOU.yml can
be used.

To use DLA on Jetson AGX Orin and Orin NX for performance measurement, refer to the Using DLA for inference
section in the Quickstart Guide.
CudaDeviceScheduleBlockingSync flag is set by default on dGPU

 On dGPU only, cudaDeviceScheduleBlockingSync flag is set by default on the GPU where the Deepstream
pipeline runs. In general, for pipelines with multiple streams, this helps in reducing the CPU utilization without
affecting the performance much.
 Setting cudaDeviceScheduleBlockingSync flag when sub batches are enabled in the tracker, results in
significant reduction in CPU utilization with similar or negligible dip in performance.
 When the environment variable NVDS_DISABLE_CUDADEV_BLOCKINGSYNC is set to 1,
cudaDeviceScheduleBlockingSync flag is not set by default.
 There is a remote possibility that setting cudaDeviceScheduleBlockingSync flag might affect the pipeline
peformance negatively when the pipeline already runs with GPU utilization close to 100%. Hence, when the user
encounters a situation where a Deepstream pipeline is GPU bound and the GPU utilization does not reach close
to 100%, then the user may experiment with setting NVDS_DISABLE_CUDADEV_BLOCKINGSYNC to 1 and
check if it helps in improving the performance of the pipeline.

DeepStream reference model

Data center GPU - GA100
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - GA100.

System Configuration
The system configuration for the DeepStream SDK is listed below:

GA100 System configuration

System Configuration Specification

CPU AMD EPYC 7742 @ 2.25GHz 3.4GHz Turbo (Rome) HT Off

GPU A100-PCIE-40GB(GA100) 140537 MiB 1108 SM

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1410 MHz

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

GA100 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=180

sample_1080p_h264.mp4 (provided with the SDK) N=93

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 180 11% 74.17%

H.264 93 2.57% 41.63%

Data center GPU - T4

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - T4.

System Configuration
The system configuration for the DeepStream SDK is listed below:

T4 System configuration

System Configuration Specification

CPU Dual Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz (48 threads total)

GPU Tesla T4*

System Memory 360448Mb (22x16384) DDR42666, 2400MHz

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1513 MHz

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

T4 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=45

sample_1080p_h264.mp4 (provided with the SDK) N=31

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 45 51.81% 100%

H.264 31 2.72% 61.23%

Data center GPU - A30

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A30.

System Configuration
The system configuration for the DeepStream SDK is listed below:
A30 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU A30

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1440 MHz

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A30 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=150

sample_1080p_h264.mp4 (provided with the SDK) N=98

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 150 41.87% 96.9%

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.264 98 5.62% 61.33%

Data center GPU - A2

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A2.

System Configuration
The system configuration for the DeepStream SDK is listed below:

A2 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU A2

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1770 MHz

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A2 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=31

sample_1080p_h264.mp4 (provided with the SDK) N=31

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 31 21.91% 100%

H.264 31 21.99% 100%

Data center GPU - A10

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A10.

System Configuration
The system configuration for the DeepStream SDK is listed below:

A10 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU A10

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1695 MHz

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A10 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=79

sample_1080p_h264.mp4 (provided with the SDK) N=43

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 79 3.26% 65.59%

H.264 43 1.4% 31.18%

Data center GPU - H100

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - H100.

System Configuration
The system configuration for the DeepStream SDK is listed below:
H100 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU H100

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

H100 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=229

sample_1080p_h264.mp4 (provided with the SDK) N=148

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 229 2.76% 90.1%

H.264 148 2.6% 42.32%

Data center GPU - L40
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L40.

System Configuration
The system configuration for the DeepStream SDK is listed below:

L40 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU L40

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L40 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=166

sample_1080p_h264.mp4 (provided with the SDK) N=75

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 166 12.65% 71.63%

H.264 75 1.89% 34.57%

Data center GPU - L4

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L4.

System Configuration
The system configuration for the DeepStream SDK is listed below:

L4 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU L4

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L4 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=81

sample_1080p_h264.mp4 (provided with the SDK) N=68

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 81 46.1% 100%

H.264 68 8.06% 75.74%

Data center GPU - Quadro (A6000)

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - Quadro
(A6000).

System Configuration
The system configuration for the DeepStream SDK is listed below:

Quadro (A6000) System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU Quadro (A6000)

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

Quadro (A6000) application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=101

sample_1080p_h264.mp4 (provided with the SDK) N=49

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 101 7.05% 60.17%

H.264 49 2.68% 28.57%

Data center GPU - A4000

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A4000.

System Configuration
The system configuration for the DeepStream SDK is listed below:
A4000 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU L4

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A4000 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=49

sample_1080p_h264.mp4 (provided with the SDK) N=24

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 49 0.97% 49.87%

H.264 24 0.48% 24.56%

Data center GPU - L4000
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L4000.

System Configuration
The system configuration for the DeepStream SDK is listed below:

L4000 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU L4

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L4000 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=76

sample_1080p_h264.mp4 (provided with the SDK) N=45

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 76 20% 99.25%

H.264 45 0.96% 53.02%

Jetson
This section describes configuration and settings for the DeepStream SDK on NVIDIA Jetson™ platforms. JetPack 6.0
GA is used for software installation.

System Configuration
For the performance test:

1. Max power mode is enabled: $ sudo nvpmodel -m 0 .

2. The GPU clocks are stepped to maximum: $ sudo jetson_clocks

For information about supported power modes, see the “Supported Modes and Power Efficiency” section in the power
management topics of NVIDIA Tegra Linux Driver Package Development Guide, e.g., “Power Management for Jetson
AGX Orin Devices.”

Jetson AGX Orin

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin™.

Jetson AGX Orin Pipeline Configuration ( deepstream-app )

Application Configuration Specification

N×1080p 30 fps streams sample_1080p_h265.mp4 (provided with the SDK) N=37

sample_1080p_h264.mp4 (provided with the SDK) N=15

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval = 0

Tracker Enabled; processing at 960x544 resolution, IOU tracker enabled.

2× secondary GIEs All batches are size 32. Asynchronous mode enabled.
 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

OSD/tiled display Disabled

Renderer Disabled
Achieved Performance

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 37 21.25% 82.30%

H.264 15 9.49% 36.42%

Jetson Orin NX
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin NX™.

Jetson Orin NX Pipeline Configuration ( deepstream-app )

Application Configuration Specification

N×1080p 30 fps streams sample_1080p_h265.mp4 (provided with the SDK) N=16

sample_1080p_h264.mp4 (provided with the SDK) N=13

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval = 0

Tracker Enabled; processing at 960x544 resolution, IOU tracker enabled.

2× secondary GIEs All batches are size 32. Asynchronous mode enabled.
 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

OSD/tiled display Disabled

Renderer Disabled

Achieved Performance

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 16 19.26% 99%

H.264 13 15.22% 78.52%

Jetson Orin Nano

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IOU tracker.
The following tables describe performance results for the NVIDIA Jetson Orin Nano™.

Jetson Orin Nano Pipeline Configuration ( deepstream-app )

Application Configuration Specification

N×1080p 30 fps streams sample_1080p_h265.mp4 (provided with the SDK) N=13

sample_1080p_h264.mp4 (provided with the SDK) N=8

Primary GIE  resnet18_trafficcamnet.etlt

 Batch Size = N
 Interval = 0

Tracker Enabled; processing at 960x544 resolution, IOU tracker enabled.

2× secondary GIEs All batches are size 32. Asynchronous mode enabled.
 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

OSD/tiled display Disabled

Renderer Disabled

Achieved Performance

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 13 20.65% 99%

H.264 8 12.49% 60.15%

Aml 47 1 Real-Time Face Detection and Recognition Using Deep Learning and Nvidia Optimised Models
No ratings yet
Aml 47 1 Real-Time Face Detection and Recognition Using Deep Learning and Nvidia Optimised Models
4 pages
s7459 VR Rendering Improvements Featuring Autodesk Vred
No ratings yet
s7459 VR Rendering Improvements Featuring Autodesk Vred
47 pages
Deep-Learning-Optimization
No ratings yet
Deep-Learning-Optimization
62 pages
NVIDIA’s AI Stack
No ratings yet
NVIDIA’s AI Stack
14 pages
ar-sdk-programming-guide
No ratings yet
ar-sdk-programming-guide
116 pages
DGX A100 System Architecture Whitepaper
No ratings yet
DGX A100 System Architecture Whitepaper
23 pages
Nvidia Gtc2024 Keynote
No ratings yet
Nvidia Gtc2024 Keynote
71 pages
output-05
No ratings yet
output-05
68 pages
GDC A Guided Tour of Blackreef
No ratings yet
GDC A Guided Tour of Blackreef
74 pages
CUDA
No ratings yet
CUDA
54 pages
DeepLearning L9
No ratings yet
DeepLearning L9
94 pages
Nvidia Ampere Architecture Whitepaper
No ratings yet
Nvidia Ampere Architecture Whitepaper
83 pages
Nvidia Profiling Tools Keipert 10 4 22
No ratings yet
Nvidia Profiling Tools Keipert 10 4 22
27 pages
TensorRT Sample Support Guide
No ratings yet
TensorRT Sample Support Guide
52 pages
051024_Nvidia_update_for_Lenovo[1]
No ratings yet
051024_Nvidia_update_for_Lenovo[1]
30 pages
inference-whitepaper-mar23-update
No ratings yet
inference-whitepaper-mar23-update
42 pages
NVIDIA BUSINESS MODEL
No ratings yet
NVIDIA BUSINESS MODEL
2 pages
TENSORRT
No ratings yet
TENSORRT
26 pages
Slides Deep Learning On AWS With NVIDIA From Training To Deployment
No ratings yet
Slides Deep Learning On AWS With NVIDIA From Training To Deployment
48 pages
Playing For Benchmarks (1709.07322)
No ratings yet
Playing For Benchmarks (1709.07322)
10 pages
Tech-Brief-Virtual-GPU-Positioning
No ratings yet
Tech-Brief-Virtual-GPU-Positioning
26 pages
02_efficient… (3) - JupyterLab
No ratings yet
02_efficient… (3) - JupyterLab
19 pages
GPU友好稀疏量化Boost Vision Transformer
No ratings yet
GPU友好稀疏量化Boost Vision Transformer
11 pages
NVIDIA Turing Architecture Whitepaper PDF
No ratings yet
NVIDIA Turing Architecture Whitepaper PDF
87 pages
Instruction Mix
No ratings yet
Instruction Mix
14 pages
Instruction Mix
No ratings yet
Instruction Mix
14 pages
Deprecation Notices
No ratings yet
Deprecation Notices
5 pages
Research Paper (1)
No ratings yet
Research Paper (1)
15 pages
Performance Analysis of Deep Learning Workloads on Leading-edge Systems
No ratings yet
Performance Analysis of Deep Learning Workloads on Leading-edge Systems
13 pages
Edge Impulse
No ratings yet
Edge Impulse
12 pages
Instruction Mix
No ratings yet
Instruction Mix
14 pages
NVIDIA Virtual GPU Software QuickSpecs
No ratings yet
NVIDIA Virtual GPU Software QuickSpecs
12 pages
Nvidia GTX 1060 Laptop Results AID64
No ratings yet
Nvidia GTX 1060 Laptop Results AID64
3 pages
Lu - Ga - Yeu - Ipynb - Colab
No ratings yet
Lu - Ga - Yeu - Ipynb - Colab
4 pages
Nvidia Ada Gpu Science
No ratings yet
Nvidia Ada Gpu Science
17 pages
t4 Inference Print Update Inference Tech Overview Final
No ratings yet
t4 Inference Print Update Inference Tech Overview Final
25 pages
RTX On - The Nvidia Turing Gpu
No ratings yet
RTX On - The Nvidia Turing Gpu
6 pages
a2-datasheet
No ratings yet
a2-datasheet
3 pages
Custom YOLO Model in the DeepStream YOLO App
No ratings yet
Custom YOLO Model in the DeepStream YOLO App
6 pages
Radeon Pro w6000 Topaz
No ratings yet
Radeon Pro w6000 Topaz
2 pages
1811.08309
No ratings yet
1811.08309
14 pages
2401.01339
No ratings yet
2401.01339
13 pages
Nvidia A40 Datasheet
No ratings yet
Nvidia A40 Datasheet
2 pages
GPU Bootcamp Samhar
100% (1)
GPU Bootcamp Samhar
96 pages
Parameters to compare GPUs (1) (1)
No ratings yet
Parameters to compare GPUs (1) (1)
7 pages
Unit 05 - AI Software Ecosystem - Summary
No ratings yet
Unit 05 - AI Software Ecosystem - Summary
2 pages
NVIDIA System Information 04-01-2025 09-31-39
No ratings yet
NVIDIA System Information 04-01-2025 09-31-39
1 page
Nvidia Professional Graphics Solutions: Laptop Gpus
No ratings yet
Nvidia Professional Graphics Solutions: Laptop Gpus
1 page
Nvidia_Business_Model_Canvas
No ratings yet
Nvidia_Business_Model_Canvas
3 pages
The Fastest PC Gpu For Developer Workflows: Nvidia Titan RTX
No ratings yet
The Fastest PC Gpu For Developer Workflows: Nvidia Titan RTX
1 page
rtx-5000-ada-datasheet-2788511-v6
No ratings yet
rtx-5000-ada-datasheet-2788511-v6
2 pages
Nvidia Professional Graphics Solutions: Nvidia Laptop Gpus Nvidia Desktop Workstations Gpus Nvidia Servers Gpus
No ratings yet
Nvidia Professional Graphics Solutions: Nvidia Laptop Gpus Nvidia Desktop Workstations Gpus Nvidia Servers Gpus
2 pages
Random Tech
No ratings yet
Random Tech
4 pages
[MIS BANA3050] Draft #2 - Group 7
No ratings yet
[MIS BANA3050] Draft #2 - Group 7
2 pages
History
No ratings yet
History
4 pages
NVIDIA Intelligent Video Analytics Platform Infographic Poster
No ratings yet
NVIDIA Intelligent Video Analytics Platform Infographic Poster
1 page
Proviz Print Nvidia RTX A6000 Datasheet Us Nvidia 1454980 r9 Web
No ratings yet
Proviz Print Nvidia RTX A6000 Datasheet Us Nvidia 1454980 r9 Web
2 pages
WBP Syllabus Sem2 BCA GGSIPU 2023-26
No ratings yet
WBP Syllabus Sem2 BCA GGSIPU 2023-26
2 pages
Nvidia Professional Graphics Solutions: Quadro in Mobile Workstations Quadro in Desktop Workstations Quadro in Servers
No ratings yet
Nvidia Professional Graphics Solutions: Quadro in Mobile Workstations Quadro in Desktop Workstations Quadro in Servers
2 pages
SEO Audit & Action Plan
100% (6)
SEO Audit & Action Plan
33 pages
NGD Mini Notes
No ratings yet
NGD Mini Notes
7 pages
TOP a Review of Business Intelligence and Its Maturity
No ratings yet
TOP a Review of Business Intelligence and Its Maturity
6 pages
Smart Mirror _final_doc (4)
No ratings yet
Smart Mirror _final_doc (4)
46 pages
SAP CPI-DS - Doc
No ratings yet
SAP CPI-DS - Doc
444 pages
EC ecCLOUD UserManual
No ratings yet
EC ecCLOUD UserManual
277 pages
OUTPUT "Please Enter The Number of Seats You Want To Book " INPUT Seats
No ratings yet
OUTPUT "Please Enter The Number of Seats You Want To Book " INPUT Seats
5 pages
4.EnhancingCybersecurityMeasuresforRobustFraudDetectionandPreventioninU.S.onlineBanking
No ratings yet
4.EnhancingCybersecurityMeasuresforRobustFraudDetectionandPreventioninU.S.onlineBanking
18 pages
Java and Mathematica
No ratings yet
Java and Mathematica
4 pages
Chapter05 Exercises
No ratings yet
Chapter05 Exercises
11 pages
Basic Trouble Shooting With VPLS: JTAC - Steven Wong
No ratings yet
Basic Trouble Shooting With VPLS: JTAC - Steven Wong
97 pages
10.Amazon Web Services - Lambda
No ratings yet
10.Amazon Web Services - Lambda
5 pages
MODUL 3 - Creating An Interconnected IP Network
No ratings yet
MODUL 3 - Creating An Interconnected IP Network
28 pages
SAP - CRM Functional
No ratings yet
SAP - CRM Functional
2 pages
Accounting Information Systems: Fourteenth Edition
No ratings yet
Accounting Information Systems: Fourteenth Edition
17 pages
Polymorphism-in-Java 17
No ratings yet
Polymorphism-in-Java 17
7 pages
ST2153 & ST2154
No ratings yet
ST2153 & ST2154
53 pages
Q1 Single Line Comments in Python Begin With Symbol.: Most Important Multiple Choice Questions
No ratings yet
Q1 Single Line Comments in Python Begin With Symbol.: Most Important Multiple Choice Questions
17 pages
Driver
No ratings yet
Driver
47 pages
A Arte de Ler Mentes em Portugues Do Brasil
No ratings yet
A Arte de Ler Mentes em Portugues Do Brasil
3 pages
Registration No. Name Internship Details (Company Name & Duration)
No ratings yet
Registration No. Name Internship Details (Company Name & Duration)
21 pages
Getting Your System Ready For IREPS Application Version 2.0
No ratings yet
Getting Your System Ready For IREPS Application Version 2.0
21 pages
100 Mbit (Cat 5) Network Cable Wiring Pinout Diagram at Pinouts
No ratings yet
100 Mbit (Cat 5) Network Cable Wiring Pinout Diagram at Pinouts
2 pages
What Is Data Extraction
No ratings yet
What Is Data Extraction
13 pages
MSM 7200 Qualcomm
No ratings yet
MSM 7200 Qualcomm
4 pages
Assignment 2, Part 2: Measuring Program Efficiency: Calculate The Big-O Running Time of Your Code
No ratings yet
Assignment 2, Part 2: Measuring Program Efficiency: Calculate The Big-O Running Time of Your Code
2 pages
Oracle On Demand Infrastructure: Virtualization With Oracle VM
No ratings yet
Oracle On Demand Infrastructure: Virtualization With Oracle VM
9 pages
IAPP CERTIFICATION ExamUpdates 072120.2 PDF
No ratings yet
IAPP CERTIFICATION ExamUpdates 072120.2 PDF
1 page
Resume
No ratings yet
Resume
1 page
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
From Everand
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
Mulayam Singh
No ratings yet

Performance - DeepStream Documentation 6.4 Documentation

Uploaded by

Performance - DeepStream Documentation 6.4 Documentation

Uploaded by

 » Performance

$ sudo service display-manager stop

TAO Pre-trained models

Jetson Jetson Jetson

AGX Orin Orin

Model Arch Inference Precision GPU DLA1 GPU DLA1/ GPU

PeopleNet- 960x544 INT8 970 329 372 175 256

TrafficCamNet – 960x544 INT8 370 NA 180 NA 120

TrafficCamNet – 960x544 INT8 1105 512 590 283 419

DashCamNet – 960x544 INT8 1107 516 574 271 406

FaceDetectIR- 384x240 INT8 1112 554 963 481 591

Action 224x224x32 FP16 147 NA 51 NA 34

Performance dgpu- pretrained models

T4 PCIe A30 A2 A10

PeopleNet- 960x544 INT8 TRT 912 4952 3273 610 2059

T4 PCIe A30 A2 A10

PeopleNet- 960x544 INT8 Triton 797 4214 2730 522 2081

PeopleNet- 960x544 INT8 Triton 826 3161 2281 517 1929

TrafficCamNet 960x544 INT8 TRT 382 2150 1327 253 1071

TrafficCamNet 960x544 INT8 TRT 1296 5292 4483 968 2388

DashCamNet – 960x544 INT8 TRT 1358 5322 4391 903 2359

FaceDetectIR- 384x240 INT8 TRT 2458 5637 5656 3141 3112

Action 224x224x32 FP16 TRT 173 996 552 74 450

Performance dgpu- pretrained models

H100 L40 L4 Quadro (A6000)

PeopleNet- 960x544 INT8 TRT 6920 4443 1674 2787

PeopleNet- 960x544 INT8 Triton 6150 4080 1506 2833

PeopleNet- 960x544 INT8 Triton 4822 3560 1451 2466

TrafficCamNet 960x544 INT8 TRT 2801 2280 741 1404

TrafficCamNet 960x544 INT8 TRT 8259 5176 2485 3092

DashCamNet – 960x544 INT8 TRT 8311 5235 2527 3071

FaceDetectIR- 384x240 INT8 TRT 8372 5821 5775 3464

Action 224x224x32 FP16 TRT 1270 870 313 638

DeepStream reference model and tracker

Configuration File Settings for Performance Measurement

Turn off output rendering, OSD, and tiler

 To disable OSD, change enable to 0

 To disable tiling, change enable to 0

 To turn-off output rendering, change the sink to fakesink.

Use the max_perf setting for tracker

DeepStream reference model

GA100 System configuration

System Configuration Specification

CPU AMD EPYC 7742 @ 2.25GHz 3.4GHz Turbo (Rome) HT Off

GPU A100-PCIE-40GB(GA100) 1*40537 MiB 1*108 SM

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

GPU clock frequency 1410 MHz

Change the following items in the config file:

The application configuration for the DeepStream SDK is listed below:

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=180

Primary GIE  resnet18_trafficcamnet.etlt

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

Tiled Display Disabled

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 180 11% 74.17%

H.264 93 2.57% 41.63%

Data center GPU - T4

System Configuration Specification

GPU Tesla T4*

System Memory 360448Mb (22x16384) DDR42666, 2400MHz

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

GPU clock frequency 1513 MHz

Change the following items in the config file:

The application configuration for the DeepStream SDK is listed below:

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=45

Primary GIE  resnet18_trafficcamnet.etlt

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

Tiled Display Disabled

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 45 51.81% 100%

GPU A100-PCIE-40GB(GA100) 140537 MiB 1108 SM