0% found this document useful (0 votes)
29 views

Performance - DeepStream Documentation 6.4 Documentation

Uploaded by

Huynh Tranvan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Performance - DeepStream Documentation 6.4 Documentation

Uploaded by

Huynh Tranvan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

 » Performance

Performance
DeepStream application is benchmarked across various NVIDIA TAO Toolkit and open source models. The measured
performance represents end-to-end performance of the entire video analytic application considering video capture
and decode, pre-processing, batching, inference, and post-processing to generate metadata. The output rendering is
turned off to achieve peak inference performance. For information on disabling the output rendering, see DeepStream
Reference Application - deepstream-app chapter.

To Run higher number of streams (200+) on Hopper, Ampere and Ada, follow below instructions:

$ sudo service display-manager stop


#Make sure no process is running on GPU i.e. Xorg or trition server etc
$ sudo pkill -9 Xorg
#Remove kernel modules
$ sudo rmmod nvidia_drm nvidia_modeset nvidia
#Load Modules with Regkeys
$ sudo modprobe nvidia NVreg_RegistryDwords="RMDebugOverridePerRunlistChannelRam = 1;RMIncreaseRsvdMemorySizeMB =
1024;RMDisableChIdIsolation = 0x1;RmGspFirmwareHeapSizeMB = 256"
$ sudo service display-manager start

TAO Pre-trained models


TAO toolkit  has a set of pretrained models listed in the table below. If the models below satisfy your requirement,
you should start with one of them. These could be used for various applications in smart city or smart places. If your
application is beyond the scope of these models, you may re-train one of the popular model architecture using TAO
toolkit. The table below shows the end-to-end performance on highly accurate pre-trained models from TAO toolkit.
All models are available on NGC. These models are natively integrated with DeepStream and the instructions to run
these models are in /opt/nvidia/deepstream/deepstream-6.4/samples/configs/tao_pretrained_models/ . The following
numbers are obtained with sample_1080p_h265.mp4 .
Performance jetson- pretrained models

Jetson Jetson Jetson

AGX Orin Orin

Orin NX Nano

Model Arch Inference Precision GPU DLA1 GPU DLA1/ GPU


resolution /DLA2 DLA2
(FPS) (FPS) (FPS)
(FPS) (FPS)

PeopleNet- 960x544 INT8 970 329 372 175 256


ResNet34 

TrafficCamNet – 960x544 INT8 370 NA 180 NA 120


ResNet18  640x480
License Plate 96x48
Detection 
License Plate
Recognition 

TrafficCamNet – 960x544 INT8 1105 512 590 283 419


ResNet18 

DashCamNet – 960x544 INT8 1107 516 574 271 406


ResNet18 

FaceDetectIR- 384x240 INT8 1112 554 963 481 591


ResNet18 

Action 224x224x32 FP16 147 NA 51 NA 34


Recognition(3D
Conv) 

All the models in the table above can run solely on DLA. This saves valuable GPU resources to run more complex
models.

 Note

 Running inference simultaneously on multiple models is not supported on the DLA. You can only run one model
at a time on the DLA.
 NA : Not available for Jetson
 NA* : For these models DLA falls back to GPU

Performance dgpu- pretrained models

A100

T4 PCIe A30 A2 A10

Model Arch Inference Precision Inference GPU GPU GPU GPU GPU
resolution Engine (FPS) (FPS) (FPS) (FPS) (FPS)

PeopleNet- 960x544 INT8 TRT 912 4952 3273 610 2059


ResNet34 
A100

T4 PCIe A30 A2 A10

PeopleNet- 960x544 INT8 Triton 797 4214 2730 522 2081


ResNet34 

PeopleNet- 960x544 INT8 Triton 826 3161 2281 517 1929


ResNet34  gRPC

TrafficCamNet 960x544 INT8 TRT 382 2150 1327 253 1071


– ResNet18  640x480
License Plate 96x48
Detection 
License Plate
recognition 

TrafficCamNet 960x544 INT8 TRT 1296 5292 4483 968 2388


– ResNet18 

DashCamNet – 960x544 INT8 TRT 1358 5322 4391 903 2359


ResNet18 

FaceDetectIR- 384x240 INT8 TRT 2458 5637 5656 3141 3112


ResNet18 

Action 224x224x32 FP16 TRT 173 996 552 74 450


Recognition(3D
Conv) 

Performance dgpu- pretrained models

H100 L40 L4 Quadro (A6000)

Model Arch Inference Precision Inference GPU GPU GPU GPU (FPS)
resolution Engine (FPS) (FPS) (FPS)

PeopleNet- 960x544 INT8 TRT 6920 4443 1674 2787


ResNet34 

PeopleNet- 960x544 INT8 Triton 6150 4080 1506 2833


ResNet34 

PeopleNet- 960x544 INT8 Triton 4822 3560 1451 2466


ResNet34  gRPC

TrafficCamNet 960x544 INT8 TRT 2801 2280 741 1404


– ResNet18  640x480
License Plate 96x48
Detection 
License Plate
recognition 

TrafficCamNet 960x544 INT8 TRT 8259 5176 2485 3092


– ResNet18 
H100 L40 L4 Quadro (A6000)

DashCamNet – 960x544 INT8 TRT 8311 5235 2527 3071


ResNet18 

FaceDetectIR- 384x240 INT8 TRT 8372 5821 5775 3464


ResNet18 

Action 224x224x32 FP16 TRT 1270 870 313 638


Recognition(3D
Conv) 

 Note

 NA : Not available

DeepStream reference model and tracker


DeepStream SDK ships with a reference DetectNet_v2-ResNet10 model and three ResNet18 classifier models. The
detailed instructions to run these models with DeepStream are provided in the next section. DeepStream provides
four reference trackers: IOU, NvSORT, NvDeepSORT and NvDCF. For more information about trackers, See the Gst-
nvtracker section.

Configuration File Settings for Performance Measurement


To achieve peak performance, make sure the devices are properly cooled. For Turing and Ampere GPUs, make sure you
use a server that meets the thermal and airflow requirements. Along with the hardware setup, a few other options in
the config file need to be set to achieve the published performance. Make the required changes to one of the config
files from DeepStream SDK to replicate the peak performance.

Turn off output rendering, OSD, and tiler


OSD (on-screen display) is used to display bounding box, masks, and labels on the screen. If output rendering is
disabled, creating bounding boxes is not required unless the output needs to be streamed over RTSP or saved to
disk. Tiler is used to display the output in NxM tiled grid. It is not needed if rendering is disabled. Output rendering,
OSD and tiler use some percentage of compute resources, so it can reduce the inference performance.

To disable OSD, tiled display and output sink, make the following changes in the DeepStream config file.

 To disable OSD, change enable to 0

[osd]
enable=0

 To disable tiling, change enable to 0

[tiled-display]
enable=0

 To turn-off output rendering, change the sink to fakesink.

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0

Use the max_perf setting for tracker

DeepStream SDK 6.2 introduces a new reference low-level tracker library, NvMultiObjectTracker, along with a set of
configuration files:

 config_tracker_IOU.yml
 config_tracker_NvDCF_max_perf.yml
 config_tracker_NvDCF_perf.yml
 config_tracker_NvDCF_accuracy.yml

To achieve the peak performance shown in the table above when using the NvDCF tracker, make sure the max_perf
configuration is used with video frame resolution matched to that of the inference module. If the inference module
uses 480x272 resolution, for example, it would be recommended to use a reduced resolution (e.g., 480x288) for the
tracker module like the following:

[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_max_perf.yml
gpu-id=0
enable-batch-process=1
display-tracking-id=1

When the IOU tracker is used, the video frame resolution doesn’t matter, and the default config_tracker_IOU.yml can
be used.

To use DLA on Jetson AGX Orin and Orin NX for performance measurement, refer to the Using DLA for inference
section in the Quickstart Guide.
CudaDeviceScheduleBlockingSync flag is set by default on dGPU

 On dGPU only, cudaDeviceScheduleBlockingSync flag is set by default on the GPU where the Deepstream
pipeline runs. In general, for pipelines with multiple streams, this helps in reducing the CPU utilization without
affecting the performance much.
 Setting cudaDeviceScheduleBlockingSync flag when sub batches are enabled in the tracker, results in
significant reduction in CPU utilization with similar or negligible dip in performance.
 When the environment variable NVDS_DISABLE_CUDADEV_BLOCKINGSYNC is set to 1,
cudaDeviceScheduleBlockingSync flag is not set by default.
 There is a remote possibility that setting cudaDeviceScheduleBlockingSync flag might affect the pipeline
peformance negatively when the pipeline already runs with GPU utilization close to 100%. Hence, when the user
encounters a situation where a Deepstream pipeline is GPU bound and the GPU utilization does not reach close
to 100%, then the user may experiment with setting NVDS_DISABLE_CUDADEV_BLOCKINGSYNC to 1 and
check if it helps in improving the performance of the pipeline.

DeepStream reference model


Data center GPU - GA100
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - GA100.

System Configuration
The system configuration for the DeepStream SDK is listed below:

GA100 System configuration

System Configuration Specification

CPU AMD EPYC 7742 @ 2.25GHz 3.4GHz Turbo (Rome) HT Off

GPU A100-PCIE-40GB(GA100) 1*40537 MiB 1*108 SM

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1410 MHz

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:


GA100 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=180


sample_1080p_h264.mp4 (provided with the SDK) N=93

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 180 11% 74.17%

H.264 93 2.57% 41.63%

Data center GPU - T4


This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - T4.

System Configuration
The system configuration for the DeepStream SDK is listed below:

T4 System configuration

System Configuration Specification

CPU Dual Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz (48 threads total)

GPU Tesla T4*

System Memory 360448Mb (22x16384) DDR42666, 2400MHz

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1513 MHz


Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

T4 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=45


sample_1080p_h264.mp4 (provided with the SDK) N=31

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 45 51.81% 100%

H.264 31 2.72% 61.23%

Data center GPU - A30


This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A30.

System Configuration
The system configuration for the DeepStream SDK is listed below:
A30 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU A30

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1440 MHz

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A30 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=150


sample_1080p_h264.mp4 (provided with the SDK) N=98

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 150 41.87% 96.9%


Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.264 98 5.62% 61.33%

Data center GPU - A2


This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A2.

System Configuration
The system configuration for the DeepStream SDK is listed below:

A2 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU A2

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1770 MHz

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:


A2 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=31


sample_1080p_h264.mp4 (provided with the SDK) N=31

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 31 21.91% 100%

H.264 31 21.99% 100%

Data center GPU - A10


This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A10.

System Configuration
The system configuration for the DeepStream SDK is listed below:

A10 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU A10

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

GPU clock frequency 1695 MHz

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A10 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=79


sample_1080p_h264.mp4 (provided with the SDK) N=43

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 79 3.26% 65.59%

H.264 43 1.4% 31.18%

Data center GPU - H100


This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - H100.

System Configuration
The system configuration for the DeepStream SDK is listed below:
H100 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU H100

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

H100 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=229


sample_1080p_h264.mp4 (provided with the SDK) N=148

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 229 2.76% 90.1%

H.264 148 2.6% 42.32%


Data center GPU - L40
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L40.

System Configuration
The system configuration for the DeepStream SDK is listed below:

L40 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU L40

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L40 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=166


sample_1080p_h264.mp4 (provided with the SDK) N=75

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 166 12.65% 71.63%

H.264 75 1.89% 34.57%

Data center GPU - L4


This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L4.

System Configuration
The system configuration for the DeepStream SDK is listed below:

L4 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU L4

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:


L4 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=81


sample_1080p_h264.mp4 (provided with the SDK) N=68

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 81 46.1% 100%

H.264 68 8.06% 75.74%

Data center GPU - Quadro (A6000)


This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - Quadro
(A6000).

System Configuration
The system configuration for the DeepStream SDK is listed below:

Quadro (A6000) System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU Quadro (A6000)

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

Quadro (A6000) application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=101


sample_1080p_h264.mp4 (provided with the SDK) N=49

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 101 7.05% 60.17%

H.264 49 2.68% 28.57%

Data center GPU - A4000


This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A4000.

System Configuration
The system configuration for the DeepStream SDK is listed below:
A4000 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU L4

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A4000 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=49


sample_1080p_h264.mp4 (provided with the SDK) N=24

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 49 0.97% 49.87%

H.264 24 0.48% 24.56%


Data center GPU - L4000
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L4000.

System Configuration
The system configuration for the DeepStream SDK is listed below:

L4000 System configuration

System Configuration Specification

CPU AMD EPYC 7763 @2430 MHz

GPU L4

Ubuntu Ubuntu 22.04

GPU Driver 535.161.08

CUDA 12.2

TensorRT 8.6.1.6

Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L4000 application configuration

Application Configuration Specification

N×1080p 30 fps stream sample_1080p_h265.mp4 (provided with the SDK) N=76


sample_1080p_h264.mp4 (provided with the SDK) N=45

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval=0

Tracker Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs All batches size 32. Asynchronous mode enabled.

 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

Tiled Display Disabled

Rendering Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified
system and application configuration:

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 76 20% 99.25%

H.264 45 0.96% 53.02%

Jetson
This section describes configuration and settings for the DeepStream SDK on NVIDIA Jetson™ platforms. JetPack 6.0
GA is used for software installation.

System Configuration
For the performance test:

1. Max power mode is enabled: $ sudo nvpmodel -m 0 .


2. The GPU clocks are stepped to maximum: $ sudo jetson_clocks

For information about supported power modes, see the “Supported Modes and Power Efficiency” section in the power
management topics of NVIDIA Tegra Linux Driver Package Development Guide, e.g., “Power Management for Jetson
AGX Orin Devices.”

Jetson AGX Orin


Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin™.

Jetson AGX Orin Pipeline Configuration ( deepstream-app )

Application Configuration Specification

N×1080p 30 fps streams sample_1080p_h265.mp4 (provided with the SDK) N=37


sample_1080p_h264.mp4 (provided with the SDK) N=15

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval = 0

Tracker Enabled; processing at 960x544 resolution, IOU tracker enabled.

2× secondary GIEs All batches are size 32. Asynchronous mode enabled.
 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

OSD/tiled display Disabled

Renderer Disabled
Achieved Performance

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 37 21.25% 82.30%

H.264 15 9.49% 36.42%

Jetson Orin NX
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin NX™.

Jetson Orin NX Pipeline Configuration ( deepstream-app )

Application Configuration Specification

N×1080p 30 fps streams sample_1080p_h265.mp4 (provided with the SDK) N=16


sample_1080p_h264.mp4 (provided with the SDK) N=13

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval = 0

Tracker Enabled; processing at 960x544 resolution, IOU tracker enabled.

2× secondary GIEs All batches are size 32. Asynchronous mode enabled.
 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

OSD/tiled display Disabled

Renderer Disabled

Achieved Performance

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 16 19.26% 99%

H.264 13 15.22% 78.52%

Jetson Orin Nano


Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

 Change batch size under streammux and primary-gie to match the number of streams.
 Disable tiled display and rendering using instructions above.
 Enable IOU tracker.
The following tables describe performance results for the NVIDIA Jetson Orin Nano™.

Jetson Orin Nano Pipeline Configuration ( deepstream-app )

Application Configuration Specification

N×1080p 30 fps streams sample_1080p_h265.mp4 (provided with the SDK) N=13


sample_1080p_h264.mp4 (provided with the SDK) N=8

Primary GIE  resnet18_trafficcamnet.etlt


 Batch Size = N
 Interval = 0

Tracker Enabled; processing at 960x544 resolution, IOU tracker enabled.

2× secondary GIEs All batches are size 32. Asynchronous mode enabled.
 Secondary_VehicleTypes (224×224—Resnet18)
 Secondary_VehicleMake (224×224—Resnet18)

OSD/tiled display Disabled

Renderer Disabled

Achieved Performance

Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization

H.265 13 20.65% 99%

H.264 8 12.49% 60.15%

You might also like