Transfer Learning Toolkit Getting Started Guide IVA
Transfer Learning Toolkit Getting Started Guide IVA
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | ii
Analytics
8.2. Running inference on a DetectNet_v2 model.......................................................85
8.3. Running inference on a FasterRCNN model......................................................... 87
8.4. Running inference on an SSD model................................................................. 89
Chapter 9. Pruning the model............................................................................... 91
Chapter 10. Exporting the model........................................................................... 93
Chapter 11. Deploying to DeepStream..................................................................... 97
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | iii
Analytics
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | iv
Analytics
Chapter 1.
OVERVIEW
NVIDIA Transfer Learning Toolkit is a Python package that enables NVIDIA customers
to fine-tune pre-trained models with their own data. Customers can then export these
models for TensorRT based inference through an edge device.
This software is used to train computer vision and deep learning models for streaming
analytics use cases. In this release the following applications are supported:
‣ Classification
‣ Object Detection
Under object detection the following meta-architectures are supported:
‣ DetectNet_v2
‣ SSD
‣ FasterRCNN
Use the Transfer Learning Toolkit to perform these tasks:
‣ Download the model - Download pre-trained models.
‣ Evaluate the model - Evaluate models for target predictions.
‣ Train the model - Train or re-train data to create and refine models.
‣ Prune the model - Prune models to reduce size.
‣ Export the model - Export models for TensorRT inference.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 1
Analytics
Chapter 2.
TRANSFER LEARNING TOOLKIT
REQUIREMENTS
Hardware Requirements
Minimum
‣ 4 GB system RAM
‣ 4 GB of GPU RAM
‣ Single core CPU
‣ 1 GPU
‣ 50 GB of HDD space
Recommended
‣ 32 GB system RAM
‣ 32 GB of GPU RAM
‣ 8 core CPU
‣ 4 GPUs
‣ 100 GB of SSD space
Software Requirements
‣ Ubuntu 18.04 LTS
‣ NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/
‣ docker-ce installed, https://docs.docker.com/install/linux/docker-ce/ubuntu/
‣ nvidia-docker2 installed, instructions: https://github.com/nvidia/nvidia-docker/wiki/
Installation-(version-2.0)
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 2
Analytics
Transfer Learning Toolkit Requirements
Model Requirements
Classification
Classification input images do not need to be manually resized. The input dataloader
resizes images as needed.
DetectNet_v2
The tlt-train tool does not support training on images of multiple resolutions, or
resizing images during training. All of the images must be resized offline to the final
training size and the corresponding bounding boxes must be scaled accordingly.
SSD
The tlt-train tool does not support training on images of multiple resolutions, or
resizing images during training. All of the images must be resized offline to the final
training size and the corresponding bounding boxes must be scaled accordingly.
FasterRCNN
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 3
Analytics
Transfer Learning Toolkit Requirements
The FasterRCNN app will resize the input images on-the-fly during training/
evaluation/inference, when the images' sizes are different from that specified in
the experiment spec. Therefore you don't need to manually resize the images before
using the FasterRCNN app. Offline resizing will, however, save time during training/
evaluation/inference.
Installation Prerequisites
‣ Execute docker login nvcr.io from the command line and enter your
username and password.
‣ Username: $oauthtoken
‣ Password: API_KEY
‣ Execute docker pull nvcr.io/nvidia/tlt-streamanalytics:<version>
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 4
Analytics
Chapter 3.
INSTALLATION
The Transfer Learning Toolkit (TLT) is available to download from the NGC. You must
have an NGC account and an API key associated with your account. See the Installation
Prerequisites section in Chapter 2 for details on creating an NGC account and obtaining
an API key.
It is useful to mount separate volumes for the dataset and the experiment results
so that they persist outside of the docker. In this way the data is preserved after the
docker is closed. Any data that is generated to, or referred from a directory inside
the docker, will be lost if it is not either copied out of the docker, or written to or
read from volumes outside of the docker.
‣ Use the examples: Examples using ResNet18 backbone for detecting objects with
either DetectNet_v2, SSD, or FasterRCNN architectures are available as Jupyter
Notebooks. To run the examples that are available, enable the jupyter notebook
included in the docker to run in your browser:
docker run --runtime=nvidia -it -v /home/<username>/tlt-experiments:/
workspace/tlt-experiments -p 8888:8888 tlt-streamanalytics:<version>
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 5
Analytics
Installation
Copy and paste the link produced from this command into your browser to access
the notebook. The /workspace/examples folder will contain a demo notebook.
For DetectNet_v2 and SSD notebooks, the tlt-train tool does not support training
on images of multiple resolutions, or resizing images during training. All of the images
must be resized offline to the final training size and the corresponding bounding boxes
must be scaled accordingly.
All our classification models have names based on this template nvidia/iva/
tlt_*_classification.
Downloading a model
Use this command to download the model you have chosen from the NGC model
registry:
ngc registry model download-version <ORG/model_name:version> -d
<path_to_download_dir>
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 6
Analytics
Installation
For example, use this command to download the resnet 18 classification model to the
$USER_EXPERIMENT_DIR directory.
ngc registry model download-version nvidia/iva/tlt_resnet18_classification:1 -d
$USER_EXPERIMENT_DIR/pretrained_resnet18
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 7
Analytics
Chapter 4.
PREPARING INPUT DATA STRUCTURE
The chapter provides instructions on preparing your data for use by the Transfer
Learning Toolkit (TLT).
|--dataset_root:
|--train
|--audi:
|--1.jpg
|--2.jpg
|--bmw:
|--01.jpg
|--02.jpg
|--val
|--audi:
|--3.jpg
|--4.jpg
|--bmw:
|--03.jpg
|--04.jpg
|--test
|--audi:
|--5.jpg
|--6.jpg
|--bmw:
|--05.jpg
|--06.jpg
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 8
Analytics
Preparing input data structure
through the data. The steps to convert the data for TFRecords are covered in Conversion
to TFRecords. For FasterRCNN, the KITTI format data may be ingested directly, and
more on this is covered in Specification file for FasterRCNN.
.
|--dataset root
|-- images
|-- 000000.jpg
|-- 000001.jpg
.
.
|-- xxxxxx.jpg
|-- labels
|-- 000000.txt
|-- 000001.txt
.
.
|-- xxxxxx.txt
|-- kitti_seq_to_map.json
The images and labels have the same file id's before the extension. The image to
label correspondence is maintained using this file name.
‣ kitti_seq_to_map.json: This file contains a sequence to frame id mapping for
the frames in the images directory. This is an optional file, and is useful if the data
needs to be split into N folds sequence wise. In case the data is to be split into a
random 80:20 train:val split, then this file may be ignored.
All the images and labels in the training dataset should be of the same resolution. For
DetectNet_v2 and SSD notebooks, the tlt-train tool does not support training on
images of multiple resolutions, or resizing images during training. All of the images
must be resized offline to the final training size and the corresponding bounding boxes
must be scaled accordingly.
4.2.2. Label files
A KITTI format label file is a simple text file containing one line per object. Each line has
multiple fields. Here is a description of these fields:
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 9
Analytics
Preparing input data structure
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 10
Analytics
Preparing input data structure
The sum of the total number of elements per object is 15. Here is a sample text file:
car 0.00 0 -1.58 587.01 173.33 614.12 200.12 1.65 1.67 3.64 -0.65 1.71 46.70
-1.59
cyclist 0.00 0 -2.46 665.45 160.00 717.93 217.99 1.72 0.47 1.65 2.45 1.35 22.10
-2.35
pedestrian 0.00 2 0.21 423.17 173.67 433.17 224.03 1.60 0.38 0.30 -5.87 1.63
23.11 -0.03
This indicates that in the image there are 3 objects with parameters mentioned as above.
Currently, for detection the toolkit only requires the class name and bbox coordinates
fields to be populated. This is because the TLT training pipe supports training only for
class and bbox coordinates. The remaining fields maybe set to 0. Here is a sample file for
a custom annotated dataset:
car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cyclist 0.00 0 0.00 665.45 160.00 717.93 217.99 0.00 0.00 0.00 0.00 0.00 0.00
0.00
pedestrian 0.00 0 0.00 423.17 173.67 433.17 224.03 0.00 0.00 0.00 0.00 0.00 0.00
0.00
car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cyclist 0.00 0 0.00 665.45 160.00 717.93 217.99 0.00 0.00 0.00 0.00 0.00 0.00
0.00
pedestrian 0.00 0 0.00 423.17 173.67 433.17 224.03 0.00 0.00 0.00 0.00 0.00 0.00
0.00
{
"video_sequence_name": [list of strings(frame idx)]
}
{
"2011_09_28_drive_0165_sync": ["003193", "003185", "002857", "001864",
"003838",
"007320", "003476", "007308", "000337", "004165", "006573"],
"2011_09_28_drive_0191_sync": ["005724", "002529", "004136", "005746"],
"2011_09_28_drive_0179_sync": ["005107", "002485", "006089", "000695"],
"2011_09_26_drive_0079_sync": ["005421", "000673", "002064", "000783",
"003068"],
"2011_09_28_drive_0035_sync": ["005540", "002424", "004949", "004996",
"003969"],
"2011_09_28_drive_0117_sync": ["007150", "003797", "002554", "001509"]
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 11
Analytics
Preparing input data structure
4.3. Conversion to TFRecords
The SSD and DetectNet_v2 apps, as mentioned in Data input for object detection,
require KITTI format data to be converted to TFRecords. To do so, the Transfer Learning
Toolkit includes the tlt-dataset-convert tool. This tool requires a configuration
file as input. Configuration file details and sample usage examples are included in the
following sections.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 12
Analytics
Preparing input data structure
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 13
Analytics
Preparing input data structure
A sample configuration file to convert the pascal voc dataset with 80% training data and
20 % validation data is mentioned below. This assumes that the data has been converted
to KITTI format and is available for ingestion in the root directory path.
kitti_config {
root_directory_path: "/workspace/tlt-experiments/data/VOCtrainval_11-May-2012/
VOCdevkit/VOC2012"
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 14
Analytics
Preparing input data structure
image_dir_name: "JPEGImages_kitti/test"
label_dir_name: "Annotations_kitti/test"
image_extension: ".jpg"
partition_mode: "random"
num_partitions: 2
val_split: 20
num_shards: 10
}
image_directory_path: "/workspace/tlt-experiments/data/VOCtrainval_11-May-2012/
VOCdevkit/VOC2012"
tlt-dataset-convert -d <path_to_tfrecords_conversion_spec> -o
<path_to_output_tfrecords>
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 15
Analytics
Preparing input data structure
..
2019-07-16 01:32:40,338 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Writing partition 1, shard 9
2019-07-16 01:32:49,063 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
sheep: 695
..
car: 1770
boat: boat
For the dataset_config in the experiment_spec, please use labels in the
tfrecords file, while writing the classmap.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 16
Analytics
Chapter 5.
CREATING AN EXPERIMENT SPEC FILE
This chapter describes how to create a specification file for model training, inference and
evaluation.
model_config {
arch: "resnet"
n_layers: 18
use_bias: True
use_batch_norm: True
all_projections: True
use_pooling: False
freeze_bn: False
freeze_blocks: 0
freeze_blocks: 1
eval_config {
eval_dataset_path: "/path/to/your/eval/data"
model_path: "/path/to/your/model"
top_k: 3
conf_threshold: 0.5
batch_size: 256
n_workers: 8
train_config {
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 17
Analytics
Creating an experiment spec file
train_dataset_path: "/path/to/your/train/data"
val_dataset_path: "/path/to/your/val/data"
optimizer: "sgd"
batch_size_per_gpu: 256
n_epochs: 80
n_workers: 16
# regularizer
reg_config {
type: "L2"
scope: "Conv2D,Dense"
weight_decay: 0.00005
# learning_rate
lr_config {
scheduler: "soft_anneal"
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 18
Analytics
Creating an experiment spec file
5.2.1. Model config
Core object detection can be configured using the model_config option in the spec file.
Heare are the parameters:
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 19
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 20
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 21
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 22
Analytics
Creating an experiment spec file
Here's a sample model config to instantiate a resnet18 model with pretrained weights
and freeze blocks 0 and 1, with all shortcuts being set to projection layers.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 23
Analytics
Creating an experiment spec file
use_pooling: False
use_batch_norm: True
dropout_rate: 0.0
training_precision: {
backend_floatx: FLOAT32
}
objective_set: {
cov {}
bbox {
scale: 35.0
offset: 0.5
}
}
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 24
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 25
Analytics
Creating an experiment spec file
5.2.3. Post processor
The post processor module generates renderable bboxes from the raw detection output.
The process includes:
‣ Filtering out valid detections by thresholding objects using the confidence value in
the coverage tensor
‣ value: containing a clustering_config parameters defining parameters for the
DBSCAN clustering algorithm. The DBSCAN algorithm helps cluster the valid
predictions to a box per object.
This section defines parameters that configure the post processor. For each class we
train for, the postprocessing_config has a target_class_config element, which defines the
clustering parameters for this class. The parameters for each target class include:
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 26
Analytics
Creating an experiment spec file
The clustering_config element configures the clustering block for this class. Here are
the parameters for this element.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 27
Analytics
Creating an experiment spec file
Here is an example of the definition of the postprocessor for a 3 class network learning
for car, cyclist, and pedestrian:
postprocessing_config {
target_class_config {
key: "car"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: "cyclist"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 20
}
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 28
Analytics
Creating an experiment spec file
}
target_class_config {
key: "pedestrian"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 20
}
}
}
}
5.2.4. Cost function
This section helps you configure the cost function to include the classes that you are
training for. For each class you want to train, add a new entry of the target classes to the
spec file. NVIDIA recommends not changing the parameters within the spec file for best
performance with these classes. The other parameters remain unchanged here.
cost_function_config {
target_classes {
name: "car"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: "cyclist"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: "pedestrian"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 29
Analytics
Creating an experiment spec file
weight_target: 10.0
}
}
enable_autoweighting: True
max_objective_weight: 0.9999
min_objective_weight: 0.0001
}
5.2.5. Trainer
Here are the parameters used to configure the trainer:
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 30
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 31
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 32
Analytics
Creating an experiment spec file
training_config {
batch_size_per_gpu: 16
num_epochs: 80
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-6
max_learning_rate: 5e-4
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-9
}
optimizer {
adam {
epsilon: 1e-08
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
enabled: False
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
}
5.2.6. Augmentation module
The augmentation module provides some basic pre-processing and augmentation when
training. The augmentation_config contains three elements :
‣ preprocessing: This nested field configures the input image and ground truth
label pre-processing module. It sets the shape of the input tensor to the network. The
ground truth labels are pre-processed to meet the dimensions of the input image
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 33
Analytics
Creating an experiment spec file
tensors. If the output image height and output image width of the pre-processing
block don't match with the dimensions of the input images in the tfrecords, you
either pad with zeros, or take random crops to fit the input dimensions. If the
images are cropped, then the labels are altered accordingly to consider only objects
in the crop. Currently, the entire input image and labels are not resized to fit the
input resolution. The parameters that configure the preprocessing block include:
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 34
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 35
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 36
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 37
Analytics
Creating an experiment spec file
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}
If the output image height and the output image width of the preprocessing block,
doesn't match with the dimensions of the input image, the dataloader either pads
with zeros, or crops to fit to the output resolution. It does not resize the input images
and labels to fit.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 38
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 39
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 40
Analytics
Creating an experiment spec file
# Sample evaluation config to run evaluation in integrate mode for the given 3
class model,
# at every 10th epoch starting from the epoch 1.
evaluation_config {
average_precision_mode: INTEGRATE
validation_period_during_training: 10
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: "car"
value: 0.7
}
minimum_detection_ground_truth_overlap {
key: "person"
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: "bicycle"
value: 0.5
}
evaluation_box_config {
key: "car"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: "person"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 41
Analytics
Creating an experiment spec file
key: "bicycle"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
}
5.2.8. Dataloader
This section defines the parameters to configure the dataloader. Here, you define the
path to the data you want to train on and the class mapping for classes in the dataset
that the network is to be trained for. The parameters in the dataset config are:
‣ data_sources: Captures the path to the tfrecords to train on. This field contains 2
parameters:
‣ tfrecords_path: Path to the individual tfrecords files. This path follows UNIX
style pathname pattern extension, so we can provide a common pathname
pattern that captures all the tfrecords files in that directory.
‣ image_directory_path: Path to the training data root from which the tfrecords
was generated.
‣ image_extension: Extension of the images to be used.
‣ target_class_mapping: This parameter maps the class names in the tfrecords to the
target class to be trained in the network. We instantiate n such elements for each
source to target class mapping.
‣ validation_fold: In case of an n fold tfrecords, you define the index of the fold to use
for validation. For sequence wise validation choose the validation fold in the range
[0, N-1]. However, for a random split tfrecords, force the validation fold index to 0 as
the tfrecord is just 2-fold.
The class names key in the target_class_mapping must be identical to the one shown
in the dataset converter log, so that the correct classes are picked up for training.
dataset_config {
data_sources: {
tfrecords_path: "<path to the training tfrecords root/tfrecords train
pattern>"
image_directory_path: "<path to the training data source>"
}
image_extension: "jpg"
target_class_mapping {
key: "car"
value: "car"
}
target_class_mapping {
key: "automobile"
value: "car"
}
target_class_mapping {
key: "heavy_truck"
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 42
Analytics
Creating an experiment spec file
value: "car"
}
target_class_mapping {
key: "person"
value: "pedestrian"
}
target_class_mapping {
key: "rider"
value: "cyclist"
}
validation_fold: 0
}
In this example the tfrecords is assumed to be multi-fold, and the fold number to
validate on is defined. If you want to validate on a different tfrecords than those defined
in the training set then, use the validation_data_source field to define this. In this
case, remove the validation_fold field from the spec.
validation_data_source: {
tfrecords_path: " <path to tfrecords to validate on>/tfrecords validation
pattern>"
image_directory_path: " <path to validation data source>"
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 43
Analytics
Creating an experiment spec file
‣ color: The color of the bboxes for each class. This is important when visualizing the
boxes.
‣ postproc_classes: This parameter is used incase you would like to filter out and
visualize only a subset of classes.
‣ image_height: The height of the image at inference.
‣ image_width: The width of the image at inference.
‣ stride: This defines the ratio of the input_height to output_height of the feature
map or input_width to the output width of the feature map. Only a stride of 16
for DetectNet_v2 models are currently supported. Therefore, the stride is 16 for all
inferences.
If the input image sizes are different from the specified size, the inference tool
resizes the image to the size mentioned in the spec file, runs inference and resizes
the inference coordinates back to the original input image resolution.
{
"dbscan_criterion": "IOU",
"dbscan_eps": {
"bicycle": 0.4,
"car": 0.25,
"default": 0.15,
"person": 0.4
},
"dbscan_min_samples": {
"bicycle": 0.05,
"car": 0.05,
"default": 0.0,
"person": 0.05
},
"min_cov_to_cluster": {
"bicycle": 0.075,
"car": 0.075,
"default": 0.005,
"person": 0.005
},
"min_obj_height": {
"bicycle": 4,
"car": 4,
"person": 4,
"default": 2
},
"target_classes": ["car", "bicycle", "person"],
"confidence_th": {
"car": 0.3,
"bicycle": 0.3,
"person": 0.2
},
"confidence_model": {
"car": { "kind": "aggregate_cov"},
"bicycle": { "kind": "aggregate_cov"},
"person": { "kind": "aggregate_cov"},
"default": { "kind": "aggregate_cov"}
},
"output_map": {
"person" : "person",
"car" : "car",
"bicycle" : "bicycle"
},
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 44
Analytics
Creating an experiment spec file
"color": {
"car": "green",
"person": "magenta",
"bicycle": "cyan"
},
"postproc_classes": ["car", "bicycle", "person"],
"image_height": 384,
"image_width": 1248,
"stride": 16
}
random_seed: 42
enc_key: "<your_enc_key>"
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
size_min {
min:600
}
image_channel_mean {
key: 'b'
value: 103.939
}
image_channel_mean {
key: 'g'
value: 116.779
}
image_channel_mean {
key: 'r'
value: 123.68
}
image_scaling_factor: 1.0
}
feature_extractor: "vgg"
anchor_box_config {
scale: 128.0
scale: 256.0
scale: 512.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 1
freeze_blocks: 2
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: True
}
}
training_config {
kitti_data_config {
images_dir: '/workspace/tlt-experiments/data/voc0712trainval/images'
labels_dir: '/workspace/tlt-experiments/data/voc0712trainval/labels_kitti'
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 45
Analytics
Creating an experiment spec file
training_data_parser: 'raw_kitti'
data_augmentation {
use_augmentation: True
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
num_epochs: 12
class_mapping {
key: 'horse'
value: 0
}
class_mapping {
key: "pottedplant"
value: 1
}
class_mapping {
key: "train"
value: 2
}
class_mapping {
key: "person"
value: 3
}
class_mapping {
key: "bird"
value: 4
}
class_mapping {
key: "car"
value: 5
}
class_mapping {
key: "chair"
value: 6
}
class_mapping {
key: "tvmonitor"
value: 7
}
class_mapping {
key: "bus"
value: 8
}
class_mapping {
key: "sofa"
value: 9
}
class_mapping {
key: "dog"
value: 10
}
class_mapping {
key: "motorbike"
value: 11
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 46
Analytics
Creating an experiment spec file
}
class_mapping {
key: "bicycle"
value: 12
}
class_mapping {
key: "sheep"
value: 13
}
class_mapping {
key: "boat"
value: 14
}
class_mapping {
key: "cat"
value: 15
}
class_mapping {
key: "bottle"
value: 16
}
class_mapping {
key: "diningtable"
value: 17
}
class_mapping {
key: "cow"
value: 18
}
class_mapping {
key: "aeroplane"
value: 19
}
class_mapping {
key: "background"
value: 20
}
pretrained_model: ""
pretrained_weights: "/workspace/tlt-experiments/data/
vgg16_weights_tf_dim_ordering_tf_kernels.h5"
output_weights: "/workspace/tlt-experiments/faster_rcnn_exp/
faster_rcnn_pascal_voc.tltw"
output_model: "/workspace/tlt-experiments/faster_rcnn_exp/
faster_rcnn_pascal_voc.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 47
Analytics
Creating an experiment spec file
rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7
reg_config {
reg_type: 'L2'
weight_decay: 1e-4
}
optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}
lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}
lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0
inference_config {
images_dir: '/workspace/tlt-experiments/data/voc07test/images'
model: '/workspace/tlt-experiments/faster_rcnn_exp/
faster_rcnn_pascal_voc.epoch12.tlt'
detection_image_output_dir: '/workspace/tlt-experiments/faster_rcnn_exp/
infer_results_imgs'
labels_dump_dir: '/workspace/tlt-experiments/faster_rcnn_exp/infer_dump_labels'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}
evaluation_config {
dataset {
images_dir : '/workspace/tlt-experiments/data/voc07test/images'
labels_dir: '/workspace/tlt-experiments/data/voc07test/labels_kitti'
}
data_parser: 'raw_kitti'
model: '/workspace/tlt-experiments/faster_rcnn_exp/
faster_rcnn_pascal_voc.epoch12.tlt'
labels_dump_dir: '/workspace/tlt-experiments/faster_rcnn_exp/eval_dump_labels'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:True
}
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 48
Analytics
Creating an experiment spec file
network config
The network config(network_config) defines the model structure and its input format.
This model is used for training, evaluation, and inference.
network_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
size_min {
min:600
}
image_channel_mean {
key: 'b'
value: 103.939
}
image_channel_mean {
key: 'g'
value: 116.779
}
image_channel_mean {
key: 'r'
value: 123.68
}
image_scaling_factor: 1.0
}
feature_extractor: "vgg"
anchor_box_config {
scale: 128.0
scale: 256.0
scale: 512.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 1
freeze_blocks: 2
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: True
}
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 49
Analytics
Creating an experiment spec file
feature extractor
FasterRCNN supports 11 backbones.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 50
Analytics
Creating an experiment spec file
freeze BN
You can choose to freeze the BatchNormalization layers in the model during training.
This is a common trick when training a FasterRCNN model
freeze blocks
You can choose to freeze some of the CNN blocks in the model to make the training
more stable and/or easier to converge.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 51
Analytics
Creating an experiment spec file
You can divide the whole model into several blocks and optionally freeze a subset of it.
For FasterRCNN you can only freeze the blocks that are before the ROI pooling layer.
Any layer after the ROI pooling will not be frozen in any way. For different backbones,
the number of blocks and the block ID for each block are different. It deserves some
detailed info of how to specify the block ID's for each backbone.
‣ ResNet series: For the ResNet series, the block ID's valid for freezing is any subset of
[0, 1, 2, 3](inclusive)
‣ VGG series: For the VGG series, the block ID's valid for freezing is any subset of [1,
2, 3, 4, 5](inclusive)
‣ GoogLeNet: For the GoogLeNet, the block ID's valid for freezing is any subset of [0,
1, 2, 3, 4, 5, 6, 7](inclusive)
‣ MobileNet V1: For the MobileNet V1, the block ID's valid for freezing is any subset
of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11](inclusive)
‣ MobileNet V1: For the MobileNet V2, the block ID's valid for freezing is any subset
of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13](inclusive)
RPN stride
The cumulative stride from the model input to the RPN. This value is fixed(16) for
current implementation.
conv_bn_share_bias
conv_bn_share_bias is a Boolean value to indicate whether or not to share the bias of
the convolution layer and the BatchNormalization(BN) layer immediately after it. This
is usually shared, but for FasterRCNN there is a caveat. During the training, you may
want to freeze the BN layer to make the training process more stable. But once the BN
layer is frozen and the bias is shared, the convolution layer before it will have no bias
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 52
Analytics
Creating an experiment spec file
during the training. This loss of a degree-of-freedom can lead to some degradation of the
model accuracy. To overcome this, you can force the convolution layer to have its own
bias. If conv_bn_share_bias is set to False, the convolution layer itself will have a
bias, otherwise it won't.
For MobileNet V1 or MobileNet V2, if you want to load pretrained weights in NGC for
training or retraining, set the conv_bn_share_bias field in the experiment_spec
file to True. For all other backbones, if you want to load the pretrained weights in
NGC for training or retrain, set them to False. For all the backbones, if you do not use
the pretrained weights in NGC, both settings for conv_bn_share_bias are acceptable.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 53
Analytics
Creating an experiment spec file
all_projections
The all_projections field is only useful for models that have shortcuts in them. These
models include ResNet series and the MobileNet V2. If all_projections=True, all the
pass-through shortcuts will be replaced by a projection layer that has the same number
of output channels.
use_pooling
The use_pooling operation is only useful for VGG series and ResNet series. When
use_pooling=True, use pooling in the model as the original implementation,
otherwise use strided convolution to replace the pooling operations in the
model. If you want to improve the inference FPS performance, you can try to set
use_pooling=False.
training config
The training config defines the parameters needed for training, evaluation and inference.
training_config {
kitti_data_config {
images_dir : '<path_to_the_training_images_directory>'
labels_dir: '<path_to_the_training_KITTI_labels_directory>'
}
training_data_parser: 'raw_kitti'
data_augmentation {
use_augmentation: True
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
num_epochs: 12
class_mapping {
key: 'Car'
value: 0
}
class_mapping {
key: 'Van'
value: 0
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 54
Analytics
Creating an experiment spec file
class_mapping {
key: "Pedestrian"
value: 1
}
class_mapping {
key: "Person_sitting"
value: 1
}
class_mapping {
key: 'Cyclist'
value: 2
}
class_mapping {
key: "background"
value: 3
}
class_mapping {
key: "DontCare"
value: -1
}
class_mapping {
key: "Truck"
value: -1
}
class_mapping {
key: "Misc"
value: -1
}
class_mapping {
key: "Tram"
value: -1
}
pretrained_model: "<path_to_the_pretrained_model>"
pretrained_weights: "<path_to_the_pretrained_weights>"
output_weights: "<path_to_the_output_weights_during_training>"
output_model: "<path_to_the_output_model_during_training>"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}
rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7
reg_config {
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 55
Analytics
Creating an experiment spec file
reg_type: 'L2'
weight_decay: 1e-4
}
optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}
lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}
lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0
inference_config {
images_dir: '<path_to_the_inference_images_directory>'
model: '<path_to_the_model_to_do_inference_on>'
detection_image_output_dir: '<path_to_the_dumped_images_directory>'
labels_dump_dir: '<path_to_the_dumped_labels_directory>'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}
evaluation_config {
dataset {
images_dir : '<path_to_the_evaluation_images_directory>'
labels_dir: '<path_to_the_evaluation_KITTI_labels_directory>'
}
data_parser: 'raw_kitti'
model: '<path_to_the_model_to_do_evaluation_on>'
labels_dump_dir: '<path_to_the_dumped_labels_directory>'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}
kitti_data_config
kitti_data_config defines the dataset for training. It includes the images directory
and the KITTI labels directory.
training_data_parser
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 56
Analytics
Creating an experiment spec file
The parser type for the training dataset. In this release, only raw_kitti is supported.
data_augmentation
Data augmentation for the training. It includes spatial augmentation and color
augmentation. The data augmentation configuration has two parts: spatial augmentation
and color augmentation. Spatial augmentation does some spatial transform to the
input image and its label, while the color augmentation only applies some hue,
saturation, and contrast to the input image. The label is untouched. A Boolean
value that controls whether or not to activate data augmentation during training. A
normalization is applied before augmentation because augmentation only applies to the
normalized image in the range [0, 1]. Also, data augmentation happens before image
preprocessing(subtracting mean value and scaling). Details of these sub-fields are given
in this table:
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 57
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 58
Analytics
Creating an experiment spec file
num_epochs
This field defines the number of epochs for training.
class_mapping
In some cases, the number of classes in the dataset labels is not exactly the number
of classes you want to use to train the model. For example, you may want to group
two different classes 'Car' and 'Van' into a single class in the training. You may want
to filter out some specific classes in the dataset. For example, you have 'Car', 'Person',
'Cyclist', 'Truck' in the training dataset, but you want to ignore the 'Truck' class when
you train the model. This is the rationale for the class_mapping field. The class_mapping
maps each class name in the original dataset to an integer. If some classes are mapped
to the same integer, it means they are grouped into a single class. For FasterRCNN,
the class that mapped to the largest number is always the 'background' due to the
implementation. Also, if you want to ignore some classes in the dataset, simply map
them to -1. In the previous example, their 5 classes: 'Car', 'Van', 'Person', 'Cyclist', 'Truck'
in the dataset. You want to group 'Car' and 'Van', so map them to 0. You also want to
exclude 'Truck', so map Truck into -1. Finally, add a dummy 'background' class that is
mapped to the largest number(3).
pretrained_model
The path to the pretrained model used to initialize the training model. The pretrained
model can be either a Keras model or a TLT model. The suffix is used to identify the
model types. If the model ends with '.hdf5' treat it as a Keras model; if it ends with '.tlt',
treat it as a TLT model. If the model path neither ends with '.hdf5' nor ends with '.tlt' it
will raise an error.
pretrained_weights
The path to the pretrained weights used to initialize the training model. This is similar to
the pretrained model but more flexible in terms of the input dimension and the number
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 59
Analytics
Creating an experiment spec file
of classes in the model head. When you use the pretrained model, you should limit
the training model to have the same input dimension and number of classes as in the
pretrained model. With pretrained weights, you can discard these limitations. Pretrained
weights can be either a Keras weights(.h5) or a TLT weights(.tltw). If the pretrained
weights do not end with either one of them, it will raise an error.
output_weights
Path to the output weights(TLT weights) as the checkpoint during training.
output_model
Path to the output model(TLT model) as the checkpoint during training.
rpn_min_overlap
The lower IoU threshold is used to map the anchor boxes to ground truth boxes. If the
IoU of an anchor box and any ground truth box is below this threshold, this anchor box
is treated as a negative anchor box.
rpn_max_overlap
The upper IoU threshold used to map the anchor boxes to ground truth boxes. If the IoU
of an anchor box and at least one ground truth boxes is above this threshold, this anchor
box is treated as a positive anchor box.
classifier_min_overlap
The lower IoU threshold to generate the proposal target. If the IoU of a ROI and a
ground truth box is above the threshold and below the classifier_max_overlap, then this
ROI is regarded as a negative ROI(background) when training the classifier.
classifier_max_overlap
If the IoU of a ROI and a ground truth box is above this threshold, then this ROI is
regarded as a positive ROI and this ground truth box is treated as the target(ground
truth) of this ROI when training the classifier.
gt_as_roi
A Boolean value to specify whether or not to include the ground truth boxes into the
positive ROI to train the classifier.
std_scaling
The scaling factor to multiply by for the RPN regressor loss when training the RPN.
classifier_regr_std
The scaling factor to divide by for the classifier regressor loss when training the
classifier.
rpn_mini_batch
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 60
Analytics
Creating an experiment spec file
optimizer
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 61
Analytics
Creating an experiment spec file
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 62
Analytics
Creating an experiment spec file
SSD config
ssd_config {
aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 0.33]"
scales: "[0.1, 0.24166667, 0.38333333, 0.525, 0.66666667, 0.80833333, 0.95]"
two_boxes_for_ar1: true
clip_boxes: false
loss_loc_weight: 1.0
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: "[0.1, 0.1, 0.2, 0.2]"
arch: "resnet18"
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
}
aspect_ratios_global or aspect_ratios
Only one of scales and the combination of min_scale and max_scale is required.
Scales should be a 1-d array inside quotation marks. It is a list of positive floats
containing scaling factors per convolutional predictor layer. This list must be one
element longer than the number of predictor layers, so if two_boxes_for_ar1 is true,
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 63
Analytics
Creating an experiment spec file
the second aspect ratio 1.0 box for the last layer can have a proper scale. Except for the
last element in this list, each positive float is the scaling factor for boxes in that layer.
For example, if for one layer the scale is 0.1, then the generated anchor box with aspect
ratio 1 for that layer (the first aspect ratio 1 box if two_boxes_for_ar1 is true) will have its
height and width as 0.1*min(img_h, img_w).
min_scale and max_scale are two positive floats. If both of them appear in the config,
the program can automatically generate the scales by evenly splitting the space between
min_scale and max_scale.
clip_boxes
If true, all corner anchor boxes will be truncated so they are fully inside the feature
images.
loss_loc_weight
This is a positive float controlling how much location regression loss should contribute
to the final loss. The final loss is calculated as classification_loss + loss_loc_weight * loc_loss
focal_loss_alpha and focal_loss_gamma
Focal loss is calculated as:
steps
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 64
Analytics
Creating an experiment spec file
An optional list inside quotation marks whose length is the number of feature layers
for prediction. The elements should be floats or tuples/lists of two floats. Steps define
how many pixels apart the anchor box center points should be. If element is a float, both
vertical and horizontal margin is the same. Otherwise, the first value is step_vertical
and the second value is step_horizontal. If steps are not provided, anchorboxes will be
distributed uniformly inside the image.
offsets
An optional list of floats inside quotation marks whose length is the number of feature
layers for prediction. The first anchor box will have offsets[i]*steps[i] pixels margin from
the left and top borders. If offsets are not provided, 0.5 will be used as default value.
arch
A string indicating which feature extraction architecture you want to use. Currently,
"resnet10" and "resnet18" are supported.
freeze_bn
Whether to freeze all batch normalization layers during training.
freeze_blocks
Optionally, you can have more than 1 freeze_blocks field. Weights of layers in those
blocks will be freezed during training. See Model config for more information.
batch_size_per_gpu
Batch size per GPU.
num_epochs
Number of epochs to use for training.
learning rate
Only soft_start_annealing_schedule with these nested parameters is supported.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 65
Analytics
Creating an experiment spec file
regularizer
This parameter configures the regularizer to be used while training and contains the
following nested parameters.
1. type: The type or regularizer to use. NVIDIA supports NO_REG, L1 or L2
2. weight: The floating point value for regularizer weight
eval_config {
validation_period_during_training: 10
averge_precision_mode: SAMPLE
matching_iou_threshold: 0.5
}
validation_period_during_training
The number of training epoches per which one validation should run.
average_precision_mode
Average Precision (AP) calculation mode can be either SAMPLE or INTEGRATE.
SAMPLE is used as VOC metrics for VOC 2009 or before. INTEGRATE is used for VOC
2010 or after that.
matching_iou_threshold
The lowest iou of predicted box and ground truth box that can be considered a match.
NMS config
nms_config {
confidence_threshold: 0.05
clustering_iou_threshold: 0.5
top_k: 200
}
NMS config applies to NMS layer in training, validation, evaluation, inference and
export.
confidence_threshold
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 66
Analytics
Creating an experiment spec file
Boxes with a confidence score less than confidence_threshold are discarded before
applying NMS.
clustering_iou_threshold
IOU threshold below which boxes will go through NMS process
top_k
top_k boxes will be outputed after NMS keras layer. If the number of valid boxes is less
than k, return array will be padded with boxes whose confidence score is 0.
augmentation config
augmentation_config {
preprocessing {
output_image_width: 1024
output_image_height: 256
crop_right: 1024
crop_bottom: 256
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 0.7
zoom_max: 1.8
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
dataset config
dataset_config {
data_sources: {
tfrecords_path: "/path/to/tfrecords/root/*"
image_directory_path: "/path/to/dataset/root"
}
image_extension: "png"
target_class_mapping {
key: "car"
value: "car"
}
target_class_mapping {
key: "pedestrian"
value: "pedestrian"
}
target_class_mapping {
key: "cyclist"
value: "cyclist"
}
target_class_mapping {
key: "van"
value: "car"
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 67
Analytics
Creating an experiment spec file
}
target_class_mapping {
key: "person_sitting"
value: "pedestrian"
}
validation_fold: 0
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 68
Analytics
Chapter 6.
TRAINING THE MODEL
You can use the tlt-train command to train models with single and multiple GPUs.
The NVIDIA Transfer Learning Toolkit provides a simple command line interface to
train a deep learning model for classification and object detection. It includes the tlt-
train command to do this. To speed up the training process, the tlt-train command
supports multiGPU training. You can invoke a multi GPU training session by using the
--gpus N option, where N is the number of GPUs you want to use. N must be less than
the number of GPUs available in the given node for training.
Required arguments:
‣ -r, --results_dir : Path to a folder where the experiment outputs should be
written.
‣ -k, --key : User specific encoding key to save or load a .tlt model.
‣ -e, --experiment_spec_file: Path to the experiment spec file.
Optional arguments:
‣ --gpus : Number of GPUs to use and processes to launch for training. The default
value is 1.
See the Specification file for classification section for more details.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 69
Analytics
Training the model
Output Log
Here's the output log from the successful use of this command:
=============================================================================
input_1 (InputLayer) (None, 3, 224, 224) 0
..
..
..
________________________________________________________________________________
predictions (Dense) (None, 20) 10260 flatten_1[0][0]
================================================================================
Total params: 11,558,548
Trainable params: 11,546,900
Non-trainable params: 11,648
________________________________________________________________________________
Epoch 1/80
124/311 [==========>...................] - ETA: 49s - loss: 4.1188 - acc:
0.06592018-10-11 22:09:13.292358: W tensorflow/core/framework/allocator.cc:101]
Allocation of 38535168 exceeds 10% of system memory.
Required arguments
‣ -r, --results_dir : Path to a folder where experiment outputs should be
written.
‣ -k, –key : User specific encoding key to save or load a .tlt model.
‣ -e, --experiment_spec_file : Path to spec file. Absolute path or relative to
working directory. (default: spec from spec_loader.py is used).
Optional arguments
‣ --gpus : Number of GPUs to use and processes to launch for training. The default
value is 1.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 70
Analytics
Training the model
The tlt-train tool does not support training on images of multiple resolutions, or
resizing images during training. All of the images must be resized offline to the final
training size and the corresponding bounding boxes must be scaled accordingly.
Output log
Here's an example of the output log:
===============================================================================
input_1 (InputLayer) (None, 3, 544, 960) 0
..
===============================================================================
Total params: 11,555,983
Trainable params: 11,544,335
Non-trainable params: 11,648
..
..
2018-11-06 01:04:06,173 [INFO] tensorflow: Running local_init_op.
..
INFO:tensorflow:loss = 0.07203477, epoch = 0.0, step = 0
2018-11-06 01:05:14,270 [INFO] tensorflow: loss = 0.07203477, epoch = 0.0, step
= 0
INFO:tensorflow:Saving checkpoints for step-1.
..
2018-11-06 01:05:44,920 [INFO] tensorflow: loss = 0.05362146, epoch =
0.0663716814159292, step = 15 (5.978 sec)
INFO:tensorflow:global_step/sec: 0.555544
..
Validation cost: 0.000268
Mean average_precision (in %): 73.9490
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 71
Analytics
Training the model
Required arguments:
‣ -e, --experiment_spec_file : Experiment specification file to set up the
evaluation experiment. This should be the same as training specification file.
Optional arguments:
‣ -h, --help : Show this help message and exit.
Sample usage
Here's an example of using the FasterRCNN training command:
tlt-train faster_rcnn -e <experiment_spec>
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 72
Analytics
Training the model
================================================================================
input_1 (InputLayer) (None, 3, 384, 1280) 0
________________________________________________________________________________
..
________________________________________________________________________________
add_7 (Add) (256, 512, 7, 7) 0
block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
________________________________________________________________________________
activation_15 (Activation) (256, 512, 7, 7) 0 add_7[0][0]
________________________________________________________________________________
block_4b_conv_1 (Conv2D) (256, 512, 7, 7) 2359808
activation_15[0][0]
________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (256, 512, 7, 7) 2048
block_4b_conv_1[0][0]
________________________________________________________________________________
activation_16 (Activation) (256, 512, 7, 7) 0
block_4b_bn_1[0][0]
________________________________________________________________________________
block_4b_conv_2 (Conv2D) (256, 512, 7, 7) 2359808
activation_16[0][0]
________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (256, 512, 7, 7) 262656
activation_15[0][0]
________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (256, 512, 7, 7) 2048
block_4b_conv_2[0][0]
________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (256, 512, 7, 7) 2048
block_4b_conv_shortcut[0][0]
________________________________________________________________________________
add_8 (Add) (256, 512, 7, 7) 0
block_4b_bn_2[0][0]
block_4b_bn_shortcut[0][0]
________________________________________________________________________________
2019-07-04 08:43:14,937 [INFO] /app/iva/common/py_image.binary.runfiles/
ai_infra/iva/faster_rcnn/scripts/train.py: training example num: 6481
2019-07-04 08:43:15,579 [INFO] /app/iva/common/py_image.binary.runfiles/
ai_infra/iva/faster_rcnn/scripts/train.py: Starting training
2019-07-04 08:43:15,579 [INFO] /app/iva/common/py_image.binary.runfiles/
ai_infra/iva/faster_rcnn/scripts/train.py: Epoch 1/12
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 73
Analytics
Training the model
--gpus <num_gpus>
Required arguments:
‣ -r, --results_dir: Path to the folder where the experiment output is written.
‣ -k, --key: Provide the encryption key to decrypt the model.
‣ -e, --experiment_spec_file: Experiment specification file to set up the
evaluation experiment. This should be the same as training specification file.
Optional arguments:
‣ --gpus num_gpus: Number of GPUs to use and processes to launch for training.
The default = 1.
‣ -m, --resume_model_weights: Path to a pre-trained model or model to continue
training.
‣ --initial_epoch: Epoch number to resume from.
‣ -h, --help: Show this help message and exit.
Here's an example of using the train command on an SSD model:
tlt-train ssd --gpus 2 -e /path/to/spec.txt -r /path/to/result -k $KEY
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 74
Analytics
Training the model
block_1a_bn_shortcut[0][0]
...
...
_______________________________________________________________________________
conf_reshape_0 (Reshape) (18, 24576, 1, 3) 0 permute_1[0][0]
_______________________________________________________________________________
conf_reshape_1 (Reshape) (18, 6144, 1, 3) 0 permute_3[0][0]
_______________________________________________________________________________
conf_reshape_2 (Reshape) (18, 1536, 1, 3) 0 permute_5[0][0]
_______________________________________________________________________________
conf_reshape_3 (Reshape) (18, 384, 1, 3) 0 permute_7[0][0]
_______________________________________________________________________________
conf_reshape_4 (Reshape) (18, 96, 1, 3) 0 permute_9[0][0]
_______________________________________________________________________________
conf_reshape_5 (Reshape) (18, 24, 1, 3) 0 permute_11[0]
[0]
_______________________________________________________________________________
..
Epoch 1/120
171/171 [======================================================] - 94s 547ms/
step - loss: 2.3210
...
Number of images in the evaluation dataset: 1339
()
Producing predictions batch-wise: 100% 75/75 [00:36<00:00, 2.57it/s]
Matching predictions to ground truth, class 1/3.: 100% 131693/131693
[00:10<00:00, 12953.23it/s]
Matching predictions to ground truth, class 2/3.: 100% 15162/15162 [00:00<00:00,
26290.28it/s]
Matching predictions to ground truth, class 3/3.: 100% 36838/36838 [00:01<00:00,
19611.29it/s]
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 75
Analytics
Training the model
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 76
Analytics
Chapter 7.
EVALUATING THE MODEL
Once the model has been trained, using the experiment config file, and by following
the steps to train a model, the next step would be to evaluate this model on a test set
to measure the accuracy of the model. The TLT toolkit includes the tlt-evaluate
command to do this. Each of the 4 apps, namely Classification, DetectNet_v2, SSD and
FasterRCNN support evaluate. The sample usage for this command, along with some
example command line invocations are mentioned below.
The classification app computes evaluation loss, Top-k accuracy, precision and recall
as metrics. Meanwhile, tlt-evaluation for DetectNet_v2, FasterRCNN and SSD
computes the Average Precision per class and the mean Average Precision metrics as
defined in the Pascal VOC challenge. We support both sample and integrate mode to
calculate average precision. The former was used in VOC challenges before 2010 while
the latter was used from 2010 onwards.
When training is complete, the model is stored in the output directory of your choice in
$OUTPUT_DIR. Evaluate a model using the tlt-evaluate command:
Required arguments:
‣ {classification, detectnet_v2, faster_rcnn, ssd}
Choose whether you are evaluating a classification, detectnet_v2, ssd, or
faster_rcnn model.
Optional arguments: These arguments vary depending upon Classification,
DetectNet_v2, SSD and Faster_RCNN models.
Required arguments
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 77
Analytics
Evaluating the model
==============================================================================
input_1 (InputLayer) (None, 3, 224, 224) 0
______________________________________________________________________________
conv1 (Conv2D) (None, 64, 112, 112) 9472 input_1[0][0]
______________________________________________________________________________
..
..
..
predictions (Dense) (None, 20) 10260 flatten[0][0]
===============================================================================
Total params: 11,558,548
Trainable params: 11,546,900
Non-trainable params: 11,648
_______________________________________________________________________________
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 78
Analytics
Evaluating the model
[--use_training_set]
Required arguments:
‣ -e, --experiment_spec_file: Experiment spec file to set up the evaluation
experiment. This should be the same as training spec file.
‣ -m, --model: Path to the model file to use for evaluation.
‣ -k, -–key : Provide the encryption key to decrypt the model.
Optional arguments
‣ -h, --help : show this help message and exit.
‣ --use_training_set: Set this flag to run evaluation on training + validation
dataset.
If you have followed the example in Training a detection model, you may now evaluate
the model using the following command.
This command runs evaluation on the same validation set that was used during
training.
Use these steps to evaluate on a test set with ground truth labeled:
1. Create tfrecords for this training set by following the steps listed in the data input
section.
2. Update the dataloader configuration part of the training spec file to include the
newly generated tfrecords. For more information on the dataset config, please refer
to Create an experiment spec file.
dataset_config {
data_sources: {
tfrecords_path: "<path to training tfrecords root>/<tfrecords_name*>"
image_directory_path: "<path to training data root>"
}
image_extension: "jpg"
target_class_mapping {
key: "car"
value: "car"
}
target_class_mapping {
key: "automobile"
value: "car"
}
..
..
..
target_class_mapping {
key: "person"
value: "pedestrian"
}
target_class_mapping {
key: "rider"
value: "cyclist"
}
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 79
Analytics
Evaluating the model
validation_data_source: {
tfrecords_path: "<path to testing tfrecords root>/<tfrecords_name*>"
image_directory_path: "<path to testing data root>"
}
}
The rest of the experiment spec file remains the same as the training spec file.
Sample output log
Here's an example of the output:
===============================================================================
input_1 (InputLayer) (None, 3, 544, 960) 0
_______________________________________________________________________________
conv1 (Conv2D) (None, 64, 272, 480) 9472 input_1[0][0]
_______________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 272, 480) 256 conv1[0][0]
_______________________________________________________________________________
activation_1 (Activation) (None, 64, 272, 480) 0 bn_conv1[0][0]
_______________________________________________________________________________
..
..
________________________________________________________________________________
activation_17 (Activation) (None, 512, 34, 60) 0 add_8[0][0]
________________________________________________________________________________
dropout_1 (Dropout) (None, 512, 34, 60) 0 activation_17[0][0]
________________________________________________________________________________
output_bbox (Conv2D) (None, 12, 34, 60) 6156 dropout_1[0][0]
________________________________________________________________________________
output_cov (Conv2D) (None, 3, 34, 60) 1539 dropout_1[0][0]
================================================================================
Total params: 11,555,983
Trainable params: 11,544,335
Non-trainable params: 11,648
________________________________________________________________________________
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 80
Analytics
Evaluating the model
Required arguments:
‣ -e, --experiment_spec_file : Experiment spec file to set up the evaluation
experiment. This should be the same as training spec file.
Optional arguments:
‣ -h, --help : show this help message and exit.
Here's a sample output log:
WARNING:tensorflow:From /app/iva/faster_rcnn/launcher/py_image.binary.runfiles/
pip_deps2__tensorflow_gpu_1_13_1/extracted/tensorflow/python/framework/
op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is
deprecated and will be removed in a future version.
Instructions for updating:
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 81
Analytics
Evaluating the model
Required arguments:
‣ -e, --experiment_spec_file : Experiment spec file to set up the evaluation
experiment. This should be the same as training spec file.
‣ -m, --model : Path to the model file to use for evaluation.
‣ -k, --key : Provide the key to load the model.
Optional arguments:
‣ -h, --help : show this help message and exit.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 82
Analytics
Evaluating the model
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 83
Analytics
Chapter 8.
USING INFERENCE ON A MODEL
The tlt-infer command runs the inference on a specified set of input images. In the
classification mode, tlt-infer provides class label output over command line for a
single image or a csv file containing the image path and the corresponding labels for
multiple images. In DetectNet_v2, SSD or FasterRCNN mode, tlt-infer produces
output images with bounding boxes rendered on them after inference. Optionally, you
can also serialize the output meta-data in kitti_format.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 84
Analytics
Using inference on a model
=============================================================================
input_1 (InputLayer) (None, 3, 224, 224) 0
_____________________________________________________________________________
conv1 (Conv2D) (None, 16, 112, 112) 2368 input_1[0][0]
_____________________________________________________________________________
...
...
_____________________________________________________________________________
2018-11-05 18:46:16,248 [INFO] root: Current predictions: [[2.0956191e-04
4.7424308e-08 6.0529976e-07 1.5379728e-05 4.9668059e-05
2.3047665e-05 8.3990363e-07 2.1063986e-06 3.9042366e-06 9.8465785e-07
7.9830796e-05 8.4068454e-08 1.3434786e-06 1.6271177e-05 1.1729119e-06
9.9955863e-01 2.9604094e-05 2.6558594e-06 3.4933796e-06 7.3329272e-07]]
2018-11-05 18:46:16,248 [INFO] root: Class label = 15
2018-11-05 18:46:16,248 [INFO] root: Class name = mercedes
0,/home/tmp/1.jpg,A
0,/home/tmp/2.jpg,B
0,/home/tmp/3.jpg,C
In both single image and directory modes, a classmap (-cm) is required, which should
be a byproduct (classmap.json) of your training process.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 85
Analytics
Using inference on a model
Required parameters
‣ -m, --model: TLT model file path
‣ -i, --inference_input: The directory of input images or a single image for inference.
‣ -o, --inference_output: The directory to the output images and labels. The
annotated images are in inference_output/images_annotated and labels are in
inference_output/labels
‣ -bs, --batch_size: Inference batch size
‣ -cp, --cluster_params_file: Bbox post processing json file.
‣ -lw, --line_width: Overlay linewidth
‣ -k, --enc_key: Key to load model
Optional parameters
‣ -g, --gpu_set: GPU index to choose. The default is 0.
Inference is not a multiple GPU process. This process only allows the user to
choose which GPU to run inference on, in case there are multiple GPU's in the
machine.
‣ --output_nodes: Comma separated list of output nodes,
default=output_cov,output_bbox
‣ --kitti_dump: Flag to enable KITTI dump
‣ --disable_overlay : Flag to disable image overlay
This clusterfile is suitable for use with our uploaded pretrained models in NGC.
The tool automatically generates bbox rendered images in output_path/
images_annotated. In order to get the bbox labels in KITTI format, please set the --
kitti-dump flag. This will generate the output in output_path/labels.
Here's a sample output log:
pciBusID: 0000:02:00.0
..
..
=================================================================
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 86
Analytics
Using inference on a model
..
..
..
..
..
..
Required arguments:
‣ -e, --experiment_spec_file: Path to the experiment specification file for
FasterRCNN training.
Here's a sample output log:
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 87
Analytics
Using inference on a model
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 88
Analytics
Using inference on a model
Required arguments
‣ -m, --model : Path to the pretrained model (TLT model).
‣ -i, --in_image_dir : The directory of input images for inference.
‣ -o, --out_image_dir : The directory path to output annotated images.
‣ -k, --key : Key to load model.
‣ -e, --config_path : Path to an experiment spec file for training.
Optional arguments
‣ -t, --draw_conf_thres : Threshold for drawing a bbox. default: 0.3
‣ -h, --help : Show this help message and exit
‣ -l, --out_label_dir : The directory to output KITTI labels.
Here's a sample output log:
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 89
Analytics
Using inference on a model
________________________________________________________________________________
mbox_loc (Concatenate) (None, 32760, 1, 4) 0
loc_reshape_0[0][0]
loc_reshape_1[0][0]
loc_reshape_2[0][0]
loc_reshape_3[0][0]
loc_reshape_4[0][0]
loc_reshape_5[0][0]
________________________________________________________________________________
mbox_priorbox (Concatenate) (None, 32760, 1, 8) 0
anchor_reshape_0[0][0]
anchor_reshape_1[0][0]
anchor_reshape_2[0][0]
anchor_reshape_3[0][0]
anchor_reshape_4[0][0]
anchor_reshape_5[0][0]
________________________________________________________________________________
concatenate_3 (Concatenate) (None, 32760, 1, 32) 0
mbox_conf_sigmoid[0][0]
mbox_loc[0][0]
mbox_priorbox[0][0]
________________________________________________________________________________
ssd_predictions (Reshape) (None, 32760, 32) 0
concatenate_3[0][0]
================================================================================
Total params: 7,961,848
Trainable params: 7,958,376
Non-trainable params: 3,472
________________________________________________________________________________
WARNING:tensorflow:From ./ssd/box_coder/output_decoder_layer.py:83: to_float
(from tensorflow.python.ops.math_ops) is deprecated and will be removed in a
future version
Instructions for updating:
Use tf.cast instead.
2019-08-04 00:01:14,444 [WARNING] tensorflow: From ./ssd/box_coder/
output_decoder_layer.py:83: to_float (from tensorflow.python.ops.math_ops) is
deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
100%|##########| 4952/4952 [03:35<00:00, 22.99it/s]
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 90
Analytics
Chapter 9.
PRUNING THE MODEL
Pruning removes parameters from the model to reduce the model size without
compromising the integrity of the model itself using the tlt-prune command.
The tlt-prune command includes these parameters:
Required arguments:
‣ -pm, --pretrained_model : Path to pretrained model.
‣ -o, --output_dir : Path to output checkpoints.
‣ -k, --key : Key to load a .tlt model
Optional arguments
‣ -h, --help: Show this help message and exit.
‣ -n, –normalizer : `max` to normalize by dividing each norm by the maximum
norm within a layer; `L2` to normalize by dividing by the L2 norm of the vector
comprising all kernel norms. (default: `max`)
‣ -eq, --equalization_criterion : Criteria to equalize the stats of inputs to an
element wise op layer, or depth-wise convolutional layer. This parameter is useful
for resnets and mobilenets. Options are [arithmetic_mean, geometric_mean, union,
intersection]. (default: `union`)
‣ -pg, -pruning_granularity: Number of filters to remove at a time. (default:8).
‣ -pth : Threshold to compare normalized norm against. (default:0.1)
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 91
Analytics
Pruning the model
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 92
Analytics
Chapter 10.
EXPORTING THE MODEL
The Transfer Learning Toolkit includes the tlt-export command to export and
prepare TLT models for Deploying to DeepStream. The tlt-export command
optionally generates the calibration cache for TensorRT INT8 engine calibration.
Exporting the model decouples the training process from inference and allows
conversion to TensorRT engines outside the TLT environment. TensorRT engines are
specific to each hardware configuration and should be generated for each unique
inference environment, but the same exported TLT model may be used universally.
INT8 mode overview
TensorRT engines can be generated in INT8 mode to improve performance, but require
a calibration cache at engine creation-time. The calibration cache is generated using a
calibration tensor file, if tlt-export is run with the --data_type flag set to int8. Pre-
generating the calibration information and caching it removes the need for calibrating
the model on the inference machine. Moving the calibration cache is usually much more
convenient than moving the calibration tensorfile, since it is a much smaller file and can
be moved with the exported model. Using the calibration cache also speeds up engine
creation as building the cache can take several minutes to generate depending on the
size of the Tensorfile and the model itself.
The export tool can ingest training data using either of these two options:
‣ Providing a calibration tensorfile generated using the tlt-int8-tensorfile
command
‣ Pointing the tool to a directory of images that you want to use to calibrate the model
NVIDIA recommends using the first option, because the tlt-int8-tensorfile
command uses the data generators to produce the training data. This ensures that
all the preprocessing steps have been done, and you get the best representation of
the inputs to the network. If you decide to use the second option, you must run the
preprocessing offline before feeding these images to the calibration tool for optimum
performance.
Generating an INT8 tensorfile using the tlt-int8-tensorfile command
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 93
Analytics
Exporting the model
The INT8 tensorfile is a binary file that contains the preprocessed training samples,
which maybe used to calibrate the model. In this release, TLT only supports calibration
tensorfile generation for DetectNet_v2 and classification models.
Here's an example of using the tlt-int8-tensorfile command to generate a
calibration tensorfile for a DetectNet_v2 model.
tlt-int8-tensorfile {classification, detectnet_v2} [-h]
-e <path to training experiment spec file>
-o <path to output tensorfile>
-m <maximum number of batches to serialize>
[--use_validation_set]
Positional arguments:
classification or detectnet_v2
Required arguments:
‣ -e, --experiment_spec_file: Path to the experiment spec file. (Only required
for SSD and FasterRCNN.)
‣ -o, --output_path: Path to the output tensorfile that will be created.
‣ -m, --max_batches: Number of batches of input data to be serialized.
Optional argument
‣ --use_validation_set: Flag to use validation dataset instead of training set.
Here's a sample command to invoke the tlt-int8-tensorfile command for a
classification model.
-m 10
-o $USER_EXPERIMENT_DIR/export/
calibration.tensor
Required arguments:
‣ -i: Path to the model exported using tlt-export.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 94
Analytics
Exporting the model
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 95
Analytics
Exporting the model
tlt-export $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/
resnet18_detector_pruned.tlt \
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
--outputs output_cov/Sigmoid,output_bbox/BiasAdd \
-k $KEY \
--input_dims 3,512,512 \
--max_workspace_size 1100000 \
--export_module detectnet_v2 \
--cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/
calibration.tensor \
--data_type int8 \
--batches 10 \
--cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/
calibration.bin
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 96
Analytics
Chapter 11.
DEPLOYING TO DEEPSTREAM
The deep learning and computer vision models that you train are meant for deployment
on edge devices, such as a Jetson Xavier, Jetson Nano or a Tesla T4. Some of these
devices may not be as rich in compute resources or power, as the larger servers where
the Transfer Learning Toolkit (TLT) docker maybe hosted. To facilitate this diversity of
computational platforms, TLT has been designed to integrate with DeepStream video
analytics. To deploy a model trained by TLT to DeepStream you can:
1. Generate a device specific optimized TensorRT engine, using tlt-converter which
may then be ingested by DeepStream
2. Integrate the model directly in the DeepStream environment using the exported
model file generated by tlt-export.
Machine specific optimizations are done as part of the engine creation process, so a
distinct engine should be generated for each environment and hardware configuration.
If the inference environment's TensorRT or CUDA libraries are updated – including
minor version updates – new engines should be generated. Running an engine that was
generated with a different version of TensorRT and CUDA is not supported and will
cause unknown behavior that affects inference speed, accuracy, and stability, or it may
fail to run altogether.
Generating an engine using tlt-converter
Setup and Execution
The tlt-converter is a tool that is provided with the Transfer Learning Toolkit to
facilitate the deployment of TLT trained models on TensorRT and/or Deepstream. For
deployment platforms with an x86 based CPU and discrete GPU's, the tlt-converter
is distributed within the TLT docker. Therefore, it is suggested to use the docker to
generate the engine. However, this requires that the user adhere to the same minor
version of TensorRT as distributed with the docker. The TLT docker includes TensorRT
version 5.1.5. In order to use the engine with a different minor version of TensorRT, it
would be best to copy over the converter from /opt/nvidia/tools/tlt-converter
to the target machine and follow the instructions mentioned below to run it and
generate a TensorRT engine.
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 97
Analytics
Deploying to DeepStream
For the Jetson platform, the tlt-converter is available to download in the dev zone here.
Once the tlt-converter is downloaded, please follow the instructions metioned below to
generate a TensorRT engine.
1. Install the open ssl package using the command: sudo apt-get install
libssl-dev
2. Install Tensorrt 5.1 for the respective target machine from here.
Deploying SSD and Faster RCNN requires custom plugins that are currently
a.
not available with TensorRT 5.1 GA. Therefore, inorder to deploy these
models, please follow the instructions on how to build the TRT Open
Source Software repo and replace the system lib /usr/lib/aarch64-
linux-gnu/libnvinfer_plugin.so.5.x.x with the newly built lib
libnvinfer_plugin.so.5.x.x.
b. For Jetson devices, TensorRT 5.1 should come pre-installed with the JetPack.
3. Locate the tlt-converter inside the inference environment and add its parent
directory to the system path.
4. Run the tlt-converter using the sample command below and generate the
engine.
Make sure to follow the output node names as mentioned in CLI below or from
Exporting the model.
Required arguments:
‣ input_file: Path to the model exported using tlt-export.
‣ -k: The API key used to configure the ngc cli to download the models.
‣ -d: Comma-separated list of input dimensions that should match the dimensions
used for tlt-export. Unlike tlt-export this cannot be inferred from calibration data.
‣ -o: Comma-separated list of output blob names that should match the output
configuration used for tlt-export. For classification use: predictions/Softmax.
‣ For detection: output_bbox/BiasAdd,output_cov/Sigmoid
‣ For FasterRCNN: dense_class/Softmax,dense_regress/BiasAdd, proposal
‣ For SSD: NMS
Optional arguments:
‣ -e: Path to save the engine to. (default: ./saved.engine)
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 98
Analytics
Deploying to DeepStream
‣ -t: Desired engine data type, generates calibration cache if in INT8 mode. The
default value is fp32.The options are {fp32, fp16, int8}
‣ -w: Maximum workspace size for the TensorRT engine. The default value is 1<<30.
‣ -i: Input dimension ordering, all other tlt command use NCHW. The default value
is nchw. The options are {nchw, nhwc, nc}.
INT8 Mode Arguments:
‣ -c: Path to calibration cache file, only used in INT8 mode. The default value is ./
cal.bin.
‣ -b: Batch size used during the tlt-export step for INT8 calibration cache generation.
(default: 8).
‣ -m: Maximum batch size of TensorRT engine. The default value is 16.
Sample output log
Sample log for exporting a resnet10 detectnet_v2 model.
Here's a sample:
tlt-converter -k $API_KEY \
-o $OUTPUT_NODES \
-d $INPUT_DIMS \
-e $ENGINE_PATH \
$MODEL_PATH
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 99
Analytics
Deploying to DeepStream
Label file
The label file is a text file, containing the names of the classes that the TLT model is
trained to classify against. The order in which the classes are listed must match the
order in which the model predicts the output. This order maybe deduced from the
classmap.json file, that is generated by TLT. This file is a simple dictionary containing the
class_name to index map. For example, in the sample classification sample notebook file
included with the tlt-docker, the classmap.json file generated for pascal voc would look
like this:
{"sheep": 16,"horse": 12,"bicycle": 1, "aeroplane": 0, "cow": 9,
"sofa": 17, "bus": 5, "dog": 11, "cat": 7, "person": 14, "train": 18,
"diningtable": 10, "bottle": 4, "car": 6, "pottedplant": 15,
"tvmonitor": 19, "chair": 8, "bird": 2, "boat": 3, "motorbike": 13}
The 0th index corresponds to aeroplane, the 1st index corresponds to bicycle,
etc. up to 19 which corresponds to tvmonitor. Here is a sample label.txt file,
classification_labels.txt.
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 100
Analytics
Deploying to DeepStream
..
..
tvmonitor
[property]
gpu-id=0
# preprocessing parameters: These are the same for all classification models
generated by TLT.
net-scale-factor=1.0
offsets=123.67;116.28;103.53
model-color-format=1
batch-size=30
# Model specific paths. These need to be updated for every classfication model.
int8-calib-file=/path/to/int8/cache.bin
labelfile-path=/path/to/label/file.txt
tlt-encoded-model=/path/ to/ exported/ file.etlt
tlt-model-key=<ngc_api_key>
input-dims=c;h;w;0 # where c = number of channels, h = height of the model
input, w = width of model input, 0: implies CHW format.
uff-input-blob-name=input_1
output-blob-names=predictions/Softmax #output node name for classification
Label file
The label file is a text file, containing the names of the classes that the DetectNet_v2
model is trained to detect. The order in which the classes are listed here must match
the order in which the model predicts the output. This order is derived from the order
the objects are instantiated in the cost_function_config field of the DetectNet_v2
experiment config file. Here's an example, of the DetectNet_v2 sample notebook file
included with the tlt-docker, the cost_function_config parameter looks like this:
cost_function_config {
target_classes {
name: "sheep"
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 101
Analytics
Deploying to DeepStream
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: "bottle"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: "horse"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 1.0
}
}
..
..
target_classes {
name: "boat"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: "car"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 102
Analytics
Deploying to DeepStream
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 1.0
}
}
enable_autoweighting: False
max_objective_weight: 0.9999
min_objective_weight: 0.0001
}
sheep
bottle
horse
..
..
boat
car
[property]
gpu-id=0
# preprocessing parameters.
net-scale-factor=0.0039215697906911373
model-color-format=0
# model paths.
int8-calib-file=/path/ to/ int8/ cache.bin
labelfile-path=/path/ to/ labels.txt
tlt-encoded-model=/path/ to/ detectnet_v2/ exported/ file.etlt
tlt-model-key=<ngc api key to decode the model>
input-dims=c;h;w;0 # where c = number of channels, h = height of the model
input, w = width of model input, 0: implies CHW format.
uff-input-blob-name=input_1
batch-size=4
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
num-detected-classes=3
interval=0
gie-unique-id=1
is-classifier=0
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
#enable_dbscan=0
[class-attrs-all]
threshold=0.2
group-threshold=1
## Set eps=0.7 and minBoxes for enable-dbscan=1
eps=0.2
#minBoxes=3
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 103
Analytics
Deploying to DeepStream
Label file
The label file is a text file, containing the names of the classes that the SSD model is
trained to detect. The order in which the classes are listed here must match the order in
which the model predicts the output. This order is derived from the order the objects
are instantiated in the dataset_config field of the SSD experiment config file. For
example, if the dataset_config is:
dataset_config {
data_sources: {
tfrecords_path: "/workspace/tlt-experiments/tfrecords/pascal_voc/
pascal_voc*"
image_directory_path: "/workspace/tlt-experiments/data/VOCdevkit/VOC2012"
}
image_extension: "jpg"
target_class_mapping {
key: "car"
value: "car"
}
target_class_mapping {
key: "person"
value: "person"
}
target_class_mapping {
key: "bicycle"
value: "bicycle"
}
validation_fold: 0
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 104
Analytics
Deploying to DeepStream
[property]
gpu-id=0
net-scale-factor=1.0
offsets=103.939;116.779;123.68
model-color-format=1
labelfile-path=/path/to/labels.txt
tlt-encoded-model=/path/to/ssd/exported/file.etlt
tlt-model-key=<key to decode the model>
input-dims=c;h;w;0 # where c = number of channels, h = height of the model
input, w = width of model input, 0: implies CHW format.
uff-input-blob-name=Input
batch-size=1
output-blob-names=NMS
parse-bbox-func-name=NvDsInferParseCustomSSDUff
custom-lib-path=./nvdsinfer_customparser_ssd_uff/
libnvds_infercustomparser_ssd_uff.so
[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 105
Analytics
Deploying to DeepStream
obtained from the TensorRT Open Source Software (OSS) in GitHub and checkout
the branch release/5.1. Please follow the installation guide here, compile the
open sourced plugins, and replace the libnvinfer_plugin.* in the installation
directory with the one built from TensorRT OSS.
2. To integrate FasterRCNN model into DeepStream, additional DeepStream
plugin is required. It is available here: https://github.com/NVIDIA-AI-IOT/
deepstream_4.x_apps.
3. Replace /Your_deepstream_SDK_v4.0_xxxxx_path with your actual DeepStream
SDK 4.0 path in deepstream_4.x_apps/nvdsinfer_customparser_frcnn_uff/
Makefile and deepstream_4.x_apps/Makefile.
4. Compile the plugin and sample app.
Label file
The label file is a text file, containing the names of the classes that the FasterRCNN
model is trained to detect. The order in which the classes are listed here must match
the order in which the model predicts the output. This order is derived from the order
the objects are instantiated in the class_mapping field of the FasterRCNN experiment
specification file. For example, if the class_mapping label file is:
class_mapping {
key: 'Car'
value: 0
}
class_mapping {
key: 'Van'
value: 0
}
class_mapping {
key: "Pedestrian"
value: 1
}
class_mapping {
key: "Person_sitting"
value: 1
}
class_mapping {
key: 'Cyclist'
value: 2
}
class_mapping {
key: "background"
value: 3
}
class_mapping {
key: "DontCare"
value: -1
}
class_mapping {
key: "Truck"
value: -1
}
class_mapping {
key: "Misc"
value: -1
}
class_mapping {
key: "Tram"
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 106
Analytics
Deploying to DeepStream
value: -1
}
[property]
gpu-id=0
net-scale-factor=1.0
offsets=<image mean values as in the training spec file> # e.g.:
103.939;116.779;123.68
model-color-format=1
labelfile-path=</path/to/labels.txt>
tlt-encoded-model=</path/to/etlt/model>
tlt-model-key=<key to decode the model>
uff-input-dims=<c;h;w;0> # 3;272;480;0. Where c = number of channels, h = height
of the model input, w = width of model input, 0: implies CHW format
uff-input-blob-name=<input_blob_name> # e.g.: input_1
batch-size=<batch size> e.g.: 1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
num-detected-classes=<number of classes to detect(including background)> # e.g.:
5
interval=0
gie-unique-id=1
is-classifier=0
#network-type=0
output-blob-names=<output_blob_names> e.g.: dense_regress/BiasAdd;dense_class/
Softmax;proposal
parse-bbox-func-name=NvDsInferParseCustomFrcnnUff
custom-lib-path=./nvdsinfer_customparser_frcnn_uff/
libnvds_infercustomparser_frcnn_uff.so
[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0
www.nvidia.com
NVIDIA Transfer Learning Toolkit for Intelligent Video DU-09243-003 _v1.0.1 | 107
Analytics
Notice
THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION
REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED,
STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY
DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A
PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever,
NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall
be limited in accordance with the NVIDIA terms and conditions of sale for the product.
THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED,
MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE,
AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A
SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE
(INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER
LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS
FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR
IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.
NVIDIA makes no representation or warranty that the product described in this guide will be suitable for
any specified use without further testing or modification. Testing of all parameters of each product is not
necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and
fit for the application planned by customer and to do the necessary testing for the application in order
to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect
the quality and reliability of the NVIDIA product and may result in additional or different conditions and/
or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any
default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA
product in any manner that is contrary to this guide, or (ii) customer product designs.
Other than the right for customer to use the information in this guide with the product, no other license,
either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information
in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without
alteration, and is accompanied by all associated conditions, limitations, and notices.
Trademarks
NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, DGX Station,
GRID, Jetson, Kepler, NVIDIA GPU Cloud, Maxwell, NCCL, NVLink, Pascal, Tegra, TensorRT, Tesla and Volta are
trademarks and/or registered trademarks of NVIDIA Corporation in the United States and other countries.
Other company and product names may be trademarks of the respective companies with which they are
associated.
Copyright
© 2019 NVIDIA Corporation. All rights reserved.
www.nvidia.com