0% found this document useful (0 votes)

31 views45 pages

Module2 - Optimization & Quantization of AI Models For Improved Performance

The document outlines Module 2 of the Intel® Distribution of OpenVINO™ Toolkit, focusing on the optimization and quantization of AI models to enhance performance. It covers various techniques such as Model Optimizer, Post-Training Optimization Tool (POT), and the importance of model quantization, alongside practical exercises for hands-on learning. Key learning objectives include understanding model optimization strategies and their impact on inference performance.

Uploaded by

Aayush Bhure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views45 pages

Module2 - Optimization & Quantization of AI Models For Improved Performance

Uploaded by

Aayush Bhure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Intel® Distribution of OpenVINO™ Toolkit

Digital Courseware for

Educators
Course: Deploying Deep Learning Applications
Intel® Distribution of OpenVINO™ Toolkit

Deploying Deep Learning

Applications
MODULE 2: Optimization & Quantization of AI Models for
Improved Performance
Notices and disclaimers

Performance varies by use, configuration, and other factors. Learn more at intel.com/PerformanceIndex .
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available
updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel® technologies may require enabled hardware, software, or service activation.
Intel® optimizations, for Intel® compilers or other products, may not optimize to the same degree for non-Intel products.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Results have been estimated or simulated.
Intel is committed to respecting human rights and avoiding complicity in human rights abuses.
See Intel’s Global Human Rights Principles. Intel® products and software are intended only to be used in
applications that do not cause or contribute to a violation of an internationally recognized human right.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.
Other names and brands may be claimed as the property of others.

3
Module 2
Optimization & Quantization of AI
Models for Improved Performance

4
Table of Contents
• Model Optimizer
• Setting Inputs Shape of a Model
• Cutting Off Parts of a Model
• Model Optimizer Optimization Techniques
• Generic Optimization
• Framework or topology specific optimization
• Model Quantization
• Compression of a Model
• Post Training Optimization (POT)
• Benchmark Tool
• Hands-on Labs
• Exercise 1 : Download a model from OMZ using OpenVINO™ Notebooks - 104-model- tools
• Exercise 2 : Tiny YOLO* V3 to IR conversion using OpenVINO toolkit

5
Module 2: Learning Objective

• Recognize the importance of optimizing and tuning pre-trained models for AI Inference.
• Understand the Model Optimizer, Post-training Optimization Tool, and their functions.
• Learn about OpenVINOTM Intermediate Representation (IR).
• Implement the model optimization strategies–Quantization and Topology optimization.
• Understand the workflow and factors to consider when using quantization for Deep Learning models.
• Work on practical projects to understand the difference between pre- and post-optimization model
performance.

6
Module 2: Learning Outcomes

After completing this module, students should be able to:

• Explain why optimization and tuning Deep Learning models for inference are necessary.
• Use the Model Optimizer and POT tools from the OpenVINOTM toolkit and become acquainted with how to use them.
• Make informed technical decisions in order to select the best optimization strategy.
• Describe the advantages and disadvantages of various model optimization strategies.

7
Module 2: Key Questions Addressed
• Why do pre-trained Deep Learning models need further optimization?
• What exactly is a Model Optimizer? What roles does it play?
• What is the Intermediate Representation (IR) used by the OpenVINO™ toolkit?
• What is quantization, and what factors need to be kept in mind while using this optimization method?
• What are the different optimization strategies available with the OpenVINO™ toolkit?
• What is the Post-training Optimization tool? How is it useful?
• How can you benchmark model performance with the OpenVINO™ toolkit?

8
Model Optimizer

9
Convert model with Model Optimizer
▪ A Python* based tool to read trained models and The simplest way to convert a model is:
convert them to Intermediate Representation format > mo --input_model <INPUT_MODEL>
▪ Optimizes for performance or space with conservative To get the full list of conversion parameters
topology transformations available in Model Optimizer, run the following
▪ Hardware-agnostic optimizations command:
> mo --help

Get Your Model

Intermediate
Run Model Representation
Optimizer (IR)
.xml and .bin

10
Discussion Points

• What are some other reasons for pre-trained models requiring optimization?
• What are some tradeoffs that need to be kept in mind while optimizing a pre-trained
model?

11
Model Optimizer: Generic Optimization
Operations Pruning
Linear Operations Fusion Example
Drop unused operations that only matter for
training

Linear Operations Fusion

1. BatchNorm and ScaleShift decomposition
• BN operations decompose to Mul->Add->Mul-
>Add sequence
• ScaleShift operations decompose to Mul->Add
sequence.

2. Linear Operations merge: Merges sequences

of Mul and Add operations to a single Mul-
>Add instance.
After MO
3. Linear Operations Fusion: Fuses Mul and Add
Caffe* Resnet269 (IR)
operations to Convolution or Fully Connected
layers.
Before MO
Caffe* Resnet269

12
Setting Input Shapes
Use the CLI options --input_shape. Model Optimizer supports conversion of models with dynamic
input shapes that contain undefined dimensions.
However, if the shape of data is fixed, then it’s recommended to set up fully defined shape for the
inputs. It can be beneficial from a performance perspective and memory consumption.

• Example 1: Run the Model Optimizer for the TensorFlow* MobileNet model with the single input
and specify input shape [2,300,300,3].
mo --input_model MobileNet.pb --input_shape [2,300,300,3]

• Example 2: Run the Model Optimizer for the ONNX* OCR model with a pair of inputs, data and
seq_len. Then specify shapes [3,150,200,1] and [3] respectively.
mo --input_model ocr.onnx --input data,seq_len --input_shape [3,150,200,1],[3]

Learn more about Input Shape

https://intel.ly/DEh4Af 13
Cutting Off Parts of a Model
Use the CLI options --input and --output
Here are several reasons some parts of a model must be removed by the Model Optimizer while converting
models to the Intermediate Representation

The Model has a The Model contains

Single custom layer
The Model has pre- training part that is lots of unsupported There is a Problem
or a combination of
or post-processing convenient to be operations that with model
custom layers is
parts that cannot be kept in the model, cannot be easily conversion in the
isolated for
translated but not used during implemented as Model Optimizer
debugging purposes
inference custom layers

Example: Cutting at the End Example: Cutting from the Beginning

mo --input_model inception_v1.pb -- mo --input_model inception_v1.pb --input
output=InceptionV1/InceptionV1/Conv2d_1a_7x7/ 0:InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu
Relu

Learn more about Cutting Off Parts of Model

https://intel.ly/DE38291 14
Embedding Preprocessing Computation
Use the CLI options --mean_values, --scale, --reverse_input_channels, and --layout
• The input color channel sequence for inference can be different from the training data set
• The Model Optimizer (MO) generates IR with additional subgraphs inserted that perform the defined
preprocessing causing it to be performed on the inference device.

Run the MO on a PaddlePaddle* UNet

mo --input_model unet.pdmodel --mean_values [123,117,104]
model and apply mean and scale
normalization to the input data --scale 255

Run the MO on a TensorFlow* AlexNet

model and embed a reverse_input_channel mo --input_model alexnet.pb --reverse_input_channels
preprocessing block into IR.

Run the MO on an ONNX* NASNet model

to convert the layout to NCHW
mo --input_model tf_nasnet_large.onnx --layout "nhwc->nchw"

Learn more about Embedding Processing Computation

https://intel.ly/DE3z2tX 15
Framework or topology specific optimization
Grouped Convolutions Fusing

Grouped convolution fusing is a specific optimization that applies for TensorFlow*

topologies. The main idea of this optimization is to combine convolutions results for the
Split outputs and then recombine them using Concat operation in the same order as they
were out from Split.

You can read about other strategies in depth in the Model Optimizer documentation 16
Discussion Points

• What are some changes you noticed between the original model's IR and the optimized
OpenVINO™ IR?
• Why is it essential for MO's model optimizations to be hardware agnostic?

17
Model Quantization
Model Quantization is a method of representing Deep Learning models using less memory
Most Deep Learning models are trained using full precision or FP32 representation. But research has
indicated that you can perform inference with lower numerical precision and minimal change in model
accuracy.
At lower numerical representation, INT8 or a similar data format is used to store the weights and biases of
the deep learning model.

Plugin FP32 FP16 I8

CPU plugin Supported and preferred Supported Supported
GPU plugin Supported Supported and preferred Supported

VPU plugins Not supported Supported Not supported

GNA plugin Supported Supported Not supported
Arm® CPU plugin Supported and preferred Supported Supported
(partially)

In addition to accuracy consideration, you need to ensure that the hardware platform supports the data format of your
quantization. Figure above provides this compatibility information.
18
Compression of a Model to FP16
Use the CLI option --data_type
• Model Optimizer can convert all floating-point weights to 16-bit.
• The resulting model will occupy about half the disk space and runtime memory.
• FP16 is the recommended data type for GPU optimizations and is the only supported data type for
MYRIAD VPUs

mo --input_model INPUT_MODEL --data_type FP16

Note: FP16 compression may have some accuracy drop, although for the majority of
models accuracy degradation is negligible.

Learn more about FP16 compression

https://intel.ly/DE3FriH
19
Discussion Points

• Which models are not trained in INT8 precision?

• In addition to quantization, what are some other optimization techniques that you can
apply to a model after training?

20
Post-Training Optimization Tool (POT)

21
Overview of Post-Training Optimization Tool
The POT uses a conversion
technique that reduces the
model size into low precision
without retraining POT
Configuration
• Improves latency with little Model (CLI/API)
degradation in model accuracy
• Different optimization OpenVINO™ Optimized
approaches are supported: Model IR model Post-Training INT8
Optimizer FP32 or FP16 Optimization Tool OpenVINO
quantization algorithms, etc. .xml & .bin IR model

• Available as a command line

tool and API.
Dataset

Learn more about the POT overview

https://intel.ly/POT 22
Command Line Tool
Simplified Engine uses Default Quantization Algorithm
Command:
• Simplified mode is designed to pot \
-q default \
make data preparation for the -m <path_to_xml> \
model optimization process -w <path_to_bin> \
--engine simplified \
easier. --data-source <path_to_data>

POT
Configuration
• The Default Quantization (CLI)
algorithm
• is designed to do a fast and, in OpenVINO™ Post-Training Optimization Optimized
many cases, accurate IR model
Tool INT8
FP32 or FP16 OpenVINO
quantization. .xml & .bin IR model
• The accuracy metric does not
change but provides a lot of
knobs that can be used to
improve it. Dataset

Learn more about Parameters of Default Quantization Algorithm:

https://intel.ly/DE3R7qr 23
Command Line Tool
Accuracy Checker Engine uses Accuracy-Aware Quantization Algorithm
• When using the Default Command:
Quantization Algorithm introduces pot \
-q accuracy_aware \
a significant accuracy degradation, -m <path_to_xml> \
the Accuracy-Aware Quantization -w <path_to_bin> \
--ac-config <path_to_AC_config_yml>\
algorithm can be used to check that --max-drop 0.01
the accuracy remains within the
pre-defined drop range. Accuracy Checker POT
Configuration Configuration
• The drop range is the maximum .yaml (CLI)
amount of the accuracy loss the
developer will allow
OpenVINO™ Post-Training Optimization Optimized
• This may cause a degradation in IR model Tool OpenVINO
inference compute performance in FP32 or FP16 INT8
.xml & .bin IR model
comparison to the Default
Quantization algorithm because
some layers can be reverted to the
original precision.
Dataset

Learn more about Parameters of Accuracy-Aware Quantization Algorithm

https://intel.ly/DE3VDvo 24
Configuration of Accuracy Checker
• The Accuracy Checker configuration .yaml file declares the validation process.
• Every validated model must have its entry with distinct name, launcher, datasets, and other properties

models:
- name: mobilenet-ssd
launchers:
- framework: openvino #backend frameworks for Accuracy Checker
adapter: ssd #Adapter converts raw output produced by framework to high level problem specific
representation (e.g., ClassificationPrediction, DetectionPrediction, etc).
datasets:
- name: VOC2007_detection
data_source: <DATASET_PATH>
preprocessing: #list of preprocessing steps applied to input data.
- type: resize
size: 300
postprocessing: #list of postprocessing steps.
- type: resize_prediction_boxes
metrics: #list of metrics that should be computed.
- type: map
integral: 11point
ignore_difficult: True
presenter: print_scalar
reference: 0.67

Learn more about Accuracy Checker

https://intel.ly/DE38L9B 25
Command Line Tool
Customized Config through Configuration File Description
The configuration .json file contains all the parameters required by POT.

Command:
pot -c mobilenet-ssd.json

Accuracy Checker
POT Configuration
Configuration
(CLI reads .json file)
.yaml

Post-Training Optimization Tool

OpenVINO™ IR
Optimized
model
OpenVINO INT8
FP32 or FP16
IR model
.xml & .bin

Dataset

26
Sample Configuration File of POT
{
"model": {
"model_name": "mobilenet-ssd",
Logically all parameters are divided into "model": "./public/mobilenet-ssd /FP32/mobilenet-
ssd.xml",
three groups: "weights": "./public/mobilenet-ssd /FP32/mobilenet-
• Model parameters are related to the model ssd.bin"
},
definition
"engine": {
• Engine parameters define parameters of the "config“: "./mobilenet-ssd.yaml"
engine that are responsible for the model },
inference and data preparation used for "compression": {
optimization and evaluation “algorithm”: {
"name": "AccuracyAwareQuantization",
• Compression parameters are related to the "params":
optimization algorithm {
"preset": "performance",
"stat_subset_size": 300,
"maximal_drop": 0.01
}
}
}
}

Learn more about Configuration File

https://intel.ly/DE3AGwo 27
OpenVINO™ Notebooks
114-quantization-simplified-mode
• This tutorial demonstrates how to perform INT8
quantization with an image classification model
using the Post-Training Optimization Tool
Simplified Mode (part of OpenVINO).
• We use ResNet20 model and Cifar10 dataset.

• The code in this tutorial is designed to extend to

custom models and datasets. It consists of the
following steps:
• Download and prepare the ResNet20 model and
calibration dataset
• Prepare the model for quantization
• Compress the model using the simplified mode
• Compare performance of the original and quantized
https://intel.ly/DE3R8dQ models
• Demonstrate the results of the optimized model

28
POT Python* API
Default Quantization algorithm using an unannotated dataset
To use this method, you need to create a Python* script that implements data loader and quantization pipeline:
1. Prepare data and dataset interface - Using openvino.tools.pot.DataLoader
2. Select quantization parameters - Same as the configuration.json file but in your Python code
3. Define and run quantization process - from openvino.tools.pot import IEEngine, load_model, save_model,
compress_model_weights, create_pipeline

Post-Training Optimization Tool API

OpenVINO™
load_model()
DataLoader FP32 IR Model
Data

Engine IEEngine Pipeline

Configuration OpenVINO
save_model()
INT8 IR Model
User’s Implementation Existing API helpers

Learn more about POT Python API for Default Quantization Algorthm
https://intel.ly/DE4bwYp 29
Code Example of Defining DataLoader
for Image Dataset
In most cases, it is required to implement only
openvino.tools.pot.DataLoader interface which allows
acquiring data from a dataset and applying model-
specific pre-processing providing access by index. Any
implementation should override the following methods:

• len(), returns the size of the dataset

• __getitem__(), provides access to the data by index in
range of 0 to len(self). It also can encapsulate the logic
of model-specific pre-processing. The method should
return data in the following format:
• (data, annotation)

Learn more about Text and Audio DataLoader

https://intel.ly/DE3Q4Yp 30
Code Example of Creating Pipeline
• POT API provides its own methods to load and save model objects from OpenVINO Intermediate
Representation: load_model and save_model.
• It also has a concept of Pipeline that sequentially applies specified optimization methods to the model.
create_pipeine method is used to instantiate a Pipeline object.
• A code example below shows a basic quantization workflow:
Defined in the previous page

POT Pipeline API

Same as configuration.json file
31
POT Python* API 1. Prepare data and dataset interface -
openvino.tools.pot.DataLoader
Quantizing with Accuracy Control 2. Define accuracy metric - openvino.tools.pot.Metric
3. Select quantization parameters - same as the
• This method assumes that users already tried configuration.json file but in your Python code
Default Quantization for the same model, but it 4. Define and run quantization process -
introduced a significant accuracy degradation.
from openvino.tools.pot load_model, save_model
• Some layers can be reverted to the original from openvino.tools.pot import
precision. compress_model_weights, create_pipeline

Post-Training Optimization Tool API

OpenVINO™
load_model()
DataLoader FP32 IR Model
Data

Metric Engine IEEngine Pipeline

Configuration OpenVINO
save_model()
INT8 IR Model
User’s Implementation Existing API helpers

Learn more about POT Python API for Accuracy Control

https://intel.ly/AccuracyAware 32
Code Example of
Defining Accuracy Metric
• To control accuracy during the optimization a
openvino.tools.pot.Metric interface should be implemented.
Each implementation should override the following
• properties:
• value - returns the accuracy metric value for the last model
output in a format of Dict[str, numpy.array].
• avg_value - returns the average accuracy metric over
collected model results in a format of Dict[str,
numpy.array].
• higher_better - returns True if a higher value of the metric
corresponds to better performance, otherwise, returns
False.
• methods:
• update(output, annotation) - calculates and updates the
accuracy metric value using the last model output and
annotation.
• reset() - resets collected accuracy metric.
• get_attributes() - returns a dictionary of metric attributes:
• direction - (higher-better or higher-worse) a string
parameter defining whether metric value should be
increased in accuracy-aware algorithms.
• type - a string representation of metric type. For
example, ‘accuracy’ or ‘mean_iou’.

Learn more about POT Python API for Accuracy Control

33
https://intel.ly/DE3pYMd
Code Example of Quantization Workflow with
Accuracy Control
User defined function - Depends on type of dataset

Example on the previous slide

Building Pipeline
Same as configuration.json file

34
Sample Application
Quantizing Object Detection Model with Accuracy Control

▪ This example demonstrates the use of the

Post-Training Optimization Toolkit API to
quantize an object detection model in the
accuracy-aware mode.

https://intel.ly/DE3d0gzY ▪ The MobileNetV1 FPN model from

TensorFlow* for object detection task is
Three files you can be found in this sample: used for this purpose.

▪ A custom DataLoader is created to load

the COCO dataset for object detection
task and the implementation of mAP
COCO is used for the model evaluation.

35
OpenVINO™ Notebooks - 105-language-quantize-bert
This tutorial demonstrates how to apply INT8 quantization to
the Natural Language Processing model BERT, using the Post-
Training Optimization Tool API (part of OpenVINO). We will use
HuggingFace BERT PyTorch model fine-tuned for Microsoft
Research Paraphrase Corpus (MRPC) task. The code of the
tutorial is designed to be extendable to custom models and
datasets.
https://intel.ly/DE3HQXJ

OpenVINO™ Notebooks - 111-detection-quantization

This tutorial shows how to quantize an object detection model,
using OpenVINO's Post-Training Optimization Tool API. For
demonstration purposes, we use a very small dataset of 10
images presenting people at the airport. The images have been
resized from the original resolution of 1920x1080 to 960x540.
For any real use cases, a representative dataset of about 300
images would have to be applied. The model used is person-
detection-retail-0013
https://intel.ly/DE3RaGN

36
Benchmark Tool
The benchmark app allows you to benchmark your model's throughput and latency. Performance for
a particular application can also be evaluated virtually using Intel® DevCloud for the Edge Workloads,
a remote development environment with access to Intel® hardware and the latest versions of the
Intel® Distribution of the OpenVINO™ Toolkit.

Basic Usage
The Python benchmark_app is automatically installed when you install OpenVINO Developer Tools
using PyPI. Before running benchmark_app, make sure the openvino_env virtual environment is
activated, and navigate to the directory where your model is located.
The benchmarking application works with models in the OpenVINO IR (model.xml and model.bin)
and ONNX* (model.onnx) formats. Make sure to convert your models if necessary.

To run benchmarking with default options on a model, use the following command:
benchmark_app -m model.xml

37
Summary
• We learned about the concepts and applications of artificial intelligence in this module.

38
Summary
• We learned about optimizing Deep Learning Models in this module.
• To begin, we learned why pre-trained deep learning models must be optimized for
inference.
• Then we delved deep into the OpenVINO™ toolkit's optimization tools, including the
Model Optimizer, Post-Training Optimization Tool, and other supporting software.

39
Hands-on Lab

40
Hands-on Labs
Exercise 1: Download a model from OMZ using OpenVINO Notebooks - 104-model- tools

In this exercise, you will learn how to

download a model from Open Model Zoo,
convert it to OpenVINO™ IR format, show
information about the model, and
benchmark the model.

https://intel.ly/104-model-tools

41
Hands-on Labs
Exercise 2: Tiny YOLO* V3 to IR conversion using OpenVINO™ toolkit

This exercise will show you how to use

Model Optimizer in the Intel® DevCloud for
Edge Workloads to convert the Tiny
YOLO* V3 model to IR.

https://intel.ly/tinyyolov3-IR

42
System configuration
System board Intel prototype, TGL U DDR4 SODIMM RVP ASUSTek COMPUTER INC./Prime z370-a

CPU 11th Gen Intel® Core™ i5-1145G7 @ 2.6 GHz 8th Gen Intel ® Core™ i5-8500t @ 3.0 GHz

Sockets/physical cores 1/4 1/6

Hyperthreading/turbo setting Enabled/On NA/On

Memory 2 x 8198 MB 3200 MT/s DDR4 2 x 16384 MB 2667 MT/s DDR4

OS Ubuntu 18.04 LTS Ubuntu 18.04 LTS

Kernel 5.8.0-050800-generic 5.3.0-24-generic

Software Intel® Distribution of OpenVINO™ toolkit 2021.1.075 Intel Distribution of OpenVINO toolkit 2021.1.075

BIOS Intel TGLIFUI1.R00.3243.A04.2006302148 AMI, version 2401

BIOS release date June 30, 2020 July 12, 2019

BIOS setting Load default settings Load default settings, set XMP to 2667

Test date September 9, 2020 September 9, 2020

Precision and batch size CPU: int8, GPU: FP16-int8, batch size: 1 CPU: int8, GPU: FP16-int8, batch size: 1

Number of inference requests 4 6

Number of execution streams 4 6

Power (TDP link) 28W 35W

Price (USD) link on 02/25/2022

USD 312 USD 192
Prices may vary

1) Memory is installed such that all primary memory slots are populated.
2) Testing by Intel as of September 9, 2020.

44
Compounding effect of hardware and software configuration
See the compounding effect

1) Purley E63448-400,
System board 2) Intel® Server Board S2600STB 3) Intel Server Board S2600STB 4) Intel® Internal Reference System
Intel® Internal Reference System

CPU Intel® Xeon® Silver 4116 @ 2.1 GHz Intel® Xeon® Silver 4216 CPU @ 2.10 GHz Intel® Xeon® Silver 4216R CPU @ 2.20 GHz Intel® Xeon® Silver 4316 CPU @ 2.30 GHz

Sockets, physical cores/socket 2, 12 2, 16 2, 16 2, 20

Hyperthreading/turbo setting Enabled/On Enabled/On Enabled/On Enabled/On

Memory 12x 16 GB DDR4 2400 MHz 12x 64 GB DDR4 2400 MHz 12x 32GB DDR4 2666 MHz 16 x32GB DDR4 2666 MHz

OS UB-16.04.3 LTS UB-18.04 LTS UB-18.04 LTS UB-20.04 LTS

Kernel 4.4.0-210-generic 4.15.0-96-generic 5.3.0-24-generic 5.13.0-rc5-intel-next+

Intel® Distribution of OpenVINO™ toolkit Intel® Distribution of OpenVINO™ toolkit Intel® Distribution of OpenVINO™ toolkit Intel® Distribution of OpenVINO™ toolkit
Software
R5 2018 R3 2019 2021.2 2021.4.1
Intel Corporation
BIOS PLYXCRB1.86B.0616.D08.2109180410 — SE5C620.86B.02.01. WLYDCRB1.SYS.0020.P93.2103190412
0009.092820190230
BIOS release date September 18, 2021 — September 28, 2019 March 19, 2021
Select optimized default settings, Select optimized default settings,
Select optimized default settings, Select optimized default settings,
BIOS setting change power policy to "performance," change power policy to "performance,"
save, and exit save, and exit
save, and exit save, and exit
Test date October 8, 2021 September 27, 2019 December 24, 2020 September 6, 2021

Precision and batch size FP32/Batch 1 int8/Batch 1 int8/Batch 1 int8/Batch 1

Workload: model/image size MobileNet-SSD/300x300 MobileNet-SSD/300x300 MobileNet-SSD/300x300 MobileNet-SSD/300x300

Number of inference requests 24 32 32 10

Number of execution streams 24 32 32 10

Power (TDP link) 170W 200W 250W 300W

Price (USD) link on 02/25/2022

USD 2,024 USD 1,926 USD 2,004 USD 2,166
Prices may vary

Carton Packaging Knowledge
88% (8)
Carton Packaging Knowledge
93 pages
Introduction To TensorFlow For Artificial Intelligence
No ratings yet
Introduction To TensorFlow For Artificial Intelligence
41 pages
Module 3 - Creating Scalable and Future-Ready AI Applications With The OpenVINO Runtime
No ratings yet
Module 3 - Creating Scalable and Future-Ready AI Applications With The OpenVINO Runtime
48 pages
Module5 - Streamlining AI Application Development and Deployment With Deep Learning Workbench
No ratings yet
Module5 - Streamlining AI Application Development and Deployment With Deep Learning Workbench
34 pages
1725876250-Unit 3 Computer Vision With OpenVINO
No ratings yet
1725876250-Unit 3 Computer Vision With OpenVINO
30 pages
Modul 11 Optimasi Model Deep Learning
No ratings yet
Modul 11 Optimasi Model Deep Learning
9 pages
Demidovskij 2021 J. Phys. Conf. Ser. 1828 012012
No ratings yet
Demidovskij 2021 J. Phys. Conf. Ser. 1828 012012
9 pages
IMP Deep Learning
No ratings yet
IMP Deep Learning
9 pages
AMX Quick Start Guide
No ratings yet
AMX Quick Start Guide
1 page
Matlab
100% (2)
Matlab
162 pages
Lecture 14 Introduction To Pytorch
No ratings yet
Lecture 14 Introduction To Pytorch
45 pages
Fibercablelength Understanding
No ratings yet
Fibercablelength Understanding
5 pages
Nnet - Ug 1 150 PDF
No ratings yet
Nnet - Ug 1 150 PDF
150 pages
Bio Optimization of Deep Learning Network Architectures 22fguqp5
No ratings yet
Bio Optimization of Deep Learning Network Architectures 22fguqp5
11 pages
Module 10 - Learners Guide
No ratings yet
Module 10 - Learners Guide
29 pages
15 ML
No ratings yet
15 ML
60 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
GENAI Questions
No ratings yet
GENAI Questions
14 pages
Lab 9
No ratings yet
Lab 9
29 pages
Deep Learning With Matlab Quick Start Guide PDF
No ratings yet
Deep Learning With Matlab Quick Start Guide PDF
1 page
2023-01 How Nvidia's CUDA Monopoly in Machine Learning Is Breaking - OpenAI Triton and PyTorch 2.0
No ratings yet
2023-01 How Nvidia's CUDA Monopoly in Machine Learning Is Breaking - OpenAI Triton and PyTorch 2.0
19 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Warboy Brochure
No ratings yet
Warboy Brochure
10 pages
MATLAB With TensorFlow and PyTorch (TechSource)
No ratings yet
MATLAB With TensorFlow and PyTorch (TechSource)
32 pages
TENSORRT
No ratings yet
TENSORRT
26 pages
DL Pipeline and Tutorial
No ratings yet
DL Pipeline and Tutorial
36 pages
Tensorflow Proposal
No ratings yet
Tensorflow Proposal
3 pages
DL Practical PROGRAM
No ratings yet
DL Practical PROGRAM
28 pages
Large-Scale Deep Learning With Tensorflow: Jeff Dean Google Brain Team
No ratings yet
Large-Scale Deep Learning With Tensorflow: Jeff Dean Google Brain Team
119 pages
OpTorch Optimized Deep Learning Architectures For
No ratings yet
OpTorch Optimized Deep Learning Architectures For
7 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
10 pages
15 Ways To Lower LLM Costs
No ratings yet
15 Ways To Lower LLM Costs
17 pages
CV Lecture 4-Donnnn
No ratings yet
CV Lecture 4-Donnnn
65 pages
Major Code
No ratings yet
Major Code
12 pages
GlobalLogic - Optimization Algorithms For Machine Learning
No ratings yet
GlobalLogic - Optimization Algorithms For Machine Learning
4 pages
Parameters To Fine Tune Large Language Models
No ratings yet
Parameters To Fine Tune Large Language Models
4 pages
Deep Learning
No ratings yet
Deep Learning
32 pages
Improved Performance Research Integration Tool User Guide - Version 4.6
From Everand
Improved Performance Research Integration Tool User Guide - Version 4.6
Beth Plott
No ratings yet
Elaborate On The Significance of Hyperparameter Optimization
No ratings yet
Elaborate On The Significance of Hyperparameter Optimization
5 pages
Pytorch
No ratings yet
Pytorch
4 pages
Intro To PyTorch and Neural Networks - Intro To PyTorch and Neural Networks Cheatsheet - Codecademy
No ratings yet
Intro To PyTorch and Neural Networks - Intro To PyTorch and Neural Networks Cheatsheet - Codecademy
8 pages
ML Midterm Cheatsheet
No ratings yet
ML Midterm Cheatsheet
2 pages
RA2211026010557 - SEAI Scenario 2
No ratings yet
RA2211026010557 - SEAI Scenario 2
3 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Exam Tentative
No ratings yet
Exam Tentative
5 pages
NB4-08 PT III Early Stopping
No ratings yet
NB4-08 PT III Early Stopping
6 pages
Keras Applications
No ratings yet
Keras Applications
16 pages
Lossless Data Compression With Neural Networks: Fabrice Bellard May 4, 2019
No ratings yet
Lossless Data Compression With Neural Networks: Fabrice Bellard May 4, 2019
11 pages
Report
No ratings yet
Report
14 pages
01 AI Quantizer and AI Compiler - TensorFlow2 and
No ratings yet
01 AI Quantizer and AI Compiler - TensorFlow2 and
21 pages
Deep Learning With PyTorch 1
No ratings yet
Deep Learning With PyTorch 1
1 page
4-Tensors and Opeartions - Probability Basics-Gradient Descent-27!07!2024
No ratings yet
4-Tensors and Opeartions - Probability Basics-Gradient Descent-27!07!2024
18 pages
Training AI Models On CPU. Revisiting CPU For ML
No ratings yet
Training AI Models On CPU. Revisiting CPU For ML
15 pages
A DNN Optimization Framework With Unlabeled
No ratings yet
A DNN Optimization Framework With Unlabeled
5 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Course Contents #1
No ratings yet
Course Contents #1
24 pages
Speeding Up Document Image Classi Cation
No ratings yet
Speeding Up Document Image Classi Cation
59 pages
Universal Model Serving Via Triton and Tensorrt: Ke Ma, Genai@Snap, Inc
No ratings yet
Universal Model Serving Via Triton and Tensorrt: Ke Ma, Genai@Snap, Inc
28 pages
NB4-10 PT V Transfer Learning
No ratings yet
NB4-10 PT V Transfer Learning
16 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
PLC Programming & Implementation: An Introduction to PLC Programming Methods and Applications
From Everand
PLC Programming & Implementation: An Introduction to PLC Programming Methods and Applications
Ojula Technology Innovations
No ratings yet
Disclosure To Promote The Right To Information: IS 9875 (1990) : Lipstick (PCD 19: Cosmetics)
No ratings yet
Disclosure To Promote The Right To Information: IS 9875 (1990) : Lipstick (PCD 19: Cosmetics)
18 pages
Unit 5: Advanced PHP & Mysql: Web Programming
No ratings yet
Unit 5: Advanced PHP & Mysql: Web Programming
22 pages
Peace and Conflict Studies
No ratings yet
Peace and Conflict Studies
18 pages
Yaskawa SGMGV
No ratings yet
Yaskawa SGMGV
24 pages
23PGHR023 Final Review Ather
No ratings yet
23PGHR023 Final Review Ather
13 pages
A First Book Nature UK Part4
100% (1)
A First Book Nature UK Part4
13 pages
Geol 194 Syllabus Revised
No ratings yet
Geol 194 Syllabus Revised
4 pages
Brighton Spec ASME 80-10 2017 PDF
No ratings yet
Brighton Spec ASME 80-10 2017 PDF
1 page
Basukukya
No ratings yet
Basukukya
9 pages
Dhupguri Report
No ratings yet
Dhupguri Report
11 pages
ANTENATAL ASSESSMENT Form 10
No ratings yet
ANTENATAL ASSESSMENT Form 10
4 pages
The Road To Makkah As God Inspired Book
No ratings yet
The Road To Makkah As God Inspired Book
5 pages
Action Plan in English
No ratings yet
Action Plan in English
4 pages
Gautama Buddha Was Born in Hela Bima
33% (3)
Gautama Buddha Was Born in Hela Bima
62 pages
Experiment 6 Isolation of Eugenol From Cloves TECHNIQUE: Steam Distillation
No ratings yet
Experiment 6 Isolation of Eugenol From Cloves TECHNIQUE: Steam Distillation
2 pages
TD Sba0 en
No ratings yet
TD Sba0 en
3 pages
Method Statement For Installation
No ratings yet
Method Statement For Installation
6 pages
Mitochondrial Disorders Biochemical and Molecular Analysis Methods in Molecular Biology Vol 837 2012th Edition Lee-Jun C. Wong (Editor) Download PDF
100% (2)
Mitochondrial Disorders Biochemical and Molecular Analysis Methods in Molecular Biology Vol 837 2012th Edition Lee-Jun C. Wong (Editor) Download PDF
84 pages
Energy Relationships in Chemical Reactions
No ratings yet
Energy Relationships in Chemical Reactions
11 pages
Chemistry Sheet Haxked - 5
No ratings yet
Chemistry Sheet Haxked - 5
7 pages
Family Worship, Family Worship, J. H. Merle D'aubigné
No ratings yet
Family Worship, Family Worship, J. H. Merle D'aubigné
18 pages
TSR Notes
No ratings yet
TSR Notes
6 pages
Bab 9 Akm
No ratings yet
Bab 9 Akm
44 pages
O Level Forces
No ratings yet
O Level Forces
16 pages
AB Salts WKST Key
No ratings yet
AB Salts WKST Key
10 pages
FIT ZONE Nutrition Plan For MEN by Guru Mann
100% (1)
FIT ZONE Nutrition Plan For MEN by Guru Mann
8 pages
Mandelbrot Zoom Report
No ratings yet
Mandelbrot Zoom Report
9 pages
Dan Glimne Motor Tuning 2 - MC Jan-70
No ratings yet
Dan Glimne Motor Tuning 2 - MC Jan-70
40 pages
Evidence Claim Assessment Worksheet
No ratings yet
Evidence Claim Assessment Worksheet
3 pages

Module2 - Optimization & Quantization of AI Models For Improved Performance

Uploaded by

Module2 - Optimization & Quantization of AI Models For Improved Performance

Uploaded by

Intel® Distribution of OpenVINO™ Toolkit

Digital Courseware for

Deploying Deep Learning

After completing this module, students should be able to:

Get Your Model

Linear Operations Fusion

2. Linear Operations merge: Merges sequences

Learn more about Input Shape

The Model has a The Model contains

Example: Cutting at the End Example: Cutting from the Beginning

Learn more about Cutting Off Parts of Model

Run the MO on a PaddlePaddle* UNet

Run the MO on a TensorFlow* AlexNet

Run the MO on an ONNX* NASNet model

Learn more about Embedding Processing Computation

Grouped convolution fusing is a specific optimization that applies for TensorFlow*

Plugin FP32 FP16 I8

VPU plugins Not supported Supported Not supported

mo --input_model INPUT_MODEL --data_type FP16

Learn more about FP16 compression

• Which models are not trained in INT8 precision?

• Available as a command line

Learn more about the POT overview

Learn more about Parameters of Default Quantization Algorithm:

Learn more about Parameters of Accuracy-Aware Quantization Algorithm

Learn more about Accuracy Checker

Post-Training Optimization Tool

Learn more about Configuration File

• The code in this tutorial is designed to extend to

Post-Training Optimization Tool API

Engine IEEngine Pipeline

• __len__(), returns the size of the dataset

Learn more about Text and Audio DataLoader

POT Pipeline API

Post-Training Optimization Tool API

Metric Engine IEEngine Pipeline

Learn more about POT Python API for Accuracy Control

Learn more about POT Python API for Accuracy Control

Example on the previous slide

▪ This example demonstrates the use of the

https://intel.ly/DE3d0gzY ▪ The MobileNetV1 FPN model from

▪ A custom DataLoader is created to load

OpenVINO™ Notebooks - 111-detection-quantization

In this exercise, you will learn how to

This exercise will show you how to use

Sockets/physical cores 1/4 1/6

Hyperthreading/turbo setting Enabled/On NA/On

Memory 2 x 8198 MB 3200 MT/s DDR4 2 x 16384 MB 2667 MT/s DDR4

OS Ubuntu 18.04 LTS Ubuntu 18.04 LTS

Kernel 5.8.0-050800-generic 5.3.0-24-generic

BIOS Intel TGLIFUI1.R00.3243.A04.2006302148 AMI, version 2401

BIOS release date June 30, 2020 July 12, 2019

Test date September 9, 2020 September 9, 2020

Number of inference requests 4 6

Number of execution streams 4 6

Power (TDP link) 28W 35W

Price (USD) link on 02/25/2022

Sockets, physical cores/socket 2, 12 2, 16 2, 16 2, 20

Hyperthreading/turbo setting Enabled/On Enabled/On Enabled/On Enabled/On

OS UB-16.04.3 LTS UB-18.04 LTS UB-18.04 LTS UB-20.04 LTS

Kernel 4.4.0-210-generic 4.15.0-96-generic 5.3.0-24-generic 5.13.0-rc5-intel-next+

Precision and batch size FP32/Batch 1 int8/Batch 1 int8/Batch 1 int8/Batch 1

Workload: model/image size MobileNet-SSD/300x300 MobileNet-SSD/300x300 MobileNet-SSD/300x300 MobileNet-SSD/300x300

Number of inference requests 24 32 32 10

Number of execution streams 24 32 32 10

Power (TDP link) 170W 200W 250W 300W

Price (USD) link on 02/25/2022

You might also like

• len(), returns the size of the dataset