0% found this document useful (0 votes)

23 views

Lecture06 - Copie

This document discusses various computer vision tasks that can be performed with deep learning models, including classification, image augmentation, transfer learning, object detection, and semantic segmentation. It provides details on convolutional neural networks for classification and how they can be adapted for object detection tasks through methods like OverFeat and YOLO that predict bounding boxes to locate objects in images.

Uploaded by

Charef Wided

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Lecture06 - Copie

Uploaded by

Charef Wided

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Deep Learning

Lecture 6: Computer vision

Prof. Stéphane Gaïffas

https://stephanegaiffas.github.io

1 / 51
Agenda
Computer vision with deep learning:

1. Classi cation
2. Image augmentation
3. Transfer learning / ne tuning
4. Object detection
5. Semantic segmentation

2 / 51
Some of the main computer vision tasks.
Each of them requires a different neural network architecture.

3 / 51
Classi cation
Convolutional neural networks

Convolutional neural networks combine convolution, pooling and fully

connected layers.
They achieve state-of-the-art results for spatially structured data, such as
images, sound or text.

For classi cation

the activation of the output layer is a softmax activation producing a vector p

in the simplex of probability estimates P[Y = c∣x] for c = 1, … , C , where
C is the number of classes and x is the input image

the loss function is the cross-entropy loss

4 / 51
Image augmentation
The lack of data is the biggest limit for the performance of deep learning models

Image augmentation is a form of data augmentation for images

Collecting more data is usually expensive and laborious.
Synthesizing data is complicated and may not represent the true distribution.
Augmenting the data with base transformations is simple and ef cient (e.g., as
demonstrated with AlexNet).

―――
Credits: DeepAugment, 2020. 5 / 51
Image augmentation

―――
Credits: DeepAugment, 2020. 6 / 51
Pre-trained models
Training a model on natural images, from scratch, takes days or weeks.
Many models trained on ImageNet are publicly available for download. These
models can be used as feature extractors or for smart initialization.
The models themselves should be considered as generic and re-usable assets.

7 / 51
Transfer learning
Take a pre-trained network, remove the last layer(s) and then treat the rest of
the network as a xed feature extractor.
Train a model from these features on a new task.
Often better than handcrafted feature extraction for natural images, or better
than training from data of the new task only.

―――
Credits: Mormont et al, Comparison of deep transfer learning strategies for digital pathology, 2018. 8 / 51
Fine-tuning

Same as for transfer learning, but also ne-tune the weights of the pre-trained
network by continuing backpropagation.
All or only some of the layers can be tuned.

―――
Credits: Dive Into Deep Learning, 2020. 9 / 51
In the case of models pre-trained on ImageNet, transferred/ ne-tuned networks
usually work even when the input images for the new task are not photographs of
objects or animals, such as biomedical images, satellite images or paintings.

―――
Credits: Matthia Sabatelli et al, Deep Transfer Learning for Art Classi cation Problems, 2018. 10 / 51
Object detection

11 / 51
The simplest strategy to move from image classi cation to object detection is to
classify local regions, at multiple scales and locations.

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 12 / 51
Intersection over Union (IoU)
A standard performance indicator for object detection is to evaluate the
intersection over union (IoU) between a predicted bounding box B ^ and an
annotated bounding box B ,

^)
area(B ∩ B
^
IoU(B, B ) = .
^)
area(B ∪ B

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 13 / 51
Mean Average Precision (mAP)
^ ) is larger than a xed threshold (usually 1 ), then the predicted
If IoU(B, B 2
bounding-box is valid (true positive) and wrong otherwise (false positive).

TP and FP values are accumulated for all thresholds on the predicted con dence.
The area under the resulting precision-recall curve is the average precision for the
considered class.

The mean over the classes is the mean average precision.

Recall that Precision = TP / all detections and that Recall = TP / all ground truths
14 / 51
The sliding window approach evaluates a classi er at large number of locations and
scales.

This approach is usually very computationally expensive as performance directly

depends on the resolution and number of the windows fed to the classi er (the
more the better, but also the more costly).

15 / 51
OverFeat
The complexity of the sliding window approach
was mitigated in the pioneer OverFeat network
(Sermanet et al, 2013) by adding a regression
head to predict the object bounding box
(x, y, w, h).
For training, the convolutional layers are xed
and the regression network is trained using an ℓ2
loss between the predicted and the true
bounding box for each example.

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 16 / 51
The classi er head outputs a class and a con dence for each location and scale pre-
de ned from a coarse grid. Each window is resized to t with the input dimensions
of the classi er.

―――
Credits: Sermanet et al, 2013. 17 / 51
The regression head then predicts the location of the object with respect to each
window.

―――
Credits: Sermanet et al, 2013. 18 / 51
These bounding boxes are nally merged with an ad-hoc greedy procedure to
produce the nal predictions over a small number of objects.

―――
Credits: Sermanet et al, 2013. 19 / 51
The OverFeat architecture can be adapted to object detection by adding a
"background" class to the object classes.

Negative samples are taken in each scene either at random or by selecting the ones
with the worst miss-classi cation.

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 20 / 51
Although OverFeat is one of the earliest successful networks for object detection,
its architecture comes with several drawbacks:

it is a disjoint system (2 disjoint heads with their respective losses, ad-hoc

merging procedure);
it optimizes for localization rather than detection;
it cannot reason about global context and thus requires signi cant post-
processing to produce coherent detections.

21 / 51
YOLO

YOLO (You Only Look Once; Redmon et al, 2015) models detection as a regression
problem.

It divides the image into an S × S grid and for each grid cell predicts B bounding
boxes, con dence for those boxes, and C class probabilities. These predictions are
encoded as an S × S × (5B + C) tensor.

―――
Credits: Redmon et al, 2015. 22 / 51
For S = 7, B = 2, C = 20, the network predicts a vector of size 30 for each cell.

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 23 / 51
The network predicts class scores and bounding-box regressions, and although the
output comes from fully connected layers, it has a 2D structure.

Unlike sliding window techniques, YOLO is therefore capable of reasoning

globally about the image when making predictions.
It sees the entire image during training and test time, so it implicitly encodes
contextual information about classes as well as their appearance.

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 24 / 51
During training, YOLO makes the assumptions that any of the S × S cells contains
at most (the center of) a single object. We de ne for every image, cell index
i = 1, ..., S × S , predicted box j = 1, ..., B and class index c = 1, ..., C ,

1obj
i is 1 if there is an object in cell i, and 0 otherwise;

1obj
i,j is 1 if there is an object in cell i and predicted box j is the most tting one,
and 0 otherwise;
pi,c is 1 if there is an object of class c in cell i, and 0 and otherwise;
xi , yi , wi , hi the annoted bouding box (de ned only if 1obj
i = 1, and relative
in location and scale to the cell);
ci,j is the IoU between the predicted box and the ground truth target.

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 25 / 51
obj
The training procedure rst computes on each image the value of the 1i,j 's and ci,j ,
and then does one step to minimize the multi-part loss function
S×S B
λcoord ∑ ∑ 1i,j ((xi − x ^ i,j )2 )
obj
î,j )2 + (yi − yî,j )2 + ( wi − ^ i,j )2 + ( hi −
w h
i=1 j=1
S×S B S×S B
+ λobj ∑ ∑ 1obji,j (ci,j − cî,j ) + λnoobj ∑ ∑(1 − 1obj
2
^2i,j
i,j ) c
i=1 j=1 i=1 j=1
S×S C
+ λclasses ∑ 1i ∑(pi,c
obj
− pî,c )2
i=1 c=1

where pî,c , x
î,j , yî,j , w ^ i,j and cî,j are the network outputs.
^ i,j , h

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 26 / 51
Training YOLO relies on many engineering choices that illustrate well how involved
is deep learning in practice:

pre-train the 20 rst convolutional layers on ImageNet classi cation;

use 448 × 448 input for detection, instead of 224 × 224;

use Leaky ReLUs for all layers;

dropout after the rst convolutional layer;
normalize bounding boxes parameters in [0, 1];

use a quadratic loss not only for the bounding box coordinates, but also for the
con dence and the class scores;
reduce weight of large bounding boxes by using the square roots of the size in
the loss;
reduce the importance of empty cells by weighting less the con dence-related
loss on them;
data augmentation with scaling, translation and HSV transformation.

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 27 / 51
YOLO in New York

Redmon, 2017.

28 / 51
SSD
The Single Short Multi-box Detector (SSD; Liu et al, 2015) improves upon YOLO by
using a fully-convolutional architecture and multi-scale maps.

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 29 / 51
Region-based CNNs
An alternative strategy to having a huge prede ned set of box proposals, as in
OverFeat or YOLO, is to rely on region proposals rst extracted from the image.

The main family of architectures following this principle are region-based

convolutional neural networks:

(Slow) R-CNN (Girshick et al, 2014)

Fast R-CNN (Girshick et al, 2015)
Faster R-CNN (Ren et al, 2015)
Mask R-CNN (He et al, 2017)

30 / 51
R-CNN
This architecture is made of four parts:

1. Selective search is performed on the input image to select multiple high-quality

region proposals.
2. A pre-trained CNN is selected and put before the output layer. It resizes each
proposed region into the input dimensions required by the network and uses a
forward pass to output features for the proposals.
3. The features are fed to an SVM for predicting the class.
4. The features are fed to a linear regression model for predicting the bounding-
box.

―――
Credits: Dive Into Deep Learning, 2020. 31 / 51
Selective search (Uijlings et al, 2013) looks at the image through windows of
different sizes, and for each size tries to group together adjacent pixels that are
similar by texture, color or intensity.

32 / 51
Fast R-CNN
The main performance bottleneck of an R-
CNN model is the need to independently
extract features for each proposed region.
Fast R-CNN uses the entire image as input
to the CNN for feature extraction, rather
than each proposed region.
Fast R-CNN introduces RoI pooling for
producing feature vectors of xed size
from region proposals of different sizes.

―――
Credits: Dive Into Deep Learning, 2020. 33 / 51
Faster R-CNN

The performance of both R-CNN and Fast R-CNN is tied to the quality of the
region proposals from selective search.
Faster R-CNN replaces selective search with a region proposal network.
This network reduces the number of proposed regions generated, while
ensuring precise object detection.
―――
Credits: Dive Into Deep Learning, 2020. 34 / 51
YoloV2, Yolo 9000, SSD Mobilenet, Faster RCNN NasNet compar…
compar…

YOLO (v2) vs YOLO 9000 vs SSD vs Faster RCNN

35 / 51
Take-home messages
One-stage detectors (YOLO, SSD, RetinaNet, etc) are fast for inference but are
usually not the most accurate object detectors.
Two-stage detectors (Fast R-CNN, Faster R-CNN, R-FCN, Light head R-CNN,
etc) are usually slower but are often more accurate.
All networks depend on lots of engineering decisions.

36 / 51
Segmentation

37 / 51
Semantic segmentation is the task of partitioning an image into regions of different
semantic categories.

These semantic regions label and predict objects at the pixel level.

―――
Credits: Dive Into Deep Learning, 2020. 38 / 51
Fully convolutional networks
The historical approach to image segmentation was to de ne a measure of
similarity between pixels, and to cluster groups of similar pixels. Such approaches
account poorly for semantic content.

The deep-learning approach re-casts semantic segmentation as pixel classi cation,

and re-uses networks trained for image classi cation by making them fully
convolutional (FCNs).

―――
Credits: Francois Fleuret, EE559 Deep Learning, EPFL. 39 / 51
―――
Credits: CS231n, Lecture 11, 2018. 40 / 51
―――
Credits: CS231n, Lecture 11, 2018. 41 / 51
Transposed convolution
The convolution and pooling layers introduced so far often reduce the input width
and height, or keep them unchanged.

Semantic segmentation requires to predict values for each pixel, and therefore
needs to increase input width and height.
Fully connected layers could be used for that purpose but would face the same
limitations as before (spatial specialization, too many parameters).
Ideally, we would like layers that implement the inverse of convolutional and
pooling layers.

42 / 51
Transposed convolution
A transposed convolution is a convolution where the implementation of the
forward and backward passes are swapped.

Given a convolutional kernel u,

the forward pass is implemented as v(h) = UT v(x) with appropriate

reshaping, thereby effectively up-sampling an input v(x) into a larger one;

the backward pass is computed by multiplying the loss by U instead of UT .

Transposed convolutions are also referred to as deconvolutions (but this is

misleading...).

T
U

x flatten matmul reshape

43 / 51
Transposed convolution
UT v(x) = v(h)
⎡1 0 0 0⎤ ⎡2⎤
⎢4 1 0 0⎥ ⎢9⎥
⎢1 0⎥ ⎢6⎥
⎢ 4 0 ⎥ ⎢ ⎥
⎢0 ⎥ ⎢1⎥
⎢ 1 0 0⎥ ⎢ ⎥
⎢1 ⎥ ⎢6⎥
⎢ 0 1 0⎥ ⎢ ⎥
⎢4 ⎥ ⎢29⎥
⎢ 1 4 1⎥ ⎢ ⎥
⎢3 4⎥ ⎢ ⎥
⎢ 4 1 ⎥ ⎡ ⎤ ⎢30⎥
2
⎢0 1⎥ ⎢ ⎥
⎢ 3 0 ⎥ ⎢1 ⎥ = ⎢ 7 ⎥
⎢3 0⎥ ⎢ ⎥ ⎢10⎥
⎢ 0 1 ⎥ 4 ⎢ ⎥
⎢3 1⎥ ⎣4⎦ ⎢29⎥
⎢ 3 4 ⎥ ⎢ ⎥
⎢1 4⎥ ⎢33⎥
⎢ 3 3 ⎥ ⎢ ⎥
⎢0 3⎥ ⎢13⎥
⎢ 1 0 ⎥ ⎢ ⎥
⎢0 0⎥ ⎢12⎥
⎢ 0 3 ⎥ ⎢ ⎥
⎢0 ⎥ ⎢24⎥
⎢ 0 3 3⎥ ⎢ ⎥
⎢0 0 1 3 ⎥ ⎢16⎥
⎣0 0 0 1⎦ ⎣4⎦
44 / 51
FCNs for segmentation
The simplest design of a fully convolutional network for
semantic segmentation consists in:

using a (pre-trained) convolutional network for

downsampling and extracting image features;
replacing the dense layers with a 1 × 1 convolution
layer to transform the number of channels into the
number of categories;
upsampling the feature map to the size of the input
image by using one (or several) transposed convolution
layer(s).

Contrary to fully connected networks, the dimensions of the

output of a fully convolutional network is not xed. It
directly depends on the dimensions of the input, which can
be images of arbitrary sizes.

45 / 51
―――
Credits: Noh et al, 2015. 46 / 51
UNet
The UNet architecture builds upon the previous FCN architecture.

It consists in symmetric contraction and expansion paths, along with a

concatenation of high resolution features from the contracting path to the
unsampled features from the expanding path. These connections allow for
localization.

―――
Credits: Ronneberger et al, 2015. 47 / 51
Mask R-CNN

Mask R-CNN extends the Faster R-CNN model for semantic segmentation.

The RoI pooling layer is replaced with an RoI alignment layer.

It branches off to an FCN for predicting a segmentation mask.
Object detection combined with mask prediction enables instance
segmentation.

48 / 51
―――
Credits: He et al, 2017. 49 / 51
Mask RCNN - COCO - instance segmentation

50 / 51
Some nal comments
For detection and semantic segmentation, there is a heavy use of transfer-
learning and ne tuning: re-use of large networks trained on classi cation
problems

Tons of engineering, many crucial details

Take-home message

The models themselves, as much as the source code of the algorithm that
produced them, or the training data, are generic and re-usable assets

Transfer-learning is crucial, but somewhat under-studied

There is no such successful transfer learning outside of deep learning

51 / 51
Thank you !

51 / 51

CXL 1.1
100% (1)
CXL 1.1
250 pages
Dms 10 Freatures
No ratings yet
Dms 10 Freatures
326 pages
Mastering All YOLO Models From YOLOv1 To YOLO
100% (1)
Mastering All YOLO Models From YOLOv1 To YOLO
58 pages
FTNT Icon Library External July 2021
No ratings yet
FTNT Icon Library External July 2021
46 pages
Rethinking Classical Concurrency Patterns
No ratings yet
Rethinking Classical Concurrency Patterns
121 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
Incremental Training For Image Classification of Unseen Objects
No ratings yet
Incremental Training For Image Classification of Unseen Objects
19 pages
M10 - Introduction To TensorFlow, Deep Learning and Application
No ratings yet
M10 - Introduction To TensorFlow, Deep Learning and Application
25 pages
CV Project
No ratings yet
CV Project
7 pages
IJRPR7632
No ratings yet
IJRPR7632
8 pages
Efficient Detection of Small and Complex Objects for Autonomous Driving Using Deep Learning
No ratings yet
Efficient Detection of Small and Complex Objects for Autonomous Driving Using Deep Learning
5 pages
Unified Real-Time Object Detection
No ratings yet
Unified Real-Time Object Detection
36 pages
MJEER-Volume 30-Issue 1 - Page 52-57
No ratings yet
MJEER-Volume 30-Issue 1 - Page 52-57
6 pages
Yolo Algorithm
No ratings yet
Yolo Algorithm
37 pages
CV Lab 9
No ratings yet
CV Lab 9
4 pages
The Applications of Machine Learning and Computer Vision Algorithms To Aid People With Vision Impairment.
100% (1)
The Applications of Machine Learning and Computer Vision Algorithms To Aid People With Vision Impairment.
16 pages
Lecun 20181015 Ihes Gomax PDF
No ratings yet
Lecun 20181015 Ihes Gomax PDF
109 pages
Yolo: You Only Look Once: Unified Real-Time Object Detection
No ratings yet
Yolo: You Only Look Once: Unified Real-Time Object Detection
60 pages
Week 05
No ratings yet
Week 05
38 pages
DL Tutorial NIPS2015 PDF
No ratings yet
DL Tutorial NIPS2015 PDF
133 pages
Object Detection Using TensorFlow
No ratings yet
Object Detection Using TensorFlow
21 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Deep Learning YOLOv2
No ratings yet
Deep Learning YOLOv2
3 pages
19bce0014 VL2021220702099 Pe003
No ratings yet
19bce0014 VL2021220702099 Pe003
17 pages
IP Report Final
No ratings yet
IP Report Final
20 pages
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
Signature Object Detection Based On YOLOv3
No ratings yet
Signature Object Detection Based On YOLOv3
4 pages
Object Detection Withtensorflow: D. Hari Vamshi V. Raju U. Laxman
No ratings yet
Object Detection Withtensorflow: D. Hari Vamshi V. Raju U. Laxman
25 pages
Constructon
No ratings yet
Constructon
10 pages
10 - CPU Based YOLO A Real Time Object Detection Algorithm
No ratings yet
10 - CPU Based YOLO A Real Time Object Detection Algorithm
4 pages
ImageNet Classification With Deep
No ratings yet
ImageNet Classification With Deep
7 pages
Ex No 06
No ratings yet
Ex No 06
4 pages
Red Mon 2016
No ratings yet
Red Mon 2016
10 pages
Active Learning for Deep Object Detection 2
No ratings yet
Active Learning for Deep Object Detection 2
10 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
Yolo
No ratings yet
Yolo
24 pages
PyTorch Neural Network Classifcation
No ratings yet
PyTorch Neural Network Classifcation
1 page
Yolo Paper
No ratings yet
Yolo Paper
10 pages
IRJET-V10I1067 (2) (6)
No ratings yet
IRJET-V10I1067 (2) (6)
5 pages
You Only Look Once - Unified, Real-Time Object Detection
No ratings yet
You Only Look Once - Unified, Real-Time Object Detection
10 pages
Yolo Vs RCNN
No ratings yet
Yolo Vs RCNN
5 pages
Part 2
No ratings yet
Part 2
225 pages
Towards Open Set Deep Networks: Abhijit Bendale, Terrance E. Boult University of Colorado at Colorado Springs
No ratings yet
Towards Open Set Deep Networks: Abhijit Bendale, Terrance E. Boult University of Colorado at Colorado Springs
14 pages
yolopdf
No ratings yet
yolopdf
10 pages
REPORT Python
No ratings yet
REPORT Python
40 pages
Semantic Image Segmentation Via Deep Parsing Network
No ratings yet
Semantic Image Segmentation Via Deep Parsing Network
11 pages
1 s2.0 S0925231221009486 Main
No ratings yet
1 s2.0 S0925231221009486 Main
7 pages
Redes Neuronales Convoluciones Gan
No ratings yet
Redes Neuronales Convoluciones Gan
29 pages
Deep Learning Based Automated Billing Cart
No ratings yet
Deep Learning Based Automated Billing Cart
4 pages
Building Domain Enriched Deep Learning Algorithms
No ratings yet
Building Domain Enriched Deep Learning Algorithms
35 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
155 pages
NN 09
No ratings yet
NN 09
34 pages
Object-Detection-with-YOLO
No ratings yet
Object-Detection-with-YOLO
18 pages
"I C U N N ": Mage Lassification Sing Eural Etworks
No ratings yet
"I C U N N ": Mage Lassification Sing Eural Etworks
15 pages
Research and Prospect of Image Recognition Based o
No ratings yet
Research and Prospect of Image Recognition Based o
7 pages
Detection and Content Retrieval of Object in An Image Using YOLO
No ratings yet
Detection and Content Retrieval of Object in An Image Using YOLO
8 pages
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
No ratings yet
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
8 pages
Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators
No ratings yet
Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators
22 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
RAMI 4.0: An Architectural Model For Industrie 4.0
No ratings yet
RAMI 4.0: An Architectural Model For Industrie 4.0
31 pages
Arctic GPRS Gateway Users Manual 1.5 FINAL
No ratings yet
Arctic GPRS Gateway Users Manual 1.5 FINAL
34 pages
Authorisation BOM
No ratings yet
Authorisation BOM
4 pages
Brajesh Resume
No ratings yet
Brajesh Resume
2 pages
Configuration Steps in IDOC - SAP Blogs
No ratings yet
Configuration Steps in IDOC - SAP Blogs
19 pages
Arbor Cloud SLA and Supp Terms
No ratings yet
Arbor Cloud SLA and Supp Terms
8 pages
LabVIEW Robotics Programming Guide For
No ratings yet
LabVIEW Robotics Programming Guide For
65 pages
Swathi DevOps
No ratings yet
Swathi DevOps
3 pages
Clean Room Display User Manual mMPom601
No ratings yet
Clean Room Display User Manual mMPom601
32 pages
Sanjay Bagaria Resume
No ratings yet
Sanjay Bagaria Resume
2 pages
Curriculum Btech Computer Science Engineering 2019
No ratings yet
Curriculum Btech Computer Science Engineering 2019
15 pages
Gamestop Security
No ratings yet
Gamestop Security
38 pages
Organizing E-Learning Standards and Specifications
No ratings yet
Organizing E-Learning Standards and Specifications
7 pages
Lesson 23
No ratings yet
Lesson 23
4 pages
Vision Tool Technical Guide: C (CON)
No ratings yet
Vision Tool Technical Guide: C (CON)
36 pages
MyPhp Materials
No ratings yet
MyPhp Materials
233 pages
Workbook 4 Cnd
No ratings yet
Workbook 4 Cnd
4 pages
Pre-Appointment Bep Template
No ratings yet
Pre-Appointment Bep Template
7 pages
Module - 1 DBMS Notes
No ratings yet
Module - 1 DBMS Notes
34 pages
log
No ratings yet
log
2 pages
AC15 User Guide
No ratings yet
AC15 User Guide
136 pages
Accounting Information Systems: Dr. Hisham Madi
No ratings yet
Accounting Information Systems: Dr. Hisham Madi
24 pages
CARLSON RT4 The Premier Ruggedized Tablet
No ratings yet
CARLSON RT4 The Premier Ruggedized Tablet
2 pages
Laporan PKL SMK Teknik Listrik: Achmad Faizal Full Description
No ratings yet
Laporan PKL SMK Teknik Listrik: Achmad Faizal Full Description
9 pages
Operating Instructions Software UCOM
No ratings yet
Operating Instructions Software UCOM
12 pages
ICT Technician - Job Advert
No ratings yet
ICT Technician - Job Advert
3 pages

Lecture06 - Copie

Uploaded by

Lecture06 - Copie

Uploaded by

Deep Learning

Lecture 6: Computer vision

Prof. Stéphane Gaïffas

Convolutional neural networks combine convolution, pooling and fully

For classi cation

the activation of the output layer is a softmax activation producing a vector p

the loss function is the cross-entropy loss

Image augmentation is a form of data augmentation for images

The mean over the classes is the mean average precision.

This approach is usually very computationally expensive as performance directly

it is a disjoint system (2 disjoint heads with their respective losses, ad-hoc

Unlike sliding window techniques, YOLO is therefore capable of reasoning

pre-train the 20 rst convolutional layers on ImageNet classi cation;

use Leaky ReLUs for all layers;

The main family of architectures following this principle are region-based

(Slow) R-CNN (Girshick et al, 2014)

1. Selective search is performed on the input image to select multiple high-quality

YOLO (v2) vs YOLO 9000 vs SSD vs Faster RCNN

The deep-learning approach re-casts semantic segmentation as pixel classi cation,

Given a convolutional kernel u,

the forward pass is implemented as v(h) = UT v(x) with appropriate

the backward pass is computed by multiplying the loss by U instead of UT .

Transposed convolutions are also referred to as deconvolutions (but this is

x flatten matmul reshape

using a (pre-trained) convolutional network for

Contrary to fully connected networks, the dimensions of the

It consists in symmetric contraction and expansion paths, along with a

The RoI pooling layer is replaced with an RoI alignment layer.

Tons of engineering, many crucial details

Transfer-learning is crucial, but somewhat under-studied

There is no such successful transfer learning outside of deep learning

You might also like