05 CNN 2
05 CNN 2
Networks
Common Augmentation Method
Andrew Ng
Color Shifting
Andrew Ng
Image Classification
Segmentation
Label each pixel in the
image with a category
label Sky Sky
Don’t differentiate
Cat Cow
instances, only care about
pixels
Grass Grass
Cow
Grass
Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013
Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML
2014
Cow
Grass
Problem: Very inefficient! Not
reusing shared features between
Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013
overlapping patches Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML
2014
Input:
Scores: Predictions:
3xHxW
CxHxW HxW
Convolutions:
DxHxW
Input:
Scores: Predictions:
3xHxW
CxHxW HxW
Convolutions:
Problem: convolutions at
DxHxW
original image resolution will
be very expensive ...
May 10, 2018
Semantic Segmentation Idea:
Fully Convolutional
Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!
Med-res: Med-res:
D2 x H/4 x W/4 D2 x H/4 x W/4
Low-res:
D3 x H/4 x W/4
Input: High-res: High-res: Predictions:
3xHxW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW
Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR
2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015
Low-res:
D3 x H/4 x W/4
Input: High-res: High-res: Predictions:
3xHxW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW
Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR
2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015
1 2 1 1 2 2 1 2 0 0 0 0
3 4 3 3 4 4 3 4 3 0 4 0
3 3 4 4 0 0 0 0
…
3 5 2 1 5 6 1 2 0 1 0 0
1 2 2 1 7 8 3 4 0 0 0 0
Rest of the network
7 3 4 8 3 0 0 4
Corresponding pairs of
downsampling and
upsampling layers
• Synonyms:
often (incorrectly) called “deconvolution”
(mathematically, deconvolution is defined as the
inverse of convolution, which is different from
transposed convolutions)
• The term “unconv” is sometimes also used
• Fractionally strided convolution is another term
May 10, 2018
Transpose Convolution
Dot product
between filter
and input
Input: 4 x 4 Output: 4 x 4
Input: 2 x 2 Output: 4 x 4
Input gives
weight for
filter
Input: 2 x 2 Output: 4 x 4
Low-res:
D3 x H/4 x W/4
Input: High-res: High-res: Predictions:
3xHxW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW
Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR
2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015
Class Scores
Fully Cat: 0.9
Connected:
Dog: 0.05
4096 to 1000
Car: 0.01
...
Class Scores
Fully Cat: 0.9 Softmax
Connected: Dog: 0.05 Loss
4096 to 1000 Car: 0.01
...
Correct box:
(x’, y’, w’, h’)
+ Loss
…
Vector:
4096 Head top: (x, y)
… ...
+ Loss
Vector:
4096 Head top: (x, y) L2 loss
Correct head
Toshev and Szegedy, “DeepPose: Human Pose top: (x’, y’)
Estimation via Deep Neural Networks”, CVPR 2014
No objects, just pixels Single Object Multiple Object This image is CC0 public domain
DOG: (x, y, w, h)
DOG: (x, y, w, h)
CAT: (x, y, w, h)
DUCK: (x, y, w, h)
DUCK: (x, y, w, h)
….
DOG: (x, y, w, h)
DOG: (x, y, w, h) 16 numbers
CAT: (x, y, w, h)
DUCK: (x, y, w, h)
Many
DUCK: (x, y, w, h)
numbers!
…. May 10, 2017
Object Detection as Classification:
Sliding Window
Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background
Dog? NO
Cat? NO
Background? YES
Dog? YES
Cat? NO
Background? NO
Dog? NO
Cat? YES
Background? NO
Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR
2014. He et al, “Spatial pyramid pooling in deep convolutional networks for visual recognition”, ECCV 2014
Girshick, “Fast R-CNN”, ICCV 2015
Problem:
Runtime dominated
by region proposals!
Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR
2014. He et al, “Spatial pyramid pooling in deep convolutional networks for visual recognition”, ECCV 2014
Girshick, “Fast R-CNN”, ICCV 2015
Johnson, Karpathy, and Fei-Fei, “DenseCap: Fully Convolutional Localization Networks for Dense Captioning”, CVPR
2016 Figure copyright IEEE, 2016. Reproduced for educational purposes.
Johnson, Karpathy, and Fei-Fei, “DenseCap: Fully Convolutional Localization Networks for Dense Captioning”, CVPR
2016 Figure copyright IEEE, 2016. Reproduced for educational purposes.
No objects, just pixels Single Object Multiple Object This image is CC0 public domain
Classification Scores: C
Box coordinates (per class): 4 * C
https://cocodataset.org/#home
Detectron 2
https://github.com/facebookresearch/detectron2
• https://github.com/tensorflow/models/tree/master/research/o
bject_detection
https://arxiv.org/pdf/1412.6572.pdf
Intriguing properties of neural networks
https://arxiv.org/pdf/1312.6199.pdf
How to Hack Artificial Intelligence
https://medium.com/xix-ai/how-adversarial-attacks-work-87495b81da2d
http://databasecultures.irmielin.org/how-to-hack-artificial-intelligence/
Fooling Neural Networks in the Physical World with
3D Adversarial Objects
https://www.labsix.org/physical-objects-that-fool-neural-nets/
Countermeasures for adversarial examples
• Reactive Strategy: detect adversarial examples after deep
neural networks are built. Ex) MagNet: a Two-Pronged
Defense against Adversarial Examples
• https://arxiv.org/pdf/1705.09064.pdf
• https://www.youtube.com/watch?v=XL07WEc2TRI