Dlai DL CNN
Dlai DL CNN
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
Computer vision
deeplearning.ai
Computer Vision Problems
Image Classification Neural Style Transfer
Cat? (0/1)
64x64
Object detection
Andrew Ng
Deep Learning on large images
Cat? (0/1)
64x64
!"
!#
⋮ ⋮ ⋮ %&
!$
Andrew Ng
Convolutional
Neural Networks
Edge detection
deeplearning.ai
example
Computer Vision Problem
vertical edges
Andrew Ng
Vertical edge detection
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
∗ 1 0 -1 =
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
∗
Andrew Ng
Convolutional
Neural Networks
More edge
deeplearning.ai
detection
Vertical edge detection examples
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
∗ 1 0 -1 =
0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
0 0 0 10 10 10
0 0 0 10 10 10 0 -30 -30 0
1 0 -1
0 0 0 10 10 10 0 -30 -30 0
0 0 0 10 10 10
∗ 1 0 -1 =
0 -30 -30 0
1 0 -1
0 0 0 10 10 10 0 -30 -30 0
0 0 0 10 10 10
Andrew Ng
Vertical and Horizontal Edge Detection
1 0 -1 1 1 1
1 0 -1 0 0 0
1 0 -1 -1 -1 -1
Vertical Horizontal
10 10 10 0 0 0
10 10 10 0 0 0 0 0 0 0
1 1 1
10 10 10 0 0 0 30 10 -10 -30
∗ 0 0 0 =
0 0 0 10 10 10 30 10 -10 -30
-1 -1 -1
0 0 0 10 10 10 0 0 0 0
0 0 0 10 10 10
Andrew Ng
Learning to detect edges
1 0 -1
1 0 -1
1 0 -1
3 0 1 2 7 4
1 5 8 9 3 1
#$ #% #&
2 7 2 5 1 3
#' #( #)
0 1 3 1 7 8
#* #+ #,
4 2 1 6 2 8
2 4 5 2 3 9
Andrew Ng
Convolutional
Neural Networks
Padding
deeplearning.ai
Padding
∗ =
Andrew Ng
Valid and Same convolutions
“Valid”:
Andrew Ng
Convolutional
Neural Networks
Strided
deeplearning.ai
convolutions
Strided convolution
2 3 3 4 7 43 4 4 6 34 2 4 9 4
6 1 6 0 9 21 8 0 7 12 4 0 3 2
3 -13 4 40 8 -143 3 40 8 -134 9 40 7 43 3 4 4
7 1 8 0 3 21 6 0 6 12 3 0 4 2 ∗ 1 0 2 =
4 -13 2 04 1 -134 8 40 3 -134 4 40 6 43 -1 0 3
3 1 2 0 4 12 1 0 9 12 8 0 3 2
0 -1 1 0 3 -13 9 0 2 -13 1 0 4 3
Andrew Ng
Summary of convolutions
padding p stride s
'()* +, '()* +,
+1 × +1
- -
Andrew Ng
Technical note on cross-correlation vs.
convolution
Convolution in math textbook:
2 3 7 4 6 2
6 6 9 8 7 4
3 4 5
3 4 8 3 8 9
∗ 1 0 2
7 8 3 6 6 3
-1 9 7
4 2 1 8 3 4
3 2 4 1 9 8
Andrew Ng
Convolutional
Neural Networks
Convolutions over
deeplearning.ai
volumes
Convolutions on RGB images
Andrew Ng
Convolutions on RGB image
∗ =
4x4
Andrew Ng
Multiple filters
∗ =
3x3x3 4x4
6x6x3 ∗ =
3x3x3
4x4
Andrew Ng
Convolutional
Neural Networks
One layer of a
deeplearning.ai
convolutional
network
Example of a layer
∗
3x3x3
6x6x3
∗
3x3x3
Andrew Ng
Number of parameters in one layer
Andrew Ng
Summary of notation
If layer l is a convolution layer:
#
" = filter size Input:
$ # = padding Output:
#
% = stride
#
&' = number of filters
Each filter is:
Activations:
Weights:
bias:
Andrew Ng
Convolutional
Neural Networks
A simple convolution
deeplearning.ai
network example
Example ConvNet
Andrew Ng
Types of layer in a convolutional network:
- Convolution
- Pooling
- Fully connected
Andrew Ng
Convolutional
Neural Networks
Pooling layers
deeplearning.ai
Pooling layer: Max pooling
1 3 2 1
2 9 1 1
1 3 2 3
5 6 1 2
Andrew Ng
Pooling layer: Max pooling
1 3 2 1 3
2 9 1 1 5
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9
Andrew Ng
Pooling layer: Average pooling
1 3 2 1
2 9 1 1
1 4 2 3
5 6 1 2
Andrew Ng
Summary of pooling
Hyperparameters:
f : filter size
s : stride
Max or average pooling
Andrew Ng
Convolutional
Neural Networks
Convolutional neural
deeplearning.ai
network example
Neural network example
Andrew Ng
608
3216
48120
10164
850
Convolutional
Neural Networks
Why convolutions?
deeplearning.ai
Why convolutions
Andrew Ng
Why convolutions
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
∗ 1 0 -1 =
0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
(1
+
&
Cost , = + - ℒ((1 . , ( . )
./&
Andrew Ng
Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
Why look at
deeplearning.ai
case studies?
Outline
Classic networks:
• LeNet-5
• AlexNet
• VGG
ResNet
Inception
Andrew Ng
Case Studies
Classic networks
deeplearning.ai
LeNet - 5
avg pool avg pool
⋮
"#
5×5 f=2 5×5 f=2 ⋮
s=1 s=2 s=1 s=2
MAX-POOL
= ⋮ ⋮ ⋮
3×3 3×3 3×3 3×3
s=2
Softmax
same
1000
13×13 ×384 13×13 ×384 13×13 ×256 6×6 ×256 9216 4096 4096
[Krizhevsky et al., 2012. ImageNet classification with deep convolutional neural networks] Andrew Ng
VGG - 16
CONV = 3×3 filter, s = 1, same MAX-POOL = 2×2 , s = 2
224×224 ×3
[Simonyan & Zisserman 2015. Very deep convolutional networks for large-scale image recognition] Andrew Ng
Case Studies
Residual Networks
deeplearning.ai
(ResNets)
Residual block
![#%(]
![#] ![#%&]
' [#%(] = * [#%(] ![#] + , [#%(] ![#%(] = -(' [#%(] ) ' [#%&] = * [#%&] ![#%(] + , [#%&] ![#%&] = -(' [#%&] )
[He et al., 2015. Deep residual networks for image recognition] Andrew Ng
Residual Network
x ![#]
Plain ResNet
training error
training error
# layers # layers
[He et al., 2015. Deep residual networks for image recognition] Andrew Ng
Case Studies
Why ResNets
deeplearning.ai
work
Why do residual networks work?
Andrew Ng
ResNet
Plain
ResNet
[He et al., 2015. Deep residual networks for image recognition] Andrew Ng
Case Studies
Network in Network
deeplearning.ai
and 1×1 convolutions
Why does a 1 × 1 convolution do?
1 2 3 6 5 8
3 5 5 1 3 4
2 1 3 4 9 3
4 7 8 5 7 9
∗ 2 =
1 5 3 7 4 8
5 4 9 8 3 5
6×6
∗ =
6 × 6 × 32 1 × 1 × 32 6 × 6 × # filters
[Lin et al., 2013. Network in network] Andrew Ng
Using 1×1 convolutions
ReLU
CONV 1 × 1
32
28 × 28 × 32
28 × 28 × 192
Inception network
deeplearning.ai
motivation
Motivation for inception network
1×1
3×3
64
128
5×5 28
32
32
28
28 × 28 × 192 MAX-POOL
CONV
5 × 5,
same,
32 28 × 28 × 32
28 × 28 × 192
Andrew Ng
Using 1×1 convolution
CONV CONV
1 × 1, 5 × 5,
16, 32,
1 × 1 × 192 28 × 28 × 16 5 × 5 × 16 28 × 28 × 32
28 × 28 × 192
Andrew Ng
Case Studies
Inception network
deeplearning.ai
Inception module
1×1
CONV
1×1 3×3
CONV CONV
Previous Channel
Activation Concat
1×1 5×5
CONV CONV
MAXPOOL
3 × 3,s = 1
1×1
same CONV
Andrew Ng
Inception network
MobileNet
Motivation for MobileNets
• Low computational cost at deployment
• Useful for mobile and embedded vision
applications
• Key idea: Normal vs. depthwise-
separable convolutions
[Howard et al. 2017, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications] Andrew Ng
Normal Convolution
* =
3x3x3
4 4x x4 4x 5
6x6x3
Andrew Ng
Depthwise Separable Convolution
Normal Convolution
* =
* * =
Depthwise Pointwise
Andrew Ng
Depthwise Convolution
* =
3x3 4x4x3
6x6x3
Andrew Ng
Depthwise Separable Convolution
Depthwise Convolution
* =
Pointwise Convolution
* =
Andrew Ng
Pointwise Convolution
* =
1x1x3
4x4x3 4 x4 4x x4 5
Andrew Ng
Depthwise Separable Convolution
Normal Convolution
* =
* * =
Depthwise Pointwise
Andrew Ng
Cost Summary
Cost of normal convolution
[Howard et al. 2017, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications] Andrew Ng
Depthwise Separable Convolution
Depthwise Convolution
* =
Pointwise Convolution
* =
Andrew Ng
Convolutional
Neural Networks
MobileNet
Architecture
MobileNet
MobileNet v1
MobileNet v2
Residual Connection
[Sandler et al. 2019, MobileNetV2: Inverted Residuals and Linear Bottlenecks] Andrew Ng
MobileNet v2 Bottleneck
Residual Connection
[Sandler et al. 2019, MobileNetV2: Inverted Residuals and Linear Bottlenecks] Andrew Ng
MobileNet
MobileNet v1
MobileNet v2
Residual Connection
[Sandler et al. 2019, MobileNetV2: Inverted Residuals and Linear Bottlenecks] Andrew Ng
MobileNet v2 Full Architecture
[Sandler et al. 2019, MobileNetV2: Inverted Residuals and Linear Bottlenecks] Andrew Ng
Convolutional
Neural Networks
EfficientNet
EfficientNet
Baseline
𝑦ො
Wider
Higher
Deeper Resolution
Compound scaling
𝑦ො 𝑦ො 𝑦ො 𝑦ො
[Tan and Le, 2019, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks] Andrew Ng
Practical advice for
using ConvNets
Transfer Learning
deeplearning.ai
Practical advice for
using ConvNets
Data augmentation
deeplearning.ai
Common augmentation method
Mirroring
+20,-20,+20
-20,+20,+20
+5,0,+50
Andrew Ng
Implementing distortions during training
Andrew Ng
Practical advice for
using ConvNets
The state of
deeplearning.ai
computer vision
Data vs. hand-engineering
Andrew Ng
Use open source code
Andrew Ng
Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
Object
deeplearning.ai
localization
What are localization and detection?
Image classification Classification with Detection
localization
Andrew Ng
Classification with localization
⋯ ⋮
1- pedestrian
2- car
3- motorcycle
4- background
Andrew Ng
Defining the target label y
1- pedestrian Need to output #$ , #& , #' , #( , class label (1-4)
2- car
3- motorcycle
4- background
Andrew Ng
Object Detection
Landmark
deeplearning.ai
detection
Landmark detection ConvNet
!" , !$ , !% , !&
Andrew Ng
Object Detection
Object
deeplearning.ai
detection
Car detection example
Training set:
x y
1
Andrew Ng
Sliding windows detection
Andrew Ng
Object Detection
Convolutional
deeplearning.ai implementation of
sliding windows
Turning FC layer into convolutional layers
MAX POOL FC FC
5×5 2×2 ⋮ ⋮
y
14 × 14 × 3 10 × 10 × 16 5 × 5 × 16 400 400 softmax (4)
MAX POOL FC FC
Andrew Ng
Convolution implementation of sliding windows
MAX POOL FC FC FC
MAX POOL FC FC FC
MAX POOL
MAX POOL
Andrew Ng
Object Detection
Bounding box
deeplearning.ai
predictions
Output accurate bounding boxes
Andrew Ng
YOLO algorithm
Labels for training
For each grid cell:
100
100
[Redmon et al., 2015, You Only Look Once: Unified real-time object detection] Andrew Ng
Specify the bounding boxes
100
100
[Redmon et al., 2015, You Only Look Once: Unified real-time object detection] Andrew Ng
Object Detection
Intersection
deeplearning.ai
over union
Evaluating object localization
More generally, IoU is a measure of the overlap between two bounding boxes.
Andrew Ng
Object Detection
Non-max
deeplearning.ai
suppression
Non-max suppression example
Andrew Ng
Non-max suppression example
0.6
0.8
0.9
0.3
0.5
Andrew Ng
Non-max suppression example
0.6
0.8
0.9
0.7
0.7
Andrew Ng
Non-max suppression algorithm
$%
&'
Each output prediction is: &(
&)
&*
Discard all boxes with $% ≤ 0.6
While there are any remaining boxes:
• Pick the box with the largest $%
Output that as a prediction.
19×19
• Discard any remaining box with
IoU ≥ 0.5 with the box output
in the previous step Andrew Ng
Object Detection
Anchor boxes
deeplearning.ai
Overlapping objects:
Anchor box 1: Anchor box 2:
!"
#$
#%
#&
y = #'
()
(*
(+
[Redmon et al., 2015, You Only Look Once: Unified real-time object detection] Andrew Ng
Anchor box algorithm
Previously: With two anchor boxes:
Each object in training Each object in training
image is assigned to grid image is assigned to grid
cell that contains that cell that contains object’s
object’s midpoint. midpoint and anchor box
for the grid cell with
highest IoU.
Andrew Ng
Anchor box example !"
#$
#%
#&
#'
()
(*
(+
y = !"
#$
#%
#&
#'
Anchor box 1: Anchor box 2:
()
(*
(+
Andrew Ng
Object Detection
Putting it together:
deeplearning.ai
YOLO algorithm
Training 1 - pedestrian
'( 0 0
2 - car )* ? ?
)+ ? ?
3 - motorcycle
)- ? ?
). ? ?
/0 ? ?
/1 ? ?
/2 ? ?
y = '( 0 1
)* ? )*
)+ ? )+
)- ? )-
). ?
/0
).
? 0
/1 ?
/2 1
? 0
y is 3×3×2×8
[Redmon et al., 2015, You Only Look Once: Unified real-time object detection] Andrew Ng
Making predictions
'(
)*
)+
)-
).
/0
⋯ 4= /1
/2
'(
)*
3×3×2×8 )+
)-
).
/0
/1
/2
Andrew Ng
Outputting the non-max supressed outputs
Andrew Ng
Object Detection
Region proposals
deeplearning.ai
(Optional)
Region proposal: R-CNN
[Girshik et. al, 2013, Rich feature hierarchies for accurate object detection and semantic segmentation] Andrew Ng
Faster algorithms
[Girshik et. al, 2013. Rich feature hierarchies for accurate object detection and semantic segmentation]
[Girshik, 2015. Fast R-CNN]
[Ren et. al, 2016. Faster R-CNN: Towards real-time object detection with region proposal networks] Andrew Ng
Convolutional
Neural Networks
Semantic segmentation
with U-Net
Object Detection vs. Semantic Segmentation
Andrew Ng
Motivation for U-Net
[Novikov et al., 2017, Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs]
[Dong et al., 2017, Automatic Brain Tumor Detection and Segmentation Using U-Net Based Fully Convolutional Networks ] Andrew Ng
Per-pixel class labels
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
1. Car
000000011111100000000000 0. Not Car
001111111111111100000000
001111111111111111111110
001111111111111111111110
000011100000000000111000
000000000000000000000000
000000000000000000000000
000000000000000000000000
000000000000000000000000
Andrew Ng
Per-pixel class labels
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
222222222222222222222222 222222222222222222222222
2 2 2 2 2 2 21 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1. Car 2 2 2 2 2 2 21 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2. Building 2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3. Road 2 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
333311133333333333111 333 333311133333333333111 333
333333333333333333333333 333333333333333333333333
333333333333333333333333 333333333333333333333333
333333333333333333333333 333333333333333333333333
333333333333333333333333 333333333333333333333333
Segmentation Map
Andrew Ng
Deep Learning for Semantic Segmentation
𝑦ො
Andrew Ng
Transpose Convolution
Normal Convolution
* =
Transpose Convolution
* =
Andrew Ng
Transpose Convolution
231 231 231
1 2 1
231 231 231
2 0 1 0 24
+2 0 1
231 231 231 2 +0
0 2 1 410
+6 7
+3+2
+4 1 3
26 +2
2 1
0
0 37
+4 0
0 2
2
3 2 weight filter
6 33
+0 4 2
2x2
4x4
𝑦ො
Andrew Ng
U-Net
Conv, RELU
Max Pool
Trans Conv
Skip Connection
Conv (1x1)
[Ronneberger et al., 2015, U-Net: Convolutional Networks for Biomedical Image Segmentation] Andrew Ng
U-Net
hxwx3 h x w x # classes
Conv, RELU
Max Pool
Trans Conv
Skip Connection
Conv (1x1)
[Ronneberger et al., 2015, U-Net: Convolutional Networks for Biomedical Image Segmentation] Andrew Ng
Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
What is face
deeplearning.ai
recognition?
Face recognition
Recognition
• Has a database of K persons
• Get an input image
• Output ID if the image is any of the K persons (or
“not recognized”)
Andrew Ng
Face recognition
One-shot learning
deeplearning.ai
One-shot learning
Learning from one
example to recognize the
person again
Andrew Ng
Learning a “similarity” function
d(img1,img2) = degree of difference between images
If d(img1,img2) ≤ -
> -
Andrew Ng
Face recognition
Siamese network
deeplearning.ai
Siamese network
⋮ ⋮
" ($)
⋮ ⋮
" (&)
[Taigman et. al., 2014. DeepFace closing the gap to human level performance] Andrew Ng
Goal of learning
⋮ ⋮
f(" ($) )
Andrew Ng
Face recognition
Triplet loss
deeplearning.ai
Learning Objective
[Schroff et al.,2015, FaceNet: A unified embedding for face recognition and clustering] Andrew Ng
Loss function
[Schroff et al.,2015, FaceNet: A unified embedding for face recognition and clustering] Andrew Ng
Choosing the triplets A,P,N
[Schroff et al.,2015, FaceNet: A unified embedding for face recognition and clustering] Andrew Ng
Training set using triplet loss
Anchor Positive Negative
⋮ ⋮ ⋮
Andrew Ng
Face recognition
[Taigman et. al., 2014. DeepFace closing the gap to human level performance] Andrew Ng
Face verification supervised learning
$ (
[Taigman et. al., 2014. DeepFace closing the gap to human level performance] Andrew Ng
Neural Style
Transfer
⋮ ⋮ &'
26×26×256 13×13×256 13×13×384 13×13×384 6×6×256
55×55×96
FC FC
224×224×3 110×110×96 4096 4096
[Zeiler and Fergus., 2013, Visualizing and understanding convolutional networks] Andrew Ng
Visualizing deep layers
Andrew Ng
Visualizing deep layers: Layer 1
Andrew Ng
Visualizing deep layers: Layer 2
Andrew Ng
Visualizing deep layers: Layer 3
Andrew Ng
Visualizing deep layers: Layer 3
Andrew Ng
Visualizing deep layers: Layer 4
Andrew Ng
Visualizing deep layers: Layer 5
Andrew Ng
Neural Style
Transfer
Cost function
deeplearning.ai
Neural style transfer cost function
Content C Style S
Generated image G
[Gatys et al., 2015. A neural algorithm of artistic style. Images on slide generated by Justin Johnson] Andrew Ng
Find the generated image G
1. Initiate G randomly
G: 100×100×3
Content cost
deeplearning.ai
function
Content cost function
" # = % "'()*+)* ,, # + / "0*12+ (4, #)
• Say you use hidden layer ! to compute content cost.
• Use pre-trained ConvNet. (E.g., VGG network)
• Let 6[2](9) and 6[2](:) be the activation of layer !
on the images
• If 6[2](9) and 6[2](:) are similar, both images have
similar content
"#
34 83
123 94 44 187
2 30
34 44 187 192
34 44 187 92 124
34 76 232
34 76 232 34
67 232
346776 83 124
194 142
⋮
83 194 94
67 83 194 202
%' %'
%( %(
%& %&
1D and 3D
deeplearning.ai
generalizations of
models
Convolutions in 2D and 1D
∗
2D filter
5×5
2D input image
14×14
1 20 15 3 18 12 4 17 1 3 10 3 1
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D data
Andrew Ng
3D convolution
∗
3D filter
3D volume
Andrew Ng