MLT CNN Architectures
MLT CNN Architectures
CNN Architectures
Plan A
CONTENT
PROGRAM FLOW
10:00 - 10:30 10:30 - 11:10 11:10 - 12:00 12:00 - 12:45 12:45 - 13:05 13:05 - 13:50
Depthwise
3x3 vs 11x11 “1x1 conv” vs “FC” BREAK Channel shuffling SqueezeNet
(separable) conv
10min 15min 20min 25min 30min 20min 20min 20min 25min
13:50 - 14:20 14:20 - 14:30 14:30 - 15:00 15:00 - 15:30 15:30 - 15:45 16:00 -
CONTENT
PROGRAM FLOW
10:00 - 10:30 10:30 - 11:00 11:00 - 12:45 12:45 - 13:30 13:30 - 14:20 14:20 - 15:00
Part 1 : A Historical
Depthwise
Review on Deep 3x3 vs 11x11 “1x1 conv” vs “FC” BREAK SqueezeNet
(separable) conv
CNNs
15min 15min 10min 15min 20min 25min 30min 20min 20min 25min
64 x 64 x 3
3-Channel RGB
Goal :
Extracting meaningful features from an image
Tool:
Algorithms
- gray-scaling, thresholding,
- complex descriptors (HOG, SIFT, SURF etc.)
HOG,
SIFT,
SURF..
full-of hand-
engineering
..well, then
Blurring
*
Sharpening
*
Let’s try together
* Edge
Extracting Image Features via ConvOps
panda
Feature Learning
1 1 2 4
kernel size 3x3
5 6 7 8
stride 1 Max 2x2, 2
3 2 1 0
padding same
1 2 3 4
Convolution Visualizer
A brief history...
Convolution operations are first introduced into Machine Learning by Yann LeCun at AT&T Laboratories (Y. LeCun et. al.
1989, Fukushima 1980, A.Weibel 1987)
r1 3
e r y e r5 Input image
ye y
La La La
REFERENCES
[1] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Backpropagation Applied to Handwritten Zip Code
Recognition; AT&T Bell Laboratories
[2] LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-based learning applied to document recognition" (PDF).
Proceedings of the IEEE. 86 (11): 2278–2324. doi:10.1109/5.726791. Retrieved October 7, 2016.
[3] The History of Neural Networks
by Eugenio Culurciello
https://dataconomy.com/2017/04/history-neural-networks/
[4] Convolutions by AI Shack
Utkarsh Sinha
http://aishack.in/tutorials/image-convolution-examples/
[5] The History of Neural Networks
Andrew Fogg
https://www.import.io/post/history-of-deep-learning/
[6] Overview of Convolutional Neural Networks for Image Classification
Intel Academy https://software.intel.com/en-us/articles/hands-on-ai-part-15-overview-of-convolutional-neural-networks-for-image-
classification
[7] Convolution Arithmetic
https://github.com/vdumoulin/conv_arithmetic
Snippet Implementation
Part 2 : Convolutions in Deep Architectures
3x3 vs 11x11
B
G
R
AlexNet VGG-16
# of Convolution Layers 5 13
Convolution Parameters 3.8M 15M
# of FC Layers 3 3
FC Layer Parameters 59M 59M
Total Params 62M 138M
ImageNet Error 17% 7.3%
REFERENCES
❏ Decreases the computations (NxN conv → 1x1 conv) ❏ Decreases the parameters (FC → 1x1 conv)
❏ Get more valuable combination of filters: represent “M” features with “N” features
Inception Module
● CNN design has a lot of parameters;
● Do them all (at once)!!!
○ Conv:
● Let the network learn;
■ 3x3?
○ Whatever parameter,
■ 5x5?
○ Whatever the combination of these
■ 1x1?
○ Pooling: filter sizes it wants to learn
● Inception Layer
■ 3x3?
Inception → GoogLeNet
“Bottleneck layer”:
● 1x1 conv ⇒ 5x5 conv
○ (28x28) x (1x1x192)x(32) ≈ 2.4M
+
○ (28x28) x (5x5x16)x(32) ≈ 10M
○ ≈ 12.4M
● 10x less computation!
Inception → GoogLeNet
GoogLeNet (Inception v1)
Difference:
● So far;
○ 2D convolutions performed over all input channels
○ Lets us to mix channels
● Depthwise convolution;
○ Each channel kept separate
Approach:
● Split the input tensor into channels & split the kernel into
channels
● For each channels, convolve the input with the
corresponding filter → 2D tensor
● Stack the output (2D) tensors back together
Depthwise separable conv
● Depthwise convolution is commonly used in combination
with an additional step → depthwise separation convolution
○ 1. Filtering
○ 2. Combining
[2] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G.Howard, et. al. (Google Inc.)
https://arxiv.org/pdf/1704.04861.pdf
Grouped Convolutions:
● Proposed first in AlexNet
○ Memory constraints
● Decreases number of operations
○ 2 groups → 2x less operation
● (+) Learns better representations
○ Feature relationships are sparse
● (-) outputs from a certain channel are only derived from a small fraction of input channels
Channel shuffling
Grouped Convolutions:
● First proposed in AlexNet
○ Memory constraints
● Decreases number of operations
○ 2 groups → 2x less operation
● (+) Learns better representations
○ Feature relationships are sparse
● (-) outputs from a certain channel are only derived from a small fraction of input channels
ShuffleNet
Channel shuffling:
● Applies group convolutions on 1x1 layer
also
○ By grouping filters, computation
decreases significantly
● (!) remind the side effect of grouped
convolutions
○ Channel shuffling addresses this
issue
● (!) channel shuffling is also differentiable
[3] ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun
https://arxiv.org/abs/1707.01083
SqueezeNet
is;
● Smart and small architecture which proposes:
● → AlexNet level accuracy (on ImageNet) with
○ 50x fewer parameters
○ 500x fewer parameters after compression
● → 3 times faster
● → Fully Convolutional Network (FCN), i.e. no FC Layer
SqueezeNet
Fire module:
● Squeeze layer: only 1x1 filters (bottleneck)
● Expand layer: 1x1 and 3x3 filters
REFERENCES
[1] Notes on SqueezeNet
Hao Gao
https://medium.com/@smallfishbigsea/notes-of-squeezenet-4137d51feef4
[3] SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer
https://arxiv.org/abs/1602.07360
Interactive Implementation
Residuals in ConvNets
Power of going deeper = Richer contextual information
iteration
is there any limitation to having more depth?
epochs
Recap : Backpropagation
Recap : Backpropagation
strong gradients
The core idea of ResNet is introducing a “identity shortcut (residual) connection”
Standard connection Skips one or more layers Easy gradient flow via shortcuts
identity
Plain
Input
ResNet
Input
ResNeXT
Input
REFERENCES
[1] DenResNet: Ensembling Dense Networks and Residual Networks
Victor Cheung
http://cs231n.stanford.edu/reports/2017/pdfs/933.pdf
[5] Hand-Gesture Classification using Deep Convolution and Residual Neural Network
Sandipan Dey
https://sandipanweb.wordpress.com/2018/01/20/hand-gesture-classification-using-deep-convolution-and-residual-neural-network-
with-tensorflow-keras-in-python/
Extras:
Transition Layer
(1x1 conv, Pooling)
Standard Connectivity
Successive convolutions
Resnet Connectivity
Element-wise feature
summation
DenseNet Connectivity
Feature concatenation
Standard ResNet DenseNet
x y
● power of feature reuse
ResNet DenseNet
Supervision to gradients
● Less parameters , computationally efficient
★ Bottleneck Layer
● Error vs parameters & computation
REFERENCES
[1] DenResNet: Ensembling Dense Networks and Residual Networks
Victor Cheung
http://cs231n.stanford.edu/reports/2017/pdfs/933.pdf
[5] Hand-Gesture Classification using Deep Convolution and Residual Neural Network
Sandipan Dey
https://sandipanweb.wordpress.com/2018/01/20/hand-gesture-classification-using-deep-convolution-and-residual-neural-network-
with-tensorflow-keras-in-python/
Part 3 : Extras
State-of-the-art:
# of parameters # of flops
Source: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
Neural ODEs
Very similar to an
Euler-equation
Re-parameterizing continuous dynamics of
hidden states by an ODE
ResNet ODE-Net
Feature Pyramid Networks
Top-Down Pathway:
applies 1x1 convolutions and neares neighbour to
downscale and element-wise addition of feature
maps
Interactive Implementation
Appendix : State-of-the-art
Cheat Sheet
Convolution Operations Convolution in CNN Architectures 1x1 conv
>Convolutions are basically a filtering > feature pooling
operation used in CV world > convolution: filtering > decreases parameter
>Extracting useful information > stride: sliding step size > decreases computation
from images > padding: control output size > adds nonlinearity
> Sliding windows (kernels or > pooling: downsampling
filter are used to convolve an input 6x6x32 1x1x32x16 6x6x16
image
ResNeXt DenseNet
> Inception style in ResNet >Connecting all layers to the other layers
>Depth concatenation, same >Strong gradient flow THANK YOU FOR JOINING US TODAY!
convolution topology >More diversified features
>Having high cardinality helps in > Allowing feature re-use Machine Learning Tokyo
decreasing validation error >More memory hungry,
>New hyper-parameter : > computationally more efficient
cardinality → width size