CNN2
CNN2
Networks
Gaurav Mittal
2012CSB1013
IIT Ropar
Lenet-5 (Lecun-98), Convolutional Neural Network for digits recognition [email protected]
1
ANN Recap
gasturbinespower.asmedigitalcollection.asme.org 2
What are CNNs?
http://goodfeli.github.io/dlbook/contents/convnets.html 3
Motivation
4
Detection or Classification Tasks
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 5
What to do with this data?
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 6
Feature Representations
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 7
Feature Representations
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 8
How is computer perception done?
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 9
Feature Representations???
10
Computer Vision Features
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 11
Audio Features
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 12
NLP Features
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 13
Certainly, coming up with features is
difficult, time-consuming and requires
expert knowledge.
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 14
Feature Representations
www.cse.ust.hk/~leichen/courses/FYTG.../FYTGS5101-Guoyangxie.pdf 15
Feature Representations
www.cse.ust.hk/~leichen/courses/FYTG.../FYTGS5101-Guoyangxie.pdf 16
Learning non-linear functions
www.cse.ust.hk/~leichen/courses/FYTG.../FYTGS5101-Guoyangxie.pdf 17
Learning non-linear functions
Shallow
Deep
www.cse.ust.hk/~leichen/courses/FYTG.../FYTGS5101-Guoyangxie.pdf 18
Biologically Inspired!
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning 19
Features Learned by Deep Training
20
21
22
23
Distinguished Features
Shared Weights
http://neuralnetworksanddeeplearning.com/chap6.html 24
Typical CNN Layer
25
Typical CNN Layer
26
Convolution
The convolution of f and g, written as f∗g, is defined as the integral of
the product of the two functions after one is reversed and shifted:
Convolution is commutative.
Can be viewed as a weighted average operation at every moment (for
this w need to be a valid probability density function)
Discrete Convolution (one-axis):
https://www.wikipedia.org/ 27
Cross-Correlation
• For continuous functions f and g, the cross-correlation is defined as:
https://www.wikipedia.org/ 28
Convolution and Cross-Correlation
in Images
For a 2-D image H and a 2-D kernel F,
29
How do they differ?
Convolution is equivalent to flipping the
filter in both dimensions (bottom to top,
right to left) and applying cross-
correlation
30
2-D Convolution (without kernel
flipping)
http://goodfeli.github.io/dlbook/contents/convnets.html 31
2-D Convolution in Action!
http://i.stack.imgur.com/I7DBr.gif 32
Variants
Full • Add zero-padding to the image enough for every pixel to be visited
k times in each direction, with output size: (m + k - 1) x (m + k - 1)
Same • Add zero-padding to the image to have the output of the same size
as the image, i.e., m x m
Stride s
• Down-sampling the output of convolution by sampling only every s pixels in each direction.
m−k+s
• For instance, the output of 'valid' convolution with stride s results in an output of size x
s
m−k+s
s
http://goodfeli.github.io/dlbook/contents/convnets.html 33
Why Convolution?
34
Why Convolution?
35
Local Receptive Field/Sparse Connectivity
Convolution exploits the property of spatial local-
correlations in the image by enforcing local connectivity
pattern between neurons of adjacent layers
36
Indirect Global Connectivity
• Receptive fields of units in deeper layers larger
than shallow layers
37
Example
http://neuralnetworksanddeeplearning.com/chap6.html 38
Example
http://neuralnetworksanddeeplearning.com/chap6.html 39
Example
http://neuralnetworksanddeeplearning.com/chap6.html 40
Shared Weights and Bias
• All neuron in the hidden layer share the
same parameterization (weight vector
and bias) forming a 'Feature Map‘
41
Shared Weights and Bias
• Translation Equivariance
o Allows features to be detected regardless of their position in the visual field.
(Feature is a kind of input pattern that will cause a neuron to activate, for eg. an
edge)
o All neurons in the first hidden layer detect exactly the same feature, just at
different locations.
o CNNs are well adapted to translation invariance of images: move a picture of a cat,
and it's still an image of a cat!
42
Typical CNN Layer
43
Non-Linear Activation Function
• Sigmoid:
• Tanh:
Require a number of such feature maps at each layer to capture sufficient features in
the image
Let 𝑘 𝑡𝑡 feature map at a given layer be 𝑥 𝑘 , whose filters are determined by 𝑊𝑘 and bias
𝑏𝑘 , then 𝑥 𝑘 with sigmoid, 𝜎 function for non-linearity and filter of size m x m is
obtained as:
𝑚−1 𝑚−1
45
• Each hidden layer is compose of
multiple feature maps, 𝑥 𝑘 , 𝑘 = 0. . 𝐾
46
Typical CNN Layer
47
Pooling
Non-linear down-sampling to simplify the information in
output from convolutional layer.
Variants:
Max pooling (popular)
Weighted average based on distance
L2 norm of neighborhood
49
Normalization (Optional)
Locally the response is normalized using some distance based weighted
average function
50
Putting It All Together!
Lecun 1998 51
Backpropagation
• Loss function
o For Classification
• Softmax Function with negative log likelihood
o For Regression
• Mean squared error
• Weight Update
• Pooling Layer
o Do not actually learn themselves, just reduce the size of the problem by introducing sparseness.
o Reduces region of k x k size to a single value during forward propagation.
o Error propagated back to the place where it came from, thus errors are rather sparse. 53
http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/
Theano
54
What is Theano?
• Theano is a Python-based Math Expression Compiler whose syntax is
quite similar to NumPy.
http://deeplearning.net/ 55
Key Features
• Single implementation compatible
with both CPU and GPU.
http://deeplearning.net/ 57
Theano-based implementations for
Deep Learning
Caffe
Torch
Keras
Other Frameworks:
cuDNN
DIGITS
58
Caffe
59
Key Features
• Deep learning framework (essentially for training CNNs) developed by
Berkeley Vision and Learning Center (BVLC)
• Speed: Able to process over 60M images per day with a single Nvidia
K40 GPU, thus considered to be the fastest convnet implementation
available.
www.caffe.berkeleyvision.org 60
Sneak Peek into Caffe
Convolutional Layer Max Pooling Layer Solver
61
Age and Gender
Classification using
Convolutional Neural
Networks
Gil Levi and Tal Hassner
The Open University of Israel
IEEE Workshop on Analysis and Modeling of Faces and Gestures (AMFG), at the IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, June 2015
62
Overview
63
Dataset - The Adience Benchmark
• Consists of images automatically uploaded
to Flickr from smartphones.
64
Network Architecture
All 3 RGB 96 filters 256 filters 384 filters Both fully connected Output to
channels size 3x7x7 size 96x5x5 size 256x3x3 layers contain 512 class labels
First, resized neurons followed by (age /
to 256 x 256, Each convolutional layer is followed by rectified ReLU and dropout layer gender)
then cropped linear operator (ReLU), max pooling layer of 3x3
to 227 x 227 regions with 2-pixel strides and a local
normalization layer
65
Measures to reduce overfitting
• Lean network architecture using just 3 convolutional layers and 2 fully
connected layers considering the size of the dataset and labels involved (8 age
classes and 2 gender classes)
All these measures help in keeping the number of free parameters in the
network low reducing complexity and thus over-fitting
66
Experiments
5-fold cross validation based on pre-
specified subject exclusive folds
distribution
What I used
• Trained on Nvidia Quadro K2200 with
640 CUDA cores and 4 GB GDDR5 RAM
Solver
67
Results
Gender Classification
Accuracy
Method
Paper Reimplementation
Single-Crop 85.9 ±1.4 86.7 ± 1.5
Over-Sample 86.8 ± 1.4 87.4 ± 0.9
Age Estimation
Accuracy
Method Paper Reimplementation
Exact One-off Exact One-off
Single-Crop 49.5 ± 4.4 84.6 ± 1.7 49.5 ± 3.6 85.4 ± 1.8
Over-Sample 50.7 ± 5.1 84.7 ± 2.2 50.6 ± 5.0 85.8 ± 1.5
68
Results - Age Estimation Confusion
Matrix
Paper Reimplementation
0-2 0.699 0.147 0.028 0.006 0.005 0.008 0.007 0.009 0-2 0.741 0.139 0 0.028 0 0 0 0.093
4-6 0.256 0.573 0.166 0.023 0.010 0.011 0.010 0.005 4-6 0.057 0.654 0.135 0.135 0 0 0 0.019
Actual Labels
Actual Labels
8-13 0.027 0.223 0.552 0.150 0.091 0.068 0.055 0.061 8-13 0 0.114 0 0.828 0.057 0 0 0
15-20 0.003 0.019 0.081 0.239 0.106 0.055 0.049 0.028 15-20 0.018 0.119 0.065 0.653 0.106 0.015 0.010 0.010
25-32 0.006 0.029 0.138 0.510 0.613 0.461 0.260 0.108 25-32 0.009 0.094 0.009 0.471 0.292 0.037 0.037 0.047
38-43 0.004 0.007 0.023 0.058 0.149 0.293 0.339 0.268 38-43 0.02 0 0 0.22 0.56 0.14 0.06 0
48-53 0.002 0.001 0.004 0.007 0.017 0.055 0.146 0.165 48-53 0 0.1 0.033 0.067 0.133 0.267 0.4 0
60- 0.001 0.001 0.008 0.007 0.009 0.050 0.134 0.357 60- 0.238 0.012 0 0.008 0 0 0 0.740
69
References
• http://deeplearning.net/tutorial/
• http://goodfeli.github.io/dlbook/contents/convnets.html
• http://neuralnetworksanddeeplearning.com/chap6.html
• http://deeplearning.net/software/theano/tutorial/
• http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/
• Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning
• www.caffe.berkeleyvision.org
• http://www.openu.ac.il/home/hassner/Adience/index.html
• https://www.wikipedia.org/
• www.cse.ust.hk/~leichen/courses/FYTG.../FYTGS5101-Guoyangxie.pdf
• LeCun, Yann, et al. "Gradient-based learning applied to document recognition."Proceedings of the IEEE 86.11
(1998): 2278-2324.
• Bergstra, James, et al. "Theano: a CPU and GPU math expression compiler."Proceedings of the Python for scientific
computing conference (SciPy). Vol. 4. 2010.
• Gil Levi and Tal Hassner, Age and Gender Classification using Convolutional Neural Networks, IEEE Workshop on
Analysis and Modeling of Faces and Gestures (AMFG), at the IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), Boston, June 2015
70