convolutional_neural_networks
convolutional_neural_networks
Alasdair Newson
LTCI, Télécom Paris, IP Paris
[email protected]
A. Newson 1
Introduction
A. Newson 2
Introduction
256 1000
A. Newson 3
Introduction
∗
Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in
Position, Fukushima, K., Biological Cybernetics, 1980
†
Receptive fields and functional architecture of monkey striate cortex, Hubel, D. H. and Wiesel, T. N, 1968
‡
Backpropagation Applied to Handwritten Zip Code Recognition, LeCun, Y. et al., AT&T Bell Laboratories
A. Newson 5
Introduction - some history
GPUs turned out to be very efficient for training neural nets (lots of
parallel computations)
A. Newson 6
Introduction - some history
Image classification
Computer graphics
Image restoration
Medical imaging
Automatic speech recognition
A. Newson 7
Summary
1 Introduction, notation
2 Convolutional Layers
3 Down-sampling and the receptive field
4 CNN details and variants
5 CNNs in practice
6 Image datasets, well-known CNNs, and applications
Applications of CNNs
7 Interpreting CNNs
Visualising CNNs
Adversarial examples
A. Newson 8
Introduction - some notation
Notations
x ∈ Rn : input vector
y ∈ Rq : output vector
u` : feature vector at layer `
θ` : network parameters at layer `
A. Newson 9
Introduction
A. Newson 10
Summary
1 Introduction, notation
2 Convolutional Layers
3 Down-sampling and the receptive field
4 CNN details and variants
5 CNNs in practice
6 Image datasets, well-known CNNs, and applications
Applications of CNNs
7 Interpreting CNNs
Visualising CNNs
Adversarial examples
A. Newson 11
Convolutional Layers
Convolution operator
Let f and g be two integrable functions. The convolution operator ∗
takes as its input two such functions, and outputs another function
h = f ∗ g, which is defined at any point t ∈ R as :
Z +∞
h(t) = (f ∗ g)(t) = f (τ )g(t − τ )dτ.
−∞
A. Newson 12
Convolutional Layers
Properties of convolution
1 Associativity : (f ∗ g) ∗ h = f ∗ (g ∗ h)
2 Commutativity : f ∗ g = g ∗ f
A. Newson 14
Convolutional Layers
Associativity, commutativity
Associativity+commutativity implies that we can carry out convolution
in any order
There is no point in having two or more consecutive convolutions
This is true in fact for any linear map
Equivariance to translation
Equivariance implies that the convolution of any shifted input
(f + τ ) ∗ g contains the same information as f ∗ g †
This is useful, since we want to detect objects anywhere in the image
†
if we forget about border conditions for a moment
A. Newson 15
Convolutional Layers - 2D Convolution
2D convolution operator
+∞
X +∞
X
(f ∗ g)(s, t) = f (i, j)g(s − i, t − j)
i=−∞ j=−∞
A. Newson 16
Convolutional Layers : Visual Illustration
A. Newson 17
Convolutional Layers : Visual Illustration
A. Newson 18
Convolutional Layers : Visual Illustration
A. Newson 19
Convolutional Layers : Visual Illustration
A. Newson 20
Convolutional Layers : Visual Illustration
A. Newson 21
Convolutional Layers : Visual Illustration
A. Newson 22
Convolutional Layers : Visual Illustration
A. Newson 23
Convolutional Layers : Visual Illustration
A. Newson 24
Convolutional Layers : Visual Illustration
A. Newson 25
Convolutional Layers : Visual Illustration
A. Newson 26
Convolutional Layers : Visual Illustration
A. Newson 27
Convolutional Layers : Visual Illustration
A. Newson 28
Convolutional Layers : Visual Illustration
A. Newson 29
Convolutional Layers
A. Newson 30
Convolutional Layers
A. Newson 31
Convolutional Layers
A. Newson 32
Convolutional Layers
...
...
A. Newson 33
Convolutional Layers
∂L
We have the derivatives ∂yi available
We want to calculate the following quantities :
∂L
∂xk (for further back-propagation) and
∂L
∂wk
∂L
We shall use the abbreviation ∂yi =: dyi
A. Newson 34
Convolutional Layers
Before considering the general case, let’s take an example from the
illustration from above
...
...
∂L
Say we want to calculate dx1 := ∂x1
A. Newson 35
Convolutional Layers
X ∂L ∂yi
dx1 =
i
∂yi ∂x1
A. Newson 36
Convolutional Layers
...
...
dx1 =???
A. Newson 37
Convolutional Layers
...
...
∂L X ∂yi
= dyi multi-variate chain rule
∂xk i
∂xk
X ∂(x ∗ w)i
= dyi
i
∂xk
P
X ∂ j xj wi−j
= dyi
i
∂xk
X X
= dyi wi−k = dyi w−(k−i)
i i
dx = Aw | dy (1)
A. Newson 40
Convolutional Layers
∂L
Now for the second part : ∂wk
...
...
P ∂L ∂yi
Again, we use the chain rule. For example da = i ∂yi ∂a
A. Newson 41
Convolutional Layers
...
...
X
da = dyi xi−1
i
A. Newson 42
Convolutional Layers
∂L X ∂yi
= dyi multi-variate chain rule
∂wk i
∂wk
X ∂(x ∗ w)i
= dyi
i
∂wk
P
X ∂ j xj wi−j
= dyi
i
∂wk
X X
= dyi xi−k = dyi x−(k−i) k =i−j
i i
A. Newson 43
Convolutional Layers
A. Newson 44
Convolutional Layers - border conditions
A. Newson 45
Convolutional Layers - border conditions
A. Newson 46
2D+feature convolution
Several filters are used per layer, let us say K filters : {w1 , . . . , wK }
u`+1 = [u ∗ w1 , . . . , u ∗ wK ]
A. Newson 47
Convolutional layers
A. Newson 48
Convolutional layers - a note on Biases
A. Newson 49
Convolutional Layers
A. Newson 50
Summary
1 Introduction, notation
2 Convolutional Layers
3 Down-sampling and the receptive field
4 CNN details and variants
5 CNNs in practice
6 Image datasets, well-known CNNs, and applications
Applications of CNNs
7 Interpreting CNNs
Visualising CNNs
Adversarial examples
A. Newson 51
Down-sampling and the
receptive field
A. Newson 52
The Receptive Field
†
Receptive fields and functional architecture of monkey striate cortex, Hubel, D. H.; Wiesel, T. N, 1968 Illustration from :
http: // www. yorku. ca/ eye/ cortfld. htm
A. Newson 53
The Receptive Field
Convolution +
subsampling
Illustration from : Applied Deep Learning, Andrei Bursuc, https: // www. di. ens. fr/ ~lelarge/ dldiy/ slides/ lecture_ 7/
A. Newson 54
Strided convolution
A. Newson 55
Max pooling
(
max
(
A. Newson 56
Max pooling
Back propagation of max pooling only passes the gradient through the
maximum
10 80 0
80
15 30 0 0
A. Newson 57
Down-sampling
✓ ✓ ✓ ✓
A. Newson 58
Summary
1 Introduction, notation
2 Convolutional Layers
3 Down-sampling and the receptive field
4 CNN details and variants
5 CNNs in practice
6 Image datasets, well-known CNNs, and applications
Applications of CNNs
7 Interpreting CNNs
Visualising CNNs
Adversarial examples
A. Newson 59
Dilated Convolution
X
(u ∗ v)(y, x) = u(i, j, k)v(y − Di, x − Dj, k) (2)
i,j,k
∗
Multi-Scale Context Aggregation by Dilated Convolution, Yu, F, Kolten, V, ICLR 2016
A. Newson 60
Locally connected layers / unshared convolution
...
...
1 Introduction, notation
2 Convolutional Layers
3 Down-sampling and the receptive field
4 CNN details and variants
5 CNNs in practice
6 Image datasets, well-known CNNs, and applications
Applications of CNNs
7 Interpreting CNNs
Visualising CNNs
Adversarial examples
A. Newson 62
How to build your CNN ?
A. Newson 63
Architecture : vanilla CNN
A. Newson 64
Architecture
convolution,
subsampling etc.
A. Newson 65
CNN programming frameworks
Caffe
Open source, developed by University of California, Berkley
Network created in separate specific files
Somewhat laborious to use, less used than other frameworks
Theano
Open source, created by the Université de Montréal
Unfortunately, to be discontinued due to strong competition
Tensorflow
Open source, developed by Google
Implements a wide range of deep learning functionalities, widely used
Pytorch
Open source, developed by Facebook
Implements a wide range of deep learning functionalities, widely used
A. Newson 66
Summary
1 Introduction, notation
2 Convolutional Layers
3 Down-sampling and the receptive field
4 CNN details and variants
5 CNNs in practice
6 Image datasets, well-known CNNs, and applications
Applications of CNNs
7 Interpreting CNNs
Visualising CNNs
Adversarial examples
A. Newson 67
MNIST dataset
A. Newson 68
Caltech 101
A. Newson 69
ImageNet dataset
A. Newson 70
LeNet (1989/1998)
Illustration from : Gradient-based Learning Applied to Document Recognition, LeCun, Y. Bottou, L., Bengio, Y. and Haffner,
Proceedings of the IEEE, 1989
A. Newson 71
AlexNet (2012)
Illustration from : Imagenet classification with deep convolutional neural networks, Krizhevsky, A., Sutskever, I. and Hinton, G.
E, NIPS, 2012
A. Newson 72
GoogLeNet (2015)
A. Newson 73
GoogLeNet (2015)
A. Newson 74
VGG16 (2015)
Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan, K. and Zisserman, A., ICLR, 2015
Illustration from Mathieu Cord,
https: // blog. heuritech. com/ 2016/ 02/ 29/ a-brief-report-of-the-heuritech-deep-learning-meetup-5/
A. Newson 75
Summary of advances in CNNs
Network LeNet (1998) AlexNet GoogLeNet VGG16 (2015)
(2012) (2014)
20
Error on ILSVRC
15
10
0
2011 2012 2013 2015 2014 2015
SVM AlexNet AlexNet, bis VGG16 GoogleNetdeep ResNets
A. Newson 76
Image classification
A. Newson 77
Image classification
A. Newson 78
Image classification
A. Newson 79
Image classification
∗
Deep Residual Learning for Image Recognition, Kaiming, H. et al, CVPR, 2016
Illustration from https: // becominghuman. ai/ resnet-convolution-neural-network-e10921245d3d
A. Newson 80
Image classification
∗
Attention is all you need, Vaswani et al, NIPS, 2017
A. Newson 81
Image classification
∗
Attention is all you need, Vaswani et al, NIPS, 2017
A. Newson 82
Image classification
K
Q
?
K K
∗
Attention is all you need, Vaswani et al, NIPS, 2017
A. Newson 83
Image classification
∗
https://paperswithcode.com/sota/image-classification-on-imagenet
A. Newson 84
Image classification
∗ Rich feature hierarchies for accurate object detection and semantic segmentation, Girschik, R. et al. CVPR 2014
A. Newson 85
Motion estimation
Optical flow
Illustration from : BriefMatch: Dense binary feature matching for real-time optical flow estimation, Eilertsen, G, Forssén, P-E,
Unger, J., Scandinavian Conference on Image Analysis, 2017
A. Newson 86
Motion estimation with CNNs
∗
FlowNet: Learning Optical Flow with Convolutional Networks, Fischer et al, ICCV 2015
A. Newson 87
Super-resolution
∗
Learning a deep convolutional network for image super-resolution, Chao et al, ECCV 2014
A. Newson 88
Point clouds
CNNs require regular grids. Point cloud data are not in this format
Nevertheless, ways have been found to deal with this
∗
3d shapenets: A deep representation for volumetric shapes, W. Zhirong et al. CVPR, 2015
A. Newson 89
Summary
1 Introduction, notation
2 Convolutional Layers
3 Down-sampling and the receptive field
4 CNN details and variants
5 CNNs in practice
6 Image datasets, well-known CNNs, and applications
Applications of CNNs
7 Interpreting CNNs
Visualising CNNs
Adversarial examples
A. Newson 90
Adversarial examples
∗
https: // www. darpa. mil/ program/ explainable-artificial-intelligence
A. Newson 91
Visualising CNNs
Layer 3 filters
Layer 1 filters
Therefore, much research has been dedicated to visualising CNNs
A. Newson 92
Visualising CNNs
Idea : “invert” CNN, find x to maximise the output of a certain layer
Understand what this layer is “seeing”
This is possible due to backpropagation
Gradient ascent
A. Newson 93
Visualising features
†
Erhan, Bengio, Courville, Vincent, Visualizing Higher-Layer Features of a Deep Network, University of Montreal, 2009
A. Newson 94
Visualising CNNs
†
Mahendran and Vedaldi Understanding Deep Image Representations by Inverting Them, Conference on Computer Vision and
Pattern Recognition, 2014
A. Newson 95
Visualising CNNs
†
Erhan, Bengio, Courville, Vincent, Visualizing Higher-Layer Features of a Deep Network, University of Montreal, 2009
A. Newson 96
Visualising CNNs
†
Simonyan, Vedaldi, Zisserman Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency
Maps, arXiv preprint arXiv:1312.6034, 2013
A. Newson 97
Visualising CNNs
A. Newson 98
Visualising CNNs
Deepdream - a code example for visualizing neural networks, Mordvintsev, A., Olah, C. and Tyka, M., Google Research, 2015
A. Newson 99
Adversarial examples
We often get the impression that CNNs are the end all and be all of AI
Consistently produce state-of-the-art results on images
However, CNNs are not infallible : adversarial examples† !
†
Intriguing properties of neural networks, Szegedy, C. et al, arXiv preprint arXiv:1312.6199, 2013
A. Newson 100
Adversarial examples
†
Intriguing properties of neural networks, Szegedy, C. et al, arXiv preprint arXiv:1312.6199, 2013
A. Newson 101
Adversarial examples
Illustration from Visualizing the Loss Landscape of Neural Nets, Li, H et al, NIPS, 2018
A. Newson 102
Adversarial examples
Consider the output of a fully connected layer hw, x̂i = hw, xi + hw, ri
†
Explaining and Harnessing Adversarial Examples, Goodfellow, I.J, Shlens, J. and Szegedy, C., ICLR 2015
A. Newson 103
Adversarial examples
†
Explaining and Harnessing Adversarial Examples, Goodfellow, I.J, Shlens, J. and Szegedy, C., ICLR 2015
A. Newson 104
Adversarial examples
†
Universal adversarial perturbations, Moosavi-Dezfooli, S-M, et al arXiv preprint (2017)
A. Newson 105
Adversarial examples
†
Universal adversarial perturbations, Moosavi-Dezfooli, S-M, et al arXiv preprint (2017)
A. Newson 106
Adversarial examples
†
Adversarial Examples in the Physical World, Kurakin, A., Goodfellow, I. J, Bengio, S. et al. ICLR workshop, 2017
A. Newson 107
Summary
A. Newson 108