0% found this document useful (0 votes)

158 views108 pages

Convolutional Neural Networks

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their structure, advantages over traditional Multi-Layer Perceptrons (MLPs), and historical development. It explains the convolution operation, its properties, and how CNNs utilize filters to detect features in grid-like data such as images. The document also highlights the significant impact of CNNs in various applications, particularly in image processing and computer vision since their resurgence in 2012.

Uploaded by

Nizar Sahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views108 pages

Convolutional Neural Networks

Uploaded by

Nizar Sahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 108

Convolutional Neural Networks

Alasdair Newson
LTCI, Télécom Paris, IP Paris
[email protected]

A. Newson 1
Introduction

Neural networks provide a highly flexible way to model complex

dependencies and patterns in data

In the previous lessons, we saw the following elements :

MLPs : fully connected layers, biases
Activation functions : sigmoid, soft max, ReLU
Optimisation : gradient descent, stochastic gradient descent
Regularisation : weight decay, dropout, batch normalisation
RNNs : for sequential data

Fully connected Non-linearity

A. Newson 2
Introduction

In MLPs each layer of the network contained fully connected layers

Unfortunately, there are great drawbacks with such an approach
Fully connected
256 layer

256 1000

Each hidden unit is connected to each input unit

There is high redundancy in these weights :
In the above example, 65 million weights are required

A. Newson 3
Introduction

For many types of data with grid-like topological structures (eg.

images), it is not necessary to have so many weights
For these data, the convolution operation is often extremely useful
Reduces the number of parameters to train
Training is faster
Convergence is easier : smaller parameter space

A neural network with convolution operations is known as a

Convolutional Neural Network (CNN)
A. Newson 4
Introduction - some history

“Neocognitron” of Fukushima∗ : first to incorporate notion of

receptive field into a neural network, based on work on animal
perception of Hubert and Weisel†
Yann LeCun first to propose back-propagation for training
convolutional neural networks‡
Automatic learning of parameters instead of hand-crafted weights
However, training was very long : required 3 days (in 1990)

∗
Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in
Position, Fukushima, K., Biological Cybernetics, 1980
†
Receptive fields and functional architecture of monkey striate cortex, Hubel, D. H. and Wiesel, T. N, 1968
‡
Backpropagation Applied to Handwritten Zip Code Recognition, LeCun, Y. et al., AT&T Bell Laboratories

A. Newson 5
Introduction - some history

In the years 1998-2012, research continued on shallow and deep neural

networks, but other machine learning approaches were preferred
(GMMs, SVMs etc.)

In 2012, Alex Krizhevsky et al. used Graphics Processing Units

(GPUs) to carry out backpropagation on a very deep convolutional
neural network
Greatly outperformed classic approaches in the ImageNet Large Scale
Visual Recognition Challenge (ILSVRC)

GPUs turned out to be very efficient for training neural nets (lots of
parallel computations)

Signalled the beginning of deep learning revolution

A. Newson 6
Introduction - some history

Since 2012, CNNs have completely revolutionised many domains

CNNs produce competetive/best results for most problems in image
processing and computer vision

Image classiﬁcation

Image style transfer

Computer graphics

A Neural Algorithm of Artistic

Style, Gatys et al, CVPR 2015
Applications of deep learning
From AtlasNet, Groueix et al, CVPR, 2018

Image restoration
Medical imaging
Automatic speech recognition

Medical Image Classiﬁcation with Convolutional

Neural Network, Li et al., ICARCV, 2014

Medical Image Classiﬁcation with Convolutional

Neural Network, Li et a., ICARCV, 2014

Being applied to an ever-increasing number of problems

A. Newson 7
Summary

1 Introduction, notation
2 Convolutional Layers
3 Down-sampling and the receptive field
4 CNN details and variants
5 CNNs in practice
6 Image datasets, well-known CNNs, and applications
Applications of CNNs
7 Interpreting CNNs
Visualising CNNs
Adversarial examples

A. Newson 8
Introduction - some notation

Notations
x ∈ Rn : input vector
y ∈ Rq : output vector
u` : feature vector at layer `
θ` : network parameters at layer `

Neural network with L layers

A. Newson 9
Introduction

A “Convolutional Neural Network” (CNN) is simply a

concatenation of :
1 Convolutions (filters)
2 Additive biases
3 Down-sampling (“Max-Pooling” etc.)
4 Non-linearities

In this lesson, we will be mainly concentrating on convolutional and

down-sampling layers

A. Newson 10
Summary

A. Newson 11
Convolutional Layers

Convolution operator
Let f and g be two integrable functions. The convolution operator ∗
takes as its input two such functions, and outputs another function
h = f ∗ g, which is defined at any point t ∈ R as :
Z +∞
h(t) = (f ∗ g)(t) = f (τ )g(t − τ )dτ.
−∞

Intuitively, the function h is defined as the inner product between f

and a shifted version of g

A. Newson 12
Convolutional Layers

In many practical applications, in particular for CNNs, we use the

discrete convolution operator, which acts on discretised functions;

Discrete convolution operator

Let fn and gn be two summable series, with n ∈ Z. The discrete
convolution operator is defined as :
+∞
X
(f ∗ g)(n) = f (i)g(n − i)
i=−∞

Intuitively, the function h is defined as the inner product between f

and a shifted version of g
In practice, the filter is of small spatial support, around 3 × 3, or 5 × 5
Therefore, only a small number of parameters need to be trained (9
or 25 for these filters)
A. Newson 13
Convolutional Layers

Properties of convolution
1 Associativity : (f ∗ g) ∗ h = f ∗ (g ∗ h)

2 Commutativity : f ∗ g = g ∗ f

3 Bilinearity : (αf ) ∗ (βg) = αβ(f ∗ g), for (α, β) ∈ R × R

4 Equivariance to translation : (f ∗ (g + τ )) (t) = (f ∗ g)(t + τ )

A. Newson 14
Convolutional Layers

Associativity, commutativity
Associativity+commutativity implies that we can carry out convolution
in any order
There is no point in having two or more consecutive convolutions
This is true in fact for any linear map

Equivariance to translation
Equivariance implies that the convolution of any shifted input
(f + τ ) ∗ g contains the same information as f ∗ g †
This is useful, since we want to detect objects anywhere in the image

†
if we forget about border conditions for a moment

A. Newson 15
Convolutional Layers - 2D Convolution

Most often, we are going to be working with images

Therefore, we require a 2D convolution operator : this is defined in a
very similar manner to 1D convolution :

2D convolution operator
+∞
X +∞
X
(f ∗ g)(s, t) = f (i, j)g(s − i, t − j)
i=−∞ j=−∞

Important remarks for the rest of the lesson!

We are going to denote the filters with w
For lighter notation, we write w(i) =: wi (and the same for xi etc.)

A. Newson 16
Convolutional Layers : Visual Illustration

A. Newson 17
Convolutional Layers : Visual Illustration

A. Newson 18
Convolutional Layers : Visual Illustration

A. Newson 19
Convolutional Layers : Visual Illustration

A. Newson 20
Convolutional Layers : Visual Illustration

A. Newson 21
Convolutional Layers : Visual Illustration

A. Newson 22
Convolutional Layers : Visual Illustration

A. Newson 23
Convolutional Layers : Visual Illustration

A. Newson 24
Convolutional Layers : Visual Illustration

A. Newson 25
Convolutional Layers : Visual Illustration

A. Newson 26
Convolutional Layers : Visual Illustration

A. Newson 27
Convolutional Layers : Visual Illustration

A. Newson 28
Convolutional Layers : Visual Illustration

A. Newson 29
Convolutional Layers

The filter weights wi determine what type of “feature” can be

detected by convolutional layers;
Example, sobel filters :

Horizontal edge Vertical edge

-1 -2 -1 -1 0 1
" # " #
0 0 0 2 0 -2
1 2 1 -1 0 1

A. Newson 30
Convolutional Layers

Convolutional filters can also act as low-pass/smoothing filters

Input image Low-pass filtered image

A. Newson 31
Convolutional Layers

We can also write convolution as a matrix/vector product, as in the

case of fully connected layers
Example : discrete Laplacian operator
n
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→
0 0
  

 4 −1 ··· −1 ···
0 0
−1 4 −1 · · · −1 ···
  
0 −1 0
!  
  0 0 
w= −1 4 −1 → Aw = K   0 −1 4 −1 ··· −1 · · ·
  
0 −1 0  .. .. .. .. .. .. 
  . . . . . . 
 0 0
y −1 ··· −1 ··· −1 4

This further illustrates the drastic reduction in weight parameters (9

instead of Kn)
Can be useful to view convolution in this manner (we will see this later)

A. Newson 32
Convolutional Layers

At this point, it is good to have a more “neural network”-based

illustration of CNNs

...
...

We can see two of the main justifications for CNNs

1 Sparse connectivity
2 Weight sharing

A. Newson 33
Convolutional Layers

Now that we understand convolution, how do we optimize a neural

network with convolutional layers ? Back-propagation
Consider a layer with just a convolution with w

∂L
We have the derivatives ∂yi available
We want to calculate the following quantities :
∂L
∂xk (for further back-propagation) and
∂L
∂wk
∂L
We shall use the abbreviation ∂yi =: dyi

A. Newson 34
Convolutional Layers

Before considering the general case, let’s take an example from the
illustration from above

...
...

∂L
Say we want to calculate dx1 := ∂x1

A. Newson 35
Convolutional Layers

Each element yi depends on the input xi and the weight wk

Therefore, we can consider that the loss is a function of several
variables :

L = f (x1 , . . . , xn , w1 , . . . , wK , y1 (x· , w· ), . . . , ym (x· , w· ))

We use the multi-variate chain rule

X ∂L ∂yi
dx1 =
i
∂yi ∂x1

A. Newson 36
Convolutional Layers

...
...

dx1 =???

A. Newson 37
Convolutional Layers

...
...

∂y0 ∂y1 ∂y2

dx1 = dy0 + dy1 + dy1 = dy0 c + dy1 b + dy2 a
∂x1 ∂x1 ∂x1

As we can see, the order of the weights is flipped

A. Newson 38
Convolutional Layers
∂L
Now, let us calculate ∂xk for any k

∂L X ∂yi
= dyi multi-variate chain rule
∂xk i
∂xk
X ∂(x ∗ w)i
= dyi
i
∂xk
P
X ∂ j xj wi−j
= dyi
i
∂xk
X X
= dyi wi−k = dyi w−(k−i)
i i

More compactly : dxk = (dy ∗ flip(w))k

A. Newson 39
Convolutional Layers

Recall that the convolution operator can be written y = Aw x, with Aw

the convolution matrix
The flipping of the weights corresponds to a transpose of A

dx = Aw | dy (1)

This gives an easy method of backpropagation in convolutional layers

Although you will not actually have to implement this

A. Newson 40
Convolutional Layers
∂L
Now for the second part : ∂wk

...
...

P ∂L ∂yi
Again, we use the chain rule. For example da = i ∂yi ∂a

A. Newson 41
Convolutional Layers

...
...

We have yi = axi−1 + bxi + cxi+1

X
da = dyi xi−1
i

A. Newson 42
Convolutional Layers

In the general case, we have:

∂L X ∂yi
= dyi multi-variate chain rule
∂wk i
∂wk
X ∂(x ∗ w)i
= dyi
i
∂wk
P
X ∂ j xj wi−j
= dyi
i
∂wk
X X
= dyi xi−k = dyi x−(k−i) k =i−j
i i

More compactly : dwk = (dy ∗ flip(x))k

A. Newson 43
Convolutional Layers

Note : optimisation of loss w.r.t one parameter wk involves entire

image

Weights are “shared” across the entire image

This notion of weight sharing is one of the main justifications of

using CNNs

In practice, we do not calculate dwk and dxk ourselves, we use the

automatic differentiation tools of Tensorflow, Pytorch etc.

A. Newson 44
Convolutional Layers - border conditions

The convolution operator poses a problem at the borders

Theoretically, we consider functions defined over an infinite domain,

but which have compact support

In reality, we only have finite vectors/matrices to work on

A. Newson 45
Convolutional Layers - border conditions

Two common approaches to border conditions

“VALID” approach “SAME” approach

Only take shift/dot products that do Keep output size m

not extend beyond Supp(u) Need to choose values outside of
Output size : m − |w| + 1 Supp(u) : zero-padding
0 0 0
0
0

A. Newson 46
2D+feature convolution

Several filters are used per layer, let us say K filters : {w1 , . . . , wK }

The resulting vectors/images are then stacked together to produce the

next layer’s input u`+1 ∈ Rm×n×K

u`+1 = [u ∗ w1 , . . . , u ∗ wK ]

Therefore, the next layer’s weights must have a depth of K. The 2D

convolution with an image of depth K is defined as
X
(u ∗ w)y,x = u(i, j, k) w(y − i, x − j, k)
i,j,k

Useful explanation : https: // towardsdatascience. com/

a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215

A. Newson 47
Convolutional layers

Illustration of several consecutive convolutional layers with different

numbers of filter

Each layer contains “image” with a depth, where each channel

corresponds to a different filter response
Each layer is a concatenation of several features : rich information

Useful explanation : https: // towardsdatascience. com/

a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215

A. Newson 48
Convolutional layers - a note on Biases

A note on biases in neural networks : each output layer is

associated with one bias

There is not one bias per pixel

This is coherent with the idea of weight sharing (bias sharing)

A. Newson 49
Convolutional Layers

In many cases, we are primarily interested in detection;

We would like to detect objects wherever they are in the image

Formally, we would like to have some shift invariance property;

This is done in CNNs by using subsampling, or some variant :
Strided convolutions
Max pooling
We explain these now

A. Newson 50
Summary

A. Newson 51
Down-sampling and the
receptive field

A. Newson 52
The Receptive Field

Neural networks were initially inspired by the brain’s functioning

Hubel and Weisel† showed that the visual cortex of cats and monkeys
contained cells which individually responded to different small regions
of the visual field
The region which an individual cell responds to is known as the
“receptive field” of that cell

†
Receptive fields and functional architecture of monkey striate cortex, Hubel, D. H.; Wiesel, T. N, 1968 Illustration from :
http: // www. yorku. ca/ eye/ cortfld. htm

A. Newson 53
The Receptive Field

This idea was imitated in convolutional neural networks by adding

down-sampling operations

Convolution +
subsampling

Illustration from : Applied Deep Learning, Andrei Bursuc, https: // www. di. ens. fr/ ~lelarge/ dldiy/ slides/ lecture_ 7/

A. Newson 54
Strided convolution

Strided convolution is simply convolution, followed by subsampling

Subsampling operator (for 1D case)

Let x ∈ Rn . We define the subsampling step as δ > 1, and the subsampling
n
operator Sδ : Rn → R δ , applied to x, as
n
Sδ (x) (t) = x(δt), for t = 0 . . . −1
δ

A. Newson 55
Max pooling

Max pooling subsampling consists in taking the maximum value over

a certain region
This maximum value is the new subsampled value
We will indicate the max pooling operator with Sm

(
max
(
A. Newson 56
Max pooling

Back propagation of max pooling only passes the gradient through the
maximum

10 80 0
80
15 30 0 0

Max pooling Back propagation

A. Newson 57
Down-sampling

Conclusion : cascade of convolution, non-linearities and subsampling

produces shift-invariant classification/detection
We can detect Roger wherever he is in the image !

Convolution + non-linearity +max pooling

✓ ✓ ✓ ✓

A. Newson 58
Summary

A. Newson 59
Dilated Convolution

There is a variant of convolution called dilated convolution∗

Increase spatial extent of convolution without adding parameters
Add a space D between each point in the convolution

D=1 D=2 D=3

X
(u ∗ v)(y, x) = u(i, j, k)v(y − Di, x − Dj, k) (2)
i,j,k

∗
Multi-Scale Context Aggregation by Dilated Convolution, Yu, F, Kolten, V, ICLR 2016

A. Newson 60
Locally connected layers / unshared convolution

We might wish for a mix of a dense layer and a convolutional layer

One possibility : locally-connected layers (sometimes called
“unshared convolution”)
Local connectivity but no weight sharing

...
...

Number of weights increases linearly with the number of pixels, rather

than quadratically (for MLPs)
A. Newson 61
Summary

A. Newson 62
How to build your CNN ?

How to build your CNN ?

We have looked at the following operations : convolutions, additive
biases, non-linearities

All of these elements make up convolutional neural networks

However, how do we put these together to create our own CNN ?

Architecture ?
Programming tools ?
Datasets ?

A. Newson 63
Architecture : vanilla CNN

Simple classification CNN architecture often consists of a feature

learning section
Convolution → biases → non-linearities → subsampling
This continues until a fixed subsampling is achieved

After this, a classification section is used

Fully connected layer → non-linearity

A. Newson 64
Architecture

Central question : how to choose number of layers ?

Complicated, very little theoretical understanding, currently a hot

topic of research

However : there are a few rules of thumb to follow

Receptive field of the deepest layer should encompass what we
consider to be a fundamental brick of the objects we are analysing

convolution,
subsampling etc.

Set number of layers and subsampling factors according to the problem

A. Newson 65
CNN programming frameworks

Caffe
Open source, developed by University of California, Berkley
Network created in separate specific files
Somewhat laborious to use, less used than other frameworks

Theano
Open source, created by the Université de Montréal
Unfortunately, to be discontinued due to strong competition

Tensorflow
Open source, developed by Google
Implements a wide range of deep learning functionalities, widely used

Pytorch
Open source, developed by Facebook
Implements a wide range of deep learning functionalities, widely used

A. Newson 66
Summary

A. Newson 67
MNIST dataset

MNIST is a dataset of 60,000 28 × 28 pixel grey-level images

containing hand-written digits
The digits are centred in the images and scaled to have roughly the
same size
Although quite a “simple” dataset, still used to display performance of
modern CNNs

A. Newson 68
Caltech 101

Produced in 2003, first major object recognition dataset

9,146 images, 101 object categories, each category contains between
40 and 800 images
Annotations exist for each image : bounding box for the object and a
human-drawn outline

A. Newson 69
ImageNet dataset

Dataset created in 2009 by researchers from Princeton unverisity

Very large dataset : 14,197,122 images, hand-annotated
Used for the ImageNet Large Scale Visual Recognition Challenge, an
annual benchmark competition for object recognition algorithms

A. Newson 70
LeNet (1989/1998)

Created by Yann LeCun in 1989, goal : to recognise handwritten digits

Able to classify digits with 98.9% accuracy, used by U.S. government
to automatically read digits

Illustration from : Gradient-based Learning Applied to Document Recognition, LeCun, Y. Bottou, L., Bengio, Y. and Haffner,
Proceedings of the IEEE, 1989

A. Newson 71
AlexNet (2012)

AlexNet : created by Alex Krizhevsky in 2012

Improved accuracy of ImageNet Large Scale Visual Recognition
Challenge competition by 10 percentage points (16.4%)
First truly deep neural network
Signaled beginning of dominance of deep learning in image processing
and computer vision

Illustration from : Imagenet classification with deep convolutional neural networks, Krizhevsky, A., Sutskever, I. and Hinton, G.
E, NIPS, 2012

A. Newson 72
GoogLeNet (2015)

In 2014/2015, Google introduced the “Inception” architecture/module

Major attempt at reducing total number of parameters
No fully connected layers, only convolutional
2 million instead of 60 million for AlexNet
Novel idea : have variable receptive field sizes in one layer

Going deeper with convolutions, Szegedy et al, CVPR, 2015

A. Newson 73
GoogLeNet (2015)

Created by Google in 2014, GoogLeNet is a specific implementation of

the “inception” architecture
6.6% test error rate on ImageNet (human error rate 5%)

Going deeper with convolutions, Szegedy et al, CVPR, 2015

A. Newson 74
VGG16 (2015)

VGG16 is a 16-layer network, with small receptive fields (3 × 3 filters,

with less subsampling)
Around 7.5% test error on ILSVRC

Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan, K. and Zisserman, A., ICLR, 2015
Illustration from Mathieu Cord,
https: // blog. heuritech. com/ 2016/ 02/ 29/ a-brief-report-of-the-heuritech-deep-learning-meetup-5/
A. Newson 75
Summary of advances in CNNs
Network LeNet (1998) AlexNet GoogLeNet VGG16 (2015)
(2012) (2014)

Image size 28 × 28 256 × 256 × 3 256 × 256 × 3 224 × 224 × 3

Layers 3 8 22 16
Parameters 60,000 60 million 2 million 138 million

Evolution of CNN perfomance

20
Error on ILSVRC

0
2011 2012 2013 2015 2014 2015
SVM AlexNet AlexNet, bis VGG16 GoogleNetdeep ResNets

A. Newson 76
Image classification

As we mentioned before, CNNs make sense for data with grid-like

structures

In particular, images are most often the target of CNNs

Arguably the most common application of CNNs is to image

classification

Why is image classification important ? Closely linked to :

Object detection
Tracking
Image search (in large databases for example)

In recent years, the best performing classification algorithms have been

using neural networks

A. Newson 77
Image classification

Why is image classification difficult ?

Images can vary in size, shape, position
We need to deal with variable lighting conditions, occlusions etc.

Let us look at a standard CNN classificaton network

A. Newson 78
Image classification

We have input datapoints x, which we wish to classify into several,

predefined classes {ci }, i = 1 . . . K, where K is the number of classes

As we have seen, convolution, non-linearities, subsampling allow for

robust classification that is invariant to many perturbations

Vast majority of CNN classification networks follow this general

architecture

A. Newson 79
Image classification

Resiudal architectures : ResNET

ResNET∗ (2016) uses skip connections
to mitigate the vanishing gradient
problem
Similar to LSTM, except propagates
through network layers, rather than
time

Residual mechanism used in many

subsequent architectures

Latest residual archticture gives 87.54%

accuracy on ImageNet

∗
Deep Residual Learning for Image Recognition, Kaiming, H. et al, CVPR, 2016
Illustration from https: // becominghuman. ai/ resnet-convolution-neural-network-e10921245d3d

A. Newson 80
Image classification

Attention mechanism in image networks

Recall the attention mechanism in RNNs : addresses problem of long
range dependency
Networks exist with attention only : transformer∗
Also used in image network architectures (usually self-attention)

Attention(Q, K, V ) = Softmax(QK T )V (3)

Q, queries: what is the importance of these elements

K, keys: we use these elements for comparison (weighting)
V , values: we use these to “reconstruct” the queries
Often Q, K, V are the same, image patches

∗
Attention is all you need, Vaswani et al, NIPS, 2017

A. Newson 81
Image classification

Attention mechanism in image networks

Attention(Q, K, V ) = Softmax(QK T )V (4)

This equation says that the attention is a weighted version of V

The weights are given by a softmax of the dot products between
patches in Q and those in K

∗
Attention is all you need, Vaswani et al, NIPS, 2017

A. Newson 82
Image classification

Attention mechanism in image networks

K
Q
?

K K

∗
Attention is all you need, Vaswani et al, NIPS, 2017

A. Newson 83
Image classification

Attention mechanism in image networks

Combined attention/convolution archtictures present the best
accuracies on ImageNeT (to date∗ )
CoAt-Net7: 90.88% accuracy on ImageNet

∗
https://paperswithcode.com/sota/image-classification-on-imagenet

A. Newson 84
Image classification

We can also detect the position of objects in images

RNN∗ proposes a simple approach :
1 Propose a list of bounding boxes in the image
2 Pass the resized sub-images through a powerful classification network
3 Classify each sub-image with your favourite classifier

Many variants on this work (Fast R-NN, Faster R-CNN) etc.

∗ Rich feature hierarchies for accurate object detection and semantic segmentation, Girschik, R. et al. CVPR 2014

A. Newson 85
Motion estimation

Motion estimation is a central task for many image processing and

computer vision problems : tracking, video editing
Optical flow involves estimating a vector field (u, v) : R2 → R2
where each vector points to the displacement of pixel (x, y) from an
image I1 to I2

I1 (x, y) = I2 (x + u(x, y), y + v(x, y))

Optical ﬂow

Illustration from : BriefMatch: Dense binary feature matching for real-time optical flow estimation, Eilertsen, G, Forssén, P-E,
Unger, J., Scandinavian Conference on Image Analysis, 2017

A. Newson 86
Motion estimation with CNNs

A major challenge of optical flow estimation is to handle both fine and

large-scale motions
This is difficult to do with classical, variational approaches
CNNs have this multi-scale architecture already built in
Example : FlowNet∗ uses this, first extracting meaningful features
from the images (in parallel) and then combining them to create the
optical flow

∗
FlowNet: Learning Optical Flow with Convolutional Networks, Fischer et al, ICCV 2015

A. Newson 87
Super-resolution

Image super-resolution : go from a low-resolution image to a

higher-resolution one
Relatively straightforward approach with a CNN∗

Drawback, highly dependent on degradation used in lower-resolution

images in database

∗
Learning a deep convolutional network for image super-resolution, Chao et al, ECCV 2014

A. Newson 88
Point clouds

CNNs require regular grids. Point cloud data are not in this format
Nevertheless, ways have been found to deal with this

ShapeNet∗ splits a volume up into

sub-regions that are processed by
CNNs
Each region is a Bernoulli random
variable representing the probability of
this voxel belonging to a shape
This general approach (using voxels) is
followed in many other approaches

∗
3d shapenets: A deep representation for volumetric shapes, W. Zhirong et al. CVPR, 2015

A. Newson 89
Summary

A. Newson 90
Adversarial examples

As is often the case in deep learning, it is very difficult to understand

what is going on in CNNs

Much research is being dedicated to understanding these networks

Explainable AI (XAI) Darpa project∗

We discuss two topics related to interpretability

Visualising CNNs
Adversarial examples

∗
https: // www. darpa. mil/ program/ explainable-artificial-intelligence

A. Newson 91
Visualising CNNs

We would like to understand what CNNs are learning

Unfortunately filters are difficult to interpret (especially deeper layers)

Layer 3 filters
Layer 1 filters
Therefore, much research has been dedicated to visualising CNNs

A. Newson 92
Visualising CNNs
Idea : “invert” CNN, find x to maximise the output of a certain layer
Understand what this layer is “seeing”
This is possible due to backpropagation

Basic CNN visualisation algorithm

Choose a layer ` to visualise
x0 ∼ N (0, 1)
For i = 1 . . . N
xi = xi−1 + λ∇x ku` (xi−1 )k Gradient ascent
Return xN

Gradient ascent

A. Newson 93
Visualising features

Generalisation: maximise response to a given filter response

Choose layer `, filter k and element (“pixel”) (i, j)
Random initialisation x0 , constrain norm of solution x

x̂ = arg max u`i,j,k

x
with k x k = ρ

Optimisation : gradient ascent

†
Erhan, Bengio, Courville, Vincent, Visualizing Higher-Layer Features of a Deep Network, University of Montreal, 2009

A. Newson 94
Visualising CNNs

More sophisticated approach: standard inverse problem with

regularisation
x̂ = arg minkf (x) − f0 k22 + λkxk22 + µk∇xk22 (5)
x

†
Mahendran and Vedaldi Understanding Deep Image Representations by Inverting Them, Conference on Computer Vision and
Pattern Recognition, 2014

A. Newson 95
Visualising CNNs

Layer 1 Layer 2 Layer 3 Layer 4

Maximisation of different activations applied to MNIST dataset

†
Erhan, Bengio, Courville, Vincent, Visualizing Higher-Layer Features of a Deep Network, University of Montreal, 2009

A. Newson 96
Visualising CNNs

Another approach of Simonyan et al.† proposes to see what images

correspond to what classes
Choose a class c, maximise the response of this class

x̂ = arg max f (x)c − λkxk22

Find an L2 -regularised image which maximises the score for a given

class c
Initialise with random input image x0

†
Simonyan, Vedaldi, Zisserman Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency
Maps, arXiv preprint arXiv:1312.6034, 2013

A. Newson 97
Visualising CNNs

Class model visualisation

†
Simonyan, Vedaldi, Zisserman Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency
Maps, arXiv preprint arXiv:1312.6034, 2013

A. Newson 98
Visualising CNNs

Similar idea with Inception architecture of Google : “Deep Dream”

Maximise a class from input image

Input image Maximising “dogs” category

Deepdream - a code example for visualizing neural networks, Mordvintsev, A., Olah, C. and Tyka, M., Google Research, 2015

A. Newson 99
Adversarial examples

We often get the impression that CNNs are the end all and be all of AI
Consistently produce state-of-the-art results on images
However, CNNs are not infallible : adversarial examples† !

How was this image created ???

†
Intriguing properties of neural networks, Szegedy, C. et al, arXiv preprint arXiv:1312.6199, 2013

A. Newson 100
Adversarial examples

Szegedy et al. propose† add a small perturbation r that fools the

classifier network f into choosing the wrong class c for x̂ = x + r

arg min|r|22 , s.t f (x + r) = c, x + r ∈ [0, 1]n

x̂ is the closest example to x s.t x̂ is classified as in class c

Minimisation with box-constrained L-BFGS algorithm

†
Intriguing properties of neural networks, Szegedy, C. et al, arXiv preprint arXiv:1312.6199, 2013

A. Newson 101
Adversarial examples

Common explanation : the space of images is very high-dimensional,

and contains many areas that are unexplored during training time

Example of loss surfaces in commonly used networks (Res-Nets)

Illustration from Visualizing the Loss Landscape of Neural Nets, Li, H et al, NIPS, 2018

A. Newson 102
Adversarial examples

Many approaches to adversarial examples exist. Goodfellow et al.†

propose a principled way of creating these

Consider the output of a fully connected layer hw, x̂i = hw, xi + hw, ri

Let us set r = sign(w). What happens to hw, x̂i ?

Increase by nm as dimension n increases (m is average value of w)
However, |r|∞ does not increase with n

Conclusion : we can add a small vector r that increases the output

response hw, x̂i

†
Explaining and Harnessing Adversarial Examples, Goodfellow, I.J, Shlens, J. and Szegedy, C., ICLR 2015

A. Newson 103
Adversarial examples

Goodfellow et al. consider a local linearisation of the network’s loss

around θ
L(x0 ) ≈ f (x0 ) + w∇x L(θ, x0 , y0 )

Thus, the perturbation image x̂ is set to

x̂ = x + sign(∇x L(θ, x, y))

†
Explaining and Harnessing Adversarial Examples, Goodfellow, I.J, Shlens, J. and Szegedy, C., ICLR 2015

A. Newson 104
Adversarial examples

Even worse, it is possible to create universal adversarial examples†

Perturbations that fool a network for any image class

Simple algorithm : initialise perturbation r, go through database

adding specific perturbations to r, project onto set { r, ||r|| < ε}
What do these perturbations look like ?

†
Universal adversarial perturbations, Moosavi-Dezfooli, S-M, et al arXiv preprint (2017)

A. Newson 105
Adversarial examples

†
Universal adversarial perturbations, Moosavi-Dezfooli, S-M, et al arXiv preprint (2017)

A. Newson 106
Adversarial examples

Conclusion : CNNs are not necessarily robust

Adversarial examples are a significant problem :
Even printed photos of adversarial examples work†

Explaining and resisting adversarial examples is currently a hot

research topic

†
Adversarial Examples in the Physical World, Kurakin, A., Goodfellow, I. J, Bengio, S. et al. ICLR workshop, 2017

A. Newson 107
Summary

CNNs represent the state-of-the art in many different

domains/problems

If you have an unsolved problem, there is a good chance CNNs will

produce a good/excellent result

However : theoretical understanding is still relatively limited

This leads to problems such as adversarial examples
It is not clear whether CNNs are truly robust/generalisable
This is a hot research topic, important if CNNs are to be used in
industrial applications
21/10/2021 : last lab work, on CNNs

A. Newson 108

Danka T Mathematics of Machine Learning Master Linear Algebr
100% (1)
Danka T Mathematics of Machine Learning Master Linear Algebr
729 pages
Deep Learning For Physics Rese - Martin Erdmann Jonas Glombit - 4998
No ratings yet
Deep Learning For Physics Rese - Martin Erdmann Jonas Glombit - 4998
340 pages
Robotics, Control and Computer Vision
No ratings yet
Robotics, Control and Computer Vision
600 pages
NLP Unit 2 - Part2 - Features and Augmented Grammars
No ratings yet
NLP Unit 2 - Part2 - Features and Augmented Grammars
31 pages
4 Classification 1
100% (1)
4 Classification 1
45 pages
Gaussian Process Intuitive
No ratings yet
Gaussian Process Intuitive
17 pages
Can Machine Learning Be Used To Predict Market Direction
No ratings yet
Can Machine Learning Be Used To Predict Market Direction
11 pages
Object Detection in Drone Imagery Using Convolutional Neural Networks
100% (1)
Object Detection in Drone Imagery Using Convolutional Neural Networks
191 pages
Chapter 5. Probability and Random Process - Updated
No ratings yet
Chapter 5. Probability and Random Process - Updated
151 pages
Gaussian Processes: Probabilistic Inference (CO-493)
No ratings yet
Gaussian Processes: Probabilistic Inference (CO-493)
146 pages
Generative AI: - Lecture-1
100% (1)
Generative AI: - Lecture-1
21 pages
Segmentation Detection
100% (1)
Segmentation Detection
109 pages
Yasha Hasija, Rajkumar Chakraborty - Hands On Data Science For Biologists Using Python (2021, CRC Press) - Libgen - Li
No ratings yet
Yasha Hasija, Rajkumar Chakraborty - Hands On Data Science For Biologists Using Python (2021, CRC Press) - Libgen - Li
299 pages
Intelligent Electrical Systems and Industrial Automation: Sanjoy Mondal Vincenzo Piuri João Manuel R. S. Tavares
No ratings yet
Intelligent Electrical Systems and Industrial Automation: Sanjoy Mondal Vincenzo Piuri João Manuel R. S. Tavares
410 pages
Gaussian Processes For Implied Volatility Estimation 1719404599
No ratings yet
Gaussian Processes For Implied Volatility Estimation 1719404599
11 pages
Gaussian Process Regression For Tool Wear Prediction
No ratings yet
Gaussian Process Regression For Tool Wear Prediction
19 pages
SNS Lab Report 7 (Fourier)
No ratings yet
SNS Lab Report 7 (Fourier)
16 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
161 pages
Deep Learning Cours
No ratings yet
Deep Learning Cours
165 pages
Vessel Tracking Using GP PDF
No ratings yet
Vessel Tracking Using GP PDF
14 pages
Prediction of Currency Exchange Rate Based On Transformers
No ratings yet
Prediction of Currency Exchange Rate Based On Transformers
16 pages
Convolutional Networks
No ratings yet
Convolutional Networks
37 pages
Face Recognition With Python
No ratings yet
Face Recognition With Python
5 pages
NLP JNTUH Unit 4
No ratings yet
NLP JNTUH Unit 4
22 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
Lamport Non Token Based Algorithm
No ratings yet
Lamport Non Token Based Algorithm
13 pages
Aliasing and Anti-Aliasing
No ratings yet
Aliasing and Anti-Aliasing
15 pages
Applying Bayesian Inference in A Hybrid CNN-LSTM Model For Time Series Prediction.
No ratings yet
Applying Bayesian Inference in A Hybrid CNN-LSTM Model For Time Series Prediction.
7 pages
0.1 Top 100 Dsa Interview Questions
No ratings yet
0.1 Top 100 Dsa Interview Questions
3 pages
Transformers
No ratings yet
Transformers
102 pages
Neural Network-Unit-1-Complete-Notes
No ratings yet
Neural Network-Unit-1-Complete-Notes
154 pages
Data Structure Imp Previous Year QNS
No ratings yet
Data Structure Imp Previous Year QNS
7 pages
Himanshu
No ratings yet
Himanshu
22 pages
Midpoint
No ratings yet
Midpoint
25 pages
DL CNN
No ratings yet
DL CNN
129 pages
UAV Aerial Image-Based Forest Fire Detection Using Deep Learning KEDDOUS AKILA 2.0
No ratings yet
UAV Aerial Image-Based Forest Fire Detection Using Deep Learning KEDDOUS AKILA 2.0
100 pages
Physics Informed Neural Network Theory and Applications
No ratings yet
Physics Informed Neural Network Theory and Applications
44 pages
Assignment 2 BDA
No ratings yet
Assignment 2 BDA
9 pages
Image Recognition: Ms. Charnpreet Kaur
No ratings yet
Image Recognition: Ms. Charnpreet Kaur
21 pages
OPR Cheat Sheet: Graphical Method
No ratings yet
OPR Cheat Sheet: Graphical Method
3 pages
Deep Neural Networks and Data For Automated Driving 1721847430
No ratings yet
Deep Neural Networks and Data For Automated Driving 1721847430
288 pages
Deep Learning
No ratings yet
Deep Learning
127 pages
DL Unit2
No ratings yet
DL Unit2
25 pages
Linear Programming Examples A Maximization Model Example
100% (1)
Linear Programming Examples A Maximization Model Example
22 pages
Stock Price Prediction Using Recurrent Neural Networks PDF
No ratings yet
Stock Price Prediction Using Recurrent Neural Networks PDF
132 pages
Da Ii
No ratings yet
Da Ii
4 pages
Experiment No 6
No ratings yet
Experiment No 6
3 pages
Unit III
No ratings yet
Unit III
60 pages
Self-Supervised Learning Generative or Contrastive
No ratings yet
Self-Supervised Learning Generative or Contrastive
20 pages
A Comparative Analysis of Various Segmentation Techniques in Brain Tumor Image
No ratings yet
A Comparative Analysis of Various Segmentation Techniques in Brain Tumor Image
7 pages
Udacity Deep LEarning Part4 RNN
No ratings yet
Udacity Deep LEarning Part4 RNN
338 pages
Cs329s 01 Slides
No ratings yet
Cs329s 01 Slides
70 pages
Math 15 1 LQ1 Set C 2nd Term 2018 2019
No ratings yet
Math 15 1 LQ1 Set C 2nd Term 2018 2019
2 pages
Rec 1
No ratings yet
Rec 1
2 pages
Activation Function
No ratings yet
Activation Function
13 pages
Process Scheduling - Ii: Problem Statement
No ratings yet
Process Scheduling - Ii: Problem Statement
7 pages
Advances in Quantum Machine Learning
No ratings yet
Advances in Quantum Machine Learning
38 pages
Or Assignment 2023
No ratings yet
Or Assignment 2023
5 pages
CST383 A
No ratings yet
CST383 A
4 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Principles of Convolutional Neural Networks
No ratings yet
Principles of Convolutional Neural Networks
9 pages
LFSR Tutorial
No ratings yet
LFSR Tutorial
48 pages
Btech CSE
No ratings yet
Btech CSE
17 pages
Deep Learning Approaches For Network Int
No ratings yet
Deep Learning Approaches For Network Int
116 pages
Predictive Modeling of Stock Prices Using Transformer Model
No ratings yet
Predictive Modeling of Stock Prices Using Transformer Model
8 pages
Machine Learning Applications For Precision Agriculture
No ratings yet
Machine Learning Applications For Precision Agriculture
31 pages
Information Theory and Cognition A Review
No ratings yet
Information Theory and Cognition A Review
19 pages
Crop Yield Prediction Using Machine Learning - 2020 - Computers and Electronic
50% (2)
Crop Yield Prediction Using Machine Learning - 2020 - Computers and Electronic
18 pages
Assignment 2 Coen 352
No ratings yet
Assignment 2 Coen 352
3 pages
Transformer Explained
No ratings yet
Transformer Explained
29 pages
ML Program Output
No ratings yet
ML Program Output
22 pages
FineTuning Process Using OpenAI 1703440516
No ratings yet
FineTuning Process Using OpenAI 1703440516
14 pages
Deep Learning CNN
No ratings yet
Deep Learning CNN
204 pages
10 Ga
No ratings yet
10 Ga
20 pages
ECON 301 - Midterm - F2020 Answer Key - pdf-1601016920671
No ratings yet
ECON 301 - Midterm - F2020 Answer Key - pdf-1601016920671
7 pages
Dimension Reduction: P Adraig Cunningham University College Dublin
No ratings yet
Dimension Reduction: P Adraig Cunningham University College Dublin
24 pages
Hierarchical
No ratings yet
Hierarchical
9 pages
2 Port Problems
No ratings yet
2 Port Problems
6 pages
Neurocomputing: Zhaoyang Niu, Guoqiang Zhong, Hui Yu
No ratings yet
Neurocomputing: Zhaoyang Niu, Guoqiang Zhong, Hui Yu
15 pages
How To Code A Neural Network With Backpropagation in Python
No ratings yet
How To Code A Neural Network With Backpropagation in Python
133 pages
Lab 6 - Matlab FDATools
No ratings yet
Lab 6 - Matlab FDATools
4 pages
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
No ratings yet
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
22 pages
POP Using C - VTU Lab Program-2
No ratings yet
POP Using C - VTU Lab Program-2
8 pages
Melody Generation Using An Interactive Evolutionary Algorithm
No ratings yet
Melody Generation Using An Interactive Evolutionary Algorithm
6 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
Car Make and Model Recognition Using Ima
No ratings yet
Car Make and Model Recognition Using Ima
8 pages
Summative Test # 1.1
No ratings yet
Summative Test # 1.1
2 pages
PyTorch Workflow Fundamentals
No ratings yet
PyTorch Workflow Fundamentals
1 page
Deep Learning
No ratings yet
Deep Learning
2 pages