0% found this document useful (0 votes)
24 views72 pages

05introduction To Convolutional Neural Networks

Uploaded by

bigiye8953
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views72 pages

05introduction To Convolutional Neural Networks

Uploaded by

bigiye8953
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Introduction to

Convolutional Neural Networks

Lecture 5

1
Convolutional Neural Network (CNN)
• A class of Neural Networks
• Takes image as input (mostly)
• Make predictions about the input image

2
History
• The LeNet architecture (1990s)

Gradient-based learning applied to document recognition


LeCun Y, Bottou L, Bengio Y, Haffner P. Proceedings of the IEEE. 1998

3
First Strong Results
• AlexNet 2012
• Winner of ImageNet Large-Scale Visual Recognition Challenge (ILSVRC 2012)
• Error rate – 15.4% (the next best entry was at 26.2%)

Imagenet classification with deep convolutional neural networks


Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, 2012

4
Today: CNNs are everywhere
Classification

5
Today: CNNs are everywhere
Object detection Semantic Segmentation

Faster R-CNN: Ren, He, Girshick, Sun 2015 Semantic Segmentation Using GAN, Nasim, Concetto, and Mubarak, 2017.

6
Today: CNNs are everywhere
Image captioning Style transfer

"Show and tell: A neural image caption generator.“ A Neuíal Algoíithm of Aítistic Style
Vinyals, Oriol, et al. CVPR 2015. L. Gatys et al. 2015).

7
CNN – Not just images
• Natural Language Processing (NLP)
• Text classification
• Word to vector
• Audio Research
• Speech recognition
• Can be represented as spectrograms
• Converting data to a matrix (2-D) format
• 1D convolution – Audio, EEG, etc.
• 3D convolution - Videos

8
Background
What we already know!

9
General CNN architecture

10
General CNN architecture

11
What is a (digital) Image? - recap
• Definition: A digital image is defined by integrating and sampling
continuous (analog) data in a spatial domain [Klette, 2014].

Left hand coordinate system


12
General CNN architecture

13
Filtering - recap
• Image filtering: compute function of local neighborhood at
each position

h=output f=filter I=image

h[m, n]   f [k, l] I[m  k, n  l]


k ,l
2d coords=k,l 2d coords=m,n

[ ] [ ]
[ ]
14
Filtering - recap
• Output is linear combination of the neighborhood pixels

15
Correlation (linear relationship) - recap

f h
f1 f2 f3 h1 h2 h3 f  h  f1h1  f 2 h2  f 3 h3
f4 f5 f6  h4 h5 h6  f 4 h 4  f5h5  f 6 h 6
f7 f8 f9 h7 h8 h9  f 7 h7  f 8 h8  f 9 h9

16
Convolution – recap

h
f  Image
X  flip
h7 h8 h9 h1 h2 h3
h  Kernel h4 h5 h6 h4 h5 h6
h1 h2 h3 h7 h8 h9

f Y  flip
f1 f2 f3 h9 h8 h7 f * h  f1h9  f 2 h8  f 3 h7
f4 f5 f6  h6 h5 h4  f 4 h 6  f5h5  f 6 h 4
f7 f8 f9 h3 h2 h1
 f 7 h3  f 8 h2  f 9 h1
17
Sobel Edge Detector

18
General CNN architecture

19
Multi-layer perceptron (MLP) – recap
• …is a ‘fully connected’ neural network with non-
linear activation functions.

• ‘Feed-forward’ neural network


20
Nielson
General CNN architecture

21
Learning phases Labels

Images
Training
Image Trained
Training
Features classifier

Image
Testing Image Features Apply classifier Prediction
not in
training set
Slide credit: D. Hoiem and L. Lazebnik
22
General CNN architecture

End to end learning!


23
Neural Network vs CNN
• Image as input in neural network
• Size of feature vector = HxWxC
• For 256x256 RGB image
• 196608 dimensions

• CNN - Special type of neural network


• Operate with volume of data
• Weight sharing in form of kernels
Source: http://cs231n.github.io

24
Fundamental operation

25
Convolution
• Core building block of a CNN
• Spatial structure of image is preserved

32x32x3 image

32 3x3x3 filter

A filter/kernel is convolved with the image


32
3 26
Convolution
• Convolution at one spatial location

3x3x3 filter

32x32x3 image
32
1 number
Result of convolution

32
3 27
Convolution
• Convolution over whole image

3x3x3 filter Activation map


(feature map)
32x32x3 image
32 30

Convolve over all


spatial locations
30
32
3 1 28
Convolution
• Multiple filters
2 3x3x3 filter
Activation maps
(feature maps)
32x32x3 image
32
30

Convolve over all


spatial locations
32 30
3 1 29
Convolution layer
• One convolution layer
• 6 3x3x3 kernels
Activation maps
32x32x3 image
32 30

Convolution layer
32 30

3 6 30
Convolutional Network
• Convolution network is a sequence of these layers

32 28

6 5x5x3 filters

32 28
3 6
31
Convolutional Network
• Convolution network is a sequence of these layers

32 28 24

6 5x5x3 filters 16 5x5x6 filters

32 28 24
3 6 16
32
Parameters
3x3x3 filter Activation map
(feature map)
32x32x3 image
32 30

Convolve over all


spatial locations
30
32
3 1

33
Parameters
Activation maps
32x32x3 image
32 30

Convolution layer
32 30
3 6

6 3x3x3 kernels – 6x3x3x3 parameters = 162

34
Convolution Operation
• Convolution of two functions f and g
function f(t) kernel g(t)

In CNN we use 2D convolutions (mostly)


35
Sobel Edge Detector – recap

d
I
dx
*
Threshold Edges
Image I
d
* dy
I

36
Demo
filter
1 0 1
1 1 1 0 0 0 1 0
0 1 1 1 0 1 0 1
0 0 1 1 1
0 0 1 1 0 4

0 1 0 0 0

Input image output


37
Demo
filter
1 0 1
1 1 1 0 0 0 1 0
0 1 1 1 0 1 0 1
0 0 1 1 1
0 0 1 1 0 4 3

0 1 0 0 0

Input image output


38
Demo
filter
1 0 1
1 1 1 0 0 0 1 0
0 1 1 1 0 1 0 1
0 0 1 1 1
0 0 1 1 0 4 3 4

0 1 0 0 0

Input image output


39
Demo
filter
1 0 1
1 1 1 0 0 0 1 0
0 1 1 1 0 1 0 1
0 0 1 1 1
0 0 1 1 0 4 3 4
2
0 1 0 0 0

Input image output


40
Demo
filter
1 0 1
1 1 1 0 0 0 1 0
0 1 1 1 0 1 0 1
0 0 1 1 1
0 0 1 1 0 4 3 4
2 4 3
0 1 0 0 0
1 3 3
Input image output
41
Convolution - Intuition

0 0 0 0 0
0 0 1 0 0
0 1 0 1 0
1 0 0 0 1
0 0 0 0 0

42
Convolution - Intuition

43
Convolution - Intuition

0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0
0 1 0 1 0 0 1 0 1 0
*
1 0 0 0 1 1 0 0 0 1
0 0 0 0 0 0 0 0 0 0

1x1 + 1x1 + … + 1x1 = 5


44
Convolution - Intuition

45
Convolution - Intuition

0 0 0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0
0 0 1 0 0 0 1 0 1 0
*
0 1 1 1 1 1 0 0 0 1
0 0 0 0 0 0 0 0 0 0

1x1 = 1
46
Convolution
• Multiple filters
2 3x3x3 filter
Activation maps
(feature maps)
32x32x3 image
32
30

Convolve over all


spatial locations
32 30
47
3 1
2D Convolution - dimensions
7x7 map 3x3 filter

48
2D Convolution - dimensions
7x7 map 3x3 filter

49
2D Convolution - dimensions
7x7 map 3x3 filter

50
2D Convolution - dimensions
7x7 map 3x3 filter

51
2D Convolution - dimensions
7x7 map 3x3 filter

Output activation map 5x5


Output size
N-F+1
(7 – 3 + 1) = 5

N – input size
F – filter size

52
Stride
7x7 map 3x3 filter

Filter applied with stride 2

53
Stride
7x7 map 3x3 filter

Filter applied with stride 2

54
Stride
7x7 map 3x3 filter

Filter applied with stride 2

Activation map size 3x3


Output size
(7-3)/2 + 1 = 3

(N-F)/S + 1

55
Stride
7x7 map 3x3 filter

Filter applied with stride 3

56
Stride
7x7 map 3x3 filter

Filter applied with stride 3

Cannot cover perfectly

Not all parameters will fit

57
Stride
7x7 map 3x3 filter
Output size (N-F)/S + 1
N = 7, F = 3

Stride 1
(7-3)/1 + 1 => 5
Stride 2
(7-3)/2 + 1 => 3
Stride 3
(7-3)/3 + 1 => 2.33
58
Padding
• Zero padding in the input
0 0 0 0 0 0 0 0 0
For 7x7 input and 3x3 filter
0 0
0 0
If we have padding of one pixel
0 0
0 0 Output
0 0
7x7
0 0
0 0 Size (recall (N-F)/S+1)
0 0 0 0 0 0 0 0 0 (N-F+2P)/S + 1
59
Padding
• Zero padding in the input
0 0 0 0 0 0 0 0 0
Common to see,
0 0
(F-1)/2 padding with stride 1 to preserve
0 0
the map size
0 0
0 0
N = (N-F+2P)/S + 1
0 0
 (N-1)S = N-F+2P
0 0
 P = (F-1)/2
0 0
0 0 0 0 0 0 0 0 0

60
Pooling
• Invariance to small translations of the input

61
Pooling
• Makes the representations
smaller
• Operates over each activation
map independently

62
Pooling
• Kernel size
• Stride

63
Visualizing CNN

Source : http://cs231n.github.io

64
AlexNet : Network Size
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
• Input 227x227x3 MAX POOL3
FC6
• 5 convolution layers FC7
• 3 dense layers FC8
• Output 1000-D vector
65
AlexNet : Network Size
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
• Input: 227x227x3 images MAX POOL3
• First layer (CONV1): 96 11x11 filters applied at stride 4 FC6
FC7
• What is the output volume size? (227-11)/4+1 = 55 FC8
• What is the number of parameters? 11x11x3x96 = 35K

66
AlexNet : Network Size
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
• After CONV1: 55x55x96 MAX POOL3
• Second layer (POOL1): 3x3 filters applied at stride 2 FC6
FC7
• What is the output volume size? (55-3)/2+1 = 27 FC8
• What is the number of parameters in this layer? 0

67
AlexNet : Network Size
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
• After POOL1: 27x27x96 MAX POOL3
• Third layer (NORM1): Normalization FC6
FC7
• What is the output volume size? 27x27x96 FC8

68
AlexNet : Network Size
1. [227x227x3] INPUT
2. [55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0 CONV1 35K
3. [27x27x96] MAX POOL1: 3x3 filters at stride 2 MAX POOL1
4. [27x27x96] NORM1: Normalization layer NORM1
5. [27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2 CONV2 307K
6. [13x13x256] MAX POOL2: 3x3 filters at stride 2 MAX POOL2
7. [13x13x256] NORM2: Normalization layer NORM2
8. [13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1 CONV3 884K
9. [13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1 CONV4 1.3M
10. [13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1 CONV5 442K
11.[6x6x256] MAX POOL3: 3x3 filters at stride 2 MAX POOL3
12. [4096] FC6: 4096 neurons FC6 37M
13. [4096] FC7: 4096 neurons FC7 16M
14. [1000] FC8: 1000 neurons (class scores) FC8 4M
69
AlexNet Parameters
conv1: (11*11)*3*96 + 96 = 34944
conv2: (5*5)*96*256 + 256 = 614656
conv3: (3*3)*256*384 + 384 = 885120
conv4: (3*3)*384*384 + 384 = 1327488
conv5: (3*3)*384*256 + 256 = 884992
fc1: (6*6)*256*4096 + 4096 = 37752832
fc2: 4096*4096 + 4096 = 16781312
fc3: 4096*1000 + 1000 = 4097000

70
Visualizing Convolution

71
Why not correlation neural network?
• It could be
• Deep learning libraries actually implement correlation

• Correlation relates to convolution via a 180deg rotation


• When we learn kernels, we could easily learn them flipped

72

You might also like