Week 8

The document outlines the content of lectures on CNN architecture and popular CNN models, including concepts like convolution layers, pooling, and nonlinearity. It discusses specific models such as LeNet, AlexNet, VGG Net, and GoogLeNet, highlighting their architectures and performance metrics. Additionally, it covers the ILSVRC challenge and its significance in evaluating image classification algorithms.

Uploaded by

adithiyaaaiml2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views101 pages

Week 8

Uploaded by

adithiyaaaiml2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 101

Course Name: Deep Learning

Faculty Name: Prof. P. K. Biswas

Department : E & ECE, IIT Kharagpur

Topic
Lecture 36: CNN Architecture
Concepts Covered:
 CNN
 CNN Architecture
 Convolution Layer
 Receptive Field
 Nonlinearity
 Pooling
Convolutio
n
1 D Convolution
∞ ∞
y ( n) = ∑ x ( p ) h( n − p ) y (t ) = ∫ x(τ )h(t − τ )dτ
p =0 0

2 D Convolution
∞ ∞
y (m, n) = ∑∑ x( p, q )h(m − p, n − q )
p =0 q =0
Finite Convolution
Kernel
Feature at a point is local in nature
Convolution
Kernel
1D 2A+1
A
y ( n) = ∑ w( p) x(n − p)
p =− A

2D (2A+1)x(2A+1)
A A
y (m, n) = ∑ ∑ w( p, q) x(m − p, n − q)
p =− A q =− A
Finite Convolution
Kernel
0 0 X(0) X(1) X(2) X(3) . X(n-2) X(n-1) X(n) X(n+1) X(n+2) .
W(2) W(1) W(0) W(-1) W(-2)
Y(0)
Finite Convolution
Kernel
0 0 X(0) X(1) X(2) X(3) . X(n-2) X(n-1) X(n) X(n+1) X(n+2) .
W(2) W(1) W(0) W(-1) W(-2)
Y(0) Y(1)
Finite Convolution
Kernel
0 0 X(0) X(1) X(2) X(3) . X(n-2) X(n-1) X(n) X(n+1) X(n+2) .
W(2) W(1) W(0) W(-1) W(-2)
Y(0) Y(1) Y(2)
Finite Convolution
Kernel
0 0 X(0) X(1) X(2) X(3) . X(n-2) X(n-1) X(n) X(n+1) X(n+2) .
W(2) W(1) W(0) W(-1) W(-2)
Y(0) Y(1) Y(2) Y(3)
Finite Convolution
Kernel
0 0 X(0) X(1) X(2) X(3) . X(n-2) X(n-1) X(n) X(n+1) X(n+2) .
W(2) W(1) W(0) W(-1) W(-2)
Y(0) Y(1) Y(2) Y(3) Y(n-1)
Finite Convolution
Kernel
0 0 X(0) X(1) X(2) X(3) . X(n-2) X(n-1) X(n) X(n+1) X(n+2) .
W(2) W(1) W(0) W(-1) W(-2)
Y(0) Y(1) Y(2) Y(3) Y(n-1) Y(n)
Finite Convolution
Kernel
0 0 X(0) X(1) X(2) X(3) . X(n-2) X(n-1) X(n) X(n+1) X(n+2) .
W(2) W(1) W(0) W(-1) W(-2)
Y(0) Y(1) Y(2) Y(3) Y(n-1) Y(n) Y(n+1)
2D
Convolution

3 x 3 Kernel
6 x 6 Image
2D
Convolution

Flipping

0 Padding
2D
Convolution
2D
Convolution
2D
Convolution
2D
Convolution
2D
Convolution
2D
Convolution
2D
Convolution
2D
Convolution
Stride
No. of steps the kernel is moved during convolution

Stride =2
Stride =1

7 x 7 Input Image
3 x 3 Kernel
Class
Fully
Connected
Layer
Pooling
Nonlinearity
Convolution
Pooling
Architecture
CNN

Nonlinearity
Convolution
Image
Convolution Layer: 3 D
Convolution
• Color image has 3 dimensions: height, width and depth (depth is the
color channels i.e RGB)
• Filter or kernels that will be convolved with the RGB image could also
be 3D
• For multiple Kernels: All feature maps obtained from distinct kernels
are stacked to get the final output of that layer
3 D Convolution-
Visualization
• The kernel strides over the input
Image.
• At each location I (m, n) compute
f ( m, n) = ∑ ∑ w( p, q ) I ( p − m, q − n)
collect them in the feature map.
• The animation shows the sliding
operation at 4 locations, but in reality
it is performed over the entire input.

Animation:- Arden Dertat

https://towardsdatascience.com/applied-deep-learning-
part-4-convolutional-neural-networks-584bc134c1e2
3 D Convolution-
Visualization
• Red and green boxes are two different
featured maps obtained by convolving
the same input with two different
kernels. The feature maps are stacked
along the depth dimension as shown.

Figure: Arden Dertat

https://towardsdatascience.com/applied-deep-learning-
part-4-convolutional-neural-networks-584bc134c1e2
3 D Convolution-
Visualization
• An RGB Image of size
32X32X3

• 10 Kernels of size 5x5x3

• Output featuremap of size

32x32x10

Figure: Arden Dertat

https://towardsdatascience.com/applied-deep-learning-
part-4-convolutional-neural-networks-584bc134c1e2
Nonlinearity
• ReLU is an element wise operation (applied per pixel) and
replaces all negative pixel values in the feature map by zero

Figure: Arden Dertat

https://towardsdatascience.com/applied-deep-learning-
part-4-convolutional-neural-networks-584bc134c1e2
Poolin
g
• Replaces the output of a node at certain locations with a
summary statistic of nearby locations.
• Spatial Pooling can be of different types: Max, Average, Sum etc.
• Max Pooling report the maximum output within a rectangular
neighborhood.
• Pooling helps to make the output approximately invariant to
small translation.
• Pooling layers down sample each feature map independently,
reducing the height and width, keeping the depth intact.
• In pooling layer stride and window size needs to be specified
Poolin
g
• Figure below is the result of max pooling using a 2x2 window and
stride 2. Each color denotes a different window. Since both the
window size and stride are 2, the windows are not overlapping

3 2 5 6
9 6
8 9 5 3 Max pool with 2x2 window
4 4 6 8 with stride = 2 4 8
1 1 2 1

Figure: Arden Dertat

https://towardsdatascience.com/applied-deep-learning-
part-4-convolutional-neural-networks-584bc134c1e2
Poolin
g
• Pooling reduces the height and the width of the feature map, but the
depth remains unchanged as shown in figure
• Pooling operation is independently carried out across each depth

Figure: Arden Dertat

https://towardsdatascience.com/applied-deep-learning-
part-4-convolutional-neural-networks-584bc134c1e2
CNN
Architecture
Course Name: Deep Learning
Faculty Name: Prof. P. K. Biswas
Department : E & ECE, IIT Kharagpur

Topic
Lecture 37: Popular CNN Models
Concepts Covered:
 CNN
 LeNet
 AlexNet
 VGG Net
 GoogLeNet
 etc.
Class
Fully
Connected
Layer
Pooling
Nonlinearity
Convolution
Pooling
Architecture
CNN

Nonlinearity
Convolution
Image
CNN
Architecture
MLP vs
CNN

 Sparse Connectivity: Every node in the Convolution Layer receives input from a
small number of nodes in the previous layer (Receptive Field), needing smaller
number of parameters.
 Parameter Sharing: Each member of the Convolution Kernel is used at every
position of the input, dramatically reducing the number of parameters.
 This makes CNN much more efficient than MLP.
Some popular CNN
Models
LeNet
LeNet
5
• Proposed by Yann LeCun, Leon Bottou, Yosuha Bengio and Patrick Haffner for
handwritten and machine-printed character recognition.
• Used by many Banks for recognition of hand written numbers on cheques.
• This architecture achieves an error rate as low as 0.95% on test data