Convolutional Neural Network - 5
Convolutional Neural Network - 5
Network(CNN)
1
Basic structure of CNN
• In CNN, the nodes in each layer are arranged according to the spatial
grid structure of input.
• It is important to maintain these spatial relationships of the input
regions through the network layers
• The CNN functions much like traditional feed-forward neural
networks, except the operations in its layers are organized with sparse
connections.
• CNNs are biologically inspired networks used for:
✓ Image classification
✓ Pattern recognition
2
Cont.
• Each layer in CNN is a 3-dimensional grid structure(height, width and
depth).
• Depth means number of channels in layer (e.g., colored image has 3
colors).
• The depth may increase in next layers corresponding to number of
new resulting features.
• Features in lower level layers capture lines or simple shapes, whereas
the features in higher layers capture complex shapes like loops.
• Four types of layers in CNN architecture:
“convolutional - ReLU - pooling - fully connected “
3
4
Convolution layer
• A convolutional layer is the main building block of a CNN. It contains a
set of filters (or kernels) organized into sets of 3-dimensional
structural units.
• The filter is usually square but much smaller than those layers that it
is applied to.
• The filter is the same depth of the layers.
• The depth of the layers is depending on the number of used filters.
• The filter is placed at each position in the image so that it fully
overlaps with the image.
• The dot product is performed on the corresponding elements in the
filters and local regions in the image.
5
Cont.
• If the size of image (input layer) is 32 x 32 , filter size is 5 x 5, then the
size of next layer is 28 x 28:
32 – 5 +1=28
32 – 5 +1=28
6
Cont.
• How filter works in convolutional layer?
The filter tries to identifies a particular type of patterns in small
rectangular region. Thus, a large number of filters are required to
capture all possible shapes.
• Next figure represents horizontal edge detector on gray image with
one channel (vertical edge gives zero activation whereas horizontal
edge gives high value activation)
7
8
Hierarchical Feature Engineering
9
padding
• One observation is that, convolution operations reduce the original size.
• This operation tends to lose some information along the border of the
image. This problem can be resolved by padding.
• Padding: “adding pixels(set to zero) around the border of feature map
in order to maintain the spatial footprint”.
• Padding is performed in all layers, not just in first layer.
• Padding allows more space for the filter to cover in the image.
10
Strides
• It is not necessary to perform convolution operations at each spatial
positions in the layer.
• Stride is: ”long step or the distance covered by such a step”
• It is most common to use a stride of 1(sometimes 2).
• Larger strides reduce the spatial size of layers. Therefore, it reduces
the storage required.
11
ReLU layers
• ReLU is typically follows the convolution layer.
• It has the same form (discussed early) in traditional neural networks.
• It doesn’t reduce the size of layers, because it is one-to-one mapping
of activation values.
• In earlier years sigmoid and tanh were used.
• Recently ReLU is used to improve speed and accuracy.
12
pooling
• Pooling operation works on small regions in each layer and produce
another layer with the same depth.
• Two types of pooling: max-pooling , average pooling
• The common type of pooling is max-pooling
• Max-pooling returns the maximum value on the local region.
• it is more common to use stride more than 1 in pooling(2 x 2 with
stride of 2).
• Pooling independently operates on each feature map to produce
another feature map. Whereas, convolution work on all feature amps
simultaneously and produce one single.
• Advantage: Pooling layers downsampling (reduce) feature maps by
summarizing the presence of features in patches of the feature map.
13
14
Fully connected layer
• Fully Connected Layer is simply, feed-forward neural networks. It
forms the last few layers in the network.
• The input to the fully connected layer is the output from the final
Pooling or Convolutional Layer.
• all neurons in a fully connected layer connect to all neurons in the
previous layer.
• In most cases, more than one fully connected layers are used to
increase the power computation towards the end.
15
Interleaving between layers
• Convolution, pooling and ReLU layers are interleaved to increase the
power of network.
• In general, ReLU follows Convolution.
• Convolution and ReLU stuck together one after the other.
• After two or three sets of Convolution-ReLU combinations, max-
pooling comes as in the example:
Where ‘C’ refers to Convolution, ‘R’ for ReLU and ‘P’ for max-pooling
16
LeNet-5
17
Training of CNN
• CNN uses backpropagation(BP) as in traditional feed forward neural
networks.
• Among all kinds of ANN, BP is one of the most mature, widely used
multilayer feed-forward neural networks based on error reverse
spread.
• According to statistics, up to 80% of the neural network models apply
backpropagation or its variant forms.
18
Data augmentation
• Data augmentation:
Using new training examples generated by using transformations on original
examples.
• Data augmentation can reduce overfitting in CNN especially in image
processing domain (because it doesn't change properties of objects in image)
• Popular Augmentation Techniques:
some basic but powerful augmentation techniques that are popularly used:
✓ Rotation:
Rotating image with angle clockwise or anticlockwise. One key thing to note
about this operation is that image dimensions may not be preserved after
rotation.
19
Cont.
✓ Scale:
The image can be scaled outward or inward. In scaling outward,
the final image size will be larger than the original image size.
✓ Crop:
Unlike scaling, we just randomly cut/ sample a section from the original
image. We then resize this section to the original image size.
✓ Translation:
Translation just involves moving the image along the X or Y direction
(or both).
20
Successful Variants of CNN
✓ AlexNet
✓ ZFNet
✓ VGG
✓ GoogLeNet
✓ ResNet
21