UNIT-2 - Part-1
UNIT-2 - Part-1
A Convolutional Neural Network (CNN) is a type of deep learning algorithm specifically designed for working with grid-like data, such as
images. It is particularly useful for image recognition, classification, and other tasks that involve spatial data.
Convolutional Neural Network
Convolutional Neural Network
Applications of CNN
1. Image classification
Applications of CNN (Cont’d)
2. Image Localization
3. Object Detection
Applications of CNN (Cont’d)
4. Face Recognition
Applications of CNN (Cont’d)
5. Image Segmentation
Applications of CNN (Cont’d)
6. Super Resolution
7. Image De-coloring
Applications of CNN (Cont’d)
Pooling Layer: This layer reduces the spatial size of the data (image), making computations more efficient and helping prevent overfitting. It
typically uses methods like max pooling, which selects the maximum value from a region of the feature map.
Fully Connected Layer: After several convolutional and pooling layers, the data is flattened into a vector and passed through fully connected
layers, where all neurons are connected to the previous layer’s outputs. This is typically where classification occurs.
Activation Functions: Non-linear functions like ReLU (Rectified Linear Unit) are applied after convolutional layers to introduce non-linearity,
allowing the network to learn more complex patterns.
Feature Extraction: CNNs automatically detect important features in the data without the need for manual feature extraction, which is a key
reason they are so effective.
Biological Connections
CNN architecture is Inspired by human visual cortex
Experiment
Experiment
Hierarchy of feature extraction process in CNN for
recognizing the hexagon pattern
Hierarchy of feature extraction process in CNN for recognizing the pattern
Five stages Neocognitron representation with its layers and plans.
LeNet Architecture
CNN by Yann Lecun which was LeNet-5 by recognizing the bank cheques
AlexNet Architecture
Architecture of the CNNs applied to digit recognition
Convolution Operation
To calculate the output dimension
(n-f+1) (n-f+1)
Eg. 3*4 → Input size
2*2 → Kernel/Filter size
(3-2+1) (4-2+1)= 2*3 (The output
dimension is of size 2*3 dimension)
Convolution Process:
The convolution operation involves sliding the 2×2 edge detection kernel
over the input matrix and performing a dot product (element-wise
multiplication and sum) between the kernel and the corresponding 2×2 patch
of the input matrix.
Convolution Operation
Convolution Neural Network (CNN)
CNN
CNN
Visualization of filters
Convolution Operation (cont’d)
Convolution Operation (cont’d)
Output shape = (n + 2p — f + 1) x (n + 2p — f + 1)
Where n is input size, f is filter size, and p is the padding amount.
Padding Operation in CNN
Output shape = (n + 2p — f + 1) x (n + 2p — f + 1)
Where n is input size, f is filter size, and p is the padding amount.
https://medium.com/@Tms43/understanding-padding-strides-in-convolutional-neural-
networks-cnn-for-effective-image-feature-1b0756a52918
Padding
Types of Padding
There are two common types of padding used in neural networks:
• Valid Padding: This type of padding involves no padding at all. The
convolution operation is performed only on the valid overlap between
the filter and the input. As a result, the output dimensions will be smaller
than the input dimensions.
• Same Padding: In this approach, padding is added to the input so that
the output dimensions after the convolution operation are the same as
the input dimensions. This is typically achieved by adding an
appropriate number of zero-value pixels around the input.
https://medium.com/@Tms43/understanding-padding-strides-in-convolutional-neural-networks-cnn-for-effective-image-
feature-1b0756a52918
Strides in Convolution network (Cont’d)
• Increasing the stride value to have the filter skip over pixels, results in a smaller output
spatial dimension.
Pooling Layer in CNN
Pooling in CNN
• Pooling in Convolutional Neural Networks (CNNs) is a downsampling
operation used to reduce the spatial dimensions (width and height) of the
feature maps while retaining important information.
• It serves several purposes, including reducing the number of parameters,
minimizing computational complexity, and making the network more
robust to variations and distortions in the input, such as translation or
small shifts in images.
• Pooling plays a crucial role in CNNs by reducing the spatial dimensions
of feature maps, making computations more efficient, and introducing
translation invariance.
• Proper use of pooling layers helps maintain a balance between capturing
meaningful features and controlling computational costs.
Types of Pooling:
1. Max Pooling:
Description: Max pooling selects the maximum value from a defined region (usually a
small window, such as 2x2 or 3x3) of the input feature map. This operation is applied
with a certain stride to slide the window across the feature map.
Function: It retains the most prominent feature (largest value) within the window, helping
to preserve the strongest activations.
2. Average Pooling:
Description: In average pooling, the average of all values in the pooling window is taken to
represent that region in the downsampled feature map.
Function: It smooths the feature map by averaging values, giving a more balanced view of the
features in a local region.
3. Global Pooling:
Description: Global pooling is a special case where the pooling window covers the entire
spatial dimension of the feature map (i.e., the window size is equal to the feature map size).
This results in a single value per feature map channel.
Function: It is commonly used at the final layers of a CNN (just before classification) to
convert the entire feature map into a single summary value per channel, greatly reducing the
number of parameters.
Types:
Global Max Pooling: Takes the maximum value of the entire feature map.
Global Average Pooling: Takes the average value of the entire feature map.
Problem with convolution: issues are solved by usage of pooling
1. Memory issue
2. Translation variance: Convolution operation is location dependent, which
will
Pooling solves these issues
1. Memory Issue
2.Translation invariance is obtained using pooling operation that means the
network will produce the same output even if the input image is translated.
Max pooling
Average Pooling
• For example, you can slightly shift, rotate, and resize every picture in the
training set by various amounts and add the resulting pictures to the training
set. This forces the model to be more tolerant to variations in the position,
orientation, and size of the objects in the pictures.
• If you want the model to be more tolerant to different lighting conditions,
you can similarly generate many images with various contrasts.
• In general, you can also flip the pictures horizontally (except for text, and
other non symmetrical objects).
• By combining these transformations you can greatly increase the size of
your training set.
Data augmentation
Advantages of Data Augmentation:
• Increase the Size of the Dataset:
Data augmentation artificially increases the size of the dataset by generating
new examples from the existing data.
• Create More Diversity in the Dataset:
By applying various transformations to the original images, such as rotation,
flipping, and scaling, data augmentation introduces diversity to the dataset.
• Reduce Overfitting:
Overfitting occurs when a CNN becomes too specialized in the training data
and is unable to generalize well to new data. Data augmentation helps prevent
overfitting by creating more diverse training data.
• Improve Accuracy:
With more examples, the model learns better, which leads to improved
accuracy.
Convolution layer (RGB)
Note: The preceding slides shows how the convolution happens on RGB image
Convolution layer (RGB)
Convolution layer (RGB)
Convolution layer (RGB)
Multiple filters-multiple outputs
Multiple filters-multiple outputs
https://datahacker.rs/convolution-rgb-image/
Convolution Operation (RGB)
https://dev.to/sandeepbalachandran/machine-learning-convolution-with-color-images-2p41
Convolution Operation (RGB)-zero padding
Multiple filters-multiple outputs
Multiple filters-multiple outputs
Multiple filters-multiple outputs
Multiple filters-multiple outputs
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-
the-eli5-way-3bd2b1164a53
Variants of Basic Convolution Function
Variants of Basic Convolution Function
1. Strided convolution
Adding stride
• To Skip over some positions of the kernel in order to reduce the
computational cost.
• We can think of this as downsampling the output of the full convolution
function.
• With stride, number of feature map will be reduced.
Variants of Basic Convolution Function
Variants of Basic Convolution Function
2. Zero-Padding
• Zero-pad the input V in order to make it wider.
• Without this feature, the width of the representation shrinks by one pixel
less than the kernel width at each layer.
• In zero padding the input allows us to control the kernel width and the size
of the output independently.
• Without zero padding, we are forced to choose between shrinking the
spatial extent of the network rapidly and using small kernels-both
scenarios that significantly limit the expressive power of the network.
• Three special cases of the zero padding setting
• No zero padding-valid convolution
• Enough zero padding is added to keep the size of the output equal to
the size of the input-same convolution
• Enough zeros are added for every pixel to be visited k times in each
direction, resulting in an output image of the width m+k-1- full
convolution (m=image width)
Variants of Basic Convolution Function (Cont’d)
Unshared
Convolution
Tiled
Convolution
Shared
Convolution
(Traditional
Convolution)
Variants of Basic Convolution Function (Cont’d)
Strided Convolution
Effect: Strided convolution reduces the spatial dimensions of the output (i.e.,
downsampling), which can:
• Reduce the computational cost by producing fewer output pixels.
• Incorporate larger receptive fields, capturing more global information.
Padding
Effect:
• Prevents the output from shrinking after each convolution, which helps
maintain spatial resolution.
• Allows the convolution to focus on the edges of the input by preserving
information at the boundaries.
Unshared Convolution
Effect: This allows the network to learn spatially varying features, making it
more flexible and suitable for tasks where different regions of the input have
very different characteristics.
Tiled Convolution
Effect: Tiled convolution strikes a balance between shared and unshared
convolution, giving some degree of spatial flexibility while still maintaining a
smaller parameter count than unshared convolution.
Variants of Basic Convolution Function (Cont’d)
CT SCANS
Multi-channel
Efficient Convolution Algorithms
• Modern convolutional network applications often involve networks
containing more than one million units.
• Powerful implementations exploiting parallel computation resources, are
essential.
• However, in many cases it is also possible to speed up convolution by
selecting an appropriate convolution algorithm.
• Devising faster ways of performing convolution or approximate
convolution without harming the accuracy of the model is an active area of
research.
• Even techniques that improve the efficiency of only forward propagation
are useful because in the commercial setting, it is typical to devote more
resources to deployment of a network than to its training.
Extra slide and extra topic: Transfer Learning
Note: This topic is included only for your reference
Transfer Learning
Transfer learning is the reuse of a pre-trained model on a new problem. It’s popular
in deep learning because it can train deep neural networks with comparatively little
data. This is very useful in the data science field since most real-world problems
typically do not have millions of labeled data points to train such complex models.
Transfer Learning
END OF PART-1