0% found this document useful (0 votes)
9 views116 pages

UNIT-2 - Part-1

Uploaded by

hithesh187
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views116 pages

UNIT-2 - Part-1

Uploaded by

hithesh187
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

UNIT-II

Convolutional Neural Networks


By
Rashmi A.R.
Convolutional Neural Network
• Convolutional Neural Networks also know as ConvNet or CNN’s
• CNN is used when we have a grid like structure eg. 1D and 2D data.
Why not ANN?
1. High computational cost
2. Overfitting
3. Loss of important information like spatial arrangement of pixels.

A Convolutional Neural Network (CNN) is a type of deep learning algorithm specifically designed for working with grid-like data, such as
images. It is particularly useful for image recognition, classification, and other tasks that involve spatial data.
Convolutional Neural Network
Convolutional Neural Network
Applications of CNN
1. Image classification
Applications of CNN (Cont’d)
2. Image Localization
3. Object Detection
Applications of CNN (Cont’d)
4. Face Recognition
Applications of CNN (Cont’d)
5. Image Segmentation
Applications of CNN (Cont’d)
6. Super Resolution

7. Image De-coloring
Applications of CNN (Cont’d)

8. Human Activity recognition


History of CNN
Convolutional Layer: The core building block of a CNN, it uses filters (also called kernels) that slide over the input image and perform
element-wise multiplications to capture important features like edges, textures, or patterns.

Pooling Layer: This layer reduces the spatial size of the data (image), making computations more efficient and helping prevent overfitting. It
typically uses methods like max pooling, which selects the maximum value from a region of the feature map.

Fully Connected Layer: After several convolutional and pooling layers, the data is flattened into a vector and passed through fully connected
layers, where all neurons are connected to the previous layer’s outputs. This is typically where classification occurs.

Activation Functions: Non-linear functions like ReLU (Rectified Linear Unit) are applied after convolutional layers to introduce non-linearity,
allowing the network to learn more complex patterns.

Feature Extraction: CNNs automatically detect important features in the data without the need for manual feature extraction, which is a key
reason they are so effective.
Biological Connections
CNN architecture is Inspired by human visual cortex
Experiment
Experiment
Hierarchy of feature extraction process in CNN for
recognizing the hexagon pattern
Hierarchy of feature extraction process in CNN for recognizing the pattern
Five stages Neocognitron representation with its layers and plans.
LeNet Architecture

CNN by Yann Lecun which was LeNet-5 by recognizing the bank cheques
AlexNet Architecture
Architecture of the CNNs applied to digit recognition
Convolution Operation
To calculate the output dimension
(n-f+1) (n-f+1)
Eg. 3*4 → Input size
2*2 → Kernel/Filter size
(3-2+1) (4-2+1)= 2*3 (The output
dimension is of size 2*3 dimension)
Convolution Process:
The convolution operation involves sliding the 2×2 edge detection kernel
over the input matrix and performing a dot product (element-wise
multiplication and sum) between the kernel and the corresponding 2×2 patch
of the input matrix.
Convolution Operation
Convolution Neural Network (CNN)
CNN
CNN
Visualization of filters
Convolution Operation (cont’d)
Convolution Operation (cont’d)

Top Edge Filter


https://www.midokura.com/unveiling-the-world-of-
computer-vision-a-comprehensive-overview/
Edge Detection (Convolution operation)
• Convolution operation is a fundamental building block of CNN.
• Convolution operations work is to extract features
Edge Detection
Edge Detection-Convolution operation
Edge Detection-Convolution operation (Cont’d)
Edge Detection-Convolution operation
Various edge detection
Various edge detection
Padding in Convolution Network
Padding in Convolution Network
Why Padding is Important?
Padding is essential for several reasons to overcome the drawback :
• Dimensionality: Without padding, the size of the output feature map
produced by convolutional operations would shrink with each layer. This
reduction in size can be problematic, especially in deep networks, where
many layers are applied, resulting in a rapidly diminishing feature map
that may lose important information.
• Edge Information: Without padding, the pixels on the edges of an input
would be used much less frequently than those in the center when
convolving with a kernel. Padding ensures that edge pixels are adequately
utilized, preserving information that might otherwise be lost.
• Control Over Output Size: Padding allows for precise control over the
dimensions of the output feature maps. This is particularly useful when
building architectures where the output dimensions need to be planned and
consistent.
Padding Operation in CNN
• In machine learning, particularly in the context of neural
networks and convolutional neural networks (CNNs), padding is a critical
technique used to manage the spatial dimensions of input data.
• Padding is the process of adding layers of zeros or other values outside the
actual data in an input matrix.
• The primary purpose of padding is to preserve the spatial size of the
input so that the output after applying filters (kernels) remains the same
size, or to adjust it according to the desired output dimensions.
• Padding involves adding extra pixel rows/columns around the borders of an
input image.
• For an nxn input image and an fxf filter, the shape of the output feature
map without padding is (n-f+1)x(n-f+1).
• To maintain the same spatial dimensions after convolution, we set n-f+1=n,
which means padding the input to size (n+f-1)x(n+f-1).

Output shape = (n + 2p — f + 1) x (n + 2p — f + 1)
Where n is input size, f is filter size, and p is the padding amount.
Padding Operation in CNN

Output shape = (n + 2p — f + 1) x (n + 2p — f + 1)
Where n is input size, f is filter size, and p is the padding amount.

https://medium.com/@Tms43/understanding-padding-strides-in-convolutional-neural-
networks-cnn-for-effective-image-feature-1b0756a52918
Padding
Types of Padding
There are two common types of padding used in neural networks:
• Valid Padding: This type of padding involves no padding at all. The
convolution operation is performed only on the valid overlap between
the filter and the input. As a result, the output dimensions will be smaller
than the input dimensions.
• Same Padding: In this approach, padding is added to the input so that
the output dimensions after the convolution operation are the same as
the input dimensions. This is typically achieved by adding an
appropriate number of zero-value pixels around the input.

Keras code demonstrating padding:


model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), padding='valid', activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, kernel_size=(3,3), padding='same', activation='relu'))
Padding

Impact of padding on Feature Extraction and Output Dimensions


• No Padding: Reduces the dimensions of the input, meaning the
network might lose important information, particularly at the edges.
However, it may focus more on central areas of the input and is
computationally cheaper.
• Same Padding: Keeps the output dimensions the same as the input,
which is especially useful for deep networks. It ensures that edge
information is preserved and that the output size does not shrink,
allowing the model to extract features from the entire input.
Strides in Convolution network
• In convolution operations, the stride defines how much the filter shifts
across the input image after each application.
• By default, the stride is (1, 1), meaning the filter shifts one pixel at a time.
• You can increase the stride value to have the filter skip over pixels,
resulting in a smaller output spatial dimension.

Output shape = ((n + 2p — f) / s + 1) * ((n + 2p — f) / s + 1)


Where s is the stride value. The formula includes padding operation along
with stride

https://medium.com/@Tms43/understanding-padding-strides-in-convolutional-neural-networks-cnn-for-effective-image-
feature-1b0756a52918
Strides in Convolution network (Cont’d)

Higher strides allow:


• Capturing higher-level features while dismissing low-level details
• Reducing computation requirements
However, this comes at the cost of some loss of spatial information.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), strides=(2,2), padding='same',
activation='relu', input_shape=(28,28,1)))
Strides in Convolution network

• Increasing the stride value to have the filter skip over pixels, results in a smaller output
spatial dimension.
Pooling Layer in CNN
Pooling in CNN
• Pooling in Convolutional Neural Networks (CNNs) is a downsampling
operation used to reduce the spatial dimensions (width and height) of the
feature maps while retaining important information.
• It serves several purposes, including reducing the number of parameters,
minimizing computational complexity, and making the network more
robust to variations and distortions in the input, such as translation or
small shifts in images.
• Pooling plays a crucial role in CNNs by reducing the spatial dimensions
of feature maps, making computations more efficient, and introducing
translation invariance.
• Proper use of pooling layers helps maintain a balance between capturing
meaningful features and controlling computational costs.
Types of Pooling:
1. Max Pooling:
Description: Max pooling selects the maximum value from a defined region (usually a
small window, such as 2x2 or 3x3) of the input feature map. This operation is applied
with a certain stride to slide the window across the feature map.
Function: It retains the most prominent feature (largest value) within the window, helping
to preserve the strongest activations.
2. Average Pooling:
Description: In average pooling, the average of all values in the pooling window is taken to
represent that region in the downsampled feature map.
Function: It smooths the feature map by averaging values, giving a more balanced view of the
features in a local region.
3. Global Pooling:
Description: Global pooling is a special case where the pooling window covers the entire
spatial dimension of the feature map (i.e., the window size is equal to the feature map size).
This results in a single value per feature map channel.
Function: It is commonly used at the final layers of a CNN (just before classification) to
convert the entire feature map into a single summary value per channel, greatly reducing the
number of parameters.
Types:
Global Max Pooling: Takes the maximum value of the entire feature map.
Global Average Pooling: Takes the average value of the entire feature map.
Problem with convolution: issues are solved by usage of pooling
1. Memory issue
2. Translation variance: Convolution operation is location dependent, which
will
Pooling solves these issues
1. Memory Issue
2.Translation invariance is obtained using pooling operation that means the
network will produce the same output even if the input image is translated.

Max pooling
Average Pooling

model = Sequential( [AveragePooling2D(pool_size = 2, strides = 2)])


Impact of Pooling in CNNs:
• Reduction in Spatial Dimensions: Pooling layers reduce the width and
height of feature maps, leading to smaller representations and lower
computational costs. This allows deeper layers to focus on higher-level,
more abstract features.
• Translation Invariance: Pooling introduces a degree of translation
invariance, meaning that slight shifts or distortions in the input will not
drastically affect the output, as pooling focuses on the strongest or
average signals.
• Prevention of Overfitting: By reducing the number of parameters and
simplifying feature maps, pooling can help mitigate overfitting,
especially in deep networks.
• Loss of Information: Pooling, especially max pooling, discards a lot of
information, including subtle variations in the data. In some cases, this
may lead to a loss of important details, especially in applications like
object detection or segmentation where precise spatial information is
needed.
Data Augmentation
Data augmentation
• Data augmentation artificially increases the size of the training set by
generating many realistic variants of each training instance.
• This reduces overfitting, making this a regularization technique.
• The generated instances should be as realistic as possible” ideally, given an
image from the augmented training set, a human should not be able to tell
whether it was augmented or not.

• For example, you can slightly shift, rotate, and resize every picture in the
training set by various amounts and add the resulting pictures to the training
set. This forces the model to be more tolerant to variations in the position,
orientation, and size of the objects in the pictures.
• If you want the model to be more tolerant to different lighting conditions,
you can similarly generate many images with various contrasts.
• In general, you can also flip the pictures horizontally (except for text, and
other non symmetrical objects).
• By combining these transformations you can greatly increase the size of
your training set.
Data augmentation
Advantages of Data Augmentation:
• Increase the Size of the Dataset:
Data augmentation artificially increases the size of the dataset by generating
new examples from the existing data.
• Create More Diversity in the Dataset:
By applying various transformations to the original images, such as rotation,
flipping, and scaling, data augmentation introduces diversity to the dataset.
• Reduce Overfitting:
Overfitting occurs when a CNN becomes too specialized in the training data
and is unable to generalize well to new data. Data augmentation helps prevent
overfitting by creating more diverse training data.
• Improve Accuracy:
With more examples, the model learns better, which leads to improved
accuracy.
Convolution layer (RGB)

Note: The preceding slides shows how the convolution happens on RGB image
Convolution layer (RGB)
Convolution layer (RGB)
Convolution layer (RGB)
Multiple filters-multiple outputs
Multiple filters-multiple outputs

https://datahacker.rs/convolution-rgb-image/
Convolution Operation (RGB)

https://dev.to/sandeepbalachandran/machine-learning-convolution-with-color-images-2p41
Convolution Operation (RGB)-zero padding
Multiple filters-multiple outputs
Multiple filters-multiple outputs
Multiple filters-multiple outputs
Multiple filters-multiple outputs

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-
the-eli5-way-3bd2b1164a53
Variants of Basic Convolution Function
Variants of Basic Convolution Function
1. Strided convolution
Adding stride
• To Skip over some positions of the kernel in order to reduce the
computational cost.
• We can think of this as downsampling the output of the full convolution
function.
• With stride, number of feature map will be reduced.
Variants of Basic Convolution Function
Variants of Basic Convolution Function
2. Zero-Padding
• Zero-pad the input V in order to make it wider.
• Without this feature, the width of the representation shrinks by one pixel
less than the kernel width at each layer.
• In zero padding the input allows us to control the kernel width and the size
of the output independently.
• Without zero padding, we are forced to choose between shrinking the
spatial extent of the network rapidly and using small kernels-both
scenarios that significantly limit the expressive power of the network.
• Three special cases of the zero padding setting
• No zero padding-valid convolution
• Enough zero padding is added to keep the size of the output equal to
the size of the input-same convolution
• Enough zeros are added for every pixel to be visited k times in each
direction, resulting in an output image of the width m+k-1- full
convolution (m=image width)
Variants of Basic Convolution Function (Cont’d)

Consider a convolution network of width six at


every layer

By adding five implicit


zeros to each layer, we
prevent the representation
from shrinking with depth.
Variants of Basic Convolution Function
3. Unshared convolution:
• In unshared convolution, also known as locally connected convolution,
the weights of the convolutional filters are not shared across different
spatial locations in the input. Instead, each location in the feature map has
its own unique set of weights, allowing the network to learn spatially
varying patterns.
• No parameter sharing, different kernels are used at every step (i.e. Stride)
Variants of Basic Convolution Function (Cont’d)
3. Unshared convolution
Variants of Basic Convolution Function (Cont’d)
3. Unshared convolution
Variants of Basic Convolution Function (Cont’d)
3. Unshared convolution
Variants of Basic Convolution Function (Cont’d)
3. Unshared convolution
Variants of Basic Convolution Function (Cont’d)
3. Unshared convolution

Total 9 different filters are needed for each channel.


Variants of Basic Convolution Function (Cont’d)
3. Unshared convolution
Unshared convolution was used in DeepFace article, published in 2014 IEEE Conference on
Computer Vision and Pattern Recognition
Variants of Basic Convolution Function (Cont’d)
4. Tiled convolution
Tiled convolution is a compromise between fully shared convolution and
unshared convolution. In tiled convolution, the weights are shared across
some regions of the input, but the sharing pattern is repeated periodically,
rather than across the entire input as in standard convolution.
• Cycle between shared parameter groups
Variants of Basic Convolution Function (Cont’d)
4. Tiled convolution
Variants of Basic Convolution Function (Cont’d)
4. Tiled convolution
Variants of Basic Convolution Function (Cont’d)
4. Tiled convolution
Variants of Basic Convolution Function (Cont’d)

Unshared
Convolution

Tiled
Convolution

Shared
Convolution
(Traditional
Convolution)
Variants of Basic Convolution Function (Cont’d)
Strided Convolution
Effect: Strided convolution reduces the spatial dimensions of the output (i.e.,
downsampling), which can:
• Reduce the computational cost by producing fewer output pixels.
• Incorporate larger receptive fields, capturing more global information.
Padding
Effect:
• Prevents the output from shrinking after each convolution, which helps
maintain spatial resolution.
• Allows the convolution to focus on the edges of the input by preserving
information at the boundaries.
Unshared Convolution
Effect: This allows the network to learn spatially varying features, making it
more flexible and suitable for tasks where different regions of the input have
very different characteristics.
Tiled Convolution
Effect: Tiled convolution strikes a balance between shared and unshared
convolution, giving some degree of spatial flexibility while still maintaining a
smaller parameter count than unshared convolution.
Variants of Basic Convolution Function (Cont’d)

• Strided convolution is efficient for downsampling.


• Padding helps control output size and edge handling.
• Unshared convolution allows spatial variance in the learned features.
• Tiled convolution balances the flexibility of unshared convolution with the
efficiency of shared convolution.

These convolution variants enhance the efficiency and performance of CNNs


by enabling better feature extraction, computational optimization, and
flexibility to handle various types of input data.
Structured Outputs
 Discusses how convolutional networks can generate high-dimensional structured
outputs instead of simple labels or values.
• Convolutional networks can be used to output a high-dimensional, structured
object, rather than just predicting a class label for a classification task or a real
value for a regression task.
• These outputs are typically tensors, such as a tensor where each element
represents the probability that a pixel belongs to a specific class.
• This capability is crucial for tasks like pixel-wise image labeling, enabling the
model to create masks that outline individual objects in an image.
 The challenge of smaller output planes due to down-sampling methods like
pooling is also highlighted.
• One strategy to address this is to use a recurrent convolutional network, which
refines its predictions iteratively by sharing the same convolutional weights across
layers. This refinement process is akin to recurrent neural networks (RNNs), as
seen in tasks like pixel-wise labeling where neighboring pixels' interactions are
used to improve accuracy.
 Structured outputs are also post-processed for tasks like image segmentation,
where graphical models or approximations thereof can help group contiguous
pixels with the same label​.
• Recurrent neural networks (RNNs) are capable of refining pixel-wise predictions
in tasks like image segmentation by iteratively processing the information from
previous steps and using it to improve subsequent predictions.
• RNN-based approach, like a Recurrent Convolutional Network (RCN),
addresses the challenge of smaller output planes due to down-sampling (such as
pooling)
• This means that after a convolutional network has made initial predictions about
each pixel's label (e.g., whether a pixel belongs to a certain object or
background), additional processing steps can be applied to refine the results and
improve the accuracy of image segmentation.
• Image segmentation refers to dividing an image into regions where each region
contains pixels that share common characteristics, such as belonging to the same
object. To improve segmentation accuracy, models often assume that neighboring
pixels with similar properties (e.g., color or intensity) are likely to belong to the
same object.
• Graphical models can be used to represent and enforce these relationships
between neighboring pixels. A graphical model is a mathematical structure that
shows the dependencies between different variables—in this case, pixels. By
modeling these dependencies, the network can better ensure that nearby pixels
with similar properties are grouped together into the same segment.
Data Types
The data used with a convolutional network usually consists of several channels, each channel
being the observation of a different quantity at some point in space or time.
Audio waveform-1D
Audio waveform-2D
Volumetric Data

CT SCANS
Multi-channel
Efficient Convolution Algorithms
• Modern convolutional network applications often involve networks
containing more than one million units.
• Powerful implementations exploiting parallel computation resources, are
essential.
• However, in many cases it is also possible to speed up convolution by
selecting an appropriate convolution algorithm.
• Devising faster ways of performing convolution or approximate
convolution without harming the accuracy of the model is an active area of
research.
• Even techniques that improve the efficiency of only forward propagation
are useful because in the commercial setting, it is typical to devote more
resources to deployment of a network than to its training.
Extra slide and extra topic: Transfer Learning
Note: This topic is included only for your reference
Transfer Learning

Transfer learning is the reuse of a pre-trained model on a new problem. It’s popular
in deep learning because it can train deep neural networks with comparatively little
data. This is very useful in the data science field since most real-world problems
typically do not have millions of labeled data points to train such complex models.
Transfer Learning
END OF PART-1

You might also like