0% found this document useful (0 votes)
12 views

ml2

Uploaded by

hgiriyap
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ml2

Uploaded by

hgiriyap
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Module2

Convolution Neural Network and


Applications
Syllabus:

Hierarchical Structure of Images, Convolution Filters, Convolutional


Neural Network, CNN Math Model, How the Model Learns,
Advantages of Hierarchical Features, CNN on Real Images,
Applications in Use and Practice, Deep Learning and Transfer
Learning, Breakdown of the Convolution (1D and 2D), Core
Components of the Convolutional Layer, Activation Functions,
Pooling and Fully Connected Layers, Training the Network, Transfer
Learning and Fine-Tuning.
Need of CNN
 The multilayer perceptron is not appropriate for all data and in our history
of deep learning and neural networks.
 Convolutional neural network, which is one of the most significant pieces
of technology for analysing images.
 To do learning, we need multiple data samples, here the data samples will
be in the form of images.
 Edges, Corners, Textures, Shapes, this is characteristic of all virtually all
natural images.
Hierarchical Structure of Images
 The overall image is composed of a subset of shifted motifs. Each of the
motifs are themselves composed of sub-motifs, that repeat within the
image.
 Each image is composed of a subset of these motifs. Each of the motifs at
layer three is composed of a subset of shifted versions of the sub-motifs and
then each of the sub-motifs is composed of basic, called atomic elements
which are these fundamental shapes.
 Build a model that captures the structure, that captures this representation
of images
Convolution

 The process by which you shift that atomic element the


triangle, to every location in the image that process has a
name is called convolution.
 So, this is why the name convolutional neural network.
Convolution Filters

A feature map denotes how strongly that atomic element matches the local
region in the image. So, we call that a map.
 Its a map that reflects the degree of match between the atomic element in the
image.
 So, therefore, we get a stack or multiple feature maps. So, that depiction of a
stack of rectangles or squares is meant to reflect each of the feature maps for the
respective filters.
 So, using this process, we can see how we can identify each of the atomic
elements, the lowest layer in the hierarchy.
 Remember the sub-motifs are composed of combinations of shifted versions of
the atomic elements.
 So, now what we're going to do is, we're going to repeat the process, where
we're going to convolve or shift filters to every two-dimensional location in the
feature map.
CNN
 The hierarchical structure of images refers to the organization of visual
information into multiple levels of abstraction, from simple edges and lines
to complex objects and scenes.
 Convolutional Neural Networks (CNNs) are designed to capture this
hierarchical structure using convolutional filters, which:
1. Detect local patterns: Edges, lines, textures, and simple shapes.
2.Combine local patterns: Form more complex patterns, such as shapes, and
objects.
3. Abstract complex patterns: Recognize objects, scenes, and contexts.
Convolutional filters starts with early layers detecting simple patterns and later
layers combining these patterns to form more complex representations.
CNN Math Model
 In the toy example, shapes convolve those atomic elements to every location
in the image, and then to constitute a feature map, where the feature map
reflects how strongly that atomic element is manifested in the image.
So, phi 1, phi 2, through phi k correspond to k filters.
 Instead of them being shapes, as in toy example, these are just going to be
the parameters of these filters are just the pixel values. So, the pixel values
associated with filter phi 1, filter phi 2, and filter phi k are learnt.
 Now, the stack of feature maps from layer one is convolved with a layer two
set of filters which is represented as psi1, psi2, through psik. We can
convolve those layer three filters, with the layer two feature maps, and then
finally, get a feature map at the top layer omega1, omega2 through
omegak, Those are sent into a classifier (W)
How the model learns
 Learning is to learn the parameters of our machine , such that the
difference between the prediction of the model, l_n, and the true
label, y_n, is small.
 So, our goal is to try to design, or predict, or estimate, those model
parameters, Phi, Psi, Omega, and W, such that the total loss or energy
function(E) , between our predicted labels and the true labels is small.
 It is very difficult to get optimized values for the parameters, the
parameters that yield the best match between the predictions of the model
and the truth of the labels.
 In a Convolutional Neural Network (CNN), the energy function refers to the
total loss or cost function that the network aims to minimize during training.
It measures the difference between the network's predictions and the actual
true labels.
 Common data loss functions include Mean Squared Error (MSE), Cross-
Entropy, and Binary Cross-Entropy.
 The goal of training a CNN is to find the optimal values of W and b that
minimize the energy function E(W, b). This is typically done using
optimization algorithms like Stochastic Gradient Descent (SGD), Adam, or
RMSProp.
Advantages of Hierarchical Features
 If an architecture was built, which was only based upon the filters at layer three, then each
of those layer three filters, would be learnt independently. There would be no
understanding in the learning of those filters, that they share structure.
 The layered structure that motivates the deep architecture. So, any time a particular sub
motif is manifested in any of the motifs at layer three, the learning of that motif
provides information about the other motifs through the shared information.
 So, the idea is that by sharing structure, between the different elements of the filters at the
various layers, shows that they are all made up of updated versions of more fundamental
building blocks.
 Data can be used more effectively, and share knowledge between each of the motifs, such
that knowledge of one motif can improve knowledge of another motif, through the shared
substructure that they possess.
CNN on real images

 A Convolutional Neural Network (CNN) can be applied to real images for


various tasks such as:
 1.Image Classification: Classify images into different categories (e.g.,
objects, scenes, actions).
 2. Object Detection: Detect and locate objects within images.
 3.Image Segmentation: Segment images into their constituent parts or
objects.
 4. Image Generation: Generate new images based on a given input or context.
When applying CNNs to real images, consider the following:
1. Image Preprocessing: Resize, normalize, and augment images to prepare
them for training.
2. Data Augmentation: Apply random transformations (e.g., rotation,
flipping, cropping) to increase dataset diversity.
3. Transfer Learning: Leverage pre-trained CNN models and fine-tune them
on your specific dataset.
4. Overfitting: Regularly monitor and prevent overfitting by using techniques
like dropout, regularization, and early stopping.
5. Evaluation Metrics: Choose appropriate metrics (e.g., accuracy, precision,
recall, F1-score) to evaluate model performance.
Popular CNN architectures for real images include:
1. LeNet
2. AlexNet
3. VGG
4. ResNet
5. Inception
6. U-Net (for segmentation tasks)
Applications of CNN

GPU supports for parallel


processing, which enables
Faster training, Increased
Throughput, Improved
Productivity.
In each example, the bar represents probability. These examples
represent the performance of Deep Architecture which is quite impressive.
The Go board which is depicted here is significantly more complicated than a
game such as chess. DeepMind who is based in the UK, they developed an
algorithm which at its heart, was based upon the deep convolutional neural
network technology and reinforcement learning.
Here are six images, each of which was analysed by the convolutional neural network architecture. This is an
integration of image analysis and text synthesis, is based upon the long short-term memory architecture.
Other Real-world applications of CNNs on images

1. Self-driving cars
2. Medical image analysis
3. Facial recognition
4. Image search
5. Quality inspection
Transfer Learning
 Transfer learning is a technique in deep learning where a pre-trained model is
used as a starting point for a new task. The pre-trained model has already
learned general features from a large dataset, and these features can be fine-
tuned for the new task.
Benefits of transfer learning:
1. Reduced training time: Leverage pre-trained models and fine-tune instead of
training from scratch.
2. Improved performance: Pre-trained models have already learned general
features, leading to better performance on new tasks.
3. Smaller dataset requirements: Fine-tune on a smaller dataset instead of requiring
a large dataset for training from scratch.
Common transfer learning
scenarios:

 Image classification: Use pre-trained models like VGG or ResNet for


new image classification tasks.
 Natural Language Processing (NLP): Use pre-trained language models
like BERT or Word2Vec for new NLP tasks.
 Speech recognition: Use pre-trained speech recognition models for new
speech recognition tasks.
How to apply transfer learning:

1. Choose a pre-trained model: Select a model relevant to your task.

2. Fine-tune the model: Adjust the model's weights for your specific task.

3. Add new layers: Add task-specific layers on top of the pre-trained model.

4. Train the model: Train the fine-tuned model on your dataset.

Popular pre-trained models for transfer learning:VGG, ResNet, BERT,


Word2Vec, Inception
Applications of Transfer Learning with CNN

 Image analysis is very important in fields like, radiology,


ophthalmology , and dermatology in medicine.

For example, In dermatology, to build a deep learning algorithm for


which we need millions of images of various aspects of skin
disease. And we would need medical doctors to label them. That
would be a very expensive and time consuming process.
Weights of convolutional layers learned from ImageNet transfer to medical
images, so we only need learn new parameters at the top of the network.
 One can take a deep neural network designed for ImageNet, you can take
that deep learning architecture and the parameters of that, and almost
entirely transfer them to applications,
For example, in for the analysis of diabetic retinopathy and ophthalmology.
The rectangle is meant to show that we can take the weight of the model that
were learned based upon ImageNet and transfer those parameters in the
rectangle and then only learned parameters at the top of a network, which are
directly applicable to the confocal images.
 And so, consequently the number of parameters that we have to learn is
significantly reduced, because instead of having to learn all of the
parameters, we only learn for the medical images
One successful application of CNNs is in the classification of Diabetic
Retinopathy, a disease that affects the retina of diabetic individuals. CNNs can
learn the features that distinguish unhealthy retinas from healthy ones,
potentially eliminating the need for a trained ophthalmologist.
Other Classification Metrics

Model should perform good both in terms of Sensitivity and Specificity.


Sensitivity and Specificity
 Sensitivity is a quantity that finds the positive examples in a dataset.
Sensitivity to be as high as possible.
 However you should be able to see how you can potentially cheat this
metric. If you just label all of the examples in the dataset as positive, you'll
get a very high sensitivity score, but this will clearly results in a lot of false
positives as a doctor is classifying all the healthy retina as unhealthy.
 Need a complimentary metric to help distinguish these false positive rates
from false negative rates, and this is specificity. So, this is the complementary
metric to sensitivity where, within the labelled datasets, where you're calling
a retina healthy or unhealthy. Specificity also to be as high as possible,
together with sensitivity.
Breakdown of the Convolution
 The convolution operation involves sliding a filter over an input image to
search for specific features.
 The filter is applied via a convolution operation, where the filter is multiplied
with the input image at each position and the results are summed up. This
process is repeated for different shifts of the filter, resulting in a feature map.
 The convolution operation can be performed in one dimension as well as in two
dimensions for image analysis.
 In 2D convolution, a filter is moved over the image, and at each position, the
filter is multiplied with the corresponding region of the image and the results
are summed up to obtain the convolution value. This process is repeated for all
positions in the image, resulting in a convolved feature map that highlights
specific features.
 The convolution operation is an important step in feature extraction in CNNs, as it
helps to identify and extract relevant features from the input image.
 A high value in the convolution indicates a matched feature ,result of convolving a
filter with a two dimensional image is a heat map, where bright spots indicate
feature detection.
https://miro.medium.com/v2/resize:fit:640/format:webp/0*jLoqqFsO-
52KHTn9.gif
Edge Filter
The convolution operation in 1D and 2D
• Both 1D and 2D convolution involve sliding a filter over the input signal or image.
• In both cases, the filter is multiplied with the corresponding region of the input and
the results are summed up.
• The convolution operation is performed for different shifts of the filter to obtain a
convolved feature map.
 Differences:
• In 1D convolution, the input signal and the filter are both 1-dimensional, while in 2D
convolution, the input image and the filter are both 2-dimensional.
• In 1D convolution, the filter is moved along a single axis, while in 2D convolution,
the filter is moved along both the horizontal and vertical axes of the image.
Core Components of the Convolutional Layer
The convolutional layer is responsible for applying the convolution operation to
the input image .
The user can choose various features of the convolution, including the filter size,
stride, and number of filters.
The filter size determines the dimensions of the filter used for convolution,
typically ranging from 3x3 to 7x7.If stride is 1,image size is n*n,filter size is f*f
then the size of feature map is (n-f+1)*(n-f+1)
Padding
 Padding in CNN (Convolutional Neural Networks) refers to the
process of adding extra pixels or values around the borders of an
image or feature map. These added pixels are typically zeros,
but can also be mirrored or replicated versions of the existing
border pixels.
 We need padding in CNN for several reasons:
 1.Maintain spatial dimensions: Padding ensures that the spatial
dimensions (height and width) of the feature maps remain the
same after convolutional operations.
 2. Preserve information at borders: Without padding, border
pixels would have fewer neighboring pixels to convolve with,
 By adding padding, we can:- Preserve spatial information , Enable the use
of larger filters, Reduce border effects, Maintain feature map dimensions,
Improve translation equivariance
 This allows the network to learn more robust features and improves overall
performance.
 Refer the below link for the explanation of padding with an example

https://www.youtube.com/watch?
v=PGBop7Ka9AU&list=PLZoTAELRMXVPGU70ZGsckrMdr0FteeRUi&in
dex=29
The stride helps reduce the computational load by down sampling the input. The
filter stride determines how much the filter moves across the image during
convolution, with typical values ranging from 1 to 2.
The filter number determines the number of unique feature detectors operating
on the input.
Feature Maps:
•Each filter in the convolutional layer generates a feature map, which represents the presence of a
specific feature in the input. Multiple filters create a stack of feature maps, forming a volume of
feature maps. The filters in subsequent layers need to cover all the feature maps from the
previous layer.
Input Channels:
•Images have multiple colour channels, typically red, green, and blue (RGB).The filters in
the first layer need to have weights corresponding to each colour channel. Each filter has
a 2D extent in each input channel, resulting in a stack of filters for each channel.
Activation functions
Activation functions increase the network's functional capacity and allow it to
capture and model complex patterns in the data, making them essential for
effective machine learning and deep learning tasks.
1. Sigmoid function: The sigmoid function is a commonly used non-linear
activation function. It squashes the input values between 0 and 1, forming a
probabilistic output.
2. Hyperbolic tangent function: The hyperbolic tangent function is another
commonly used non-linear activation function. It squashes the input values
between -1 and 1.
Rectified Linear Unit (ReLU): ReLU is a simple non-linear activation
function that returns the input value if it is greater than zero, and zero
otherwise. It is widely used in neural networks and increases the
network's capacity to represent information within the input.
Pooling and Fully Connected Layers
 Pooling layer plays a crucial role in reducing the computational complexity,
down-sampling the feature maps, and combating overfitting in a CNN.
 By using fully connected layers after the pooling layer, the CNN can process
the high-level features and learn complex relationships between these features
and the output classes.
 This enables the network to make accurate predictions and perform tasks such
as image classification, object detection, and more.
The pooling layer is applied after the convolution and activation layers to reduce
the size of the input passed to subsequent layers. It helps in reducing
computational complexity, making the network easier to train.
The pooling layer uses a filter window to collapse the values within the window
to a single value. The most common type of pooling is maximum pooling, where
the maximum value within the window is selected. This process helps in down-
sampling the feature maps.
Max Pooling and Average Pooling
 In Convolutional Neural Networks (CNNs), pooling is a
downsampling technique used to reduce the spatial dimensions
of feature maps, retaining important information while
decreasing the number of parameters and computations.
 There are two primary types of pooling:1. Max Pooling: - Takes
the maximum value across each patch of the feature map. -
Helps retain the most prominent features. - Commonly used in
CNN architectures.
 2. Average Pooling: - Takes the average value across each
patch of the feature map. - Smoothens the feature map,
reducing the effect of noise. - Less commonly used than max
pooling, but still effective.
Pros and Cons of types of pooling
 Hereare the pros and cons of Max pooling, Min pooling, and
Average pooling:
Max Pooling
 Pros:1.Retains most prominent features: Max pooling helps
retain the most prominent features in the data.
2. Translation invariance: Max pooling provides translation
invariance, meaning the model is less sensitive to the location of
features.
3. Fast computation: Max pooling is computationally efficient.
 Cons:1.Loses spatial information: Max pooling loses spatial
information, which can be important in some applications.
2. Sensitive to noise: Max pooling can be sensitive to noise, as a
single noisy pixel can affect the output.
 Min Pooling
Pros:
1. Retains least prominent features: Min pooling helps retain the
least prominent features in the data.
2. 2. Robust to noise: Min pooling is more robust to noise, as it is
less affected by single noisy pixels.
Cons:
3. Loses most prominent features: Min pooling loses the most
prominent features in the data.
4. Not commonly used: Min pooling is not commonly used in
 Average Pooling
Pros:1. Retains spatial information: Average pooling retains more
spatial information compared to max pooling.
2. Robust to noise: Average pooling is more robust to noise, as it
averages out the effects of noisy pixels.
Cons:1. Washes out prominent features: Average pooling can
wash out prominent features in the data.
2. Computationally expensive: Average pooling can be
computationally expensive compared to max pooling.
In summary, Max pooling is commonly used due to its ability to
retain prominent features and provide translation invariance.
Average pooling is used when retaining spatial information is
important, while Min pooling is less commonly used due to its
limitations.
After the pooling layer, the high-level features are processed using fully
connected layers, similar to a multi-layer perceptron. These layers connect the
features to the output classes for classification.
Training a Convolutional Neural Network
(CNN)
1.Input Images and Labels:
•We start with a set of input images, each with a corresponding ground truth
label.
•For a binary classification task like diabetic retinopathy, the labels are +1 for
an unhealthy retina and -1 for a healthy retina.
2.Convolutional Layers:
•The input image is fed through the CNN, starting with a convolution operation.
•The convolution operation applies filters to the input image to extract features,
resulting in a stack of feature maps.
•This process is repeated for multiple layers, with each layer using different
filters.
3.Fully Connected Layers:
•The feature maps are then transformed using fully connected layers.
•These layers use a weight matrix to determine how the feature maps are
transformed into predicted labels.
•The predicted label represents the CNN's prediction for the input image.

4.Loss Function and Empirical Risk:


•To train the CNN, we need to find the parameters that minimize the
difference between the predicted labels and the ground truth labels.
•This is done by setting up a loss function, typically the binary cross entropy
function, to measure the difference.
•The average loss across all input images is called the empirical risk function.
5.Gradient Descent:
•Gradient descent is used to find the best-fit parameters that minimize the loss
function.
•It involves calculating the gradient of the empirical risk function with respect
to the parameters.
•Based on the gradient, we take steps in the direction that reduces the loss,
gradually moving towards the minimum.
6.Stochastic Gradient Descent:
•To calculate the gradient efficiently, we use stochastic gradient descent.
•Instead of using all the input images, we randomly select a subset and
calculate the gradient based on that subset.
•This approach leads to similar solutions at a faster rate compared to using the
entire dataset. Mini-Batch Gradient Descent can be used practically. It uses a
small batch of samples to compute the gradient.
Epoch
 In machine learning, an epoch refers to one complete pass
through the entire training dataset. It's a way to measure the
number of times the model has seen the entire training
data.Here's a breakdown:
 1. Iteration: One iteration is one update of the model's
weights based on a single batch of data.
 2.Batch: A batch is a subset of the training data used to
update the model's weights.
 3. Epoch: One epoch is one complete pass through the entire
training dataset, consisting of multiple iterations and batches.
 For example:- Training dataset: 1000 samples- Batch size:
100 samples- Number of iterations per epoch: 10 (1000
samples / 100 samples per batch)- Number of epochs: 5
 In this example, the model will see the entire training
dataset 5 times, with each epoch consisting of 10 iterations.
 Note that the number of epochs required to train a model
can vary depending on the complexity of the model, the
size of the dataset, and the learning rate.
Transfer Learning and Fine-Tuning
 Hierarchical representation of image features and Transfer learning are
powerful features of CNNs.
 The main advantage is that by learning and sharing statistical similarities
within the high-level pieces of the image, we can better leverage all the
training data.
 This means that we can learn from examples of similar images to help classify
new images of interest.
 Transfer learning allows us to take a network that has already been trained on a
large database classification task, such as the ImageNet competition, and then
do additional training in a specific domain of interest.
 Low-level features are universal to all images. These features, such as edge
detectors, are similar to the patterns extracted by the visual cortex in a
biological brain. This fundamental principle is exploited in transfer learning to
build up representations of images and improve classification accuracy.
 By reusing the low-level features learned from the large dataset, we can speed
up the training process and improve performance on the specific task. Top-level
features are typically specialized for a particular task
 The hierarchical representation of image features in CNNs and the ability to
transfer learned knowledge from one task to another are key advantages that
contribute to the effectiveness of these networks in image analysis and
classification tasks.

You might also like