ml2
ml2
1. Self-driving cars
2. Medical image analysis
3. Facial recognition
4. Image search
5. Quality inspection
Transfer Learning
Transfer learning is a technique in deep learning where a pre-trained model is
used as a starting point for a new task. The pre-trained model has already
learned general features from a large dataset, and these features can be fine-
tuned for the new task.
Benefits of transfer learning:
1. Reduced training time: Leverage pre-trained models and fine-tune instead of
training from scratch.
2. Improved performance: Pre-trained models have already learned general
features, leading to better performance on new tasks.
3. Smaller dataset requirements: Fine-tune on a smaller dataset instead of requiring
a large dataset for training from scratch.
Common transfer learning
scenarios:
2. Fine-tune the model: Adjust the model's weights for your specific task.
3. Add new layers: Add task-specific layers on top of the pre-trained model.
https://www.youtube.com/watch?
v=PGBop7Ka9AU&list=PLZoTAELRMXVPGU70ZGsckrMdr0FteeRUi&in
dex=29
The stride helps reduce the computational load by down sampling the input. The
filter stride determines how much the filter moves across the image during
convolution, with typical values ranging from 1 to 2.
The filter number determines the number of unique feature detectors operating
on the input.
Feature Maps:
•Each filter in the convolutional layer generates a feature map, which represents the presence of a
specific feature in the input. Multiple filters create a stack of feature maps, forming a volume of
feature maps. The filters in subsequent layers need to cover all the feature maps from the
previous layer.
Input Channels:
•Images have multiple colour channels, typically red, green, and blue (RGB).The filters in
the first layer need to have weights corresponding to each colour channel. Each filter has
a 2D extent in each input channel, resulting in a stack of filters for each channel.
Activation functions
Activation functions increase the network's functional capacity and allow it to
capture and model complex patterns in the data, making them essential for
effective machine learning and deep learning tasks.
1. Sigmoid function: The sigmoid function is a commonly used non-linear
activation function. It squashes the input values between 0 and 1, forming a
probabilistic output.
2. Hyperbolic tangent function: The hyperbolic tangent function is another
commonly used non-linear activation function. It squashes the input values
between -1 and 1.
Rectified Linear Unit (ReLU): ReLU is a simple non-linear activation
function that returns the input value if it is greater than zero, and zero
otherwise. It is widely used in neural networks and increases the
network's capacity to represent information within the input.
Pooling and Fully Connected Layers
Pooling layer plays a crucial role in reducing the computational complexity,
down-sampling the feature maps, and combating overfitting in a CNN.
By using fully connected layers after the pooling layer, the CNN can process
the high-level features and learn complex relationships between these features
and the output classes.
This enables the network to make accurate predictions and perform tasks such
as image classification, object detection, and more.
The pooling layer is applied after the convolution and activation layers to reduce
the size of the input passed to subsequent layers. It helps in reducing
computational complexity, making the network easier to train.
The pooling layer uses a filter window to collapse the values within the window
to a single value. The most common type of pooling is maximum pooling, where
the maximum value within the window is selected. This process helps in down-
sampling the feature maps.
Max Pooling and Average Pooling
In Convolutional Neural Networks (CNNs), pooling is a
downsampling technique used to reduce the spatial dimensions
of feature maps, retaining important information while
decreasing the number of parameters and computations.
There are two primary types of pooling:1. Max Pooling: - Takes
the maximum value across each patch of the feature map. -
Helps retain the most prominent features. - Commonly used in
CNN architectures.
2. Average Pooling: - Takes the average value across each
patch of the feature map. - Smoothens the feature map,
reducing the effect of noise. - Less commonly used than max
pooling, but still effective.
Pros and Cons of types of pooling
Hereare the pros and cons of Max pooling, Min pooling, and
Average pooling:
Max Pooling
Pros:1.Retains most prominent features: Max pooling helps
retain the most prominent features in the data.
2. Translation invariance: Max pooling provides translation
invariance, meaning the model is less sensitive to the location of
features.
3. Fast computation: Max pooling is computationally efficient.
Cons:1.Loses spatial information: Max pooling loses spatial
information, which can be important in some applications.
2. Sensitive to noise: Max pooling can be sensitive to noise, as a
single noisy pixel can affect the output.
Min Pooling
Pros:
1. Retains least prominent features: Min pooling helps retain the
least prominent features in the data.
2. 2. Robust to noise: Min pooling is more robust to noise, as it is
less affected by single noisy pixels.
Cons:
3. Loses most prominent features: Min pooling loses the most
prominent features in the data.
4. Not commonly used: Min pooling is not commonly used in
Average Pooling
Pros:1. Retains spatial information: Average pooling retains more
spatial information compared to max pooling.
2. Robust to noise: Average pooling is more robust to noise, as it
averages out the effects of noisy pixels.
Cons:1. Washes out prominent features: Average pooling can
wash out prominent features in the data.
2. Computationally expensive: Average pooling can be
computationally expensive compared to max pooling.
In summary, Max pooling is commonly used due to its ability to
retain prominent features and provide translation invariance.
Average pooling is used when retaining spatial information is
important, while Min pooling is less commonly used due to its
limitations.
After the pooling layer, the high-level features are processed using fully
connected layers, similar to a multi-layer perceptron. These layers connect the
features to the output classes for classification.
Training a Convolutional Neural Network
(CNN)
1.Input Images and Labels:
•We start with a set of input images, each with a corresponding ground truth
label.
•For a binary classification task like diabetic retinopathy, the labels are +1 for
an unhealthy retina and -1 for a healthy retina.
2.Convolutional Layers:
•The input image is fed through the CNN, starting with a convolution operation.
•The convolution operation applies filters to the input image to extract features,
resulting in a stack of feature maps.
•This process is repeated for multiple layers, with each layer using different
filters.
3.Fully Connected Layers:
•The feature maps are then transformed using fully connected layers.
•These layers use a weight matrix to determine how the feature maps are
transformed into predicted labels.
•The predicted label represents the CNN's prediction for the input image.