0% found this document useful (0 votes)
6 views

Object Classification Using CNN

Uploaded by

narayan.gccp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Object Classification Using CNN

Uploaded by

narayan.gccp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

15-10-2023

OBJECT CLASSIFICATION
USING CNN

OBJECT CLASSIFICATION USING CNN - STEPS OBJECT CLASSIFICATION USING CNN - STEPS
INVOLVED : INVOLVED :
COMMON DATASET USED ARE:
• STEP1: Data Collection and Preparation:
• Gather a labeled dataset containing images of the objects you want to MNIST (Modified National Institute of Standards and Technology) is a well-
classify. known dataset used in Computer Vision that was built by Yann Le Cun et al. It is
• Split the dataset into training, validation, and test sets. Common splits include composed of images that are handwritten digits (0-9), split into a training set
70-80% for training, 10-15% for validation, and 10-15% for testing of 50,000 images and a test set of 10,000, where each image is 28 x 28 pixels in
width and height.

The CIFAR-10 dataset consists of 60,000 images, 32 x 32 colour images in 10


classes, with 6,000 images per class. There are 50,000 training images and 10,000
test images.
15-10-2023

OBJECT CLASSIFICATION USING CNN - STEPS OBJECT CLASSIFICATION USING CNN - STEPS
INVOLVED :
INVOLVED :
COMMON DATASET USED ARE:
• STEP2 – DATA AUGMENTATION (OPTIONAL):
Data augmentation techniques like rotation, scaling, flipping, and
The Imagenet dataset consists of 1000 object categories, organized according to cropping can be applied to increase the diversity of training data and
WordNet hierarchy.
improve the model's generalization.

Processing Type Sample Output Processing Type Sample Output

Resize images

Jitter color

Warp images

Simulate noise

Simulate blur
Crop images
15-10-2023

OBJECT CLASSIFICATION USING CNN - STEPS OBJECT CLASSIFICATION USING CNN - STEPS
INVOLVED : INVOLVED :
• Step 3 – Preprocessing: • Step 4 – Build the CNN Model
It helps in training stabililty • Design the architecture of your CNN model. Common architectures include
VGG, ResNet, Inception, or custom designs.
Resize the images to a consistent input size (eg: 224x224 pixel) • Specify the number of convolutional layers, filter sizes, pooling layers, and
fully connected layers.
• Add activation functions like ReLU (Rectified Linear Unit) after convolutional
layers.

• AlexNet was the first convolution Network which used GPU(Graphics Processing Unit) to boost
performance.
• A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate
and alter memory to accelerate the creation of images in a frame buffer intended for output
to a display device.
• AlexNet architecture consists of 5 convolutional layers, 3 max-pooling layers, 3 fully
connected layers, and 1 softmax layer.
• Each convolutional layer consists of convolutional filters and a nonlinear activation function
ReLU.
• The pooling layers are used to perform max pooling.
• Input size is fixed due to the presence of fully connected layers.
• The input size is mentioned at most of the places as 224x224x3 but due to some padding which
happens it works out to be 227x227x3
15-10-2023

Full (simplified) AlexNet architecture:


[227x227x3] INPUT
[55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0
[27x27x96] MAX POOL1: 3x3 filters at stride 2 Details/Retrospectives:
[27x27x96] NORM1: Normalization layer - first use of ReLU
[27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2 - used Norm layers (not common anymore)
[13x13x256] MAX POOL2: 3x3 filters at stride 2 - data augmentation
[13x13x256] NORM2: Normalization layer - dropout 0.5
[13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1 - batch size 128
[13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1 - SGD Momentum 0.9
[13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1 - Learning rate 1e-2,
[6x6x256] MAX POOL3: 3x3 filters at stride 2 - L2 weight decay 5e-4
[4096] FC6: 4096 neurons
[4096] FC7: 4096 neurons
[1000] FC8: 1000 neurons (class scores)

ResNet –Residual Network


(34,50,101,152) Deep Residual Learning for Image Recognition
Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
Microsoft Research

• Very very deep network


• 152 layers
• won the 1st place on the ILSVRC 2015 classification task.
15-10-2023

Stacking CNN deep


• The authors introduced deep residual learning framework
• hypothesize that it is easier to optimize the residual mapping than to
optimize the original (plain network)
• Proposed Hypothesis performed better
• use layers to fit a residual mapping rather than fitting to underline
mapping
• Deep network should perform better but not performing as good as
shallow network
• The problem is due to not optimizing the learning

Residual Block
• H(x) is underlying mapping
• F(x)+x can be realized by feedforward neural networks with “shortcut
connections” known as identity mapping
• shortcut allows the gradient to be directly backpropagated to earlier layers
• It not creates any extra parameters and computation
• Training achieved by backpropagation

F(x) := H(x) – x
Fitting Residual:
H(x) =F(x)+x
15-10-2023

Full ResNet architecture GoogleNet


• 22 layers
• Inception module
• 5 million parameters(12x less than AlexNet)

Inception module
• design a good local network topology (network within a network) and then
stack these modules on top of each other
Naïve Inception module
• Used 9 Inception modules in the whole architecture
15-10-2023

Activation functions – Activation functions –


Rectified LINEAR UNIT (reLU) Rectified LINEAR UNIT (reLU)
A rectified linear unit (ReLU) is an activation function
that introduces the property of non-linearity to a deep Advantages
learning model and solves the vanishing gradients issue. 1. ReLU is computationally efficient and easy to implement
"It interprets the positive part of its argument. It is one of 2. It helps to avoid the vanishing gradient problem, which can occur when using other activation functions
the most popular activation functions in deep learning. 3. ReLU has been shown to be effective in deep learning models, achieving state-of-the-art results in many
applications

Mathematically Disadvantages
1.When the input is negative, the output is always zero, which can lead to the “dead neuron” problem where the
neuron stops learning and does not contribute to the model’s performance
2. ReLU is not a smooth function, which can cause some optimization algorithms to fail

Activation functions – Activation functions –


Leaky rectified LINEAR UNIT (LreLU) Leaky rectified LINEAR UNIT (LreLU)
Advantages
Leaky ReLU (Rectified Linear Unit) is a variation of the ReLU 1. It avoids the “dying ReLU” problem, where the gradient of the neuron can become zero during
activation function that overcomes the “dying ReLU” problem, training and the neuron will stop updating. The small negative slope allows the neuron to have a non-
where the neuron can become inactive during training and not zero gradient, even for negative inputs.
recover. The Leaky ReLU allows a small, non-zero gradient when 2. It combats bias shift since neuron are allowed to pass small negative signals to the output.
the input is negative, which helps to prevent the neuron from Disadvantages
dying. 1. The negative slope is a hyperparameter that needs to be tuned, which can add complexity to the
model.
Mathematically 2. While Leaky ReLU can help prevent the gradient from vanishing for negative inputs, it can still cause
the gradient to vanish for very large positive inputs. This can make it difficult to train deep networks
with many layers.
15-10-2023

Activation functions –
Exponential rELU Activation functions –
Exponential rELU
It is a smooth and continuous function that allows ADVANTAGES:
1.It can help to reduce the bias shift and avoid overfitting in neural networks.
negative values. 2. It has been shown to outperform other activation functions like ReLU and its variants in cases such as
regression (where output should take negative values), with imbalanced data (ELU can help prevent the
Mathematically vanishing gradient problem when some inputs have very large positive or negative values)
3. It is a smooth and continuous function, which can help more in the convergence of gradient-based
optimization algorithms.
4. ELU can help to avoid the dead neuron problem that can occur with ReLU activation function.

DISADVANTAGES
1. The exponential function used in the function can be computationally expensive.
2. The value of alpha needs to be carefully chosen to balance the advantages of the function.

Loss Function

• After completion of activation function, the model performance is verified with


• Step 5 : compile the model the required output.
• Choose an appropriate loss function, such as regression loss / classification
loss • The loss function is defined as the measurement of difference or error between
actual values and expected values at the current position

Loss function = Actual O/P – Desired O/P


• The average over all losses constitutes the cost.
•Loss functions are divided into two categories.

1. Regression loss
2. Classification Loss – Binary and Multi-class Classification
15-10-2023

Loss Function – Regression Loss Function – Binary


Loss Classification Loss Functions
1. Mean Squared Error (MSE)
• Binary Cross Entropy Loss
Squared Error loss for each training example, also known as L2 Loss, is the square of the difference between the
actual and the predicted values. 1. Entropy indicates disorder or uncertainty. It is measured for a random variable X with probability distribution
p(X):

2. Mean Square Logarithmic Error (MSLE) • Cross-entropy is the default loss function to use for binary classification problems. It is intended for use with
It measures the ratio between actual and predicted using logarithmic values. It is a good choice to predict the binary classification where the target values are in the set {0, 1}.
continuous data.
• 2. Hinge Loss
• An alternative to cross-entropy for binary classification problems is the hinge loss function, primarily developed for
3. Mean Absolute Error (MAE) use with Support Vector Machine (SVM) models. It is intended for use with binary classification where the target
Absolute Error for each training example is the distance between the predicted and the actual values, irrespective of values are in the set {-1, 1}.
the sign. Absolute Error is also known as the L1 loss:
• 3. Squared Hinge Loss
• It is an extension of Hinge Loss. It is mainly used for categorical prediction or yes/no kind of decision problems.

Loss Function – Multi-Class


Classification Loss Functions
• Multi-Class classification are those predictive modeling problems where examples are
assigned one of more than two classes.
• Step 6 - Training and Evaluation
• 1. Multiclass Cross Entropy Loss Train the data in batches and evaluate using metrics like precision,
recall, F1-Score, and confusion matrix to assess its performance.
• Cross-entropy is the default loss function to use for multi-class classification problems.
• It is best loss function for text classification.

• 2. Sparse Multiclass Cross-Entropy Loss


• Sparse cross-entropy addresses the same performance as cross-entropy calculation of error
and it is mainly used to handle large amount of data.

You might also like