0% found this document useful (0 votes)
8 views29 pages

3 - Feature Extraction and Images Classification - Part 3

Uploaded by

Zaki Lrb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views29 pages

3 - Feature Extraction and Images Classification - Part 3

Uploaded by

Zaki Lrb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Badji Mokhtar Annaba University

Faculty of Technology
Computer Science Department – Master IATI

Artificial Intelligence and Information Processing


Master 1 IATI
Information Analysis and Processing

Chapter 2 : Feature extraction and images classification


– Part 3

Year: 2023-2024 Prof: Dr. Hariri Walid


Classification
2

Artificial Neural Networks (Regularization):

Dropout: is a regularization technique used in artificial neural networks during training. It involves randomly "dropping out"
(setting to zero) a proportion of neurons in the network during each training iteration.

The dropout rate, like 0.5, in a neural network layer randomly deactivates 50% of its neurons during training.
A dropout rate greater than 0.5 deactivates more than half of the neurons.
A dropout rate less than 0.5 deactivates fewer neurons.

Benefit: prevent overfitting by forcing the network to learn


more robust features and reduces interdependent learning
among neurons.

Course: Information Analysis and Processing Year: 2023-2024


Classification
3

Artificial Neural Networks (Regularization):

Early stopping: monitoring the performance of a model on a separate validation dataset during training.
When the validation loss starts to increase consistently or stagnates while the training loss continues to
decrease, it indicates that the model is overfitting and further training may not lead to better generalization
performance (early stopping is needed).

The patience parameter: specifies the number of epochs to wait


Best model
before stopping the training process if no improvement is observed
in the monitored metric (validation loss). Validation

Benefit: helps prevent the model from learning noise in the


training data and promotes generalization to unseen data.

Course: Information Analysis and Processing Year: 2023-2024


Feature extraction: Bag of visual features
4

Steps of Bag of features process:

It is a technique that uses these feature extraction methods, like SIFT, SURF, and ORB, to extract features from the image
and cluster them into visual words.

BoF is a powerful technique for extracting and representing features in a compact and meaningful way. But it is
computationally expensive and requires large amounts of training data.

Steps: (1): Feature extraction (2): clustering (3): histogram representation

Course: Information Analysis and Processing Year: 2023-2024


Feature extraction: Histogram of gradient
5

HOG

 First used for application of person detection [Dalal and Triggs, CVPR 2005]

 Calculates the distribution of gradients (intensity gradients or edge directions) within localized regions of an image

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
6

Importance of Deep Learning

 Machine learning works only with sets of structured and semi-structured data, while deep learning works with both
structured and unstructured data

 Deep learning algorithms can perform complex operations efficiently, while machine learning algorithms cannot

 Machine learning algorithms use labeled sample data to extract patterns, while deep learning accepts large volumes
of data as input and analyzes the input data to extract features out of an object

 The performance of machine learning algorithms decreases as the number of data increases; so to maintain the
performance of the model, we need a deep learning.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
7

Neural networks

Neural network is a system modeled on the human brain, consisting of an input layer,
multiple hidden layers, and an output layer. Data is fed as input to the neurons. The
information is transferred to the next layer using appropriate weights and biases. The output
is the final value predicted by the artificial neuron.
Each neuron in a neural network performs the following operations:

 The product of each input and the weight of the channel it is


passed over is found
 The sum of the weighted products is computed, which is called
the weighted sum
 A bias value of the neuron is added to the weighted sum
 The final sum is then subjected to a particular function known as
the activation function

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
8

Cost function

The cost function is one of the significant components of a neural network. The cost value is the difference between the
neural nets predicted output and the actual output from a set of labeled training data. The least-cost value is obtained by
making adjustments to the weights and biases iteratively throughout the training process.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
9

Deep Neural Networks (DNN)

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
10

Convolutional neural networks: CNN

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
11

Classification Model
Invariance and equivariance:
Invariance: refers to the property of a machine learning model
where the output remains unchanged despite certain
transformations applied to the input data. Identity

For example, a model that detects objects in images should still


recognize those objects even if the images are rotated or
translated. Segmentation Model

Equivariance: refers to the property of a machine learning model


where the output is transformed in a corresponding manner to
the transformations applied to the input data.
Pixel label
For instance, in a CNN, as the input image undergoes
transformations like rotation, reflection or translation, the features
learned by the model also transform in a consistent manner.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
12

Classification Model
Invariance and equivariance:

Accurate classification:
• The model should accurately classify images regardless of their position or
Identity
scale within the image.
• Achieved by training with diverse datasets and using data augmentation.
• Techniques like image translation and scaling help the model generalize to
Segmentation Model
various object positions and sizes.

Accurate segmentation:
• The model can identify objects in an image regardless of their position, size
or scale.
Pixel label
• This requires training with diverse datasets and using data augmentation
techniques.
• Techniques like image translation and scaling can be applied en ensure an
accurate segmentation.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
13

Convolutional networks for 1D inputs:

Figure 1.
Course: Information Analysis and Processing Year: 2023-2024
Image classification using deep learning
14

1D convolution operation:
Convolutional layers are network layers based on the convolution operation. In 1D, a convolution transforms an input
vector x into an output vector z so that each output 𝑧𝑖 is a weighted sum of nearby inputs.

The same weights are used at every position and are collectively called the convolution kernel or filter. The size of the
region over which inputs are combined is termed the kernel size. For a kernel size of three, we have:

𝒛𝒊 = 𝒘𝟏𝒙𝒊−𝟏 + 𝒘𝟐𝒙𝒊 + 𝒘𝒙𝒊+𝟏

𝑤ℎ𝑒𝑟𝑒 𝒘 = 𝒘𝟏 , 𝒘𝟐 , 𝒘𝟑 𝑻 𝑖𝑠 𝑡ℎ𝑒 𝑘𝑒𝑟𝑛𝑒𝑙

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
15

1D convolution operation:
Stride, kernel size, and dilation.
a) Stride: With a stride of two, we evaluate the
kernel at every other position, so the first output z1 is
computed from a weighted sum centered at x1,

b) Stride: the second output z2 is computed from a


weighted sum centered at x3 and so on.

c) The kernel size: can also be changed. With a


kernel size of five, we take a weighted sum of the
nearest five inputs.

d) Dilation: In dilated or atrous convolution (from the


French “à trous” – with holes), we intersperse zeros
in the weight vector to allow us to combine Figure 2.
information over a large area using fewer weights.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
16

1D convolution operation:
Dilation:

d) Skipping input values within convolutional layers, effectively increasing the receptive
field of neurons. It helps capture broader contextual information while reducing
computational cost compared to traditional convolutions.

Kernel size: is the physical dimensions of the filter (3 in this example)

Effective kernel size: considers the impact of dilation on the receptive field (size is 5)

𝐸𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑘𝑒𝑟𝑛𝑒𝑙 𝑠𝑖𝑧𝑒 = 𝑘𝑒𝑟𝑛𝑒𝑙 𝑠𝑖𝑧𝑒 + (𝑘𝑒𝑟𝑛𝑒𝑙 𝑠𝑖𝑧𝑒 − 1) × (𝑑𝑖𝑙𝑎𝑡𝑖𝑜𝑛 𝑟𝑎𝑡𝑒 − 1)

Figure 3

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
17

1D convolution operation:
Padding:
In Convolution operation, each output is computed by taking a weighted sum of the previous, current, and subsequent positions
in the input.
Question: how to deal with the first output (where there is no previous input)? and the final output (where there is no subsequent
input)?

Solution 1: to pad the edges of the inputs with new values (adding zeros around the edges).
Zero padding assumes the input is zero outside its valid range (figure 4).
Advantage: preserve spatial dimensions during convolutional operations, enabling better
extraction of features from the edges and corners of the input images.

Solution 2: to discard the output positions where the kernel exceeds the range of input
positions. These valid convolutions have the advantage of introducing no extra information
at the edges of the input.
Disadvantage: the representation decreases in size.
Figure 4
Course: Information Analysis and Processing Year: 2023-2024
Image classification using deep learning
18

1D convolution operation:

𝑘𝑒𝑟𝑛𝑒𝑙 𝑠𝑖𝑧𝑒 − 1
𝑃𝑎𝑑𝑑𝑖𝑛𝑔 𝑠𝑖𝑧𝑒 =
2

𝑖𝑛𝑝𝑢𝑡 𝑠𝑖𝑧𝑒 − 𝑘𝑒𝑟𝑛𝑒𝑙 𝑠𝑖𝑧𝑒 + 2 × 𝑝𝑎𝑑𝑑𝑖𝑛𝑔


𝑂𝑢𝑡𝑝𝑢𝑡 𝑠𝑖𝑧𝑒 = +1
𝑠𝑡𝑟𝑖𝑑𝑒

Where:

Input size is the size of the input feature map.


Figure 5
Kernel size is the size of the convolutional kernel/filter.
Padding is the size of zero-padding applied to the input.
Stride is the step size of the kernel sliding over the input.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
19

Convolution layers:
A convolutional layer computes its output by convolving the input, adding a bias β, and passing each
result through an activation function a[]. With kernel size three, stride one, and dilation rate one, the
𝑖 𝑡ℎ hidden unit ℎ𝑖 would be computed as:

Relu : 𝒇 𝒙 = 𝒎𝒂𝒙(𝟎, 𝟏)

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
20

Convolution layers Vs. fully connected layer:

a) A fully connected layer has a weight connecting each


input x to each hidden unit h (colored arrows) and a bias
for each hidden unit.
b) 36 weights relating the six inputs to the six hidden
units.
c) Convolutional layer with kernel size three
d) special case of the fully connected matrix where
many weights are zero and others are repeated
e) Convolutional layer with kernel size three and stride
two
f) Also a special case of a fully connected network with
a different sparse weight structure.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
21

Channels:
When applying a single convolution, information will inevitably be lost due to averaging nearby inputs and ReLU activation
clipping.
It is usual to compute several convolutions in parallel. Each convolution produces a new set of hidden variables, termed a
feature map or channel.

Multiple convolutions are applied to the input x and


stored in channels.
a) A convolution is applied to create hidden units h1 to
h6, which form the first channel.

b) A second convolution operation is applied to create


hidden units h7 to h12, which form the second channel

c) If we add a further convolutional layer, there are now


two channels at each input position.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
22

2D convolution operation

Convolutional networks are more usually applied to 2D image data. The convolutional kernel is now a 2D object.
A 3×3 kernel Ω ∈ R3×3 applied to a 2D input comprising of elements xij
computes a single layer of hidden units ℎ𝑖𝑗 as:

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
23

2D convolution operation

a) The output h23 (shaded output)


is a weighted sum of the nine
positions from x12 to x34 (shaded
inputs).

b) Different outputs are computed


by translating the kernel across the
image grid in two dimensions.

c–d) With zero padding, positions


beyond the image’s edge are
considered to be zero.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
24

Receptive field:
The receptive field of a hidden unit in the network is the region of the original input that feeds into it.

Consider a convolutional network where each convolutional layer has kernel size three. The hidden units in the
first layer take a weighted sum of the three closest inputs, so have receptive fields of size three. The units in the
second layer take a weighted sum of the three closest positions in the first layer, which are themselves weighted
sums of three inputs.

The receptive field in a convolutional neural network with two 3x3 convolutional (conv) layers. In the 2nd conv layer, every
pixel has a 5x5 field of view.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
25

Receptive field of subsequent layers:

With stride 1:

𝑅𝑒𝑐𝑒𝑝𝑡𝑖𝑣𝑒 𝑓𝑖𝑒𝑙𝑑 𝑜𝑓 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡 𝑙𝑎𝑦𝑒𝑟 = 𝑘𝑒𝑟𝑛𝑒𝑙 𝑠𝑖𝑧𝑒 + current receptive field − 1

With stride 2:
𝑅𝑒𝑐𝑒𝑝𝑡𝑖𝑣𝑒 𝑓𝑖𝑒𝑙𝑑 𝑜𝑓 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡 𝑙𝑎𝑦𝑒𝑟 = 𝑘𝑒𝑟𝑛𝑒𝑙 𝑠𝑖𝑧𝑒 + (current receptive field − 1) × 2

Example: stride 1, kernel size 3:

Number of Layers 1 2 3
Receptive field 3 5 7

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
26

Receptive field and number of layers:

Assuming that the input size is 224 × 244 with kernel size 3:

𝑅𝑒𝑐𝑒𝑝𝑡𝑖𝑣𝑒 𝑓𝑖𝑒𝑙𝑑 𝑠𝑖𝑧𝑒 = 1 + 𝐿 𝑘𝑒𝑟𝑛𝑒𝑙 𝑠𝑖𝑧𝑒 − 1 = 224, so L is 112 layers

Where L is the number of layers

If we have 1000 layers, we need 500 layers!

A very deep model is needed to cover all the input data!

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
27

How to increase a receptive field in CNN (decrease the number of layers):


1. Add more convolutional layers (make the network deeper)
2. Add pooling layers or higher stride convolutions (sub-sampling)
3. Use dilated convolutions

If dilation is applied in convolutional operations, it means that the


convolutional kernel is applied with spaces between each step,
rather than every adjacent pixel.

Dilation allows the receptive field of each layer to grow


exponentially without an increase in the number of parameters or
the amount of computation, thus aiding in capturing larger spatial Dilation rate of 1
contexts while retaining the original resolution.

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
28

Pooling: Max pooling


Pooling layers play a crucial role in convolutional neural
networks (CNNs) for several reasons:

Dimensionality Reduction: Pooling shrinks the size of the


data, making computations faster and reducing overfitting.

Translation Invariance: It helps the network recognize


features regardless of their exact position in the input.

Feature Generalization: Pooling selects the most important


Average pooling
features, improving the model's ability to generalize to new
data.

𝑖𝑛𝑝𝑢𝑡 𝑠𝑖𝑧𝑒 − 𝑝𝑜𝑜𝑙𝑖𝑛𝑔 𝑠𝑖𝑧𝑒


𝑂𝑢𝑡𝑝𝑢𝑡 𝑠𝑖𝑧𝑒 = +1
𝑠𝑡𝑟𝑖𝑑𝑒

Course: Information Analysis and Processing Year: 2023-2024


Image classification using deep learning
29

Overview:

Course: Information Analysis and Processing Year: 2023-2024

You might also like