0% found this document useful (0 votes)
18 views6 pages

Convolutional Neural Networks 2 Now

Convolutional Neural Networks (CNNs) are inspired by the human visual system, featuring a hierarchical structure that extracts simple to complex features through local connectivity and translation invariance. Key components of CNNs include convolutional layers, ReLU activation functions, pooling layers, and fully connected layers, all of which work together to recognize patterns in images. CNNs have practical applications in image classification, object detection, and facial recognition, significantly advancing the field of computer vision.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views6 pages

Convolutional Neural Networks 2 Now

Convolutional Neural Networks (CNNs) are inspired by the human visual system, featuring a hierarchical structure that extracts simple to complex features through local connectivity and translation invariance. Key components of CNNs include convolutional layers, ReLU activation functions, pooling layers, and fully connected layers, all of which work together to recognize patterns in images. CNNs have practical applications in image classification, object detection, and facial recognition, significantly advancing the field of computer vision.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

(Additional Materials)

Inspiration Behind CNN and Parallels With The Human Visual System
Convolutional neural networks were inspired by the layered architecture of the human visual cortex, and
below are some key similarities and differences:

Illustration of the correspondence between the areas associated with the primary visual cortex and the layers
in a convolutional neural network
 Hierarchical architecture: Both CNNs and the visual cortex have a hierarchical
structure, with simple features extracted in early layers and more complex features built
up in deeper layers. This allows increasingly sophisticated representations of visual
inputs.
 Local connectivity: Neurons in the visual cortex only connect to a local region of the
input, not the entire visual field. Similarly, the neurons in a CNN layer are only connected
to a local region of the input volume through the convolution operation. This local
connectivity enables efficiency.
 Translation invariance: Visual cortex neurons can detect features regardless of their
location in the visual field. Pooling layers in a CNN provide a degree of translation
invariance by summarizing local features.
 Multiple feature maps: At each stage of visual processing, there are many different
feature maps extracted. CNNs mimic this through multiple filter maps in each convolution
layer.
 Non-linearity: Neurons in the visual cortex exhibit non-linear response properties. CNNs
achieve non-linearity through activation functions like ReLU applied after each
convolution.
CNNs mimic the human visual system but are simpler, lacking its complex feedback mechanisms and relying
on supervised learning rather than unsupervised, driving advances in computer vision despite these
differences.

Key Components of a CNN


The convolutional neural network is made of four main parts.
But how do CNNs Learn with those parts?
They help the CNNs mimic how the human brain operates to recognize patterns and features in images:
A. Convolutional layers
B. Rectified Linear Unit (ReLU for short)
C. Pooling layers
D. Fully connected layers
This section dives into the definition of each one of these components through the example of the following
example of classification of a handwritten digit.
Architecture of the CNNs applied to digit recognition
Convolution layers
This is the first building block of a CNN. As the name suggests, the main mathematical task performed is
called convolution, which is the application of a sliding window function to a matrix of pixels representing
an image. The sliding function applied to the matrix is called kernel or filter, and both can be used
interchangeably.
In the convolution layer, several filters of equal size are applied, and each filter is used to recognize a specific
pattern from the image, such as the curving of the digits, the edges, the whole shape of the digits, and more.
Put simply, in the convolution layer, we use small grids (called filters or kernels) that move over the image.
Each small grid is like a mini magnifying glass that looks for specific patterns in the photo, like lines, curves,
or shapes. As it moves across the photo, it creates a new grid that highlights where it found these patterns.
For example, one filter might be good at finding straight lines, another might find curves, and so on. By using
several different filters, the CNN can get a good idea of all the different patterns that make up the image.
Let’s consider this 32x32 grayscale image of a handwritten digit. The values in the matrix are given for
illustration purposes.

Illustration of the input image and its pixel representation


Also, let’s consider the kernel used for the convolution. It is a matrix with a dimension of 3x3. The weights of
each element of the kernel is represented in the grid. Zero weights are represented in the black grids and
ones in the white grid.

DO WE HAVE TO MANUALLY FIND THESE WEIGHTS?


In real life, the weights of the kernels are determined during the training process of the neural network.
Using these two matrices, we can perform the convolution operation by applying the dot product, and work
as follows:
1. Apply the kernel matrix from the top-left corner to the right.
2. Perform element-wise multiplication.
3. Sum the values of the products.
4. The resulting value corresponds to the first value (top-left corner) in the convoluted
matrix.
5. Move the kernel down with respect to the size of the sliding window.
6. Repeat steps 1 to 5 until the image matrix is fully covered.
The dimension of the convoluted matrix depends on the size of the sliding window. The higher the sliding
window, the smaller the dimension.

Application of the convolution task using a stride of 1 with 3x3 kernel


Another name associated with the kernel in the literature is feature detector because the weights can be
fine-tuned to detect specific features in the input image.
For instance:
 Averaging neighboring pixels kernel can be used to blur the input image.
 Subtracting neighboring kernel is used to perform edge detection.
The more convolution layers the network has, the better the layer is at detecting more abstract features.
Activation function
A ReLU activation function is applied after each convolution operation. This function helps the network
learn non-linear relationships between the features in the image, hence making the network more robust for
identifying different patterns. It also helps to mitigate the vanishing gradient problems.
Pooling layer
The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done
by applying some aggregation operations, which reduce the dimension of the feature map (convoluted
matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating
overfitting.
The most common aggregation functions that can be applied are:
 Max pooling, which is the maximum value of the feature map
 Sum pooling corresponds to the sum of all the values of the feature map
 Average pooling is the average of all the values.
Below is an illustration of each of the previous example:

Application of max pooling with a stride of 2 using 2x2 filter


Also, the dimension of the feature map becomes smaller as the pooling function is applied.
The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.
Fully connected layers
These layers are in the last layer of the convolutional neural network, and their inputs correspond to the
flattened one-dimensional matrix generated by the last pooling layer. ReLU activations functions are applied
to them for non-linearity.
Finally, a softmax prediction layer is used to generate probability values for each of the possible output
labels, and the final label predicted is the one with the highest probability score.
PRACTICAL APPLICATIONS OF CNNS
Convolutional Neural Networks have revolutionized the field of computer vision, leading to significant
advancements in many real-world applications. Below are a few examples of how they are applied.
 Image classification: Convolutional neural networks are used for image categorization, where
images are assigned to predefined categories. One use of such a scenario is automatic photo
organization in social media platforms.
 Object detection: CNNs are able to identify and locate multiple objects within an image. This
capability is crucial in multiple scenarios of shelf scanning in retail to identify out-of-stock items.
 Facial recognition: this is also one of the main industries of application of CNNs. For instance, this
technology can be embedded into security systems for efficient control of access based on facial
features.

You might also like