0% found this document useful (0 votes)
49 views65 pages

DL Mod 3

deep learning notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views65 pages

DL Mod 3

deep learning notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

MODULE 3

CONVOLUTIONAL NEURAL NETWORK


• A Convolutional Neural Network (CNN) is a type of Deep Learning neural
network architecture commonly used in Computer Vision.
• Computer vision is a field of Artificial Intelligence that enables a computer
to understand and interpret the image or visual data.
• We use CNN for image classification.
• CNN’s are a special type of ANN which accepts images as inputs.
Why we have to use CNN?
• Talking about grayscale images, they have pixel ranges from 0 to 255 i.e. 8-
bit pixel values.
• If the size of the image is NxM, then the size of the input vector will be N*M.
For RGB images, it would be N*M*3. 3 represents channels.
• Consider an RGB image with size 30x30. This would require 2700 neurons.
That is 30*30*3=2700
• An RGB image of size 256x256 would require over 100000 neurons. That is
256*256*3.
• The number of weights, parameters for 224x224x3 is very high.
• A single neuron in the output layer will have 224x224x3 weights coming into
it.
• This would require more computation, memory, and data.
• Each layer performs convolution on CNN.
• CNN takes input as an image volume for the RGB image.
• Basically, an image is taken as an input and we apply kernel/filter on the image
to get the output.
• CNN also enables parameter sharing between the output neurons which means
that a feature detector (for example horizontal edge detector) that’s useful in
one part of the image is probably useful in another part of the image.
• Convolutions
• Every output neuron is connected to a small neighborhood in the input through
a weight matrix also referred to as a kernel or a weight matrix.
• We can define multiple kernels for every convolution layer each giving rise to
an output.
• Each filter is moved around the input image.
• The outputs corresponding to each filter are stacked giving rise to an output
volume.


• Padding
• Padded convolution is used when preserving the dimension of an input matrix
that is important to us and it helps us keep more of the information at the
border of an image.
• We have seen that convolution reduces the size of the feature map.
• To retain the dimension of feature map as that of an input map, we pad or
append the rows and column with zeros.
• Padding P=(F-1)/2
• F is the size of the kernel matrix
Stride
• Stride refers to the number of pixels the kernel filter will skip i.e
pixels/time.
• A Stride of 2 means the kernel will skip 2 pixels before
performing the convolution operation.


• In the figure above, the kernel filter is sliding over the input matrix by
skipping one pixel at a time.
• A Stride of 2 would perform this skipping action twice before performing the
convolution like in the image below.


• The output feature map is reduced(4 times) when the stride is increased from
1 to 2.
• The dimension of the output feature map is (N-F)/S + 1.
• Pooling
• The pooling operation involves sliding a two-dimensional filter over each
channel of feature map and summarising the features lying within the region
covered by the filter.
• A common CNN model architecture is to have a number of convolution and
pooling layers stacked one after the other.
• Pooling layers are used to reduce the dimensions of the feature maps. Thus, it
reduces the number of parameters to learn and the amount of computation
performed in the network.
• The pooling layer summarises the features present in a region of the feature
map generated by a convolution layer.
• Pooling provides translational invariance by subsampling: reduces the size of
the feature maps. The two commonly used Pooling techniques are max
pooling and average pooling.


• The pooling operation divides 4x4 matrix into 4 2x2 matrices and picks the
value which is the greatest amongst the four(for max-pooling) and the average
of the four( for average pooling).
• This reduces the size of the feature maps which therefore reduces the number
of parameters without missing important information.
• One thing to note here is that the pooling operation reduces the Nx and Ny
values of the input feature map but does not reduce the value of Nc (number
of channels).
• Also, the hyperparameters involved in pooling operation are the filter
dimension, stride, and type of pooling(max or avg).
• There is no parameter for gradient descent to learn.
• Max Pooling
• Max pooling is a pooling operation that selects the maximum element from
the region of the feature map covered by the filter. Thus, the output after max-
pooling layer would be a feature map containing the most prominent features
of the previous feature map.


• Average Pooling
• Average pooling computes the average of the elements present in the region
of feature map covered by the filter. Thus, while max pooling gives the most
prominent feature in a particular patch of the feature map, average pooling
gives the average of features present in a patch.


• Global Pooling
• Global pooling reduces each channel in the feature map to a single value.
Thus, an nh x nw x nc feature map is reduced to 1 x 1 x nc feature map. This
is equivalent to using a filter of dimensions nh x nw i.e. the dimensions of the
feature map.
• Further, it can be either global max pooling or global average pooling
• In convolutional neural networks (CNNs), the pooling layer is a common type
of layer that is typically added after convolutional layers. The pooling layer is
used to reduce the spatial dimensions (i.e., the width and height) of the feature
maps, while preserving the depth (i.e., the number of channels).
• The pooling layer works by dividing the input feature map into a set of non-
overlapping regions, called pooling regions. Each pooling region is then
transformed into a single output value, which represents the presence of a
particular feature in that region. The most common types of pooling
operations are max pooling and average pooling.
• In max pooling, the output value for each pooling region is simply the
maximum value of the input values within that region. This has the effect of
preserving the most salient features in each pooling region, while discarding
less relevant information. Max pooling is often used in CNNs for object
recognition tasks, as it helps to identify the most distinctive features of an
object, such as its edges and corners.
• In average pooling, the output value for each pooling region is the average of
the input values within that region. This has the effect of preserving more
information than max pooling, but may also dilute the most salient features.
Average pooling is often used in CNNs for tasks such as image segmentation
and object detection, where a more fine-grained representation of the input is
required.
• Advantages of Pooling Layer:
• Dimensionality reduction: The main advantage of pooling layers is that they
help in reducing the spatial dimensions of the feature maps. This reduces the
computational cost and also helps in avoiding overfitting by reducing the
number of parameters in the model.
• Translation invariance: Pooling layers are also useful in achieving
translation invariance in the feature maps. This means that the position of an
object in the image does not affect the classification result, as the same
features are detected regardless of the position of the object.
• Feature selection: Pooling layers can also help in selecting the most
important features from the input, as max pooling selects the most salient
features and average pooling preserves more information.
Output Feature Map
• The size of the output feature map or volume depends on:
• Size of the input feature map
• Kernel size(Kw,Kh)
• Zero padding
• Stride(Sw, Sh)


CNN ARCHITECTURE
• Convolutional Neural Network consists of multiple layers like the input layer,
Convolutional layer, Pooling layer, and fully connected layers.


• CNN is very useful as it minimises human effort by automatically detecting
the features. For example, for apples and mangoes, it would automatically
detect the distinct features of each class on its own.
• There are two main parts to a CNN architecture
• A convolution tool that separates and identifies the various features of the
image for analysis in a process called as Feature Extraction.
• The network of feature extraction consists of many pairs of convolutional or
pooling layers.
• A fully connected layer that utilizes the output from the convolution process
and predicts the class of the image based on the features extracted in previous
stages.
• This CNN model of feature extraction aims to reduce the number of features
present in a dataset. It creates new features which summarises the existing
features contained in an original set of features. There are many CNN layers.


• There are three types of layers that make up the CNN which are the
convolutional layers, pooling layers, and fully-connected (FC) layers. When
these layers are stacked, a CNN architecture will be formed. In addition to
these three layers, there are two more important parameters which are the
dropout layer and the activation function.
• 1. Convolutional Layer
• This layer is the first layer that is used to extract the various features from the
input images. In this layer, the mathematical operation of convolution is
performed between the input image and a filter of a particular size MxM. By
sliding the filter over the input image, the dot product is taken between the
filter and the parts of the input image with respect to the size of the filter
(MxM).
• The output is termed as the Feature map which gives us information about the
image such as the corners and edges. Later, this feature map is fed to other
layers to learn several other features of the input image.
• The convolution layer in CNN passes the result to the next layer once applying
the convolution operation in the input.
• 2. Pooling Layer
• In most cases, a Convolutional Layer is followed by a Pooling Layer. The
primary aim of this layer is to decrease the size of the convolved feature map
to reduce the computational costs.
• Depending upon method used, there are several types of Pooling operations.
• It basically summarizes the features generated by a convolution layer.
• In Max Pooling, the largest element is taken from feature map. Average
Pooling calculates the average of the elements in a predefined sized Image
section. The total sum of the elements in the predefined section is computed
in Sum Pooling. The Pooling Layer usually serves as a bridge between the
Convolutional Layer and the FC Layer.
• This CNN model generalises the features extracted by the convolution layer,
and helps the networks to recognise the features independently.
• With the help of this, the computations are also reduced in a network.
• 3. Fully Connected Layer
• The Fully Connected (FC) layer consists of the weights and biases along with
the neurons and is used to connect the neurons between two different layers.
These layers are usually placed before the output layer and form the last few
layers of a CNN Architecture.
• In this, the input image from the previous layers are flattened and fed to the
FC layer. The flattened vector then undergoes few more FC layers where the
mathematical functions operations usually take place. In this stage, the
classification process begins to take place. The reason two layers are
connected is that two fully connected layers will perform better than a single
connected layer. These layers in CNN reduce the human supervision.
• 4. Dropout
• Usually, when all the features are connected to the FC layer, it can cause
overfitting in the training dataset. Overfitting occurs when a particular model
works so well on the training data causing a negative impact in the model’s
performance when used on a new data.
• To overcome this problem, a dropout layer is utilised wherein a few neurons
are dropped from the neural network during training process resulting in
reduced size of the model. On passing a dropout of 0.3, 30% of the nodes are
dropped out randomly from the neural network.
• Dropout results in improving the performance of a machine learning model as
it prevents overfitting by making the network simpler. It drops neurons from
the neural networks during training.
• 5. Activation Functions
• Finally, one of the most important parameters of the CNN model is the
activation function.
• They are used to learn and approximate any kind of continuous and complex
relationship between variables of the network.
• In simple words, it decides which information of the model should fire in the
forward direction and which ones should not.
• It adds non-linearity to the network.
• There are several commonly used activation functions such as the ReLU,
Softmax, tanH and the Sigmoid functions.
• Each of these functions have a specific usage.
• For a binary classification CNN model, sigmoid and softmax functions are
preferred an for a multi-class classification, generally softmax us used.
• In simple terms, activation functions in a CNN model determine whether a
neuron should be activated or not.
• It decides whether the input to the work is important or not to predict using
mathematical operations.

LeNet-5 CNN Architecture


• In 1998, the LeNet-5 architecture was introduced in a research paper titled
“Gradient-Based Learning Applied to Document Recognition” by Yann
LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. It is one of the
earliest and most basic CNN architecture.
• It consists of 7 layers. The first layer consists of an input image with
dimensions of 32×32. It is convolved with 6 filters of size 5×5 resulting in
dimension of 28x28x6. The second layer is a Pooling operation which filter
size 2×2 and stride of 2. Hence the resulting image dimension will be
14x14x6.
• Similarly, the third layer also involves in a convolution operation with 16
filters of size 5×5 followed by a fourth pooling layer with similar filter size of
2×2 and stride of 2. Thus, the resulting image dimension will be reduced to
5x5x16.
• Once the image dimension is reduced, the fifth layer is a fully connected
convolutional layer with 120 filters each of size 5×5. In this layer, each of the
120 units in this layer will be connected to the 400 (5x5x16) units from the
previous layers. The sixth layer is also a fully connected layer with 84 units.
• The final seventh layer will be a softmax output layer with ‘n’ possible classes
depending upon the number of classes in the dataset.
• Advantages of Convolutional Neural Networks (CNNs):
• Good at detecting patterns and features in images, videos, and audio signals.
• Robust to translation, rotation, and scaling invariance.
• End-to-end training, no need for manual feature extraction.
• Can handle large amounts of data and achieve high accuracy.
• Disadvantages of Convolutional Neural Networks (CNNs):
• Computationally expensive to train and require a lot of memory.
• Can be prone to overfitting if not enough data or proper regularization is used.

PRE-TRAINED CONVOLUTIONAL ARCHITECTURES


• ALEXNET
AlexNet comprises 8 layers — out of which 5 are convolutional and 3 denotes
fully-connected. A couple of more layers were stacked onto LeNet 5, hence
forming AlexNet as demonstrated in Figure. This architecture was the first to carry
out ReLU activation function and utilized Dropout layers.The architecture uses
60,000 parameters.
• Advantages:
1. The model performed classification of images efficiently.
2. The computation performed is fast.
3. More computation and memory efficiency
4. It is robust
• Disadvantages:
1. They are difficult in application of high resolution images
• The first two fully-connected layers have 4096 nodes each. After the above
mentioned last max-pooling, we have a total of 6*6*256 i.e. 9216 nodes or
features and each of these nodes is connected to each of the nodes in this
fully-connected layer. So the number of connections we'll have in this case is
9216*4096.

ZFNET
• ZFNet came to the limelight having significant improvement over AlexNet.
• This paper is the golden gem that gives you the starting point for many
concepts such as deep feature visualization, feature invariance, feature
evolution, and feature importance


• Our input is 224x224x3 images.
• Next, 96 convolutions of 7x7 with a stride of 2 are performed, followed
by ReLU activation, 3x3 max pooling with stride 2 and local contrast
normalization.
• Followed by it are 256 filters of 3x3 each which are then again local
contrast normalized and pooled.
• The third and fourth layers are identical with 384 kernels of 3x3 each.
• The fifth layer has 256 filters of 3x3, followed by 3x3 max pooling with
stride 2 and local contrast normalization.
• The sixth and seventh layers house 4096 dense units each.
• Finally, we feed into a Dense layer of 1000 neurons i.e. the number of
classes in ImageNet.
• The local normalization tends to uniformize the mean and variance of an
image around a local neighborhood. This is especially useful for correct
uneven illumination or shading artifacts.
• Local Contrast Normalization is a type of normalization that performs local
subtraction and division normalizations, enforcing a sort of local
competition between adjacent features in a feature map, and between
features at the same spatial location in different feature maps.

• VGG
• VGG was developed to increase the depth of such CNNs in order to increase
the model performance.
• VGG stands for Visual Geometry Group; it is a standard deep Convolutional
Neural Network (CNN) architecture with multiple layers. The “deep” refers
to the number of layers with VGG-16 or VGG-19 consisting of 16 and 19
convolutional layers.
• The model VGG16 was developed by Visual Geometry Group (VGG) that
comprises of 13 convolutional and 3 fully connected layers, carrying ReLu
custom from AlexNet. More and more layers are stacked on AlexNet to get
the VGG model.


• VGG 19
• The concept of the VGG19 model (also VGGNet-19) is the same as the
VGG16 except that it supports 19 layers.
• The “16” and “19” stand for the number of weight layers in the model
(convolutional layers).
• This means that VGG19 has three more convolutional layers than VGG16.
• VGG 19
• It is a variety of VGG model that includes 19 layers (16 convolution layer, 5
MaxPool, 3 Fully related layer and 1 SoftMax layer).
• Advantages:
• The model is efficient in performing transfer learning and small
classification tasks.
• It is more robust
• Disadvantages:
• Due to the use of large no of parameters i.e 138 million parameters, it will
increase the computation cost
• The model isn’t useful for deep networks as more the deeper it goes, it is
more inclined to Vanishing Gradients Proble

• Input: The VGGNet takes in an image input size of 224×224.
• Convolutional Layers: VGG’s convolutional layers leverage a minimal
receptive field, i.e., 3×3, the smallest possible size that still captures
up/down and left/right.
• Hidden Layers: All the hidden layers in the VGG network use ReLU. VGG
does not usually leverage Local Response Normalization (LRN) as it
increases memory consumption and training time. Moreover, it makes no
improvements to overall accuracy.
• Fully-Connected Layers: The VGGNet has three fully connected layers. Out
of the three layers, the first two have 4096 channels each, and the third has
1000 channels, 1 for each class.

ResNet-50
• ResNet stands for Residual Network and is a specific type of convolutional
neural network (CNN) introduced in the 2015.
• ResNet-50 is a 50-layer convolutional neural network (48 convolutional
layers, one MaxPool layer, and one average pool layer).
• Residual neural networks are a type of artificial neural network (ANN) that
forms networks by stacking residual blocks.
• The ResNet architecture was developed in response to a surprising
observation in deep learning research: adding more layers to a neural
network was not always improving the results.
• This was unexpected because adding a layer to a network should allow it to
learn at least what the previous network learned, plus additional information.
• To address this issue, the ResNet team, led by Kaiming He, developed a
novel architecture that incorporated skip connections.
• These connections allowed the preservation of information from earlier
layers, which helped the network learn better representations of the input
data. With the ResNet architecture, they were able to train networks with as
many as 152 layers.
• ResNet-50 Architecture
• ResNet-50 consists of 50 layers that are divided into 5 blocks, each
containing a set of residual blocks. The residual blocks allow for the
preservation of information from earlier layers, which helps the network to
learn better representations of the input data.

• 1. Convolutional Layers
• The first layer of the network is a convolutional layer that performs
convolution on the input image. This is followed by a max-pooling layer that
downsamples the output of the convolutional layer. The output of the max-
pooling layer is then passed through a series of residual blocks.

2. Residual Blocks
• Each residual block consists of two convolutional layers, each followed by a
batch normalization layer and a rectified linear unit (ReLU) activation
function. The output of the second convolutional layer is then added to the
input of the residual block, which is then passed through another ReLU
activation function. The output of the residual block is then passed on to the
next block.

• 3. Fully Connected Layer
• The final layer of the network is a fully connected layer that takes the output
of the last residual block and maps it to the output classes. The number of
neurons in the fully connected layer is equal to the number of output classes.

Concept of Skip Connection
• Skip connections, also known as identity connections, are a key feature of
ResNet-50.
• They allow for the preservation of information from earlier layers, which
helps the network to learn better representations of the input data.
• Skip connections are implemented by adding the output of an earlier layer to
the output of a later layer.


• Advantages:
• The training process is fast
• Minimized vanishing gradient problem
• It can train deeper networks
• ResNet-50 has several advantages over other networks. One of the main
advantages is its ability to train very deep networks with hundreds of layers.
• This is made possible by the use of residual blocks and skip connections,
which allow for the preservation of information from earlier layers.
• Another advantage of ResNet-50 is its ability to achieve state-of-the-art
results in a wide range of image-related tasks such as object detection, image
classification, and image segmentation.
• Disadvantage:
• The training takes a large amount of time which makes it infeasible and
impractical for real world applications.

CONVOLUTION OPERATION
• Consider an input image, we calculate the value of each and every pixel by
considering the weighted sum of pixels around it.
• The matrix of weights is referred to as the Kernel or Filter.


• Here we are calculating the value of circled pixel considering 3 neighbors
around it, assume that the weights w1, w2, w3, w4 are associated with these
4 pixels respectively
• In the above case, we have the kernel of size 2X2.


• Now we place the 2X2 filter over the first 2X2 portion of the image and take
the weighted sum and that would give the new value of the first pixel.


• The output of this operation would be: (aw + bx + ey + fz)
• Then we move the filter horizontally by one and place it over the next 2 X 2
portion of the input; in this case pixels of interest would be b, c, f, g and we
compute the output using the same technique and we would get:

• And then again we move the kernel/filter by 1 in the horizontal direction and
take the weighted sum.
• So, after this, the output from the first layer would look like:


• Then we move the kernel by 1 down in the vertical direction, calculate the
output, move the kernel in the horizontal direction and in general we move
the kernel like this: first, we start off with the starting portion of the image,
move the filter in the horizontal direction and cover this row completely then
we move the filter in the vertical direction(by some amount respective to top
left portion of image), again stride it horizontally through the entire row and
continue like this. In essence, we move the kernel left to right top to bottom.
• ******Note*****

• Explain padding ,pooling,strid e


Motivation for using convolution networks
Convolution leverages three important ideas to improve ML systems:
1. Sparse interactions
2. Parameter sharing
3. Equivariant representations
Convolution also allows for working with inputs of variable size
SPARSE INTERACTION
• Trivial neural network layers use matrix multiplication by a matrix of
parameters describing the interaction between the input and output unit.
• This means that every output unit interacts with every input unit.
• However, convolution neural networks have sparse interaction.
• This is achieved by making kernel smaller than the input e.g., an image can
have millions or thousands of pixels, but while processing it using kernel we
can detect meaningful information that is of tens or hundreds of pixels.
• This means that we need to store fewer parameters that not only reduces the
memory requirement of the model but also improves the statistical efficiency
of the model.
SHARED PARAMETERS
• If computing one feature at a spatial point (x1, y1) is useful then it should
also be useful at some other spatial point say (x2, y2).
• It means that for a single two-dimensional slice i.e., for creating one
activation map, neurons are constrained to use the same set of weights.
• In a traditional neural network, each element of the weight matrix is used
once and then never revisited, while convolution network has shared
parameters i.e., for getting output, weights applied to one input are the same
as the weight applied elsewhere.
EQUIVARIANT REPRESENTATIONS
• Due to parameter sharing, the layers of convolution neural network will have
a property of equivariance to translation.
• It says that if we changed the input in a way, the output will also get
changed in the same way.
IMP: Q. What happens if the stride of convolution layer increases? What
can be the maximum stride? Justify
• Stride is a component of convolutional neural networks, or neural
networks tuned for the compression of images and video data. Stride is a
parameter of the neural network's filter that modifies the amount of
movement over the image or video. For example, if a neural network's stride
is set to 1, the filter will move one pixel, or unit, at a time. The size of the
filter affects the encoded output volume, so stride is often set to a whole
integer, rather than a fraction or decimal.
• Naturally, as the stride, or movement, is increased, the resulting output will
be smaller.
• The choice of stride is also important, but it affects the tensor shape after the
convolution, hence the whole network. The general rule is to use stride=1 in
usual convolutions and preserve the spatial size with padding, and use
stride=2 when you want to downsample the image.
• When the stride of a convolutional layer increases, it means that the filter or
kernel moves a larger distance with each step during the convolution
operation. This results in a reduction in the spatial dimensions of the output
feature map. The maximum stride you can use depends on the dimensions of
the input data and the size of the filter.
• The formula to calculate the output size of a convolutional layer is:

• 𝑂𝑢𝑡𝑝𝑢𝑡 𝑆𝑖𝑧𝑒 = +1

• Increasing the stride value reduces the output size. The maximum stride you
can use is limited by the filter size and the input size. If the stride is too
large, the filter might not effectively cover the input data, causing
information loss and potentially making the network unable to learn
meaningful features.

• Typically, a common choice for stride is 1, which means the filter moves one
pixel at a time. Larger strides like 2 or 3 are often used in specific situations
to downsample the feature map and reduce computational complexity in
deeper layers of convolutional neural networks. However, the choice of
stride should be made carefully based on the specific task and network
architecture to balance information preservation and computational
efficiency.

Tensor
• Tensors represent deep learning data. They are multidimensional arrays,
used to store multiple dimensions of a dataset. Each dimension is called a
feature. For example, a cube storing data across an X, Y, and Z access is
represented as a 3-dimensional tensor.
• Tensors are multi-dimensional arrays with a uniform type used to represent
different features of the data.


Kernel Flipping
• In a CNN, each filter is learned to detect a specific feature in the input
image. By flipping the filter, we ensure that the filter learns to detect the
same feature regardless of its position in the input image.
• Kernel flipping is a term used in the context of convolutional neural
networks (CNNs) and image processing. It refers to the operation of rotating
a convolutional kernel (also known as a filter or a feature detector) by 180
degrees before applying it to an input image. This operation is also called
kernel rotation.
• In a CNN, convolution involves sliding a kernel over an input image to
perform feature extraction. By flipping or rotating the kernel, you can
achieve the same convolution operation as with the original kernel but in the
opposite direction. Kernel flipping can be useful in certain situations, such as
when designing neural network architectures or for specific image
processing tasks. It allows the network to capture different patterns or
features in the input data
Downsampling
• A downsampling layer helps to reduce the dimensionality of the features .
This helps save computations. Average pooling, max pooling, global average
pooling are some examples of downsampling layer. An alternative would be
to use strides in the convolution layer to downsample.

VARIANTS OF CONVOLUTION FUNCTIONS


STRUCTURED OUTPUT
• Convolutional networks can be trained to output high-dimensional structured
output rather than just a classification score.
• To produce an output map as same size as input map, only same-padded
convolutions can be stacked.
• The output of the first labelling stage can be refined successively by another
convolutional model.
• If the models use tied parameters, this gives rise to a type of recursive
model.



DATATYPES
• The data used with a convolutional network usually consist of several
channels,each channel being the observation of a different quantity at some
point in space or time.
• When output is variable sized, no extra design change needs to be made.
• When output requires fixed size (classification), a pooling stage with kernel
size proportional to input size needs to be used.
For CNNs, the data used usually has several channels (single channel or
multichannel) with different dimensionalities (1-D, 2-D, or 3-D). Each of these
channels represent an observation of a different quantity at some point in space
or time.
2D CONVOLUTION
2D convolution is a fundamental operation in image
processing and computer vision. It involves convolving a 2D input image
matrix with a 2D kernel (also known as a filter or mask) to produce a 2D output
image. The convolution operation is used for various tasks such as edge
detection, blurring, sharpening, and feature extraction in images.
There are several efficient convolution algorithms used in signal processing and
deep learning. Some of the notable ones include:
Direct Convolution: This is the most basic method, where the convolution
operation is computed by directly summing the element-wise products of the
filter and the input.

Fast Fourier Transform (FFT)-based Convolution: This approach uses the


FFT to perform convolution in the frequency domain, which can be much faster
for large inputs and filters.

Winograd Convolution: Winograd algorithms reduce the number of


multiplications required for convolution by transforming the data and filters.

Strassen Algorithm: This is an efficient method that uses matrix multiplication


techniques to compute convolutions.

Fast Convolution Algorithms for Deep Learning: Specialized algorithms like


the Fast Convolutional Network (FCN) and Fast Fourier Transform
Convolution (FFT Conv) are designed for deep learning applications, taking
advantage of the structure of convolutional neural networks.

CuDNN and cuBLAS: These are libraries developed by NVIDIA for GPU-
accelerated deep learning. They provide highly efficient implementations of
convolution operations on NVIDIA GPUs.

Depthwise Separable Convolution: Commonly used in mobile and embedded


applications, this method factorizes the standard convolution into depthwise and
pointwise convolutions, reducing computation.

Sparse Convolution Algorithms: These algorithms focus on reducing


computation for sparse inputs and filters, which is common in some
applications like natural language processing.
The choice of algorithm depends on the specific application and hardware
available. In deep learning, for example, the choice of algorithm is often
influenced by the deep learning framework and the hardware (CPU or GPU)
being used.

Applications of Convolutional Neural Networks


Convolutional Neural Networks (CNNs) excel in image and video analysis
tasks due to their hierarchical feature extraction. They find applications in
image recognition, object detection, facial recognition, medical image analysis,
self-driving cars, and more. CNNs leverage their convolutional and pooling
layers to automatically learn relevant features, making them pivotal in visual
data processing tasks.
Medical Image Computing – Predictive Analytics, Healthcare Data Science
The most fascinating image recognition CNN use case is medical image
computing.
The medical image includes a whole lot of further data analysis that arises from
initial image recognition.
CNN medical image classification detects anomalies in X-ray and MRI images
with better accuracy than the human eye.
These systems can display the series of photos as well as the differences
between them. This feature lays the groundwork for future predictive analytics.
Medical image classification is based on massive datasets such as Public Health
Records.

It serves as a training basis for the algorithms and patients’ confidential data and
test results.
They work together to create an analytical platform that monitors the current
status of the patient and forecasts results.
Health Risk Assessment Using Predictive Analytics
In healthcare, saving lives is a top priority. And it is always advantageous to
have the ability to predict the future. Because when it comes to patient care, you
must be prepared for anything. The health risk assessment is an excellent
demonstration.
Convolutional Neural Network Predictive Analytics is used in this field.
Here’s how CNN Health Risk Assessment works:
CNN uses a grid topology approach to process data, which is a set of spatial
correlations between data points.
The grid is two-dimensional in the case of images.
The grid is one-dimensional in the case of time series textual data.
The convolution algorithm is then used to identify some aspects of the input.
Take into account the variations of input.
Determine variable interactions.
Drug Discovery Using Predictive Analytics
The problem is that drug discovery and development is a time-consuming and
costly process. In drug discovery, scalability and cost-effectiveness are critical.
The process of developing new drugs lends itself well to the implementation of
neural networks. During the development of a new drug, there is a large amount
of data to consider.
The following stages are involved in the drug discovery process:
--This is a clustering and classification problem involving the analysis of
observed medical effects.
--Machine learning anomaly detection may be useful in hit discovery.
--The algorithm searches the compound database for new activities that can be
used for specific purposes.
--Then, using the Hit to Lead process, the results are narrowed down to the
most relevant.
-- That’s what dimensionality reduction and regression are all about.
--Then there’s Lead Optimization, which is the process of combining and
testing lead compounds to find the best approaches to them.
--The stages involve the examination of the organism’s chemical and physical
effects.

********UNIVERSITY MODEL
QUESTIONS***************

You might also like