Unit 4
Unit 4
Convolutional Neural Networks (CNNs) are the standard neural network architecture
used for prediction when the input observations are images, which is the case in a wide range
of neural network applications. A convolutional neural network (CNN) typically consists of
three layers: a convolutional layer, a pooling layer, and a fully connected layer. The main
advantage of using CNNs is that they do not require human supervision for image
classification and identifying important features in images.
1
The above represents the neural networks that start with n features and then learn
somewhere between √n and n “combinations” of these features to make predictions.
Having our neural network learn combinations of all of the input features i.e..,
combinations of all of the pixels in the input image—turns out to be very inefficient, since it
ignores the insight described in the prior section: that most of the interesting combinations of
features in images occur in these small patches.
Nevertheless, previously it was at least extremely easy to compute new features that
were combinations of all the input features: if we had f input features and wanted to compute
n new features, we could simply multiply the ndarray containing our input features by an f ×
n matrix. Convolution operation can be used to compute many combinations of the pixels
from local patches of the input image.
CONVOLUTIONAL LAYERS
Convolutional Neural Network (CNN) is the extended version of artificial neural
networks (ANN) which is predominantly used to extract the feature from the grid-like matrix
dataset. For example visual datasets like images or videos where data patterns play an
extensive role. Convolutional Neural Network consists of multiple layers like the input layer,
Convolutional layer, Pooling layer, and fully connected layers.
2
CNN takes an image as input, which is classified and process under a certain category
such as dog, cat, lion, tiger, etc. The computer sees an image as an array of pixels and
depends on the resolution of the image. Based on image resolution, it will see as h * w * d,
where h= height w= width and d=dimension. For example, An RGB image is 6 * 6 * 3 array
of the matrix, and the grayscale image is 4 * 4 * 1 array of the matrix.
In CNN, each input image will pass through a sequence of convolution layers along
with pooling, fully connected layers, filters (Also known as kernels). After that, we will apply
the Soft-max function to classify an object with probabilistic values 0 and 1.
Convolution Layer:
Convolution layer is the first layer to extract features from an input image. By
learning image features using a small square of input data, the convolutional layer preserves
the relationship between pixels. It is a mathematical operation which takes two inputs such as
image matrix and a kernel or filter. The dimension of the image matrix is h×w×d.
The dimension of the filter is f h ×f w ×d
The dimension of the output is (h-f h +1)×(w-f w +1)×1
3
Filters / Kernels:
A filter provides a measure for how close a patch or a region of the input resembles a
feature. A feature may be any prominent aspect – a vertical edge, a horizontal edge,
an arch, a diagonal, etc.
A filter acts as a single template or pattern, which, when convolved across the input,
finds similarities between the stored template & different locations/regions in the
input image.
Let us consider an example of detecting a vertical edge in the input image.
Each column of the 4×4 output matrix looks at exactly three columns & three rows
(the coloured boxes show the output of the filter as it moves over the input image).
The values in the output matrix represent the change in the intensity along the
horizontal direction w.r.t the columns in the input image.
The output image has the value 0 in the 1st & last column. It means there is no change
in intensity in the first three columns & the previous three columns of the input image.
On the other hand, the output is 30 in the 2nd & 3rd column, indicating a change in the
intensity of the corresponding columns of the input image.
Let’s start with consideration a 5*5 image whose pixel values are 0, 1, and filter
matrix 3*3 as:
The convolution of 5*5 image matrix multiplies with 3*3 filter matrix is called
"Features Map" and show as an output.
Convolution of an image with different filters can perform an operation such as blur,
sharpen, and edge detection by applying filters.
4
Strides:
During convolution, the filter slides from left to right and from top to bottom until it
passes through the entire input image. We define stride as the step of the filter. So, when we
want to down sample the input image and end up with a smaller output, we set S>0.
Padding:
In a convolutional layer, we observe that the pixels located on the corners and the
edges are used much less than those in the middle. A simple and powerful solution to this
problem is padding, which adds rows and columns of zeros to the input image. If we apply
padding in an input image of size HXH, the output image has dimensions (W+2P)X(H+2P).
Below we can see an example image before & after padding with p=2, where the dimension
is increased from 5X5 to 9X9
5
Pooling Layer:
Pooling layer plays an important role in pre-processing of an image. Pooling layer
reduces the number of parameters when the images are too large. Pooling is "downscaling" of
the image obtained from the previous layers. It can be compared to shrinking an image to
reduce its pixel density. Spatial pooling is also called downsampling or subsampling, which
reduces the dimensionality of each map but retains the important information. There are two
types of poolings that are used:
1. Max pooling: Max pooling is a pooling operation that selects the maximum element
from the region of the feature map covered by the filter. Thus, the output after max-
pooling layer would be a feature map containing the most prominent features of the
previous feature map.
2. Average pooling: Average pooling computes the average of the elements present in
the region of feature map covered by the filter. Thus, while max pooling gives the
most prominent feature in a particular patch of the feature map, average pooling gives
the average of features present in a patch.
6
Fully Connected Layer / Dense Layer:
The fully connected layer is a layer in which the input from the other layers will be
flattened into a vector and sent. It will transform the output into the desired number of classes
by the network.
In the above diagram, the feature map matrix will be converted into the vector such as
x1, x2, x3... xn with the help of fully connected layers. We will combine features to create a
model and apply the activation function such as softmax or sigmoid to classify the outputs as
a car, dog, truck, etc.
7
be captured in the feature map. These f feature maps will be created via f convolution
operations.
While each “set of features” detected by a particular set of weights is called a feature
map, in the context of a convolutional Layer, the number of feature maps is referred to as the
number of channels of the Layer—this is why the operation involved with the Layer is called
the multichannel convolution. In addition, the f sets of weights Wi are called the
convolutional filters.
8
4. RECURRENT NEURAL NETWORKS
INTRODUCTION TO RNN
Sequence Learning Problems: Sequence learning problems are different from other
machine learning problems in two key ways:
The inputs to the model are not of a fixed size
The inputs to the model are dependent on each other
Recurrent neural networks (RNNs) are a type of neural network that are well-suited
for solving sequence learning problems. RNNs work by maintaining a hidden state that is
updated at each time step. The hidden state captures the information from the previous inputs,
which allows the model to predict the next output.
Example:
Consider the task of auto completion. Given a sequence of characters, we want to
predict the next character. For example, given the sequence "d", we want to predict the next
character, which is "e".
An RNN would solve this problem by maintaining a hidden state. The hidden state
would be initialized with the information from the first input character, "d". Then, at the next
time step, the RNN would take the current input character, "e", and the hidden state as input
and produce a prediction for the next character. The hidden state would then be updated with
the new information. This process would be repeated until the end of the sequence. At the end
of the sequence, the RNN would output the final prediction.
9
Disadvantages of RNNs:
RNNs can be difficult to train
RNNs can be susceptible to vanishing and exploding gradients
RNNs are a powerful tool for solving sequence learning problems. They have been
used to achieve state-of- the-art results in many tasks, such as machine translation, text
summarization, and speech recognition.
10
To compute the gradients using BPTT, we need to first compute the explicit
derivative of the loss function with respect to the RNN's parameters. This is done by treating
all of the other inputs to the RNN as constants.
However, RNNs also have implicit dependencies, which means that the outputs of the
RNN at a given time step depends on the outputs of the RNN at previous time steps. This
makes it difficult to compute the gradients using the explicit derivative alone.
To address this problem, BPTT uses the chain rule to recursively compute the implicit
derivatives of the loss function with respect to the RNN's parameters. This involves summing
over all of the paths from the loss function to each parameter, where each path is a sequence
of RNN outputs and weights.
BPTT can be computationally expensive, but it is a powerful tool for training RNNs.
It has been used to achieve state-of-the-art results on a variety of sequence learning tasks,
such as natural language processing, machine translation, and speech recognition.
Vanishing and exploding gradients can be a major problem for training RNNs. If the
gradients vanish, the RNN will not be able to learn to perform the desired task. If the
gradients explode, the RNN will learn very quickly, but it will likely overfit the training data
and not generalize well to new data.
11
There are a number of techniques that can be used to address the problem of
vanishing and exploding gradients, such as:
Truncated backpropagation: Truncated backpropagation only backpropagates the
gradients through a fixed number of layers. This helps to prevent the gradients from
vanishing.
Gradient clipping: Gradient clipping normalizes the gradients so that their magnitude
does not exceed a certain threshold. This helps to prevent the gradients from
exploding.
Weight initialization: The way that the RNN's parameters are initialized can have a
big impact on the problem of vanishing and exploding gradients. It is important to
initialize the parameters in a way that prevents the gradients from becoming too small
or too large.
Long Short Term Memory (LSTM) and Gated Recurrent Units (GRUs):
Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU) are two types
of recurrent neural networks (RNNs) that are specifically designed to learn long-term
dependencies in sequential data. They are both widely used in a variety of tasks, including
natural language processing, machine translation, speech recognition, and time series
forecasting.
Both LSTMs and GRUs use a gating mechanism to control the flow of information
through the network. This allows them to learn which parts of the input sequence are
important to remember and which parts can be forgotten.
LSTM Architecture:
An LSTM cell has three gates: an input gate, a forget gate, and an output gate.
The input gate controls how much of the current input is added to the cell state
The forget gate controls how much of the previous cell state is forgotten
12
The output gate controls how much of the cell state is output to the next cell in the
sequence
The LSTM cell also has a cell state, which is a long-term memory that stores information
about the previous inputs. The cell state is updated at each time step based on the input gate,
forget gate, and output gate.
GRU Architecture:
A GRU cell has two gates: a reset gate and an update gate.
The reset gate controls how much of the previous cell state is forgotten.
The update gate controls how much of the previous cell state is combined with the
current input to form the new cell state.
The GRU cell does not have a separate output gate. Instead, the output of the GRU
cell is simply the updated cell state.
The best choice of architecture for a particular task depends on a number of factors,
including the size and complexity of the dataset, the available computing resources, and the
specific requirements of the task.
In general, LSTMs are recommended for tasks where the input sequences are very
long or complex, or where the task requires a high degree of accuracy. GRUs are a good
choice for tasks where the input sequences are shorter or less complex, or where speed and
efficiency are important considerations.
RNN CODE
import keras
13
# Train the model
model.fit(x_train, y_train, epochs=10)
# Make predictions
predictions = model.predict(x_test)
This code defines a simple RNN model with one LSTM layer, one dense layer, and
one output layer. The LSTM layer has 128 hidden units, and the dense layer has 64 hidden
units. The output layer has a single unit, and it uses the sigmoid activation function to
produce a probability score.
The model is compiled using the binary cross-entropy loss function and the Adam
optimizer. The model is then trained on the training data for 10 epochs.
Once the model is trained, it can be evaluated on the test data to assess its
performance. The model can also be used to make predictions on new data. Here is an
example of how to use the model to make predictions:
This code will print the prediction for the new data sample, which is a probability
score between 0 and 1. A probability score closer to 1 means that the model is more confident
in the prediction.
This is just a simple example of RNN code, and there are many other ways to
implement RNNs in Python. For more complex tasks, you may need to use a different RNN
architecture or add additional layers to the model.
14
PYTORCH TENSORS
PyTorch is an optimized Deep Learning tensor library based on Python and Torch and
is mainly used for applications using GPUs and CPUs. PyTorch is favored over other Deep
Learning frameworks like TensorFlow and Keras since it uses dynamic computation graphs
and is completely Pythonic.
Advantages of PyTorch:
It is easy to debug and understand the code
It includes many layers as Torch
It includes lot of loss functions
It can be considered as NumPy extension to GPUs
It allows building networks whose structure is dependent on computation itself
15
Layer: A layer is a unit of computation in a neural network. It performs a specific
mathematical operation on the input data.
Optimizer: An optimizer is an algorithm that updates the model's parameters during
training.
Loss: A loss function measures the error between the model's predictions and the
ground truth labels.
Models are created using the torch.nn.Module class. Layers are created using the
different classes provided by the torch.nn module. For example, to create a linear layer, you
would use the torch.nn.Linear class.
Optimizers are created using the classes provided by the torch.optim module. For
example, to create an Adam optimizer, you would use the torch.optim.Adam class.
Loss functions are created using the classes provided by the torch.nn.functional
module. For example, to create a mean squared error loss function, we would use the
torch.nn.functional.mse_loss function.
Once you have created the model, layers, optimizer, and loss function, you can train
the model using the following steps:
Forward pass: The input data is passed through the model to produce predictions.
Loss calculation: The loss function is used to calculate the error between the
predictions and the ground truth labels.
Backward pass: The gradients of the loss function with respect to the model's
parameters are calculated.
Optimizer step: The optimizer uses the gradients to update the model's parameters.
This process is repeated for a number of epochs until the model converges and
achieves the desired performance.
CNN IN PYTORCH
Convolutional neural networks (CNNs) are a type of neural network that are
specifically designed to work with image data. CNNs are able to learn spatial features in
images, which makes them very effective for tasks such as image classification, object
detection, and image segmentation.
PyTorch is a popular Python library for machine learning. It provides a number of
features that make it easy to build, train, and deploy CNNs.
16
To implement a CNN in PyTorch, you can use the torch.nn.Conv2d layer. This layer
performs a convolution operation on the input data. The convolution operation is a
mathematical operation that extracts features from the input data.
CNNs also use pooling layers to reduce the spatial size of the input data. This helps to
reduce the number of parameters in the network and makes it more efficient to train. Here is a
simple example of a CNN in PyTorch:
import torch
class CNN(torch.nn.Module):
def __init__(self):
super(CNN, self).__init__()
17
# Pass the flattened output through the fully connected layers
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
return x
# Create the model
model = CNN()
# Train the model
...
This code defines a simple CNN with two convolutional layers, two pooling layers,
and three fully connected layers. The convolutional layers have 6 and 16 filters, respectively.
The pooling layers have a kernel size of 2x2 and a stride of 2. The fully connected layers
have 120, 84, and 10 units, respectively.
The model is trained using the model.fit() method. The model can then be used to
make predictions on new data using the model.predict() method.
For more complex tasks, we may need to use a different CNN architecture or add
additional layers to the model. We can also use PyTorch to implement other types of neural
networks, such as recurrent neural networks (RNNs) and long short-term memory (LSTM)
networks.
18
For more complex tasks, we may need to use a different RNN architecture or add
additional layers to the model. We can also use PyTorch to implement bidirectional
RNNs, stacked RNNs, and other advanced RNN architectures.
PyTorch also provides a number of tools for training and evaluating RNNs, such as
the torch.optim module and the torch.nn.functional module.
RNN Code:
import torch
class LSTM(torch.nn.Module):
def __init__(self, input_size, hidden_size, num_layers)
super(LSTM, self).__init__()
self.lstm = torch.nn.LSTM(input_size, hidden_size, num_layers)
#Make predictions
………
19