0% found this document useful (0 votes)
28 views52 pages

DL Lab Manual

Uploaded by

ANIKET LOHKARE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views52 pages

DL Lab Manual

Uploaded by

ANIKET LOHKARE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Experiment 1

AIM: Implement ADAGRAD GD learning algorithms to learn the parameters of the


supervised single layer feed forward neural network

Theory:

AdaGrad (Adaptive Gradient Algorithm) is an optimization method that adjusts the learning
rate dynamically for each parameter based on past gradients. It was designed to perform well
on sparse data and is particularly useful for problems where certain parameters (features) are
updated more frequently than others. This makes it effective for tasks like natural language
processing and recommendation systems.

The core idea of AdaGrad is to scale the learning rate of each parameter individually based
on the historical sum of squared gradients. Parameters with large gradients in the past receive
smaller updates, while parameters with smaller or infrequent gradients receive relatively
larger updates. This allows the model to handle features with varying frequencies more
effectively.

Formula:

For each parameter θ, the AdaGrad update is done as follows:

1. Gradient Accumulation:

2. Parameter Update:

Characteristics:

 Adaptive Learning Rate:


o AdaGrad adapts the learning rate for each parameter based on the historical
gradients. Parameters with larger gradients in the past receive smaller learning
rates, and those with smaller or infrequent gradients receive larger learning
rates.
 Effective with Sparse Data:
o AdaGrad is particularly effective for sparse data, as it allows features with
smaller gradients (which may be less frequent) to have larger learning rates,
ensuring they are updated adequately.
 No Manual Adjustment of Learning Rate:
o The automatic adjustment of learning rates per parameter reduces the need for
manual hyperparameter tuning, making it easier to implement compared to
standard gradient descent methods.

Output :

import numpy as np

# Define the activation function (sigmoid)

def sigmoid(x):

return 1 / (1 + np.exp(-x))

# Derivative of the sigmoid function

def sigmoid_derivative(x):

return x * (1 - x)

# AdaGrad update function

def adagrad_update(params, grads, grad_squared_accum, learning_rate, epsilon=1e-8):

for key in params.keys():

# Accumulate squared gradients

grad_squared_accum[key] += grads[key] ** 2

# Update the parameters using AdaGrad

params[key] -= (learning_rate / (np.sqrt(grad_squared_accum[key]) + epsilon)) * grads[key]

# Forward pass function

def forward(X, weights):

# Linear combination of inputs and weights

z = np.dot(X, weights['W']) + weights['b']


# Apply activation function

output = sigmoid(z)

return output

# Backpropagation to compute gradients

def backward(X, y, output, weights):

# Calculate the error (output - target)

error = output - y

# Calculate the gradient of the output with respect to weights (using the derivative of the
activation function)

d_output = error * sigmoid_derivative(output)

# Gradients

gradients = {

'W': np.dot(X.T, d_output),

'b': np.sum(d_output, axis=0, keepdims=True)

return gradients, error

# Function to train a neural network using AdaGrad

def train_adagrad(X, y, learning_rate=0.01, epochs=1000):

# Initialize parameters (weights and bias)

input_dim = X.shape[1] # Number of features

output_dim = y.shape[1] # Number of output neurons

weights = {

'W': np.random.randn(input_dim, output_dim),

'b': np.zeros((1, output_dim))

}
# Initialize accumulator for squared gradients (AdaGrad)

grad_squared_accum = {

'W': np.zeros_like(weights['W']),

'b': np.zeros_like(weights['b'])

# Training loop

for epoch in range(epochs):

# Forward pass

output = forward(X, weights)

# Compute gradients using backpropagation

grads, error = backward(X, y, output, weights)

# Update weights using AdaGrad

adagrad_update(weights, grads, grad_squared_accum, learning_rate)

# Calculate and print the loss (mean squared error) every 100 epochs

loss = np.mean(np.square(error))

if epoch % 100 == 0:

print(f'Epoch {epoch}/{epochs}, Loss: {loss:.4f}')

return weights

# Example usage:

# Sample dataset (X: input features, y: target labels)

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) # XOR problem

y = np.array([[0], [1], [1], [0]]) # Target outputs


# Training the neural network

weights = train_adagrad(X, y, learning_rate=0.1, epochs=1000)

# Testing the model after training

output = forward(X, weights)

print("Predicted outputs after training:")

print(output)

Conclusion:
AdaGrad is an effective optimization algorithm for handling sparse data by adapting learning rates
based on past gradients. Its automatic adjustment simplifies training, but the learning rate decay can
slow down convergence over time. Despite this, it's well-suited for tasks with uneven feature
distributions.
Experiment No. 02
Aim: To implement a backpropagation algorithm to train a Deep
Neural Network (DNN) with at least two hidden layers.

Requirement:

 Python programming language


 NumPy library for numerical computations
 Matplotlib library for plotting (optional, for visualization)

Theory:

Theory:
Deep Neural Networks (DNNs): DNNs are a class of artificial neural
networks with multiple layers between the input and output layers. Each
layer consists of neurons, and each neuron in one layer is connected to
every neuron in the next layer. DNNs are capable of modeling complex
patterns in data.

Backpropagation Algorithm: Backpropagation is an algorithm used for


training DNNs. It involves the following steps:
1.Forward Pass: Compute the output of the network for a given input by
passing the input through the network layers.
2.Compute Loss: Calculate the loss (error) between the predicted output
and the actual target.
3.Backward Pass: Compute the gradients of the loss with respect to each
weight in the network using the chain rule of calculus.
4.Update Weights: Adjust the weights using the computed gradients to
minimize the loss.

Neural Networks: A neural network is a computational model inspired


by the way biological neural networks in the human brain process
information. It consists of layers of interconnected nodes, or neurons,
where each connection has an associated weight. These weights are
adjusted during training to minimize the error between the
network's predictions and the actual target values.

Layers in a Neural Network:


1. Input Layer: The first layer that receives the input features.
2. Hidden Layers: Layers between the input and output layers that
perform
computations and extract features from the input. The complexity and
depth of the
neural network are determined by the number of hidden layers and
neurons in each
layer.
3. Output Layer: The final layer that produces the output of the network.

Activation Functions: Activation functions introduce non-linearity into


the network, allowing it tomodel complex patterns.
Common activation functions include:

Forward Propagation: In forward propagation, the input data passes


through each layer of the network to compute the final output. The
process involves:
1. Multiplying the input by the weights.
2. Adding a bias term.
3. Applying an activation function.

Loss Function: The loss function measures the difference between the
network’s predictions and the actual target values. Common loss
functions include:
1. Mean Squared Error (MSE) (for regression tasks):
2. Cross-Entropy Loss (for classification tasks):

Backpropagation: Backpropagation is the algorithm used to train the


neural network by updating the weights to minimize the loss. The process
involves:

1. Calculating the Gradient: Compute the gradient of the loss


function with respect to each weight using the chain rule of
calculus.
2. Updating Weights: Adjust the weights by subtracting a fraction of
the gradient (determined by the learning rate).

Training Process:

1. Initialization: Initialize the weights and biases randomly.


2. Forward Pass: Compute the output of the network for the given input.
3. Loss Computation: Calculate the loss using the loss function.
4. Backward Pass: Perform backpropagation to compute the gradients.
5. Update Weights: Adjust the weights and biases using the computed
gradients.
6. Iteration: Repeat the process for a specified number of epochs or until
the loss converges.

Overfitting and Regularization:


Overfitting: When the model performs well on the training data but poorly on
unseen data, it is said to overfit. This occurs when the model learns noise in the
training data.

Regularization techniques help prevent overfitting:


1. L2 Regularization (Ridge): Adds a penalty proportional to the square of
the weights.
2. Dropout: Randomly drops neurons during training to prevent co-
adaptation.

Example Problem (XOR):

The XOR function is defined as:


● XOR(0, 0) = 0
● XOR(0, 1) = 1
● XOR(1, 0) = 1
● XOR(1, 1) = 0
A simple DNN with two hidden layers can learn this function using
backpropagation.

Program:

import numpy as np

# Activation functions and their


derivatives def sigmoid(x):
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
return x * (1 - x)

def relu(x):
return np.maximum(0, x)

def relu_derivative(x):
return np.where(x > 0, 1, 0)
# Loss function (Mean Squared Error)

def mean_squared_error(y_true, y_pred):


return np.mean((y_true - y_pred) ** 2)

# Initialize network parameters


input_size = 2 hidden_size1 = 4
hidden_size2 = 4
output_size = 1
learning_rate = 0.01
epochs = 10000

# Initialize weights and biases


weights1 = np.random.rand(input_size, hidden_size1)
biases1 = np.random.rand(hidden_size1)
weights2 = np.random.rand(hidden_size1, hidden_size2)
biases2 = np.random.rand(hidden_size2)
weights3 = np.random.rand(hidden_size2, output_size)
biases3 = np.random.rand(output_size)

# Training data (XOR problem)


X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Training the network


for epoch in range(epochs):
# Forward pass
z1 = np.dot(X, weights1) + biases1
a1 = sigmoid(z1)

z2 = np.dot(a1, weights2) + biases2


a2 = sigmoid(z2)
z3 = np.dot(a2, weights3) + biases3
output = sigmoid(z3)

# Compute loss
loss = mean_squared_error(y, output)

# Backward pass
output_error = y - output
output_delta = output_error * sigmoid_derivative(output)

a2_error = output_delta.dot(weights3.T)
a2_delta = a2_error * sigmoid_derivative(a2)

a1_error = a2_delta.dot(weights2.T)
a1_delta = a1_error * sigmoid_derivative(a1)

# Update weights and biases

weights3 += a2.T.dot(output_delta) * learning_rate


biases3 += np.sum(output_delta, axis=0) *
learning_rate

weights2 += a1.T.dot(a2_delta) * learning_rate


biases2 += np.sum(a2_delta, axis=0) * learning_rate

weights1 += X.T.dot(a1_delta) * learning_rate


biases1 += np.sum(a1_delta, axis=0) * learning_rate
# Print loss every 1000
epochs if epoch % 1000 == 0:
print(f'Epoch {epoch}, Loss: {loss}')

# Testing the network


print("Final output after training:")
print(output)

Output:
Epoch 0, Loss: 0.2668552327802867
Epoch 1000, Loss: 0.12346139338998134
Epoch 2000, Loss: 0.08922302013674174
Epoch 3000, Loss: 0.06991524317483898
Epoch 4000, Loss: 0.05816550148456629
Epoch 5000, Loss: 0.05003554790511628
Epoch 6000, Loss: 0.044306690631420665
Epoch 7000, Loss: 0.03955813726381803
Epoch 8000, Loss: 0.03551171131863506
Epoch 9000, Loss: 0.0320658267938436

Final outputs:
[[0.05232542]
[0.94755168]
[0.95114438]
[0.05815823]]
Conclusion:

In this implementation, we trained a DNN with two hidden layers using the
backpropagation algorithm. The network was able to learn the XOR problem,
a classic test case for neural networks. By iterating through the training data
and updating the weights and biases using the gradients computed during
backpropagation, the network minimized the loss and improved its predictions
over time.
Experiment 3
Aim : Design and implement a fully connected deep neural network with at least 2 hidden
layers for a classification application . Use appropriate Learning Algorithm, output function
and loss function
Pre-Requisite: python 3
Theory:
Deep Learning Neural Network for Classification :

To create and train a simple convolutional neural network for deep learning classification.

Convolutional neural networks are essential tools for deep learning,

The example demonstrates how to:


● Load and explore image data.
● Define the neural network architecture.
● Specify training options.
● Train the neural network.
● Predict the labels of new data and calculate the classification accuracy.

There are various hidden layers such as :


Image Input Layer: An imageInputLayer is where you specify the image size, which, in this
case, is 28-by-28-by-1. These numbers correspond to the height, width, and the channel
size. The digit data consists of grayscale images, so the channel size (color channel) is 1. For
a color image, the channel size is 3, corresponding to the RGB values. You do not need to
shuffle the data because trainnet, by default, shuffles the data at the beginning of
training. trainnet can also automatically shuffle the data at the beginning of every epoch
during training.
Convolutional Layer: In the convolutional layer, the first argument is filterSize, which is the
height and width of the filters the training function uses while scanning along the images. In
this example, the number 3 indicates that the filter size is 3-by-3. You can specify different
sizes for the height and width of the filter. This parameter determines the number of
feature maps For a convolutional layer with a default stride of 1, "same" padding ensures
that the spatial output size is the same as the input size.
Batch Normalization Layer: Batch normalization layers normalize the activations and
gradients propagating through a neural network, making neural network training an easier
optimization problem. Use batch normalization layers between convolutional layers and
nonlinearities, such as ReLU layers, to speed up neural network training and reduce the
sensitivity to neural network initialization. Use batchNormalizationLayer to create a batch
normalization layer
Output Function: Softmax, which is suitable for multi-class classification as it provides
class probabilities.
Loss Function: Categorical cross-entropy, which is used for multi-class classification
problems.

Program and Output:

Conclusion :
The design and implementation of the fully connected deep neural network were successful
in meeting the objectives of the classification task. The choice of architecture, learning
algorithm, output function, and loss function proved effective, and the model demonstrated
strong performance.
Experiment 4

Aim: Design and Implementation of an Autoencoder Model for Image Compression.

Theory :

Autoencoders are a type of neural network designed to learn efficient representations of data. They
consist of two main components: an encoder that compresses the input data into a latent-space
representation, and a decoder that reconstructs the original data from this representation. In the
context of image compression, autoencoders can learn to encode images into a compact form while
preserving as much of the original information as possible, thereby achieving compression.

We will use the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits.

Methodology :

1. Data Preprocessing:

- Load and normalize the MNIST dataset.

- Split the dataset into training and testing sets.

- Reshape the images as needed for input into the autoencoder.

2. Model Architecture:

- Encoder :

- Input Layer: Accepts the image data (28x28).

- Convolutional Layers: Extract features from the image data.

- Pooling Layers: Reduce the dimensionality of the feature maps.

- Flatten Layer: Convert the 2D feature maps to 1D.

- Dense Layer: Produce the latent-space representation.

- Decoder :

- Dense Layer: Expands the latent-space representation back to the size of the feature maps.

- Reshape Layer: Reshape the data to the dimensions before the convolutional layers.

- Transpose Convolutional Layers: Upsample the feature maps.

- Output Layer: Produce the reconstructed image (28x28).

3. Model Implementation:
- Use a deep learning framework such as TensorFlow or PyTorch.

- Define the autoencoder model using the specified architecture.

- Compile the model with a suitable loss function (e.g., Mean Squared Error) and optimizer (e.g.,
Adam).

4. Training:

- Train the autoencoder on the MNIST training set.

- Monitor the loss and adjust hyperparameters as needed.

- Use techniques like early stopping to prevent overfitting.

5. Evaluation:

- Evaluate the autoencoder on the test set.

- Calculate the reconstruction loss to measure how well the model has learned to reconstruct the
images.

- Compare the compressed image size to the original image size to determine the compression ratio.

- Visualize a few sample compressed and reconstructed images to qualitatively assess performance.

6. Analysis:

- Discuss the trade-offs between compression ratio and reconstruction quality.

- Analyze how changes in model architecture or hyperparameters affect performance.

- Consider potential improvements or alternative approaches to enhance compression.

Output:
Epoch 1/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 210s 847ms/step - loss: 0.6328 - val_loss: 0.5807
Epoch 2/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 154s 786ms/step - loss: 0.5781 - val_loss: 0.5741
Epoch 3/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 151s 770ms/step - loss: 0.5731 - val_loss: 0.5722
Epoch 4/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 150s 766ms/step - loss: 0.5707 - val_loss: 0.5699
Epoch 5/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 153s 782ms/step - loss: 0.5699 - val_loss: 0.5684
Epoch 6/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 153s 779ms/step - loss: 0.5681 - val_loss: 0.5680
Epoch 7/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 154s 783ms/step - loss: 0.5665 - val_loss: 0.5663
Epoch 8/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 152s 776ms/step - loss: 0.5658 - val_loss: 0.5657
Epoch 9/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 216s 849ms/step - loss: 0.5641 - val_loss: 0.5643
Epoch 10/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 157s 803ms/step - loss: 0.5634 - val_loss: 0.5640
Epoch 11/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 192s 749ms/step - loss: 0.5633 - val_loss: 0.5631
Epoch 12/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 143s 728ms/step - loss: 0.5628 - val_loss: 0.5627
Epoch 13/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 143s 728ms/step - loss: 0.5624 - val_loss: 0.5627
Epoch 14/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 143s 731ms/step - loss: 0.5621 - val_loss: 0.5620
Epoch 15/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 742ms/step - loss: 0.5611 - val_loss: 0.5616
Epoch 16/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 741ms/step - loss: 0.5609 - val_loss: 0.5612
Epoch 17/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 146s 747ms/step - loss: 0.5612 - val_loss: 0.5615
Epoch 18/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 152s 775ms/step - loss: 0.5612 - val_loss: 0.5609
Epoch 19/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 205s 793ms/step - loss: 0.5597 - val_loss: 0.5607
Epoch 20/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 742ms/step - loss: 0.5593 - val_loss: 0.5610
Epoch 21/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 161s 820ms/step - loss: 0.5594 - val_loss: 0.5603
Epoch 22/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 162s 825ms/step - loss: 0.5599 - val_loss: 0.5600
Epoch 23/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 157s 801ms/step - loss: 0.5594 - val_loss: 0.5602
Epoch 24/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 150s 767ms/step - loss: 0.5595 - val_loss: 0.5597
Epoch 25/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 740ms/step - loss: 0.5595 - val_loss: 0.5599
Epoch 26/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 153s 779ms/step - loss: 0.5588 - val_loss: 0.5598
Epoch 27/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 156s 798ms/step - loss: 0.5584 - val_loss: 0.5596
Epoch 28/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 164s 836ms/step - loss: 0.5585 - val_loss: 0.5591
Epoch 29/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 151s 773ms/step - loss: 0.5581 - val_loss: 0.5593
Epoch 30/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 754ms/step - loss: 0.5583 - val_loss: 0.5591
Epoch 31/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 757ms/step - loss: 0.5580 - val_loss: 0.5616
Epoch 32/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 757ms/step - loss: 0.5581 - val_loss: 0.5590
Epoch 33/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 159s 813ms/step - loss: 0.5582 - val_loss: 0.5596
Epoch 34/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 205s 828ms/step - loss: 0.5584 - val_loss: 0.5595
Epoch 35/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 167s 850ms/step - loss: 0.5574 - val_loss: 0.5584
Epoch 36/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 163s 834ms/step - loss: 0.5579 - val_loss: 0.5584
Epoch 37/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 168s 857ms/step - loss: 0.5576 - val_loss: 0.5585
Epoch 38/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 165s 844ms/step - loss: 0.5576 - val_loss: 0.5584
Epoch 39/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 171s 871ms/step - loss: 0.5573 - val_loss: 0.5584
Epoch 40/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 152s 776ms/step - loss: 0.5574 - val_loss: 0.5591
Epoch 41/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 755ms/step - loss: 0.5569 - val_loss: 0.5591
Epoch 42/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 742ms/step - loss: 0.5573 - val_loss: 0.5578
Epoch 43/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 754ms/step - loss: 0.5578 - val_loss: 0.5577
Epoch 44/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 162s 829ms/step - loss: 0.5576 - val_loss: 0.5578
Epoch 45/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 146s 742ms/step - loss: 0.5575 - val_loss: 0.5577
Epoch 46/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 146s 747ms/step - loss: 0.5564 - val_loss: 0.5575
Epoch 47/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 756ms/step - loss: 0.5566 - val_loss: 0.5581
Epoch 48/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 165s 841ms/step - loss: 0.5562 - val_loss: 0.5575
Epoch 49/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 164s 834ms/step - loss: 0.5565 - val_loss: 0.5579
Epoch 50/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 170s 870ms/step - loss: 0.5565 - val_loss: 0.5574
313/313 ━━━━━━━━━━━━━━━━━━━━ 17s 42ms/step

Conclusion :

An autoencoder model for image compression was implemented successfully.


Experiment 5
Aim: Design and Implement a CNN model for digit recognition application.
Pre-Requisite: Python 3.4 or above
Theory:
CNN:
A Convolutional Neural Network (CNN) is a type of deep learning algorithm that
is particularly well suited for image recognition and processing tasks. It is made
up of multiple layers, including convolutional layers, pooling layers, and fully
connected layers.

digit recognition application :


In the current age of digitization, handwriting recognition plays an important
role in information processing. There is a lot of information available on paper
and less processing of digital files than the processing of traditional paper files.
The purpose of the handwriting recognition system is to convert handwritten
letters into machine-readable formats. Major applications include vehicle
licence -plate identification, postal paper-sorting services, historical document
preservation in the check truncation system (CTS) scanning and archaeology
departments, old document automation in libraries and banks, and more.
All of these areas deal with large databases and therefore require high
identification accuracy, low computational complexity, and consistent
performance of the identification system. Over time, the number of fields that
can implement deep learning is increasing. In deep learning, convolutional
neural networks (CNN) are being used for visual image analysis. CNN can be
used in object detection, facial recognition, robotics, video analysis,
segmentation, pattern recognition, natural language processing, spam
detection, topical gradation, regression analysis, speech recognition, image
classification.
Detection of handwritten numbers, including accuracy in these areas, has
reached human perfection using deep convolutional neural networks (CNNs).
Recently CNN has become one of the most attractive approaches and has been
the ultimate factor in recent success and in several challenging machine
learning applications. Considering all the factors stated above have chosen
CNN for our challenging tasks of image classification used it to identify
handwritten numbers, which is one of the higher education and business
transactions. There are many applications of handwritten digit recognition for
our real-life purposes. Hence using the Convolutional neural Network (CNN).
Block diagram:
OUTPUTS:

# example of loading the mnist dataset


from tensorflow.keras.datasets import mnist
from matplotlib import pyplot as plt
# load dataset
(trainX, trainy), (testX, testy) = mnist.load_data()
# summarize loaded dataset
print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))
# plot first few images
for i in range(9):
# define subplot
plt.subplot(330 + 1 + i)
# plot raw pixel data
plt.imshow(trainX[i], cmap=plt.get_cmap('gray'))
# show the figure
plt.show()
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

# Load the MNIST dataset


(train_images, train_labels), (test_images, test_labels) =
datasets.mnist.load_data()

# Preprocess the data


train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255

# Define the CNN model


model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])

# Compile the model


model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

# Train the model


history = model.fit(train_images, train_labels, epochs=5,
validation_data=(test_images, test_labels))

# Evaluate the model


test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc}")

# Plot training & validation accuracy values


plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

# Plot training & validation loss values


plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

Epoch 1/5
1875/1875 [==============================] - 59s 31ms/step - loss:
0.1407 - accuracy: 0.9565 - val_loss: 0.0556 - val_accuracy: 0.9822
Epoch 2/5
1875/1875 [==============================] - 56s 30ms/step - loss:
0.0454 - accuracy: 0.9861 - val_loss: 0.0298 - val_accuracy: 0.9895
Epoch 3/5
1875/1875 [==============================] - 57s 31ms/step - loss:
0.0327 - accuracy: 0.9903 - val_loss: 0.0394 - val_accuracy: 0.9877
Epoch 4/5
1875/1875 [==============================] - 56s 30ms/step - loss:
0.0247 - accuracy: 0.9926 - val_loss: 0.0292 - val_accuracy: 0.9918
Epoch 5/5
1875/1875 [==============================] - 55s 29ms/step - loss:
0.0196 - accuracy: 0.9939 - val_loss: 0.0304 - val_accuracy: 0.9919
313/313 [==============================] - 3s 9ms/step - loss: 0.0304 -
accuracy: 0.9919
Test accuracy: 0.9919000267982483
Conclusion: thus, we have found that CNN gave the most accurate results for
digit recognition. So , this make us conclude that CNN is best suitable for any
type of prediction problem.
Experiment 6
Aim: Design and Implement a CNN model for image classification.
Pre-Requisite: Python 3.4 or above
Theory:
CNN:
A Convolutional Neural Network (CNN) is a type of deep learning algorithm that is particularly well-
suited for image recognition and processing tasks. It is made up of multiple layers, including
convolutional layers, pooling layers, and fully connected layers.

CNNs are trained using a large dataset of labeled images, where the network learns to recognize
patterns and features that are associated with specific objects or classes. Proven to be highly effective
in image-related tasks, achieving state-of-the-art performance in various computer vision
applications. Their ability to automatically learn hierarchical representations of features makes them
well-suited for tasks where the spatial relationships and patterns in the data are crucial for accurate
predictions. CNNs are widely used in areas such as image classification, object detection, facial
recognition, and medical image analysis.

Image Classification :

Image classification is the process of assigning a label to an image from a predefined set of
categories. This is commonly achieved using convolutional neural networks (CNNs), which
are adept at learning hierarchical features from images. CNNs use convolutional layers to
detect edges and textures, pooling layers to reduce spatial dimensions, and fully connected
layers for classification. By training on large datasets, CNNs can accurately classify images
into various categories such as animals, objects, and scenes. This technology is widely used in
applications like object detection, face recognition, medical imaging, and autonomous driving,
enabling machines to interpret and understand visual data.

The typical process of Image Classification involves the following step:

 Data Collection : Gather a dataset containing a large number of labelled images.


Each image in the dataset should be associated with a specific class or category.
 Data Preprocessing : Prepare the data for training by resizing images to a common
size, normalizing pixel values, and splitting the dataset into training and testing sets.
Data augmentation techniques can also be used to artificially increase the size of the
training dataset and improve the model’s generalization.
 Feature Extraction : In traditional computer vision approaches, engineers would
manually design features like edge detectors, texture descriptors, etc. However, with
deep learning, convolutional neural networks (CNNs) can automatically learn relevant
features from raw pixel data.
 Model Training : Train a machine learning model, often a CNN, on a training
dataset. During training, the model learns to identify patterns and features that are
useful for distinguishing different classes.
 Model Evaluation : After training, the model’s performance is evaluated using the
testing dataset. The accuracy and other metrics are calculated to assess how well the
model generalizes to unseen data.
 Model Deployment : Once the model’s performance is satisfactory, it can be
deployed to classify new, unseen images. This could be in the form of a mobile app, a
web service, or integration into other software system.
Outputs:
Conclusion: In summary, image classification with Convolutional Neural Networks (CNNs)
enables accurate labeling of images by extracting and learning key features automatically. This
technology is crucial for tasks ranging from object recognition to medical diagnostics,
continually advancing our ability to interpret and utilize visual data in diverse applications.
Experiment 7

LSTM Model in Deep Learning


1. Aim

To explore the application and effectiveness of Long Short-Term


Memory (LSTM) networks in modeling and predicting sequential data.
2. Software Requirements

To implement and experiment with LSTM models, the following software


and tools are required:
• Programming Language: Python
• Deep Learning Framework: TensorFlow or PyTorch
• Libraries: NumPy, Pandas, Matplotlib, Scikit-learn
• Development Environment: Jupyter Notebook or any IDE like PyCharm
• Hardware: GPU (optional but recommended for faster training)
3. Theory

Long Short-Term Memory (LSTM) networks are a specialized form of


recurrent neural networks (RNNs) designed to address the limitations of
traditional RNNs, particularly the vanishing gradient problem. LSTMs are
capable of learning long-term dependencies in sequential data, making them
suitable for tasks where context over time is crucial.
Architecture:

• Cell State: The cell state is the key component of LSTMs, acting as a
conveyor belt that carries information across the sequence. It allows
information to flow unchanged, which helps in maintaining long-
term dependencies.
• Gates: LSTMs use three gates to control the flow of information:

o Forget Gate: This gate decides what information to discard from


the cell state. It helps the model to forget irrelevant information
from the past.
o Input Gate: This gate determines which new information to add to
the cell state. It helps in updating the cell state with new, relevant
information.
o Output Gate: This gate controls the output based on the cell state and
input. It decides what part of the cell state should be outputted.

Working Mechanism:

1. Forget Gate: The forget gate takes the previous hidden state and the
current input to decide what information to discard from the cell
state. This is crucial for removing irrelevant information and focusing
on important data.

2. Input Gate: The input gate updates the cell state with new
information. It decides which values from the input should be updated
in the cell state.

3. Cell State Update: The cell state is updated by combining the old cell
state (after forgetting some information) and the new candidate
values (from the input gate).

4. Output Gate: The output gate decides what the next hidden state
should be. This hidden state is used for predictions and also fed back
into the network for the next time step.
Applications: LSTM networks have been successfully applied in various fields:

Natural Language Processing (NLP): LSTMs are used for language


modeling, machine translation, and text generation. They can
understand the context of words in a sentence, making them
effective for tasks like sentiment analysis and chatbots.

Time Series Forecasting: LSTMs are used for predicting future values in
a time series, such as stock prices, weather conditions, and sales
forecasting. Their ability to remember past information makes them
ideal for these tasks.

Speech Recognition: LSTMs improve the accuracy of speech-to-text


systems by understanding the temporal dependencies in speech
signals. They can handle variations in speech patterns and accents.
Program:

import numpy as np

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import LSTM, Dense

import matplotlib.pyplot as plt

# Generate a sine wave

def generate_sine_wave(seq_length, num_samples):

X=[]

y = []

for _ in range(num_samples):

start = np.random.randint(0, 100)

x = np.sin(np.linspace(start, start + seq_length, seq_length))

X.append(x)

y.append(np.sin(start + seq_length))

return np.array(X), np.array(y)

seq_length = 50

num_samples = 1000

X, y = generate_sine_wave(seq_length, num_samples)

1. Reshape data for LSTM [samples, time steps,


features] X = X.reshape((X.shape[0], X.shape[1], 1))

model = Sequential()

model.add(LSTM(50, activation='relu', input_shape=(seq_length,


1))) model.add(Dense(1))

model.compile(optimizer='adam',
loss='mse') model.summary()

history = model.fit(X, y, epochs=200, validation_split=0.2,


verbose=1) plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='validation')

plt.legend()
plt.show()

# Generate new data for prediction

X_new, y_new = generate_sine_wave(seq_length, 1)

X_new = X_new.reshape((X_new.shape[0], X_new.shape[1], 1))

# Predict

y_pred = model.predict(X_new, verbose=0)

print(f"True value: {y_new[0]}, Predicted value: {y_pred[0][0]}")

Model Summary

Output

True value: 0.8509035245341184, Predicted value: 0.8456783294677734


4. Conclusion

LSTM networks have revolutionized the field of deep learning by providing a


robust solution for modeling sequential data. Their ability to maintain long-
term dependencies and handle complex temporal patterns makes them
invaluable for a wide range of applications. Future research can focus on
optimizing LSTM architectures and exploring hybrid models to further
enhance their performance.
Experiment 8
Aim: Design and Implement GRU for real-life applications
Pre-Requisite: Python 3.4 or above

Theory:

GRU:
GRU (Gated Recurrent Unit) is a type of Recurrent Neural Network (RNN) architecture,
commonly used in natural language processing tasks like chatbot implementation. GRU is a
simpler alternative to LSTM (Long Short-Term Memory), both of which are designed to
address the problem of vanishing gradients in traditional RNNs by using gating mechanisms
to control the flow of information.

Key Components of GRU:


● Update Gate: Decides how much of the past information needs to be passed along to
the future.
● Reset Gate: Decides how much of the past information should be forgotten.

GRU Architecture in Chatbots


● Input Layer: Takes the tokenized user query.
● Embedding Layer: Represents the input query in vector space using embeddings
(e.g., Word2Vec, GloVe).
● GRU Layer: The core recurrent layer that processes sequential information,
maintaining the state.
● Dense Layer: Maps the GRU output to the final output, such as the next word or
intent classification.

GRU Workflow in a Chatbot


1. User Input (Query): The input is tokenized and converted into embeddings.

2. GRU Cell: Each token is fed into the GRU cell one at a time. The GRU cell keeps
track of the state using its gating mechanism.
● Update Gate: Decides how much of the hidden state should be carried forward.

● Reset Gate: Determines how much of the previous state should be discarded.

3. Hidden State: This is passed from one time step to the next, allowing the model to
"remember" the context of the conversation.

4. Prediction Layer: After processing the entire sequence of inputs, the hidden state is
passed through a fully connected layer to predict the chatbot’s response or the next
word in the sentence.
GRU Equations
Let:

● x ₜ be the input at time step 𝑡

● ℎ ₜ be the hidden state at time 𝑡

● 𝜎 represent the sigmoid activation function

● tanh represents the hyperbolic tangent activation function

Update Gate:

Controls how much of the previous state is carried forward.

𝑧ₜ = σ (𝑊ₜ ⋅ [ℎₜ₋₁, 𝑥ₜ])


Reset Gate:

Controls how much of the previous state is reset.

𝑟ₜ = σ (𝑊ₜ ⋅ [ℎₜ₋₁, 𝑥ₜ])


New Memory:

This is the candidate hidden state for the current time step.

ℎ̃ₜ = tanh (𝑊 ⋅ [𝑟ₜ ∗ ℎₜ₋₁, 𝑥ₜ])


Final Hidden State:

This is the actual hidden state at time step 𝑡, a combination of the previous hidden state and
the new memory.

ℎₜ = (1 - 𝑧ₜ) ∗ ℎₜ₋₁ + 𝑧ₜ ∗ ℎ̃ₜ

Output:
import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import GRU, Dense


# Generate a synthetic dataset (time series data)

def generate_time_series(n_steps):

time = np.arange(0, n_steps)

series = 0.5 * np.sin(0.1 * time) + 0.2 * np.random.randn(n_steps)

return series

# Prepare data for the GRU model

def create_dataset(series, n_steps):

X, y = [], []

for i in range(len(series) - n_steps):

X.append(series[i:i + n_steps])

y.append(series[i + n_steps])

return np.array(X), np.array(y)

# Hyperparameters

n_steps = 50 # Lookback window

n_epochs = 100 # Number of training epochs

batch_size = 32 # Batch size

# Generate synthetic time series data

n_data_points = 1000

series = generate_time_series(n_data_points)

# Plot the generated time series

plt.plot(series)

plt.title('Generated Time Series')

plt.show()

# Scale the data

scaler = MinMaxScaler(feature_range=(0, 1))


scaled_series = scaler.fit_transform(series.reshape(-1, 1)).flatten()

# Prepare the dataset

X, y = create_dataset(scaled_series, n_steps)

X = X.reshape((X.shape[0], X.shape[1], 1)) # Reshaping for GRU input

# Split into training and testing sets (80% train, 20% test)

split_index = int(0.8 * len(X))

X_train, X_test = X[:split_index], X[split_index:]

y_train, y_test = y[:split_index], y[split_index:]

# Build the GRU model

model = Sequential()

model.add(GRU(50, activation='tanh', return_sequences=False, input_shape=(n_steps, 1)))

model.add(Dense(1)) # Output layer for predicting a single value

# Compile the model

model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model

history = model.fit(X_train, y_train, epochs=n_epochs, batch_size=batch_size,


validation_data=(X_test, y_test))

# Plot the training and validation loss

plt.plot(history.history['loss'], label='Training Loss')

plt.plot(history.history['val_loss'], label='Validation Loss')

plt.title('Training and Validation Loss')

plt.legend()

plt.show()

# Make predictions on the test data


predicted = model.predict(X_test)

# Inverse transform the predicted and true values

predicted = scaler.inverse_transform(predicted)

y_test_scaled = scaler.inverse_transform(y_test.reshape(-1, 1))

# Plot the true vs. predicted values

plt.plot(y_test_scaled, label='True Values')

plt.plot(predicted, label='Predicted Values')

plt.title('True vs. Predicted Time Series')

plt.legend()

plt.show()
6/6 ━━━━━━━━━━━━━━━━━━━━ 1s 64ms/step
Conclusion :
In conclusion, implementing Gated Recurrent Units (GRUs) offers a robust and efficient
alternative to traditional Recurrent Neural Networks (RNNs) for handling sequential data. By
leveraging their gating mechanisms, GRUs address common issues such as vanishing
gradients and long-term dependency problems, making them particularly effective for tasks
involving complex sequences and time-series data.
Experiment 9

AIM: Design and implement RNN for classification of temporal data, sequence to sequence
data modelling. etc.

THEORY

RNNs are specifically designed to process sequential data, such as time series, text, or
audio. They can naturally capture dependencies and patterns present in sequences,
which is crucial for tasks like machine translation, text summarization, and speech
recognition.

They are commonly used in an encoder-decoder architecture, which is well-suited for


sequence-to- sequence tasks. The encoder processes the input sequence and creates a
fixed-length context vector, which is then used by the decoder to generate the output
sequence.

This architecture is highly effective in tasks like machine translation, where the input
and output have different lengths. While RNNs have limitations, such as the vanishing
gradient problem and difficulty in capturing long-range dependencies, they have been
foundational in sequence-to-sequence modelling and have laid the groundwork for
more advanced architectures like Transformers.

There are various different types of sequence models based on whether the input and output
to the model is quence data or non-sequence data. They are as following:

1) One-to-sequence (One-to-many): In one-to-sequence model, the input data is non


sequence and the output data is sequence data. Here is how one-sequence model looks
like. One classic example is image captioning where input is one single image and the
output is a sequence of words.

2) Sequence-to-one (Many-to-one): In sequence-to-one sequence model, the input data is


sequence and output data is non sequence. For example, consider a sequence of words
(sentence) fed into the network and the output is positive or negative sentiment. This is
also called sentiment analysis.
Sequence-to-sequence (Many-to-many): In sequence-to-sequence sequence model, the
input data is sequence and output data is sequence. Take an example of machine
translation system. Input is a sequence of words (sentence) in one language and output
is another sequence of words in another language.

Below are some popular machine learning applications that are based on sequential data:
Time Series: A challenge of predicting time series, such as stock market projections.
Text mining and sentiment analysis
Machine Translation: Given a single language input, sequence models are used to
translate the input into several languages.
1. Image captioning: Assessing the current action and creating a caption for the image.
2. Deep Recurrent Neural Network for Speech Recognition
3. Recurrent neural networks are being used to create classical music.
4. Recurrent Neural Network for Predicting Transcription Factor Binding Sites based on
DNA Sequence Analysis.

Program:

Import
Libraries import
numpy as np import
tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

Generate synthetic sequence data


def generate_sequence(n_timesteps):
= np.random.rand(n_timesteps)
y = np.array([np.sum(X)]) return
X, y

2.Create dataset
n_samples = 1000
n_timesteps = 10

X=[]
y = []

for _ in range(n_samples):
seq_x, seq_y = generate_sequence(n_timesteps)
X.append(seq_x)
y.append(seq_y)

X = np.array(X)
y = np.array(y)

# Reshape y to have correct dimensions


y = y.reshape((n_samples, 1))

# Normalize the data


scaler_X = MinMaxScaler()
scaler_y = MinMaxScaler()

X = scaler_X.fit_transform(X)
y = scaler_y.fit_transform(y)

# Split into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

= Define the RNN model


model = Sequential()

model.add(SimpleRNN(50, activation='relu', input_shape=(n_timesteps, 1)))


model.add(Dense(1))

model.compile(optimizer='adam', loss='mse')

Print model summary


model.summary()

Reshape X_train and X_test for RNN


X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

# Train the model


history = model.fit(X_train, y_train, epochs=20, validation_data=(X_test, y_test), verbose=1)

# Evaluate the model


loss = model.evaluate(X_test, y_test, verbose=1)
print(f'Test Loss: {loss}')

# Make predictions
y_pred = model.predict(X_test)

1.Inverse transform predictions and actual values to


compare y_pred = scaler_y.inverse_transform(y_pred)
y_test = scaler_y.inverse_transform(y_test)

print(f'First 5 predictions: {y_pred[:5]}')


print(f'First 5 actual values: {y_test[:5]}')

import matplotlib.pyplot as plt

7.Plot training & validation loss


plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='validation')
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
plt.show()
Plot predictions vs actual
values plt.scatter(y_test, y_pred,
alpha=0.5) plt.title('Predicted vs
Actual Values') plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

Output:
Conclusion:
Sequence Models are a sequence modeling technique that is used for analyzing sequence data.
There are three types of sequence models: one-to-sequence, sequence-to-one and sequence to
sequence. Sequence models can be used in different applications such as image captioning,
smart replies on chat tools and predicting movie ratings based on user feedback.

You might also like