DL Lab Manual
DL Lab Manual
Theory:
AdaGrad (Adaptive Gradient Algorithm) is an optimization method that adjusts the learning
rate dynamically for each parameter based on past gradients. It was designed to perform well
on sparse data and is particularly useful for problems where certain parameters (features) are
updated more frequently than others. This makes it effective for tasks like natural language
processing and recommendation systems.
The core idea of AdaGrad is to scale the learning rate of each parameter individually based
on the historical sum of squared gradients. Parameters with large gradients in the past receive
smaller updates, while parameters with smaller or infrequent gradients receive relatively
larger updates. This allows the model to handle features with varying frequencies more
effectively.
Formula:
1. Gradient Accumulation:
2. Parameter Update:
Characteristics:
Output :
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
grad_squared_accum[key] += grads[key] ** 2
output = sigmoid(z)
return output
error = output - y
# Calculate the gradient of the output with respect to weights (using the derivative of the
activation function)
# Gradients
gradients = {
weights = {
}
# Initialize accumulator for squared gradients (AdaGrad)
grad_squared_accum = {
'W': np.zeros_like(weights['W']),
'b': np.zeros_like(weights['b'])
# Training loop
# Forward pass
# Calculate and print the loss (mean squared error) every 100 epochs
loss = np.mean(np.square(error))
if epoch % 100 == 0:
return weights
# Example usage:
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) # XOR problem
print(output)
Conclusion:
AdaGrad is an effective optimization algorithm for handling sparse data by adapting learning rates
based on past gradients. Its automatic adjustment simplifies training, but the learning rate decay can
slow down convergence over time. Despite this, it's well-suited for tasks with uneven feature
distributions.
Experiment No. 02
Aim: To implement a backpropagation algorithm to train a Deep
Neural Network (DNN) with at least two hidden layers.
Requirement:
Theory:
Theory:
Deep Neural Networks (DNNs): DNNs are a class of artificial neural
networks with multiple layers between the input and output layers. Each
layer consists of neurons, and each neuron in one layer is connected to
every neuron in the next layer. DNNs are capable of modeling complex
patterns in data.
Loss Function: The loss function measures the difference between the
network’s predictions and the actual target values. Common loss
functions include:
1. Mean Squared Error (MSE) (for regression tasks):
2. Cross-Entropy Loss (for classification tasks):
Training Process:
Program:
import numpy as np
def sigmoid_derivative(x):
return x * (1 - x)
def relu(x):
return np.maximum(0, x)
def relu_derivative(x):
return np.where(x > 0, 1, 0)
# Loss function (Mean Squared Error)
# Compute loss
loss = mean_squared_error(y, output)
# Backward pass
output_error = y - output
output_delta = output_error * sigmoid_derivative(output)
a2_error = output_delta.dot(weights3.T)
a2_delta = a2_error * sigmoid_derivative(a2)
a1_error = a2_delta.dot(weights2.T)
a1_delta = a1_error * sigmoid_derivative(a1)
Output:
Epoch 0, Loss: 0.2668552327802867
Epoch 1000, Loss: 0.12346139338998134
Epoch 2000, Loss: 0.08922302013674174
Epoch 3000, Loss: 0.06991524317483898
Epoch 4000, Loss: 0.05816550148456629
Epoch 5000, Loss: 0.05003554790511628
Epoch 6000, Loss: 0.044306690631420665
Epoch 7000, Loss: 0.03955813726381803
Epoch 8000, Loss: 0.03551171131863506
Epoch 9000, Loss: 0.0320658267938436
Final outputs:
[[0.05232542]
[0.94755168]
[0.95114438]
[0.05815823]]
Conclusion:
In this implementation, we trained a DNN with two hidden layers using the
backpropagation algorithm. The network was able to learn the XOR problem,
a classic test case for neural networks. By iterating through the training data
and updating the weights and biases using the gradients computed during
backpropagation, the network minimized the loss and improved its predictions
over time.
Experiment 3
Aim : Design and implement a fully connected deep neural network with at least 2 hidden
layers for a classification application . Use appropriate Learning Algorithm, output function
and loss function
Pre-Requisite: python 3
Theory:
Deep Learning Neural Network for Classification :
To create and train a simple convolutional neural network for deep learning classification.
Conclusion :
The design and implementation of the fully connected deep neural network were successful
in meeting the objectives of the classification task. The choice of architecture, learning
algorithm, output function, and loss function proved effective, and the model demonstrated
strong performance.
Experiment 4
Theory :
Autoencoders are a type of neural network designed to learn efficient representations of data. They
consist of two main components: an encoder that compresses the input data into a latent-space
representation, and a decoder that reconstructs the original data from this representation. In the
context of image compression, autoencoders can learn to encode images into a compact form while
preserving as much of the original information as possible, thereby achieving compression.
We will use the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits.
Methodology :
1. Data Preprocessing:
2. Model Architecture:
- Encoder :
- Decoder :
- Dense Layer: Expands the latent-space representation back to the size of the feature maps.
- Reshape Layer: Reshape the data to the dimensions before the convolutional layers.
3. Model Implementation:
- Use a deep learning framework such as TensorFlow or PyTorch.
- Compile the model with a suitable loss function (e.g., Mean Squared Error) and optimizer (e.g.,
Adam).
4. Training:
5. Evaluation:
- Calculate the reconstruction loss to measure how well the model has learned to reconstruct the
images.
- Compare the compressed image size to the original image size to determine the compression ratio.
- Visualize a few sample compressed and reconstructed images to qualitatively assess performance.
6. Analysis:
Output:
Epoch 1/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 210s 847ms/step - loss: 0.6328 - val_loss: 0.5807
Epoch 2/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 154s 786ms/step - loss: 0.5781 - val_loss: 0.5741
Epoch 3/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 151s 770ms/step - loss: 0.5731 - val_loss: 0.5722
Epoch 4/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 150s 766ms/step - loss: 0.5707 - val_loss: 0.5699
Epoch 5/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 153s 782ms/step - loss: 0.5699 - val_loss: 0.5684
Epoch 6/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 153s 779ms/step - loss: 0.5681 - val_loss: 0.5680
Epoch 7/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 154s 783ms/step - loss: 0.5665 - val_loss: 0.5663
Epoch 8/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 152s 776ms/step - loss: 0.5658 - val_loss: 0.5657
Epoch 9/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 216s 849ms/step - loss: 0.5641 - val_loss: 0.5643
Epoch 10/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 157s 803ms/step - loss: 0.5634 - val_loss: 0.5640
Epoch 11/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 192s 749ms/step - loss: 0.5633 - val_loss: 0.5631
Epoch 12/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 143s 728ms/step - loss: 0.5628 - val_loss: 0.5627
Epoch 13/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 143s 728ms/step - loss: 0.5624 - val_loss: 0.5627
Epoch 14/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 143s 731ms/step - loss: 0.5621 - val_loss: 0.5620
Epoch 15/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 742ms/step - loss: 0.5611 - val_loss: 0.5616
Epoch 16/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 741ms/step - loss: 0.5609 - val_loss: 0.5612
Epoch 17/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 146s 747ms/step - loss: 0.5612 - val_loss: 0.5615
Epoch 18/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 152s 775ms/step - loss: 0.5612 - val_loss: 0.5609
Epoch 19/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 205s 793ms/step - loss: 0.5597 - val_loss: 0.5607
Epoch 20/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 742ms/step - loss: 0.5593 - val_loss: 0.5610
Epoch 21/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 161s 820ms/step - loss: 0.5594 - val_loss: 0.5603
Epoch 22/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 162s 825ms/step - loss: 0.5599 - val_loss: 0.5600
Epoch 23/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 157s 801ms/step - loss: 0.5594 - val_loss: 0.5602
Epoch 24/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 150s 767ms/step - loss: 0.5595 - val_loss: 0.5597
Epoch 25/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 740ms/step - loss: 0.5595 - val_loss: 0.5599
Epoch 26/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 153s 779ms/step - loss: 0.5588 - val_loss: 0.5598
Epoch 27/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 156s 798ms/step - loss: 0.5584 - val_loss: 0.5596
Epoch 28/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 164s 836ms/step - loss: 0.5585 - val_loss: 0.5591
Epoch 29/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 151s 773ms/step - loss: 0.5581 - val_loss: 0.5593
Epoch 30/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 754ms/step - loss: 0.5583 - val_loss: 0.5591
Epoch 31/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 757ms/step - loss: 0.5580 - val_loss: 0.5616
Epoch 32/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 757ms/step - loss: 0.5581 - val_loss: 0.5590
Epoch 33/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 159s 813ms/step - loss: 0.5582 - val_loss: 0.5596
Epoch 34/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 205s 828ms/step - loss: 0.5584 - val_loss: 0.5595
Epoch 35/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 167s 850ms/step - loss: 0.5574 - val_loss: 0.5584
Epoch 36/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 163s 834ms/step - loss: 0.5579 - val_loss: 0.5584
Epoch 37/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 168s 857ms/step - loss: 0.5576 - val_loss: 0.5585
Epoch 38/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 165s 844ms/step - loss: 0.5576 - val_loss: 0.5584
Epoch 39/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 171s 871ms/step - loss: 0.5573 - val_loss: 0.5584
Epoch 40/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 152s 776ms/step - loss: 0.5574 - val_loss: 0.5591
Epoch 41/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 755ms/step - loss: 0.5569 - val_loss: 0.5591
Epoch 42/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 145s 742ms/step - loss: 0.5573 - val_loss: 0.5578
Epoch 43/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 754ms/step - loss: 0.5578 - val_loss: 0.5577
Epoch 44/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 162s 829ms/step - loss: 0.5576 - val_loss: 0.5578
Epoch 45/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 146s 742ms/step - loss: 0.5575 - val_loss: 0.5577
Epoch 46/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 146s 747ms/step - loss: 0.5564 - val_loss: 0.5575
Epoch 47/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 148s 756ms/step - loss: 0.5566 - val_loss: 0.5581
Epoch 48/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 165s 841ms/step - loss: 0.5562 - val_loss: 0.5575
Epoch 49/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 164s 834ms/step - loss: 0.5565 - val_loss: 0.5579
Epoch 50/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 170s 870ms/step - loss: 0.5565 - val_loss: 0.5574
313/313 ━━━━━━━━━━━━━━━━━━━━ 17s 42ms/step
Conclusion :
Epoch 1/5
1875/1875 [==============================] - 59s 31ms/step - loss:
0.1407 - accuracy: 0.9565 - val_loss: 0.0556 - val_accuracy: 0.9822
Epoch 2/5
1875/1875 [==============================] - 56s 30ms/step - loss:
0.0454 - accuracy: 0.9861 - val_loss: 0.0298 - val_accuracy: 0.9895
Epoch 3/5
1875/1875 [==============================] - 57s 31ms/step - loss:
0.0327 - accuracy: 0.9903 - val_loss: 0.0394 - val_accuracy: 0.9877
Epoch 4/5
1875/1875 [==============================] - 56s 30ms/step - loss:
0.0247 - accuracy: 0.9926 - val_loss: 0.0292 - val_accuracy: 0.9918
Epoch 5/5
1875/1875 [==============================] - 55s 29ms/step - loss:
0.0196 - accuracy: 0.9939 - val_loss: 0.0304 - val_accuracy: 0.9919
313/313 [==============================] - 3s 9ms/step - loss: 0.0304 -
accuracy: 0.9919
Test accuracy: 0.9919000267982483
Conclusion: thus, we have found that CNN gave the most accurate results for
digit recognition. So , this make us conclude that CNN is best suitable for any
type of prediction problem.
Experiment 6
Aim: Design and Implement a CNN model for image classification.
Pre-Requisite: Python 3.4 or above
Theory:
CNN:
A Convolutional Neural Network (CNN) is a type of deep learning algorithm that is particularly well-
suited for image recognition and processing tasks. It is made up of multiple layers, including
convolutional layers, pooling layers, and fully connected layers.
CNNs are trained using a large dataset of labeled images, where the network learns to recognize
patterns and features that are associated with specific objects or classes. Proven to be highly effective
in image-related tasks, achieving state-of-the-art performance in various computer vision
applications. Their ability to automatically learn hierarchical representations of features makes them
well-suited for tasks where the spatial relationships and patterns in the data are crucial for accurate
predictions. CNNs are widely used in areas such as image classification, object detection, facial
recognition, and medical image analysis.
Image Classification :
Image classification is the process of assigning a label to an image from a predefined set of
categories. This is commonly achieved using convolutional neural networks (CNNs), which
are adept at learning hierarchical features from images. CNNs use convolutional layers to
detect edges and textures, pooling layers to reduce spatial dimensions, and fully connected
layers for classification. By training on large datasets, CNNs can accurately classify images
into various categories such as animals, objects, and scenes. This technology is widely used in
applications like object detection, face recognition, medical imaging, and autonomous driving,
enabling machines to interpret and understand visual data.
• Cell State: The cell state is the key component of LSTMs, acting as a
conveyor belt that carries information across the sequence. It allows
information to flow unchanged, which helps in maintaining long-
term dependencies.
• Gates: LSTMs use three gates to control the flow of information:
Working Mechanism:
1. Forget Gate: The forget gate takes the previous hidden state and the
current input to decide what information to discard from the cell
state. This is crucial for removing irrelevant information and focusing
on important data.
2. Input Gate: The input gate updates the cell state with new
information. It decides which values from the input should be updated
in the cell state.
3. Cell State Update: The cell state is updated by combining the old cell
state (after forgetting some information) and the new candidate
values (from the input gate).
4. Output Gate: The output gate decides what the next hidden state
should be. This hidden state is used for predictions and also fed back
into the network for the next time step.
Applications: LSTM networks have been successfully applied in various fields:
Time Series Forecasting: LSTMs are used for predicting future values in
a time series, such as stock prices, weather conditions, and sales
forecasting. Their ability to remember past information makes them
ideal for these tasks.
import numpy as np
import tensorflow as tf
X=[]
y = []
for _ in range(num_samples):
X.append(x)
y.append(np.sin(start + seq_length))
seq_length = 50
num_samples = 1000
X, y = generate_sine_wave(seq_length, num_samples)
model = Sequential()
model.compile(optimizer='adam',
loss='mse') model.summary()
plt.legend()
plt.show()
# Predict
Model Summary
Output
Theory:
GRU:
GRU (Gated Recurrent Unit) is a type of Recurrent Neural Network (RNN) architecture,
commonly used in natural language processing tasks like chatbot implementation. GRU is a
simpler alternative to LSTM (Long Short-Term Memory), both of which are designed to
address the problem of vanishing gradients in traditional RNNs by using gating mechanisms
to control the flow of information.
2. GRU Cell: Each token is fed into the GRU cell one at a time. The GRU cell keeps
track of the state using its gating mechanism.
● Update Gate: Decides how much of the hidden state should be carried forward.
● Reset Gate: Determines how much of the previous state should be discarded.
3. Hidden State: This is passed from one time step to the next, allowing the model to
"remember" the context of the conversation.
4. Prediction Layer: After processing the entire sequence of inputs, the hidden state is
passed through a fully connected layer to predict the chatbot’s response or the next
word in the sentence.
GRU Equations
Let:
Update Gate:
This is the candidate hidden state for the current time step.
This is the actual hidden state at time step 𝑡, a combination of the previous hidden state and
the new memory.
Output:
import numpy as np
import pandas as pd
def generate_time_series(n_steps):
return series
X, y = [], []
X.append(series[i:i + n_steps])
y.append(series[i + n_steps])
# Hyperparameters
n_data_points = 1000
series = generate_time_series(n_data_points)
plt.plot(series)
plt.show()
X, y = create_dataset(scaled_series, n_steps)
# Split into training and testing sets (80% train, 20% test)
model = Sequential()
model.compile(optimizer='adam', loss='mean_squared_error')
plt.legend()
plt.show()
predicted = scaler.inverse_transform(predicted)
plt.legend()
plt.show()
6/6 ━━━━━━━━━━━━━━━━━━━━ 1s 64ms/step
Conclusion :
In conclusion, implementing Gated Recurrent Units (GRUs) offers a robust and efficient
alternative to traditional Recurrent Neural Networks (RNNs) for handling sequential data. By
leveraging their gating mechanisms, GRUs address common issues such as vanishing
gradients and long-term dependency problems, making them particularly effective for tasks
involving complex sequences and time-series data.
Experiment 9
AIM: Design and implement RNN for classification of temporal data, sequence to sequence
data modelling. etc.
THEORY
RNNs are specifically designed to process sequential data, such as time series, text, or
audio. They can naturally capture dependencies and patterns present in sequences,
which is crucial for tasks like machine translation, text summarization, and speech
recognition.
This architecture is highly effective in tasks like machine translation, where the input
and output have different lengths. While RNNs have limitations, such as the vanishing
gradient problem and difficulty in capturing long-range dependencies, they have been
foundational in sequence-to-sequence modelling and have laid the groundwork for
more advanced architectures like Transformers.
There are various different types of sequence models based on whether the input and output
to the model is quence data or non-sequence data. They are as following:
Below are some popular machine learning applications that are based on sequential data:
Time Series: A challenge of predicting time series, such as stock market projections.
Text mining and sentiment analysis
Machine Translation: Given a single language input, sequence models are used to
translate the input into several languages.
1. Image captioning: Assessing the current action and creating a caption for the image.
2. Deep Recurrent Neural Network for Speech Recognition
3. Recurrent neural networks are being used to create classical music.
4. Recurrent Neural Network for Predicting Transcription Factor Binding Sites based on
DNA Sequence Analysis.
Program:
Import
Libraries import
numpy as np import
tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
2.Create dataset
n_samples = 1000
n_timesteps = 10
X=[]
y = []
for _ in range(n_samples):
seq_x, seq_y = generate_sequence(n_timesteps)
X.append(seq_x)
y.append(seq_y)
X = np.array(X)
y = np.array(y)
X = scaler_X.fit_transform(X)
y = scaler_y.fit_transform(y)
model.compile(optimizer='adam', loss='mse')
# Make predictions
y_pred = model.predict(X_test)
Output:
Conclusion:
Sequence Models are a sequence modeling technique that is used for analyzing sequence data.
There are three types of sequence models: one-to-sequence, sequence-to-one and sequence to
sequence. Sequence models can be used in different applications such as image captioning,
smart replies on chat tools and predicting movie ratings based on user feedback.