DEEP LEARNING LAB Manuals
DEEP LEARNING LAB Manuals
Lab Manual
CS704 Departmental Elective Lab
( Deep & Reinforcement Learning)
NAME:
ENROLLMENT NUMBER:
Faculty Incharge
Prof. Shiv Kumar Tiwari
CSE, GGCT
Session: 2023-24
SESSION-2023-24
Vision & Mission of
Institute
SEM-7TH(ODD)
MISSION OF INSTITUTE
• To provide quality professional education using latest ICT tools and incorporating latest
developments
• To create world class infrastructure that meets both curriculum and research &
innovation
• To promote professional ethics, leadership skills and societal responsibilities
• To prepare students for lifelong learning to meet future challenges
• To promote interdisciplinary culture leading to world class translational research
aligned with the sustainable development goals
SESSION-2023-24
Vision & Mission Of
Department
SEM-7TH(ODD)
To provide strong fundamentals and technical skills in Computer Science and Engineering
through advanced teaching learning methodologies.
To transform lives of the students by nurturing ethical values, creativity and novelty to become
Entrepreneurs and establish start-ups.
To focus on sustainable solutions and improve the quality of life for the welfare of society
To inculcate research attitude among the students and to provide opportunities to carryout
inter-disciplinary research through the linkages with industry and academia.
SESSION-2023-24
Program Educational
Objectives
SEM-7TH(ODD)
•PE01. Occupy position in national and multinational organizations and work as individual
•PEO2 : Continue to learn and communicate in the core and allied areas of Computer
•PEO 3: Become an entrepreneur in the domain of Computer Science and Engineering and
Program Outcomes
After Successful completion of the program, the student will be able to:
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the
professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings
10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with the society at large, such as, being able to comprehend and write effective reports
and design documentation, make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the engineering
and management principles and apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
SESSION-2023-24
Program Specific
Outcomes
SEM-7TH(ODD)
PSO1: Understand various concepts of Computing, Statistics, Mathematics and basic sciences
applicable to the domain and solve problems using fundamental knowledge.
PSO2: Apply the computing knowledge for professional software development and develop
strong problem solving, analyzing, design a system, component or process and decision-
making abilities.
PSO3: Use programming languages, tools and techniques to conduct research and demonstrate
appropriate emerging skills to provide solutions to problems in various interdisciplinary fields.
List of Experiments
#load training data and split into train and test sets
mnist = tf.keras.datasets.mnist
The expected output upon execution of the above code snippet is:
Epoch 1/5
1875/1875 [==============================] - 7s 3ms/step - loss: 0.2954 - accuracy: 0.9138
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.1422 - accuracy: 0.9572
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1070 - accuracy: 0.9664
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0886 - accuracy: 0.9721
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0753 - accuracy: 0.9764
<keras.callbacks.History at 0x7f84a21a2760>
Next, we test the model accuracy on test set:
#test model accuracy on test set
model.evaluate(x_test,y_test,verbose=2)
The expected output upon execution of the above code snippet is:
313/313 - 1s - loss: 0.0809 - accuracy: 0.9744 - 655ms/epoch - 2ms/step
[0.0809035375714302, 0.974399983882904]
Next, we extend the base model to predict softmax output:
#extend the base model to predict softmax output
probability_model = tf.keras.Sequential([model,
tf.keras.layers.Softmax()])
Installing packages in Google Colab
One can use the code cell in Colab not only to run Python code but also to run shell commands.
Just add a ! before a command. The exclamation point tells the notebook cell to run the following
command as a shell command. Most general packages needed for deep learning come pre-
installed. In some cases, you might need less popular libraries, or you might need to run code on
a different version of a library. To do this, you’ll need to install packages manually.
The package manager used for installing packages is pip. To install a particular version of
TensorFlow use this command:
!pip3 install tensorflow
Downloading a dataset
When you’re training a machine learning model on your local machine, you’re likely to have
trouble with the storage and bandwidth costs that come with downloading and storing the dataset
required for training a model. Deep learning datasets can be massive in size, ranging between 20
to 50 Gb. Downloading them is most challenging if you’re living in a developing country, where
getting high-speed internet isn’t possible. The most efficient way to use datasets is to use a cloud
interface to download them, rather than manually uploading the dataset from a local
machine. Thankfully, Colab gives us a variety of ways to download the dataset from common
data hosting platforms.
To download an existing dataset from Kaggle, we can follow the steps outlined below:
1. Go to your Kaggle Account and click on “Create New API Token”. This will download a
kaggle.json file to your machine.
2. Go to your Google Colab project file, and run the following commands:
! pip install -q kaggle
from google.colab import files
NOTE: The curl command will download the dataset in the Colab workspace, which will be lost
every time the runtime is disconnected. Hence a safe practice is to move the dataset into your
cloud drive as soon as the dataset is downloaded completely.
Downloading the dataset from GCP or Google Drive
Google Cloud Platform is a cloud computing and storage platform. You can use it to store large
datasets, and you can import that dataset directly from the cloud into Colab. To upload and
download files on GCP, first you need to authenticate your Google account.
from google.colab import auth
auth.authenticate_user()
After that, install gsutil to upload and download files, and then init gcloud.
!curl https://sdk.cloud.google.com | bash
!gcloud init
Once you have configured these options, you can use the following commands to
download/upload files to and from Google Cloud Storage.
Initiating a runtime with GPU/TPU enabled
Deep learning is a computationally expensive process, a lot of calculations need to be executed at
the same time to train a model. To mitigate this issue, Google Colab offers us not only the classic
CPU runtime but also an option for a GPU and TPU runtime as well.
The CPU runtime is best for training large models because of the high memory it provides. The
GPU runtime shows better flexibility and programmability for irregular computations, such as
small batches and nonMatMul computations. The TPU runtime is highly-optimized for large
batches and CNNs and has the highest training throughput. If you have a smaller model to train, I
suggest training the model on GPU/TPU runtime to use Colab to its full potential.
To create a GPU/TPU enabled runtime, you can click on runtime in the toolbar menu below the
file name. From there, click on “Change runtime type”, and then select GPU or TPU under the
Hardware Accelerator dropdown menu.
Exercise: Try yourself uploading a dataset and perform basic data pre- processing
on it.
2. Study and implementation of feed forward Neural
Network
What is a (Neural Network) NN?
Simple NN graph:
Output:
labels_orig = labels
#converting the multi-class to binary class
labels = np.mod(labels_orig,2)
Output:
Output:
(750, 2) (250, 2)
Implementation of Feed Forward Neural self.db1 = (self.h3-y) * self.h3*(1-
Network code self.h3) * self.w5 * self.h1*(1-self.h1)
def sigmoid(sop):
return 1.0/(1+numpy.exp(-1*sop))
def error(predicted, target):
return numpy.power(predicted-target, 2)
def error_predicted_deriv(predicted, target):
return 2*(predicted-target)
def sigmoid_sop_deriv(sop):
return sigmoid(sop)*(1.0-sigmoid(sop))
def sop_w_deriv(x):
return x
def update_w(w, grad, learning_rate):
return w - learning_rate*grad
x1=0.1
x2=0.4
target = 0.7
learning_rate = 0.01
w1=numpy.random.rand()
w2=numpy.random.rand()
predicted_output = []
network_error = []
old_err = 0
for k in range(80000):
# Forward Pass
y = w1*x1 + w2*x2
predicted = sigmoid(y)
err = error(predicted, target)
predicted_output.append(predicted)
network_error.append(err)
# Backward Pass
g1 = error_predicted_deriv(predicted, target)
g2 = sigmoid_sop_deriv(y)
g3w1 = sop_w_deriv(x1)
g3w2 = sop_w_deriv(x2)
gradw1 = g3w1*g2*g1
gradw2 = g3w2*g2*g1
Output:
Streaming output truncated to the last 5000 lines.
0.6999984864252651
0.6999984866522116
0.6999984868791244
0.699998487106003
0.6999984873328476
.
.
.
.
0.6999992843293953
0.6999992844367032
0.6999992845439951
0.6999992846512709
0.6999992847585306
Example: Execute the error plots for 80,000 epochs and show the saturated error outputvalue.
Also write a short note about your analysis with the output value.
4. Implement Batch gradient descent, Stochastic gradient
descent and mini batch gradient descent.
Objective:
To implement Batch gradient descent, Stochastic gradient descent and mini batch gradient
descent.
Batch gradient descent uses all training samples in forward pass to calculate cumulative error and then we
adjust weights using derivatives. In stochastic GD, we randomly pick one training sample, perform
forward pass, compute the error and immediately adjust weights.
So the key difference here is that to adjust weights, batch GD will use all training samples where as
stochastic GD will use one randomly picked training sample.
Mini batch is intermediate version of batch GD and stochastic GD. In mini gradient descent you will use a
batch of samples in each iteration. For example if you have total 50 training samples, you can take a batch
of 10 samples, calculate cumulative error for those 10 samples and then adjust weights.
To summarize it, In SGD we adjust weights after every one sample. In Batch we adjust weights after
going through all samples but in mini batch we do after every m samples (where m is batch size and it is 0
< m < n, where n is total number of samples).
Gradient descent allows you to find weights (w1,w2,w3) and bias in the following linear equation for
housing price prediction.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
df = pd.read_csv("homeprices.csv")
df.sample(5)
output:
index,area,bedrooms,price
5,1170,2,38.0
14,2250,3,101.0
8,1310,3,50.0
10,1800,3,82.0
0,1056,2,39.07
scaled_X = sx.fit_transform(df.drop('price',axis='columns'))
scaled_y = sy.fit_transform(df['price'].values.reshape(df.shape[0],1))
scaled_X
scaled_y
scaled_y.reshape(20,)
number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to number of features. In
# our case number_of_features = 2 (area, bedroom)
w = np.ones(shape=(number_of_features))
b = 0
total_samples = X.shape[0] # number of rows in X
cost_list = []
epoch_list = []
for i in range(epochs):
y_predicted = np.dot(w, X.T) + b
w_grad = -(2/total_samples)*(X.T.dot(y_true-y_predicted))
b_grad = -(2/total_samples)*np.sum(y_true-y_predicted)
w = w - learning_rate * w_grad
b = b - learning_rate * b_grad
cost = np.mean(np.square(y_true-
y_predicted)) # MSE (Mean Squared Error)
if i%10==0:
cost_list.append(cost)
epoch_list.append(i)
Output:
(array([0.70712464, 0.67456527]), -0.23034857438407427, 0.0068641890429808105)
plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list,cost_list)
Output:
def predict(area,bedrooms,w,b):
scaled_X = sx.transform([[area, bedrooms]])[0]
# here w1 = w[0] , w2 = w[1], w3 = w[2] and bias is b
# equation for price is w1*area + w2*bedrooms + w3*age + bias
# scaled_X[0] is area
# scaled_X[1] is bedrooms
# scaled_X[2] is age
scaled_price = w[0] * scaled_X[0] + w[1] * scaled_X[1] + b
# once we get price prediction we need to to rescal it back to origina
l value
# also since it returns 2D array, to get single value we need to do va
lue[0][0]
return sy.inverse_transform([[scaled_price]])[0][0]
predict(2600,4,w,b)
predict(1000,2,w,b)
predict(1500,3,w,b)
number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to number of features. In
# our case number_of_features = 3 (area, bedroom and age)
w = np.ones(shape=(number_of_features))
b = 0
total_samples = X.shape[0]
cost_list = []
epoch_list = []
for i in range(epochs):
random_index = random.randint(0,total_samples-
1) # random index from total samples
sample_x = X[random_index]
sample_y = y_true[random_index]
w_grad = -(2/total_samples)*(sample_x.T.dot(sample_y-y_predicted))
b_grad = -(2/total_samples)*(sample_y-y_predicted)
w = w - learning_rate * w_grad
b = b - learning_rate * b_grad
cost = np.square(sample_y-y_predicted)
Output:
(array([0.70999659, 0.67807531]), -0.23262199997984362,
0.011241941221627246)
w , b
plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list_sgd,cost_list_sgd)
Output:
predict(2600,4,w_sgd, b_sgd)
predict(1000,2,w_sgd, b_sgd)
np.random.permutation(20)
Output:
array([16, 10, 7, 8, 3, 17, 15, 13, 2, 12, 4, 1, 18, 9, 14, 6, 5, 19, 0,
11])
number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to number of features. In
# our case number_of_features = 3 (area, bedroom and age)
w = np.ones(shape=(number_of_features))
b = 0
total_samples = X.shape[0] # number of rows in X
cost_list = []
epoch_list = []
num_batches = int(total_samples/batch_size)
for i in range(epochs):
random_indices = np.random.permutation(total_samples)
X_tmp = X[random_indices]
y_tmp = y_true[random_indices]
for j in range(0,total_samples,batch_size):
Xj = X_tmp[j:j+batch_size]
yj = y_tmp[j:j+batch_size]
y_predicted = np.dot(w, Xj.T) + b
w_grad = -(2/len(Xj))*(Xj.T.dot(yj-y_predicted))
b_grad = -(2/len(Xj))*np.sum(yj-y_predicted)
w = w - learning_rate * w_grad
b = b - learning_rate * b_grad
cost = np.mean(np.square(yj-
y_predicted)) # MSE (Mean Squared Error)
if i%10==0:
cost_list.append(cost)
epoch_list.append(i)
plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list,cost_list)
Output:
def predict(area,bedrooms,w,b):
scaled_X = sx.transform([[area, bedrooms]])[0]
# here w1 = w[0] , w2 = w[1], w3 = w[2] and bias is b
# equation for price is w1*area + w2*bedrooms + w3*age + bias
# scaled_X[0] is area
# scaled_X[1] is bedrooms
# scaled_X[2] is age
scaled_price = w[0] * scaled_X[0] + w[1] * scaled_X[1] + b
# once we get price prediction we need to to rescal it back to origina
l value
# also since it returns 2D array, to get single value we need to do va
lue[0][0]
return sy.inverse_transform([[scaled_price]])[0][0]
predict(2600,4,w,b)
Output:
128.65424087579652
predict(1000,2,w,b)
Output:
29.9855861683337
Exercise:
Implement Gradient Descent for Neural Network (or Logistic Regression) by Predicting if a
person would buy life insurance based on his age.
5. Study the effect of batch normalization and dropout in
neural network classifier PCA.
Introduction
In this lab exercise, we will discuss why we need batch normalization and dropout in deep neural
networks followed by experiments using Pytorch on a standard data set to see the effects of batch
normalization and dropout.
Batch Normalization
By normalizing the inputs we are able to bring all the inputs features to the same scale. In the
neural network, we need to compute the pre-activation for the first neuron of the first layer a₁₁.
We know that pre-activation is nothing but the weighted sum of inputs plus bias. In other words,
it is the dot product between the first row of the weight matrix W₁ and the input matrix X plus
bias b₁₁.
The mathematical equation for pre-activation at each layer ‘i’ is given by,
The activation at each layer is equal to applying the activation function to the output of the pre-
activation of that layer. The mathematical equation for the activation at each layer ‘i’ is given by,
In order to bring all the activation values to the same scale, we normalize the activation values
such that the hidden representation doesn’t vary drastically and also helps us to get improvement
in the training speed.
Why is it called batch normalization?
Since we are computing the mean and standard deviation from a single batch as opposed to
computing it from the entire data. Batch normalization is done individually at each hidden
neuron in the network.
To get a better insight into how batch normalization helps in faster converge of the network, we
will look at the distribution of values across multiple hidden layers in the network during the
training phase.
For consistency, we will plot the output of the second linear layer from the two networks and
compare the distributions of the output from that layer across the networks. The results look like
this:
From the graphs, we can conclude that the distribution of values without batch normalization has
changed significantly between iterations of inputs within each epoch which means that the
subsequent layers in the network without batch normalization are seeing a varying distribution of
input data. But the change in the distribution of values for the model with batch normalization
seems to be slightly negligible.
Dropout
In this section of the lab, we discuss the concept of dropout in neural networks specifically how
it helps to reduce overfitting and generalization error. After that, we will implement a neural
network with and without dropout to see how dropout influences the performance of a network
using Pytorch.
Dropout is a regularization technique that “drops out” or “deactivates” few neurons in the neural
network randomly in order to avoid the problem of overfitting.
The idea of Dropout
Training one deep neural network with large parameters on the data might lead to overfitting.
Can we train multiple neural networks with different configurations on the same dataset and take
the average value of these predictions?.
But creating an ensemble of neural networks with different architectures and training them
wouldn’t be feasible in practice. Dropout to the rescue.
Dropout deactivates the neurons randomly at each training step instead of training the data on the
original network, we train the data on the network with dropped out nodes. In the next iteration
of the training step, the hidden neurons which are deactivated by dropout changes because of its
probabilistic behavior. In this way, by applying dropout i.e…deactivating certain individual
nodes at random during training we can simulate an ensemble of neural network with different
architectures.
Dropout at Training time
In each training iteration, each node in the network is associated with a probability p whether to
keep in the network or to deactivate it (dropout) out of the network with probability 1-p. That
means the weights associated with the nodes got updated only p fraction of times because nodes
are active only p times during training.
Dropout at Test time
During test time, we consider the original neural network with all activations present and scale
the output of each node by a value p. Since each node is activated the only p times.
To show the overfitting, we will train two networks — one without dropout and another with
dropout. The network without dropout has 3 fully connected hidden layers with ReLU as the
activation function for the hidden layers and the network with dropout also has similar
architecture but with dropout applied after first & second Linear layer.
In this example, I have used a dropout fraction of 0.5 after the first linear layer and 0.2 after the
second linear layer. Once we train the two different models i.e…one without dropout and another
with dropout and plot the test results, it would look like this:
From the above graphs, we can conclude that as we increase the number of epochs the model
without dropout is overfitting the data. The model without dropout is learning the noise
associated with the data instead of generalizing for the data. We can see that the loss associated
with the model without drop increases as we increase the number of epochs unlike the loss
associated with the model with dropout.
Example:
Outline
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
import seaborn as sns
Dataset and visualisation
trainset = torchvision.datasets.MNIST(root='./data', train=True,
download=True,
transform=transforms.ToTensor())
def imshow(img, title):
plt.figure(figsize=(batch_size * 4, 4))
plt.axis('off')
plt.imshow(np.transpose(img, (1, 2, 0)))
plt.title(title)
plt.show()
def show_batch_images(dataloader):
images, labels = next(iter(dataloader))
img = torchvision.utils.make_grid(images)
imshow(img, title=[str(x.item()) for x in labels])
Batch Normalisation
class MyNetBN(nn.Module): nn.Linear(48, 24),
def __init (self): nn.BatchNorm1d(24),
super(MyNetBN, self). ini nn.ReLU(),
t () nn.Linear(24, 10)
self.classifier = nn.Seque )
ntial(
nn.Linear(784, 48), def forward(self, x):
nn.BatchNorm1d(48), x = x.view(x.size(0), -1)
nn.ReLU(), x = self.classifier(x)
return x b = model_bn.classifier[1](b)
model_bn = MyNetBN() b = b.detach().numpy().ravel()
print(model_bn)
batch_size = 512 sns.distplot(b, kde=True, color='g
trainloader = torch.utils.data.Dat ', label='BatchNorm')
aLoader(trainset, batch_size=batch plt.title('%d: Loss = %0.2f, Loss
_size, shuffle=True) with bn = %0.2f' % (i, loss.item()
loss_fn = nn.CrossEntropyLoss() , loss_bn.item()))
opt_bn = optim.SGD(model_bn.parame plt.legend()
ters(), lr=0.01) plt.show()
plt.pause(0.5)
oss_arr = []
loss_bn_arr = []
model_bn.train()
max_epochs = 2
print(' ')
for epoch in range(max_epochs):
for i, data in enumerate(trainload
er, 0):
plt.plot(loss_bn_arr, 'g', label='
BatchNorm')
inputs, labels = data
plt.legend()
plt.show()
# training steps for bn model
opt_bn.zero_grad()
outputs_bn = model_bn(inputs)
loss_bn = loss_fn(outputs_bn, labe
ls)
loss_bn.backward()
opt_bn.step()
loss_bn_arr.append(loss_bn.item())
if i % 10 == 0:
inputs = inputs.view(inputs.size(0
), -1)
model_bn.eval()
b = model_bn.classifier[0](inputs)
Outputs:
Exercise
Write down the dropout for the above program
6. Study of Singular value Decomposition for
dimensionality reduction.
Introduction:
Reducing the number of input variables for a predictive model is referred to as dimensionality
reduction. Fewer input variables can result in a simpler predictive model that may have
better performance when making predictions on new data.
Perhaps the more popular technique for dimensionality reduction in machine learning is
Singular Value Decomposition, or SVD for short. This is a technique that comes from the
field of linear algebra and can be used as a data preparation technique to create a projection
of a sparse dataset prior to fitting a model.
Singular Value Decomposition, or SVD, might be the most popular technique for
dimensionality reduction when data is sparse. Sparse data refers to rows of data where many
of the values are zero. This is often the case in some problem domains like recommender
systems where a user has a rating for very few movies or songs in the database and zero
ratings for all other cases. Another common example is a bag of words model of a text
document, where the document has a count or frequency for some words and most words
have a 0 value.
Examples of sparse data appropriate for applying SVD for dimensionality reduction:
Recommender Systems
Customer-Product purchases
User-Song Listen Counts
User-Movie Ratings
Text Classification
One Hot Encoding
Bag of Words Counts
TF/IDF
SVD can be thought of as a projection method where data with m-columns (features) is
projected into a subspace with m or fewer columns, whilst retaining the essence of the
original data.
The SVD is used widely both in the calculation of other matrix operations, such as matrix
inverse, but also as a data reduction method in machine learning.
# define dataset
X, y = get_dataset()
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names = list(), list()
for name, model in models.items():
scores = evaluate_model(model, X, y)
results.append(scores)
names.append(name)
print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))
# plot model performance for comparison
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.xticks(rotation=45)
pyplot.show()
Output:
Exercise: Draw the whisker plot for the above solution for creating the distribution of
accuracy scores for each configured number of dimensions and also find out the predicted
class by using combination of SVD transform and logistic regression model.
7.Train a sentiment analysis model on IMDB dataset, use
RNN layers.
Sentiment analysis probably is one the most common applications in Natural Language
processing. I don’t have to emphasize how important customer service tool sentiment
analysis has become. So here we are, we will train a classifier movie reviews in IMDB data
set, using Recurrent Neural Networks. Recurrent Neural Networks (RNNs) are a type of
neural network that are well suited for natural language processing tasks, such as sentiment
analysis. To perform sentiment analysis on IMDB movie reviews using an RNN, you would
need to first pre-process the text data by tokenizing the reviews and creating numerical
representations of the words, such as word embeddings. Then you would train the RNN on
the pre-processed data, using the review text as input and the corresponding sentiment
(positive or negative) as the output. Once the model is trained, you can use it to predict the
sentiment of new reviews by inputting the text into the trained model and interpreting the
output.
A recurrent neural network (RNN) is a class of artificial neural networks where connections
between nodes form a directed graph along a temporal sequence. This allows it to exhibit
temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their
internal state (memory) to process variable length sequences of inputs. This makes them
applicable to tasks such as unsegmented, connected handwriting recognition or speech
recognition.
The term “recurrent neural network” is used indiscriminately to refer to two broad classes of
networks with a similar general structure, where one is finite impulse and the other is infinite
impulse. Both classes of networks exhibit temporal dynamic behavior. A finite impulse
recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly
feedforward neural network, while an infinite impulse recurrent network is a directed cyclic
graph that cannot be unrolled.
Both finite impulse and infinite impulse recurrent networks can have additional stored states,
and the storage can be under direct control by the neural network. The storage can also be
replaced by another network or graph, if that incorporates time delays or has feedback loops.
Such controlled states are referred to as gated state or gated memory, and are part of long
short-term memory networks (LSTMs) and gated recurrent units. This is also called Feedback
Neural Network (FNN).
In this example, we first load the IMDB dataset using the imdb.load_data function. The
dataset consists of 25,000 labeled movie reviews for training and 25,000 for testing. The
reviews are already preprocessed and encoded as sequences of integers, where each integer
represents a specific word in a vocabulary of 5,000 words.
Next, we use the sequence.pad_sequences function to pad the sequences to a fixed length of
500, to make sure that all the input sequences have the same length.
Then, we define the architecture of the RNN model using the Keras library. The model
includes an Embedding layer, which maps the input sequences to a high-dimensional space,
an LSTM layer, which processes the input sequences, and a Dense layer, which is used for
classification.
After that, we compile the model by specifying the loss function, the optimizer, and the
evaluation metric.
Finally, we train the model using the fit function. The training process consists of multiple
iterations over the training data, called epochs, and at the end of each epoch, the model's
performance is evaluated using the validation data (x_test, y_test).
Once the model is trained, it can be used to classify new reviews as positive or negative by
calling the predict method and passing in the review as a padded sequence of integers.
8.Implementing Autoencoder for encoding the real-
world data.
Introduction
Building Auto encoders in Keras
"Auto encoding" is a data compression algorithm where the compression and decompression
functions are 1) data-specific, 2) lossy, and 3) learned automatically from examples rather than
engineered by a human. Additionally, in almost all contexts where the term "auto encoder" is
used, the compression and decompression functions are implemented with neural networks.
1) Auto encoders are data-specific, which means that they will only be able to compress data
similar to what they have been trained on. This is different from, say, the MPEG-2 Audio Layer
III (MP3) compression algorithm, which only holds assumptions about "sound" in general, but
not about specific types of sounds. An auto encoder trained on pictures of faces would do a
rather poor job of compressing pictures of trees, because the features it would learn would be
face-specific.
2) Auto encoders are lossy, which means that the decompressed outputs will be degraded
compared to the original inputs (similar to MP3 or JPEG compression). This differs from lossless
arithmetic compression.
3) Auto encoders are learned automatically from data examples, which is a useful property: it
means that it is easy to train specialized instances of the algorithm that will perform well on a
specific type of input. It doesn't require any new engineering, just appropriate training data.
To build an auto encoder, you need three things: an encoding function, a decoding function, and
a distance function between the amount of information loss between the compressed
representation of your data and the decompressed representation (i.e. a "loss" function). The
encoder and decoder will be chosen to be parametric functions (typically neural networks), and
to be differentiable with respect to the distance function, so the parameters of the
encoding/decoding functions can be optimizing to minimize the reconstruction loss, using
Stochastic Gradient Descent. It's simple! And you don't even need to understand any of these
words to start using auto encoders in practice.
Code:
import keras
from keras import layers
# This is the size of our encoded representations
encoding_dim = 32
# 32 floats -
> compression of factor 24.5, assuming the input is 784 floats
# This is our input image
input_img = keras.Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = layers.Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = layers.Dense(784, activation='sigmoid')(encoded)
# This model maps an input to its reconstruction
autoencoder = keras.Model(input_img, decoded)
# Let's also create a separate encoder model:
# This model maps an input to its encoded representation
encoder = keras.Model(input_img, encoded)
# This is our encoded (32-dimensional) input
encoded_input = keras.Input(shape=(encoding_dim,))
# Retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# Create the decoder model
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)
autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
# Encode and decode some digits
# Note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
# Display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Output:
Example: As we cannot limit ourselves to a single layer as encoder or decoder.
Implement same method by using stack of layers with Convolutional auto encoder
9.Study and implementation of LSTM
Introduction
Long Short Term Memory (LSTM) : Long short-term memory (LSTM) units (or blocks) are a
building unit for layers of a recurrent neural network (RNN). A RNN composed of LSTM units
is often called an LSTM network. A common LSTM unit is composed of a cell, an input gate, an
output gate and a forget gate. The cell is responsible for "remembering" values over arbitrary
time intervals; hence the word "memory" in LSTM. Each of the three gates can be thought of as
a "conventional" artificial neuron, as in a multi-layer (or feed forward) neural network: that is,
they compute an activation (using an activation function) of a weighted sum. Intuitively, they can
be thought as regulators of the flow of values that goes through the connections of the LSTM;
hence the denotation "gate". There are connections between these gates and the cell.
The expression long short-term refers to the fact that LSTM is a model for the short-term
memory which can last for a long period of time. An LSTM is well-suited to classify, process
and predict time series given time lags of unknown size and duration between important events.
LSTMs were developed to deal with the exploding and vanishing gradient problem when
training traditional RNNs.
Example
Recurrent neural networks have a wide array of applications. These include time series analysis,
document classification, and speech and voice recognition. In contrast to feed forward artificial
neural networks, the predictions made by recurrent neural networks are dependent on previous
predictions.
Draw a straight line. Let us see, if LSTM can learn the relationship of a straight line and predict
it.
First let us create the dataset depicting a straight line.
x = numpy.arange (1,500,1)
y = 0.4 * x + 30
plt.plot(x,y)
Now that the data has been created and split into train and test. Let’s convert the time series data
into the form of supervised learning data according to the value of look-back period, which is
essentially the number of lags which are seen to predict the value at time ‘t’.
So a time series like this −
time variable_x
t1 x1
t2 x2
: :
: :
T xT
When look-back period is 1, is converted to −
x1 x2
x2 x3
: :
: :
xT-1 xT
In [404]:
def create_dataset(n_X, look_back):
dataX, dataY = [], []
for i in range(len(n_X)-look_back):
a = n_X[i:(i+look_back), ]
dataX.append(a)
dataY.append(n_X[i + look_back, ])
return numpy.array(dataX), numpy.array(dataY)
look_back = 1
trainx,trainy = create_dataset(train, look_back)
testx,testy = create_dataset(test, look_back)
model = Sequential()
model.add(LSTM(256, return_sequences = True, input_shape = (trainx.shape[1], 2)))
model.add(LSTM(128,input_shape = (trainx.shape[1], 2)))
model.add(Dense(2))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
model.fit(trainx, trainy, epochs = 2000, batch_size = 10, verbose = 2, shuffle = False)
model.save_weights('LSTMBasic1.h5')
model.load_weights('LSTMBasic1.h5')
predict = model.predict(testx)
Now let’s see what our predictions look like.
plt.plot(testx.reshape(398,2)[:,0:1], testx.reshape(398,2)[:,1:2])
plt.plot(predict[:,0:1], predict[:,1:2])
Exercise
The International Airline Passengers prediction problem. This is a problem where, given a year
and a month, the task is to predict the number of international airline passengers in units of
1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144
observations.
10.Implementing word2vec for the real-world data.