DL Student Lab Manual
DL Student Lab Manual
MR20-1CS0284
School of Engineering
CSE-AI & ML DEPARTMENT (III YEAR II SEMESTER)
Page
S No. Name of the Experiment
No.
1 Getting familiar with the usage of Google colab and
using GPU as processing unit.
2 Study and implementation of feed forward NN.
3 Study and implementation of back propagation.
4 Implement Batch gradient descent, Stochastic gradient
descent and mini batch gradient descent.
5 Study the effect of batch normalization and dropout in
neural network classifier PCA.
6 Study of Singular value Decomposition for
dimensionality reduction..
7 Train a sentiment analysis model on IMDB dataset, use
RNN layers.
8 Perform Object detection USING CNN.
9 Implementing Autoencoder for encoding the real-world
data.
10 Study and implementation of LSTM
11 Implementation of GAN for generating Handwritten
Digits images.
12 Implementing word2vec for the rela-world data.
MALLA REDDY UNIVERSITY CSE - AI & ML DEPARTMENT
1. Getting familiar with the usage of Google Colab and using GPU
as processing unit
Introduction
Colaboratory by Google (Google Colab in short) is a Jupyter notebook based runtime
environment which allows you to run code entirely on the cloud. This is necessary because it
means that you can train large scale ML and DL models even if you don’t have access to a
powerful machine or a high-speed internet access. Google Colab supports both GPU and TPU
instances, which makes it a perfect tool for deep learning and data analytics enthusiasts because
of computational limitations on local machines. Since a Colab notebook can be accessed
remotely from any machine through a browser, it’s well suited for commercial purposes as well.
In this tutorial you will learn:
Getting around in Google Colab
Installing python libraries in Colab
Downloading large datasets in Colab
Training a Deep learning model in Colab
Using TensorBoard in Colab
Creating your first .ipynb notebook in colab
Open a browser of your choice and go to colab.research.google.com and sign in using your
Google account. Click on a new notebook to create a new runtime instance. In the top left corner,
you can change the name of the notebook from “Untitled.ipynb“ to the name of your choice by
clicking on it. The cell execution block is where you type your code. To execute the cell, press
shift + enter. The variable declared in one cell can be used in other cells as a global variable.
The environment automatically prints the value of the variable in the last line of the code block if
stated explicitly.
Training a sample tensorflow model
Training a machine learning model in Colab is very easy. The best part about it is not having to
set up a custom runtime environment, it’s all handled for you. For example, let’s look at training
a basic deep learning model to recognize handwritten digits trained on the MNIST dataset. The
data is loaded from the standard Keras dataset archive. The model is very basic, it categorizes
images as numbers and recognizes them.
Setup:
#import necessary libraries
import tensorflow as tf
#load training data and split into train and test sets
mnist = tf.keras.datasets.mnist
The output for this code snippet will look like this:
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
Next, we define the Google Colab model using Python:
#define model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
The expected output upon execution of the above code snippet is:
Epoch 1/5
1875/1875 [==============================] - 7s 3ms/step - loss: 0.2954 - accuracy: 0.9138
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.1422 - accuracy: 0.9572
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1070 - accuracy: 0.9664
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0886 - accuracy: 0.9721
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0753 - accuracy: 0.9764
<keras.callbacks.History at 0x7f84a21a2760>
Next, we test the model accuracy on test set:
#test model accuracy on test set
model.evaluate(x_test,y_test,verbose=2)
The expected output upon execution of the above code snippet is:
313/313 - 1s - loss: 0.0809 - accuracy: 0.9744 - 655ms/epoch - 2ms/step
[0.0809035375714302, 0.974399983882904]
model,
tf.keras.layers.Softmax()])
Downloading a dataset
When you’re training a machine learning model on your local machine, you’re likely to have
trouble with the storage and bandwidth costs that come with downloading and storing the dataset
required for training a model. Deep learning datasets can be massive in size, ranging between 20
to 50 Gb. Downloading them is most challenging if you’re living in a developing country, where
getting high-speed internet isn’t possible. The most efficient way to use datasets is to use a cloud
interface to download them, rather than manually uploading the dataset from a local
machine. Thankfully, Colab gives us a variety of ways to download the dataset from common
data hosting platforms.
To download an existing dataset from Kaggle, we can follow the steps outlined below:
1. Go to your Kaggle Account and click on “Create New API Token”. This will download a
kaggle.json file to your machine.
2. Go to your Google Colab project file, and run the following commands:
! pip install -q kaggle
from google.colab import files
NOTE: By default, the Colab notebook uses Python shell. To run terminal commands in Colab,
you will have to use “!” at the beginning of the command.
For example, to download a file from some.url and save it as some.file, you can use the
following command in Colab:
!curl http://some.url --output some.file
NOTE: The curl command will download the dataset in the Colab workspace, which will be lost
every time the runtime is disconnected. Hence a safe practice is to move the dataset into your
cloud drive as soon as the dataset is downloaded completely.
Downloading the dataset from GCP or Google Drive
Google Cloud Platform is a cloud computing and storage platform. You can use it to store large
datasets, and you can import that dataset directly from the cloud into Colab. To upload and
download files on GCP, first you need to authenticate your Google account.
from google.colab import auth
auth.authenticate_user()
After that, install gsutil to upload and download files, and then init gcloud.
!curl https://sdk.cloud.google.com | bash
!gcloud init
Once you have configured these options, you can use the following commands to
download/upload files to and from Google Cloud Storage.
Initiating a runtime with GPU/TPU enabled
Deep learning is a computationally expensive process, a lot of calculations need to be executed at
the same time to train a model. To mitigate this issue, Google Colab offers us not only the classic
CPU runtime but also an option for a GPU and TPU runtime as well.
The CPU runtime is best for training large models because of the high memory it provides. The
GPU runtime shows better flexibility and programmability for irregular computations, such as
small batches and nonMatMul computations. The TPU runtime is highly-optimized for large
batches and CNNs and has the highest training throughput. If you have a smaller model to train, I
suggest training the model on GPU/TPU runtime to use Colab to its full potential.
To create a GPU/TPU enabled runtime, you can click on runtime in the toolbar menu below the
file name. From there, click on “Change runtime type”, and then select GPU or TPU under the
Hardware Accelerator dropdown menu.
Exercise: Try yourself uploading a dataset and perform basic data pre-
processing on it
Simple NN graph:
Input Layer — contains one or more input nodes. For example, suppose you want to
predict whether it will rain tomorrow and base your decision on two variables, humidity
and wind speed. In that case, your first input would be the value for humidity, and the
second input would be the value for wind speed.
Hidden Layer — this layer houses hidden nodes, each containing an activation
function (more on these later). Note that a Neural Network with multiple hidden layers is
known as Deep Neural Network.
Output Layer — contains one or more output nodes. Following the same weather
prediction example above, you could choose to have only one output node generating a
rain probability (where >0.5 means rain tomorrow, and ≤0.5 no rain tomorrow).
Alternatively, you could have two output nodes, one for rain and another for no rain.
Note, you can use a different activation function for output nodes vs. hidden nodes.
Connections — lines joining different nodes are known as connections. These
contain kernels (weights) and biases, the parameters that get optimized during the
training of a neural network.
Kernels (weights) — used to scale input and hidden node values. Each connection
typically holds a different weight.
Biases — used to adjust scaled values before passing them through an activation
function.
Activation functions — think of activation functions as standard curves (building
blocks) used by the Neural Network to create a custom curve to fit the training data.
Passing different input values through the network selects different sections of the
standard curve, which are then assembled into a final custom-fit curve.
Loss functions, optimizers, and training
Training Neural Networks involves a complicated process known as backpropagation. I
will not go through a step-by-step explanation of how backpropagation works since it is a big
enough topic deserving a separate article. Instead, let me briefly introduce you to loss functions
and optimizers and summarize what happens when we “train” a Neural Network.
Loss — represents the “size” of error between the true values/labels and
the predicted values/labels. The goal of training a Neural Network is to minimize this
loss. The smaller the loss, the closer the match between the true and the predicted data.
There are many loss functions to choose from, with Binary Crossentropy, Categorical
Crossentropy, and Mean Squared Error being the most common.
Optimizers — are the algorithms used in backpropagation. The goal of an optimizer is
to find the optimum set of kernels (weights) and biases to minimize the loss. Optimizers
typically use a gradient descent approach, which allows them to iteratively find the “best”
possible configuration of weights and biases. The most commonly used ones are SGD,
ADAM, and RMS Prop.
Training a Neural Network is basically fitting a custom curve through the training data
until it can approximate it as well as possible. The graph below illustrates what a custom-fitted
curve could look like in a specific scenario. This example contains a set of data that seem to flip
between 0 and 1 as the value for input increases.
Setup
We’ll need the following data and libraries:
Australian weather data from Kaggle (license: Creative Commons, original source of the data:
Commonwealth of Australia, Bureau of Meteorology).
Pandas and Numpy for data manipulation
Plotly for data visualizations
Tensorflow/Keras for Neural Networks
Scikit-learn library for splitting the data into train-test samples, and for some basic model
evaluation
def __init__(self):
self.w = None
self.b = None
Output:
labels_orig = labels
#converting the multi-class to binary class
labels = np.mod(labels_orig,2)
Output:
Output:
(750, 2) (250, 2)
Implementation of Feed Forward Neural self.db1 = (self.h3-y) * self.h3*(1-
Network code self.h3) * self.w5 * self.h1*(1-self.h1)
m = X.shape[1]
self.w1 -= learning_rate * dw1 / m Y_pred_train = ffn.predict(X_train)
self.w2 -= learning_rate * dw2 / m Y_pred_binarised_train = (Y_pred_train >= 0.5).as
self.w3 -= learning_rate * dw3 / m type("int").ravel()
self.w4 -= learning_rate * dw4 / m Y_pred_val = ffn.predict(X_val)
self.w5 -= learning_rate * dw5 / m Y_pred_binarised_val = (Y_pred_val >= 0.5).asty
self.w6 -= learning_rate * dw6 / m pe("int").ravel()
self.b1 -= learning_rate * db1 / m accuracy_train = accuracy_score(Y_pred_binarise
self.b2 -= learning_rate * db2 / m d_train, Y_train)
self.b3 -= learning_rate * db3 / m accuracy_val = accuracy_score(Y_pred_binarised_
val, Y_val)
if display_loss:
Y_pred = self.predict(X) print("Training accuracy", round(accuracy_train, 2
loss[i] = mean_squared_error(Y_pred, Y) ))
print("Validation accuracy", round(accuracy_val, 2
if display_loss: ))
plt.plot(loss.values())
plt.xlabel('Epochs') Output:
plt.ylabel('Mean Squared Error') Training accuracy 0.73
plt.show() Validation accuracy 0.72
def sigmoid(sop):
return 1.0/(1+numpy.exp(-1*sop))
def error(predicted, target):
return numpy.power(predicted-target, 2)
def error_predicted_deriv(predicted, target):
return 2*(predicted-target)
def sigmoid_sop_deriv(sop):
return sigmoid(sop)*(1.0-sigmoid(sop))
def sop_w_deriv(x):
return x
def update_w(w, grad, learning_rate):
return w - learning_rate*grad
x1=0.1
x2=0.4
target = 0.7
learning_rate = 0.01
w1=numpy.random.rand()
w2=numpy.random.rand()
predicted_output = []
network_error = []
old_err = 0
for k in range(80000):
# Forward Pass
y = w1*x1 + w2*x2
predicted = sigmoid(y)
err = error(predicted, target)
predicted_output.append(predicted)
network_error.append(err)
# Backward Pass
g1 = error_predicted_deriv(predicted, target)
g2 = sigmoid_sop_deriv(y)
g3w1 = sop_w_deriv(x1)
g3w2 = sop_w_deriv(x2)
gradw1 = g3w1*g2*g1
gradw2 = g3w2*g2*g1
print(predicted)
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(network_error)
matplotlib.pyplot.title("Iteration Number vs Error")
matplotlib.pyplot.xlabel("Iteration Number")
matplotlib.pyplot.ylabel("Error")
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(predicted_output)
matplotlib.pyplot.title("Iteration Number vs Prediction")
matplotlib.pyplot.xlabel("Iteration Number")
matplotlib.pyplot.ylabel("Prediction")
Output:
Streaming output truncated to the last 5000 lines.
0.6999984864252651
0.6999984866522116
0.6999984868791244
0.699998487106003
0.6999984873328476
.
.
.
.
0.6999992843293953
0.6999992844367032
0.6999992845439951
0.6999992846512709
0.6999992847585306
Example: Execute the error plots for 80,000 epochs and show the saturated error output
value. Also write a short note about your analysis with the output value.
Solution:
Objective:
To implement Batch gradient descent, Stochastic gradient descent and mini batch gradient
descent.
Batch gradient descent uses all training samples in forward pass to calculate cumulative error and then we
adjust weights using derivatives. In stochastic GD, we randomly pick one training sample, perform
forward pass, compute the error and immediately adjust weights.
So the key difference here is that to adjust weights, batch GD will use all training samples where as
stochastic GD will use one randomly picked training sample.
Mini batch is intermediate version of batch GD and stochastic GD. In mini gradient descent you will use a
batch of samples in each iteration. For example if you have total 50 training samples, you can take a batch
of 10 samples, calculate cumulative error for those 10 samples and then adjust weights.
To summarize it, In SGD we adjust weights after every one sample. In Batch we adjust weights after
going through all samples but in mini batch we do after every m samples (where m is batch size and it is 0
< m < n, where n is total number of samples).
Gradient descent allows you to find weights (w1,w2,w3) and bias in the following linear equation for
housing price prediction.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
df = pd.read_csv("homeprices.csv")
df.sample(5)
output:
index,area,bedrooms,price
5,1170,2,38.0
14,2250,3,101.0
8,1310,3,50.0
10,1800,3,82.0
0,1056,2,39.07
scaled_X = sx.fit_transform(df.drop('price',axis='columns'))
scaled_y = sy.fit_transform(df['price'].values.reshape(df.shape[0],1))
scaled_X
scaled_y
scaled_y.reshape(20,)
number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to number of features. In
# our case number_of_features = 2 (area, bedroom)
w = np.ones(shape=(number_of_features))
b = 0
total_samples = X.shape[0] # number of rows in X
cost_list = []
epoch_list = []
for i in range(epochs):
y_predicted = np.dot(w, X.T) + b
w_grad = -(2/total_samples)*(X.T.dot(y_true-y_predicted))
b_grad = -(2/total_samples)*np.sum(y_true-y_predicted)
w = w - learning_rate * w_grad
b = b - learning_rate * b_grad
cost = np.mean(np.square(y_true-
y_predicted)) # MSE (Mean Squared Error)
if i%10==0:
cost_list.append(cost)
epoch_list.append(i)
Output:
(array([0.70712464, 0.67456527]), -0.23034857438407427, 0.0068641890429808105)
plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list,cost_list)
Output:
def predict(area,bedrooms,w,b):
scaled_X = sx.transform([[area, bedrooms]])[0]
# here w1 = w[0] , w2 = w[1], w3 = w[2] and bias is b
# equation for price is w1*area + w2*bedrooms + w3*age + bias
# scaled_X[0] is area
# scaled_X[1] is bedrooms
# scaled_X[2] is age
scaled_price = w[0] * scaled_X[0] + w[1] * scaled_X[1] + b
# once we get price prediction we need to to rescal it back to origina
l value
predict(2600,4,w,b)
predict(1000,2,w,b)
predict(1500,3,w,b)
number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to number of features. In
# our case number_of_features = 3 (area, bedroom and age)
w = np.ones(shape=(number_of_features))
b = 0
total_samples = X.shape[0]
cost_list = []
epoch_list = []
for i in range(epochs):
random_index = random.randint(0,total_samples-
1) # random index from total samples
sample_x = X[random_index]
sample_y = y_true[random_index]
w_grad = -(2/total_samples)*(sample_x.T.dot(sample_y-y_predicted))
b_grad = -(2/total_samples)*(sample_y-y_predicted)
w = w - learning_rate * w_grad
b = b - learning_rate * b_grad
cost = np.square(sample_y-y_predicted)
cost_list.append(cost)
epoch_list.append(i)
Output:
(array([0.70999659, 0.67807531]), -0.23262199997984362,
0.011241941221627246)
w , b
plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list_sgd,cost_list_sgd)
Output:
predict(2600,4,w_sgd, b_sgd)
predict(1000,2,w_sgd, b_sgd)
np.random.permutation(20)
Output:
array([16, 10, 7, 8, 3, 17, 15, 13, 2, 12, 4, 1, 18, 9, 14, 6, 5, 19, 0,
11])
number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to number of features. In
# our case number_of_features = 3 (area, bedroom and age)
w = np.ones(shape=(number_of_features))
b = 0
total_samples = X.shape[0] # number of rows in X
cost_list = []
epoch_list = []
num_batches = int(total_samples/batch_size)
for i in range(epochs):
random_indices = np.random.permutation(total_samples)
X_tmp = X[random_indices]
y_tmp = y_true[random_indices]
for j in range(0,total_samples,batch_size):
Xj = X_tmp[j:j+batch_size]
yj = y_tmp[j:j+batch_size]
y_predicted = np.dot(w, Xj.T) + b
w_grad = -(2/len(Xj))*(Xj.T.dot(yj-y_predicted))
b_grad = -(2/len(Xj))*np.sum(yj-y_predicted)
w = w - learning_rate * w_grad
b = b - learning_rate * b_grad
cost = np.mean(np.square(yj-
y_predicted)) # MSE (Mean Squared Error)
if i%10==0:
cost_list.append(cost)
epoch_list.append(i)
Output:
(array([0.71015977, 0.67813327]), -0.23343143725261098,
0.0025898311387083403)
plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list,cost_list)
Output:
def predict(area,bedrooms,w,b):
scaled_X = sx.transform([[area, bedrooms]])[0]
# here w1 = w[0] , w2 = w[1], w3 = w[2] and bias is b
# equation for price is w1*area + w2*bedrooms + w3*age + bias
# scaled_X[0] is area
# scaled_X[1] is bedrooms
# scaled_X[2] is age
scaled_price = w[0] * scaled_X[0] + w[1] * scaled_X[1] + b
# once we get price prediction we need to to rescal it back to origina
l value
# also since it returns 2D array, to get single value we need to do va
lue[0][0]
return sy.inverse_transform([[scaled_price]])[0][0]
predict(2600,4,w,b)
Output:
128.65424087579652
predict(1000,2,w,b)
Output:
29.9855861683337
Exercise:
Implement Gradient Descent for Neural Network (or Logistic Regression) by Predicting if a
person would buy life insurance based on his age.
Solution:
Introduction
In this lab exercise, we will discuss why we need batch normalization and dropout in deep neural
networks followed by experiments using Pytorch on a standard data set to see the effects of batch
normalization and dropout.
Batch Normalization
By normalizing the inputs we are able to bring all the inputs features to the same scale. In the
neural network, we need to compute the pre-activation for the first neuron of the first layer a₁₁.
We know that pre-activation is nothing but the weighted sum of inputs plus bias. In other words,
it is the dot product between the first row of the weight matrix W₁ and the input matrix X plus
bias b₁₁.
The mathematical equation for pre-activation at each layer ‘i’ is given by,
The activation at each layer is equal to applying the activation function to the output of the pre-
activation of that layer. The mathematical equation for the activation at each layer ‘i’ is given by,
In order to bring all the activation values to the same scale, we normalize the activation values
such that the hidden representation doesn’t vary drastically and also helps us to get improvement
in the training speed.
Why is it called batch normalization?
Since we are computing the mean and standard deviation from a single batch as opposed to
computing it from the entire data. Batch normalization is done individually at each hidden
neuron in the network.
To get a better insight into how batch normalization helps in faster converge of the network, we
will look at the distribution of values across multiple hidden layers in the network during the
training phase.
For consistency, we will plot the output of the second linear layer from the two networks and
compare the distributions of the output from that layer across the networks. The results look like
this:
From the graphs, we can conclude that the distribution of values without batch normalization has
changed significantly between iterations of inputs within each epoch which means that the
III YEAR II SEMESTER Deep Learning Lab
Page 33
MALLA REDDY UNIVERSITY CSE - AI & ML DEPARTMENT
subsequent layers in the network without batch normalization are seeing a varying distribution of
input data. But the change in the distribution of values for the model with batch normalization
seems to be slightly negligible.
Dropout
In this section of the lab, we discuss the concept of dropout in neural networks specifically how
it helps to reduce overfitting and generalization error. After that, we will implement a neural
network with and without dropout to see how dropout influences the performance of a network
using Pytorch.
Dropout is a regularization technique that “drops out” or “deactivates” few neurons in the neural
network randomly in order to avoid the problem of overfitting.
The idea of Dropout
Training one deep neural network with large parameters on the data might lead to overfitting.
Can we train multiple neural networks with different configurations on the same dataset and take
the average value of these predictions?.
But creating an ensemble of neural networks with different architectures and training them
wouldn’t be feasible in practice. Dropout to the rescue.
Dropout deactivates the neurons randomly at each training step instead of training the data on the
original network, we train the data on the network with dropped out nodes. In the next iteration
of the training step, the hidden neurons which are deactivated by dropout changes because of its
probabilistic behavior. In this way, by applying dropout i.e…deactivating certain individual
nodes at random during training we can simulate an ensemble of neural network with different
architectures.
In each training iteration, each node in the network is associated with a probability p whether to
keep in the network or to deactivate it (dropout) out of the network with probability 1-p. That
means the weights associated with the nodes got updated only p fraction of times because nodes
are active only p times during training.
Dropout at Test time
During test time, we consider the original neural network with all activations present and scale
the output of each node by a value p. Since each node is activated the only p times.
To show the overfitting, we will train two networks — one without dropout and another with
dropout. The network without dropout has 3 fully connected hidden layers with ReLU as the
activation function for the hidden layers and the network with dropout also has similar
architecture but with dropout applied after first & second Linear layer.
In this example, I have used a dropout fraction of 0.5 after the first linear layer and 0.2 after the
second linear layer. Once we train the two different models i.e…one without dropout and another
with dropout and plot the test results, it would look like this:
From the above graphs, we can conclude that as we increase the number of epochs the model
without dropout is overfitting the data. The model without dropout is learning the noise
associated with the data instead of generalizing for the data. We can see that the loss associated
with the model without drop increases as we increase the number of epochs unlike the loss
associated with the model with dropout.
Example:
Outline
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
import seaborn as sns
plt.figure(figsize=(batch_size * 4, 4))
plt.axis('off')
plt.imshow(np.transpose(img, (1, 2, 0)))
plt.title(title)
plt.show()
def show_batch_images(dataloader):
images, labels = next(iter(dataloader))
img = torchvision.utils.make_grid(images)
imshow(img, title=[str(x.item()) for x in labels])
Batch Normalisation
class MyNetBN(nn.Module): nn.Linear(48, 24),
def __init__(self): nn.BatchNorm1d(24),
super(MyNetBN, self).__ini nn.ReLU(),
t__() nn.Linear(24, 10)
self.classifier = nn.Seque )
ntial(
nn.Linear(784, 48), def forward(self, x):
nn.BatchNorm1d(48), x = x.view(x.size(0), -1)
nn.ReLU(), x = self.classifier(x)
return x b = model_bn.classifier[1](b)
model_bn = MyNetBN() b = b.detach().numpy().ravel()
print(model_bn)
batch_size = 512 sns.distplot(b, kde=True, color='g
trainloader = torch.utils.data.Dat ', label='BatchNorm')
aLoader(trainset, batch_size=batch plt.title('%d: Loss = %0.2f, Loss
_size, shuffle=True) with bn = %0.2f' % (i, loss.item()
loss_fn = nn.CrossEntropyLoss() , loss_bn.item()))
opt_bn = optim.SGD(model_bn.parame plt.legend()
ters(), lr=0.01) plt.show()
plt.pause(0.5)
oss_arr = []
loss_bn_arr = []
model_bn.train()
max_epochs = 2
print('----------------------')
for epoch in range(max_epochs):
for i, data in enumerate(trainload
er, 0):
plt.plot(loss_bn_arr, 'g', label='
BatchNorm')
inputs, labels = data
plt.legend()
plt.show()
# training steps for bn model
opt_bn.zero_grad()
outputs_bn = model_bn(inputs)
loss_bn = loss_fn(outputs_bn, labe
ls)
loss_bn.backward()
opt_bn.step()
loss_bn_arr.append(loss_bn.item())
if i % 10 == 0:
inputs = inputs.view(inputs.size(0
), -1)
model_bn.eval()
b = model_bn.classifier[0](inputs)
Outputs:
Exercise
Write down the dropout for the above program
Singular Value Decomposition, or SVD, might be the most popular technique for
dimensionality reduction when data is sparse. Sparse data refers to rows of data where many
of the values are zero. This is often the case in some problem domains like recommender
systems where a user has a rating for very few movies or songs in the database and zero
ratings for all other cases. Another common example is a bag of words model of a text
document, where the document has a count or frequency for some words and most words
have a 0 value.
Examples of sparse data appropriate for applying SVD for dimensionality reduction:
Recommender Systems
Customer-Product purchases
User-Song Listen Counts
User-Movie Ratings
Text Classification
SVD can be thought of as a projection method where data with m-columns (features) is
projected into a subspace with m or fewer columns, whilst retaining the essence of the
original data.
The SVD is used widely both in the calculation of other matrix operations, such as matrix
inverse, but also as a data reduction method in machine learning.
# define dataset
X, y = get_dataset()
# get the models to evaluate
models = get_models()
# evaluate the models and store results
III YEAR II SEMESTER Deep Learning Lab
Page 43
MALLA REDDY UNIVERSITY CSE - AI & ML DEPARTMENT
Output:
Exercise: Draw the whisker plot for the above solution for creating the distribution of
accuracy scores for each configured number of dimensions and also find out the predicted
class by using combination of SVD transform and logistic regression model.
Solution:
In this example, we first load the IMDB dataset using the imdb.load_data function. The
dataset consists of 25,000 labeled movie reviews for training and 25,000 for testing. The
reviews are already preprocessed and encoded as sequences of integers, where each integer
represents a specific word in a vocabulary of 5,000 words.
Next, we use the sequence.pad_sequences function to pad the sequences to a fixed length of
500, to make sure that all the input sequences have the same length.
Then, we define the architecture of the RNN model using the Keras library. The model
includes an Embedding layer, which maps the input sequences to a high-dimensional space,
an LSTM layer, which processes the input sequences, and a Dense layer, which is used for
classification.
After that, we compile the model by specifying the loss function, the optimizer, and the
evaluation metric.
Finally, we train the model using the fit function. The training process consists of multiple
iterations over the training data, called epochs, and at the end of each epoch, the model's
performance is evaluated using the validation data (x_test, y_test).
Once the model is trained, it can be used to classify new reviews as positive or negative by
calling the predict method and passing in the review as a padded sequence of integers.
Exercise:
Perform sentiment analysis using a Recurrent Neural Network (RNN) with python code
using simple data set.
Object Detection is the process of finding real-world object instances like car, bike,
TV, flowers, and humans in still images or Videos. It allows for the recognition, localization,
and detection of multiple objects within an image which provides us with a much better
understanding of an image as a whole. It is commonly used in applications such as image
retrieval, security, surveillance, and advanced driver assistance systems (ADAS). Image
classification is straight forward, but the differences between object localization and object
detection can be confusing, especially when all three tasks may be just as equally referred to
as object recognition.
Image classification involves assigning a class label to an image, whereas object
localization involves drawing a bounding box around one or more objects in an image. Object
detection is more challenging and combines these two tasks and draws a bounding box
around each object of interest in the image and assigns them a class label. Together, all of
these problems are referred to as object recognition.
As such, we can distinguish between these three computer vision tasks:
Image Classification: Predict the type or class of an object in an image.
Input: An image with a single object, such as a photograph.
Output: A class label (e.g. one or more integers that are mapped to class
labels).
Object Localization: Locate the presence of objects in an image and indicate their
location with a bounding box.
Input: An image with one or more objects, such as a photograph.
Output: One or more bounding boxes (e.g. defined by a point, width, and
height).
Object Detection: Locate the presence of objects with a bounding box and types or
classes of the located objects in an image.
Input: An image with one or more objects, such as a photograph.
Output: One or more bounding boxes (e.g. defined by a point, width, and
height), and a class label for each bounding box.
Object Detection can be done via multiple ways:
Below is an example comparing single object localization and object detection, taken from
the ILSVRC paper. Note the difference in ground truth expectations in each case.
Every Object Detection Algorithm has a different way of working, but they all work on the
same principle.
Feature Extraction: They extract features from the input images at hands and use these
features to determine the class of the image. Be it through MatLab, Open CV, Viola Jones or
Deep Learning.
SOLUTION:
Now to Download TensorFlow and TensorFlow GPU you can use pip or conda commands:
# For CPU
pip install tensorflow
# For GPU
pip install tensorflow-gpu
For all the other libraries we can use pip or conda to install them. The code is provided below
pip install --user Cython
pip install --user contextlib2
pip install --user pillow
pip install --user lxml
pip install --user jupyter
pip install --user matplotlib
Now you need to Clone or Download TensorFlow’s Model from Github. Once downloaded
and extracted rename the “models-masters” to just “models“.
Now for simplicity, we are going to keep “models” and “protobuf” under one folder
“Tensorflow“.
Next, we need to go inside the Tensorflow folder and then inside research folder and run
protobuf from there using this command:
To check whether this worked or not, you can go to the protos folder inside
models>object_detection>protos and there you can see that for every proto file there’s one
python file created.
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
Next, we will download the model which is trained on the COCO dataset. COCO stands for
Common Objects in Context, this dataset contains around 330K labeled images. Now the
model selection is important as you need to make an important tradeoff between Speed and
Accuracy. Depending upon your requirement and the system memory, the correct model must
be selected.
Inside “models>research>object_detection>g3doc>detection_model_zoo” contains all the
models with different speed and accuracy(mAP).
Next, we provide the required model and the frozen inference graph generated by
Tensorflow to use.
MODEL_NAME = This code will download that model from
'ssd_mobilenet_v1_coco_2017_11_17' the internet and extract the frozen
MODEL_FILE = MODEL_NAME + '.tar.gz' inference graph of that model.
DOWNLOAD_BASE = '<a opener = urllib.request.URLopener()
href="http://download.tensorflow.org/m opener.retrieve(DOWNLOAD_BASE +
odels/object_detection/">http://downloa MODEL_FILE, MODEL_FILE)
d.tensorflow.org/models/object_detectio tar_file = tarfile.open(MODEL_FILE)
n/</a>' for file in tar_file.getmembers():
PATH_TO_CKPT = MODEL_NAME + file_name =
'/frozen_inference_graph.pb' os.path.basename(file.name)
PATH_TO_LABELS = os.path.join('data', if 'frozen_inference_graph.pb' in
'mscoco_label_map.pbtxt') file_name:
NUM_CLASSES = 90 tar_file.extract(file, os.getcwd())
detection_graph = tf.Graph()
with detection_graph.as_default(): boxes and provide the class and the class
od_graph_def = tf.GraphDef() score of that particular object.
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as def
fid: run_inference_for_single_image(image,
serialized_graph = fid.read() graph):
with graph.as_default():
od_graph_def.ParseFromString(serialized with tf.Session() as sess:
_graph) # Get handles to input and output
tf.import_graph_def(od_graph_def, tensors
name='') ops =
Next, we are going to load all the labels tf.get_default_graph().get_operations()
label_map = all_tensor_names = {output.name for
label_map_util.load_labelmap(PATH_TO_ op in ops for output in op.outputs}
LABELS) tensor_dict = {}
categories=label_map_util.convert_label_ for key in [
map_to_categories(label_map, 'num_detections', 'detection_boxes',
max_num_classes=NUM_CLASSES, 'detection_scores',
use_display_name=True) 'detection_classes',
category_index = 'detection_masks'
label_map_util.create_category_index(cat ]:
egories) tensor_name = key + ':0'
Now we will convert the images data into if tensor_name in all_tensor_names:
a numPy array for processing. tensor_dict[key] =
def tf.get_default_graph().get_tensor_by_na
load_image_into_numpy_array(image): me(
(im_width, im_height) = image.size tensor_name)
return if 'detection_masks' in tensor_dict:
np.array(image.getdata()).reshape( # The following processing is only for
(im_height, im_width, single image
3)).astype(np.uint8) detection_boxes =
The path to the images for the testing tf.squeeze(tensor_dict['detection_boxes'],
purpose is defined here. Here we have a [0])
naming convention “image[i]” for i in (1 to detection_masks =
n+1), n being the number of images tf.squeeze(tensor_dict['detection_masks']
provided. , [0])
PATH_TO_TEST_IMAGES_DIR = # Reframe is required to translate
'test_images' mask from box coordinates to image
TEST_IMAGE_PATHS = [ coordinates and fit the image size.
os.path.join(PATH_TO_TEST_IMAGES_DIR, real_num_detection =
'image{}.jpg'.format(i)) tf.cast(tensor_dict['num_detections'][0],
This code runs the inference for a single tf.int32)
image, where it detects the objects, make
"Auto encoding" is a data compression algorithm where the compression and decompression
functions are 1) data-specific, 2) lossy, and 3) learned automatically from examples rather than
engineered by a human. Additionally, in almost all contexts where the term "auto encoder" is
used, the compression and decompression functions are implemented with neural networks.
1) Auto encoders are data-specific, which means that they will only be able to compress data
similar to what they have been trained on. This is different from, say, the MPEG-2 Audio Layer
III (MP3) compression algorithm, which only holds assumptions about "sound" in general, but
not about specific types of sounds. An auto encoder trained on pictures of faces would do a
rather poor job of compressing pictures of trees, because the features it would learn would be
face-specific.
2) Auto encoders are lossy, which means that the decompressed outputs will be degraded
compared to the original inputs (similar to MP3 or JPEG compression). This differs from lossless
arithmetic compression.
3) Auto encoders are learned automatically from data examples, which is a useful property: it
means that it is easy to train specialized instances of the algorithm that will perform well on a
specific type of input. It doesn't require any new engineering, just appropriate training data.
To build an auto encoder, you need three things: an encoding function, a decoding function, and
a distance function between the amount of information loss between the compressed
representation of your data and the decompressed representation (i.e. a "loss" function). The
encoder and decoder will be chosen to be parametric functions (typically neural networks), and
to be differentiable with respect to the distance function, so the parameters of the
encoding/decoding functions can be optimizing to minimize the reconstruction loss, using
Stochastic Gradient Descent. It's simple! And you don't even need to understand any of these
words to start using auto encoders in practice.
Code:
import keras
from keras import layers
# This is the size of our encoded representations
encoding_dim = 32
# 32 floats -
> compression of factor 24.5, assuming the input is 784 floats
# This is our input image
input_img = keras.Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = layers.Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = layers.Dense(784, activation='sigmoid')(encoded)
# This model maps an input to its reconstruction
autoencoder = keras.Model(input_img, decoded)
# Let's also create a separate encoder model:
# This model maps an input to its encoded representation
encoder = keras.Model(input_img, encoded)
# This is our encoded (32-dimensional) input
encoded_input = keras.Input(shape=(encoding_dim,))
# Retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# Create the decoder model
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
# Display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Output:
Introduction
Long Short Term Memory (LSTM) : Long short-term memory (LSTM) units (or blocks) are a
building unit for layers of a recurrent neural network (RNN). A RNN composed of LSTM units
is often called an LSTM network. A common LSTM unit is composed of a cell, an input gate, an
output gate and a forget gate. The cell is responsible for "remembering" values over arbitrary
time intervals; hence the word "memory" in LSTM. Each of the three gates can be thought of as
a "conventional" artificial neuron, as in a multi-layer (or feed forward) neural network: that is,
they compute an activation (using an activation function) of a weighted sum. Intuitively, they can
be thought as regulators of the flow of values that goes through the connections of the LSTM;
hence the denotation "gate". There are connections between these gates and the cell.
The expression long short-term refers to the fact that LSTM is a model for the short-term
memory which can last for a long period of time. An LSTM is well-suited to classify, process
and predict time series given time lags of unknown size and duration between important events.
LSTMs were developed to deal with the exploding and vanishing gradient problem when
training traditional RNNs.
Example
Recurrent neural networks have a wide array of applications. These include time series analysis,
document classification, and speech and voice recognition. In contrast to feed forward artificial
neural networks, the predictions made by recurrent neural networks are dependent on previous
predictions.
Draw a straight line. Let us see, if LSTM can learn the relationship of a straight line and predict
it.
First let us create the dataset depicting a straight line.
x = numpy.arange (1,500,1)
y = 0.4 * x + 30
plt.plot(x,y)
Now that the data has been created and split into train and test. Let’s convert the time series data
into the form of supervised learning data according to the value of look-back period, which is
essentially the number of lags which are seen to predict the value at time ‘t’.
So a time series like this −
time variable_x
t1 x1
t2 x2
: :
: :
T xT
When look-back period is 1, is converted to −
x1 x2
x2 x3
: :
: :
xT-1 xT
In [404]:
def create_dataset(n_X, look_back):
dataX, dataY = [], []
for i in range(len(n_X)-look_back):
a = n_X[i:(i+look_back), ]
dataX.append(a)
dataY.append(n_X[i + look_back, ])
return numpy.array(dataX), numpy.array(dataY)
look_back = 1
trainx,trainy = create_dataset(train, look_back)
testx,testy = create_dataset(test, look_back)
model = Sequential()
model.add(LSTM(256, return_sequences = True, input_shape = (trainx.shape[1], 2)))
model.add(LSTM(128,input_shape = (trainx.shape[1], 2)))
model.add(Dense(2))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
model.fit(trainx, trainy, epochs = 2000, batch_size = 10, verbose = 2, shuffle = False)
model.save_weights('LSTMBasic1.h5')
model.load_weights('LSTMBasic1.h5')
predict = model.predict(testx)
Now let’s see what our predictions look like.
plt.plot(testx.reshape(398,2)[:,0:1], testx.reshape(398,2)[:,1:2])
plt.plot(predict[:,0:1], predict[:,1:2])
Exercise
The International Airline Passengers prediction problem. This is a problem where, given a year
and a month, the task is to predict the number of international airline passengers in units of
1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144
observations.
Solution:
GANs consist of two neural networks, one trained to generate data and the other
trained to distinguish fake data from real data (hence the “adversarial” nature of the
model).
Exercise:
With this sample example, students are asked to perform Handwritten Digits Generator
with a GAN.
device = ""
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
transform = transforms.Compose(
[transforms.ToTensor(), Implementing the Discriminator
transforms.Normalize((0.5,), (0.5,))] and the Generator
) class Discriminator(nn.Module):
train_set = torchvision.datasets.MNIST( def __init__(self):
root=".", train=True, download=True, super().__init__()
transform=transform self.model = nn.Sequential(
) nn.Linear(784, 1024),
nn.ReLU(),
batch_size = 32 nn.Dropout(0.3),
train_loader = torch.utils.data.DataLoader( nn.Linear(1024, 512),
train_set, batch_size=batch_size, nn.ReLU(),
shuffle=True nn.Dropout(0.3),
) nn.Linear(512, 256),
nn.ReLU(),
To plot some samples of the nn.Dropout(0.3),
training data nn.Linear(256, 1),
real_samples, mnist_labels = nn.Sigmoid(),
next(iter(train_loader)) )
for i in range(16):
ax = plt.subplot(4, 4, i + 1) def forward(self, x):
plt.imshow(real_samples[i].reshape(28, x = x.view(x.size(0), 784)
28), cmap="gray_r") output = self.model(x)
plt.xticks([]) return output
plt.yticks([])
discriminator =
output: Discriminator().to(device=device)
class Generator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
# Show loss
Exercise
Check the Samples Generated by the GAN. Also write your experience with
what you did and output you got.
model.save("word2vec.model")
In this example, the text data is tokenized into sentences and words using the sent_tokenize and
word_tokenize functions from NLTK. Then, the Word2Vec class from the gensim library is used to train
the model on the tokenized data. The min_count parameter is used to specify the minimum number of
occurrences of a word in the dataset for it to be included in the vocabulary.
Additionally, as you see in the last 3 lines, we are using the model to access vector for one word, perform
cosine similarity between two words and find the most similar word based on the provided positive and
negative words
It is important to note that the word2vec algorithm works better with large datasets, so for better
performance, it is recommended to use a larger dataset of text data when training the model.
Exercise
Ex 1: use pre-trained word2vec models from Google or Wikipedia
Ex 2: Write Python code that uses the gensim library to train a word2vec model on a pre-defined
dataset: