0% found this document useful (0 votes)
54 views55 pages

DEEP LEARNING LAB Manuals

Uploaded by

lalit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views55 pages

DEEP LEARNING LAB Manuals

Uploaded by

lalit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Gyan Ganga College of Technology, Jabalpur

Department of Computer Science & Engineering

Lab Manual
CS704 Departmental Elective Lab
( Deep & Reinforcement Learning)

NAME:

ENROLLMENT NUMBER:

Faculty Incharge
Prof. Shiv Kumar Tiwari
CSE, GGCT

Session: 2023-24
SESSION-2023-24
Vision & Mission of
Institute
SEM-7TH(ODD)

VISION OF THE INSTITUTE


• To be a centre of excellence in transformative education and research, produce
excellent professionals, researchers, entrepreneurs with cutting edge technologies,
values and ethics to meet the needs of the society.

MISSION OF INSTITUTE
• To provide quality professional education using latest ICT tools and incorporating latest
developments
• To create world class infrastructure that meets both curriculum and research &
innovation
• To promote professional ethics, leadership skills and societal responsibilities
• To prepare students for lifelong learning to meet future challenges
• To promote interdisciplinary culture leading to world class translational research
aligned with the sustainable development goals
SESSION-2023-24
Vision & Mission Of
Department
SEM-7TH(ODD)

Mission and Vision of the Department:

VISION OF THE DEPARTMENT


To impart knowledge, relevant practices in computer science and engineering and inculcating
human values to transform the students as excellent professionals and potential resources to
contribute to the society through advanced computing solutions.

MISSION OF THE DEPARTMENT

 To provide strong fundamentals and technical skills in Computer Science and Engineering
through advanced teaching learning methodologies.

 To transform lives of the students by nurturing ethical values, creativity and novelty to become
Entrepreneurs and establish start-ups.

 To focus on sustainable solutions and improve the quality of life for the welfare of society

 To inculcate research attitude among the students and to provide opportunities to carryout
inter-disciplinary research through the linkages with industry and academia.
SESSION-2023-24
Program Educational
Objectives
SEM-7TH(ODD)

Program Educational Objectives


After Successful completion of the program, the graduate will:

•PE01. Occupy position in national and multinational organizations and work as individual

and in team with highest ethics by exhibiting leadership skills.

•PEO2 : Continue to learn and communicate in the core and allied areas of Computer

Science and Engineering and adapt to constantly changing technologies in Computer

Science and Engineering

•PEO 3: Become an entrepreneur in the domain of Computer Science and Engineering and

other allied areas and participate in the growth of the nation.


SESSION-2023-24
Program Outcomes
SEM-7TH(ODD)

Program Outcomes
After Successful completion of the program, the student will be able to:

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals,


and an engineering specialization for the solution of complex engineering problems.
2. Problem analysis: Identify, formulate, research literature, and analyse complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for public
health and safety, and cultural, societal, and environmental considerations.
4. Conduct investigations of complex problems: Use research based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of information to
provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools, including prediction and modeling to complex engineering activities, with an
understanding of the limitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the
professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings
10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with the society at large, such as, being able to comprehend and write effective reports
and design documentation, make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the engineering
and management principles and apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
SESSION-2023-24
Program Specific
Outcomes
SEM-7TH(ODD)

On successful completion of the program, the student will be able to:

PSO1: Understand various concepts of Computing, Statistics, Mathematics and basic sciences
applicable to the domain and solve problems using fundamental knowledge.
PSO2: Apply the computing knowledge for professional software development and develop
strong problem solving, analyzing, design a system, component or process and decision-
making abilities.
PSO3: Use programming languages, tools and techniques to conduct research and demonstrate
appropriate emerging skills to provide solutions to problems in various interdisciplinary fields.
List of Experiments

Date Sign Marks


Name of the Experiment
1 Getting familiar with the usage of Google colab and
using GPU as processing unit.
2 Study and implementation of feed forward NN.
3 Study and implementation of back propagation.
4 Implement Batch gradient descent, Stochastic gradient
descent and mini batch gradient descent.
5 Study the effect of batch normalization and dropout in
neural network classifier PCA.
6 Study of Singular value Decomposition for
dimensionality reduction..
7 Train a sentiment analysis model on IMDB dataset,
use
RNN layers.
8 Perform Object detection USING CNN.
9 Implementing Auto encoder for encoding the real-
world
data.
10 Study and implementation of LSTM
11 Implementation of GAN for generating Handwritten
Digits images.
12 Implementing word2vec for the rela-world data.
1. Getting familiar with the usage of Google Colab and using GPU
as processing unit
Introduction
Colaboratory by Google (Google Colab in short) is a Jupyter notebook based runtime
environment which allows you to run code entirely on the cloud. This is necessary because it
means that you can train large scale ML and DL models even if you don’t have access to a
powerful machine or a high-speed internet access. Google Colab supports both GPU and TPU
instances, which makes it a perfect tool for deep learning and data analytics enthusiasts because
of computational limitations on local machines. Since a Colab notebook can be accessed
remotely from any machine through a browser, it’s well suited for commercial purposes as well.
In this tutorial you will learn:
 Getting around in Google Colab
 Installing python libraries in Colab
 Downloading large datasets in Colab
 Training a Deep learning model in Colab
 Using Tensor Board in Colab
Creating your first .ipynb notebook in colab
Open a browser of your choice and go to colab.research.google.com and sign in using your
Google account. Click on a new notebook to create a new runtime instance. In the top left corner,
you can change the name of the notebook from “Untitled.ipynb“ to the name of your choice by
clicking on it. The cell execution block is where you type your code. To execute the cell, press
shift + enter. The variable declared in one cell can be used in other cells as a global variable.
The environment automatically prints the value of the variable in the last line of the code block if
stated explicitly.
Training a sample tensorflow model
Training a machine learning model in Colab is very easy. The best part about it is not having to
set up a custom runtime environment, it’s all handled for you. For example, let’s look at training
a basic deep learning model to recognize handwritten digits trained on the MNIST dataset. The
data is loaded from the standard Keras dataset archive. The model is very basic, it categorizes
images as numbers and recognizes them.
Setup:
#import necessary libraries
import tensorflow as tf

#load training data and split into train and test sets
mnist = tf.keras.datasets.mnist

(x_train,y_train), (x_test,y_test) = mnist.load_data()


x_train, x_test = x_train / 255.0, x_test / 255.0
The output for this code snippet will look like this:
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
Next, we define the Google Colab model using Python:
#define model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
#define loss function variable
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
#define optimizer,loss function and evaluation metric
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
#train the model
model.fit(x_train,y_train,epochs=5)

The expected output upon execution of the above code snippet is:
Epoch 1/5
1875/1875 [==============================] - 7s 3ms/step - loss: 0.2954 - accuracy: 0.9138
Epoch 2/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.1422 - accuracy: 0.9572
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1070 - accuracy: 0.9664
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0886 - accuracy: 0.9721
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0753 - accuracy: 0.9764
<keras.callbacks.History at 0x7f84a21a2760>
Next, we test the model accuracy on test set:
#test model accuracy on test set
model.evaluate(x_test,y_test,verbose=2)
The expected output upon execution of the above code snippet is:
313/313 - 1s - loss: 0.0809 - accuracy: 0.9744 - 655ms/epoch - 2ms/step
[0.0809035375714302, 0.974399983882904]
Next, we extend the base model to predict softmax output:
#extend the base model to predict softmax output
probability_model = tf.keras.Sequential([model,
tf.keras.layers.Softmax()])
Installing packages in Google Colab
One can use the code cell in Colab not only to run Python code but also to run shell commands.
Just add a ! before a command. The exclamation point tells the notebook cell to run the following
command as a shell command. Most general packages needed for deep learning come pre-
installed. In some cases, you might need less popular libraries, or you might need to run code on
a different version of a library. To do this, you’ll need to install packages manually.
The package manager used for installing packages is pip. To install a particular version of
TensorFlow use this command:
!pip3 install tensorflow

Downloading a dataset
When you’re training a machine learning model on your local machine, you’re likely to have
trouble with the storage and bandwidth costs that come with downloading and storing the dataset
required for training a model. Deep learning datasets can be massive in size, ranging between 20
to 50 Gb. Downloading them is most challenging if you’re living in a developing country, where
getting high-speed internet isn’t possible. The most efficient way to use datasets is to use a cloud
interface to download them, rather than manually uploading the dataset from a local
machine. Thankfully, Colab gives us a variety of ways to download the dataset from common
data hosting platforms.
To download an existing dataset from Kaggle, we can follow the steps outlined below:
1. Go to your Kaggle Account and click on “Create New API Token”. This will download a
kaggle.json file to your machine.
2. Go to your Google Colab project file, and run the following commands:
! pip install -q kaggle
from google.colab import files

# choose the kaggle.json file that you downloaded


files.upload()
! mkdir ~/.kaggle

Downloading the dataset from any generic website


Depending on the browser that you’re using, there are extensions available to convert the dataset
download link into `curl` or `wget` format. You can use this to efficiently download the dataset.
1. For Firefox, there’s the cliget browser extension: cliget – Get this Extension for Firefox
(en-US)
2. For Chrome, there’s the CurlWget extension: Ad Added CurlWget 52
These extensions will generate a curl/wget command as soon as you click on any download
button in your browser.
You can then copy that command and execute it in your Colab notebook to download the dataset.
NOTE: By default, the Colab notebook uses Python shell. To run terminal commands in Colab,
you will have to use “!” at the beginning of the command.
For example, to download a file from some.url and save it as some.file, you can use the
following command in Colab:
!curl http://some.url --output some.file

NOTE: The curl command will download the dataset in the Colab workspace, which will be lost
every time the runtime is disconnected. Hence a safe practice is to move the dataset into your
cloud drive as soon as the dataset is downloaded completely.
Downloading the dataset from GCP or Google Drive
Google Cloud Platform is a cloud computing and storage platform. You can use it to store large
datasets, and you can import that dataset directly from the cloud into Colab. To upload and
download files on GCP, first you need to authenticate your Google account.
from google.colab import auth
auth.authenticate_user()

After that, install gsutil to upload and download files, and then init gcloud.
!curl https://sdk.cloud.google.com | bash
!gcloud init

Once you have configured these options, you can use the following commands to
download/upload files to and from Google Cloud Storage.
Initiating a runtime with GPU/TPU enabled
Deep learning is a computationally expensive process, a lot of calculations need to be executed at
the same time to train a model. To mitigate this issue, Google Colab offers us not only the classic
CPU runtime but also an option for a GPU and TPU runtime as well.
The CPU runtime is best for training large models because of the high memory it provides. The
GPU runtime shows better flexibility and programmability for irregular computations, such as
small batches and nonMatMul computations. The TPU runtime is highly-optimized for large
batches and CNNs and has the highest training throughput. If you have a smaller model to train, I
suggest training the model on GPU/TPU runtime to use Colab to its full potential.
To create a GPU/TPU enabled runtime, you can click on runtime in the toolbar menu below the
file name. From there, click on “Change runtime type”, and then select GPU or TPU under the
Hardware Accelerator dropdown menu.
Exercise: Try yourself uploading a dataset and perform basic data pre- processing
on it.
2. Study and implementation of feed forward Neural
Network
What is a (Neural Network) NN?

 Single neuron == linear regression without applying activation(perceptron)


 Basically a single neuron will calculate weighted sum of input(W.T*X) and then we can
set a threshold to predict output in a perceptron. If weighted sum of input cross the
threshold, perceptron fires and if not then perceptron doesn't predict.
 Perceptron can take real values input or boolean values.
 Actually, when w⋅x+b=0 the perceptron outputs 0.
 Disadvantage of perceptron is that it only output binary values and if we try to give small
change in weight and bais then perceptron can flip the output. We need some system
which can modify the output slightly according to small change in weight and bias. Here
comes sigmoid function in picture.
 If we change perceptron with a sigmoid function, then we can make slight change in
output. e.g. output in perceptron = 0, you slightly changed weight and bias, output
becomes = 1 but actual output is 0.7. In case of sigmoid, output1 = 0, slight change in
weight and bias, output = 0.7.
 If we apply sigmoid activation function then Single neuron will act as Logistic
Regression. we can understand difference between perceptron and sigmoid function by
looking at sigmoid function graph.

Simple NN graph:

A visual explanation of how Feed Forward NNs work


First, let’s familiarize ourselves with the basic structure of a Neural Network.
 Input Layer — contains one or more input nodes. For example, suppose you want to
predict whether it will rain tomorrow and base your decision on two variables, humidity
and wind speed. In that case, your first input would be the value for humidity, and the
second input would be the value for wind speed.
 Hidden Layer — this layer houses hidden nodes, each containing an activation
function (more on these later). Note that a Neural Network with multiple hidden layers is
known as Deep Neural Network.
 Output Layer — contains one or more output nodes. Following the same weather
prediction example above, you could choose to have only one output node generating a
rain probability (where >0.5 means rain tomorrow, and ≤0.5 no rain tomorrow).
Alternatively, you could have two output nodes, one for rain and another for no rain.
Note, you can use a different activation function for output nodes vs. hidden nodes. 
 Connections — lines joining different nodes are known as connections. These
contain kernels (weights) and biases, the parameters that get optimized during the
training of a neural network.

Parameters and activation functions


Let’s take a closer look at kernels (weights) and biases to understand what they do. For
simplicity, we create a basic neural network with one input node, two hidden nodes, and one
output node (1–2–1)
 Kernels (weights) — used to scale input and hidden node values. Each connection
typically holds a different weight.
 Biases — used to adjust scaled values before passing them through an activation
function.
 Activation functions — think of activation functions as standard curves (building
blocks) used by the Neural Network to create a custom curve to fit the training data.
Passing different input values through the network selects different sections of the
standard curve, which are then assembled into a final custom-fit curve.
Loss functions, optimizers, and training
Training Neural Networks involves a complicated process known as backpropagation. I
will not go through a step-by-step explanation of how backpropagation works since it is a big
enough topic deserving a separate article. Instead, let me briefly introduce you to loss functions
and optimizers and summarize what happens when we “train” a Neural Network.
 Loss — represents the “size” of error between the true values/labels and
the predicted values/labels. The goal of training a Neural Network is to minimize this
loss. The smaller the loss, the closer the match between the true and the predicted data.
There are many loss functions to choose from, with Binary Cross entropy, Categorical
Crossentropy, and Mean Squared Error being the most common.
 Optimizers — are the algorithms used in back propagation. The goal of an optimizer
isto find the optimum set of kernels (weights) and biases to minimize the loss. Optimizers
typically use a gradient descent approach, which allows them to iteratively find the “best”
possible configuration of weights and biases. The most commonly used ones are SGD,
ADAM, and RMS Prop.
Training a Neural Network is basically fitting a custom curve through the training data
until it can approximate it as well as possible. The graph below illustrates what a custom-fitted
curve could look like in a specific scenario. This example contains a set of data that seem to flip
between 0 and 1 as the value for input increases.
Setup
We’ll need the following data and libraries:
Australian weather data from Kaggle (license: Creative Commons, original source of the data:
Commonwealth of Australia, Bureau of Meteorology).
Pandas and Numpy for data manipulation
Plotly for data visualizations
Tensorflow/Keras for Neural Networks
Scikit-learn library for splitting the data into train-test samples, and for some basic model
evaluation

Example: Implementation of Feed Forward Neural Network.


Code along with outputs:
#Importing required libraries for the code implementation
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.colors
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error
from tqdm import tqdm_notebook

from sklearn.preprocessing import OneHotEncoder


from sklearn.datasets import make_blobs

Creating sigmoid neuron


class SigmoidNeuron:
#A class for sigmoid neuron

def init (self):


self.w = None
self.b = None
def perceptron(self, x): dw = 0
return np.dot(x, self.w.T) + self.b db = 0
for x, y in zip(X, Y):
def sigmoid(self, x): if loss_fn == "mse":
return 1.0/(1.0 + np.exp(-x)) dw += self.grad_w_mse(x, y)
db += self.grad_b_mse(x, y)
def grad_w_mse(self, x, y): elif loss_fn == "ce":
y_pred = self.sigmoid(self.perceptron(x)) dw += self.grad_w_ce(x, y)
return (y_pred - y) * y_pred * (1 - y_pred) * x db += self.grad_b_ce(x, y)
def grad_b_mse(self, x, y): m = X.shape[1]
y_pred = self.sigmoid(self.perceptron(x)) self.w -= learning_rate * dw/m
return (y_pred - y) * y_pred * (1 - y_pred) self.b -= learning_rate * db/m

def grad_w_ce(self, x, y): if display_loss:


y_pred = self.sigmoid(self.perceptron(x)) Y_pred = self.sigmoid(self.perceptron(X))
if y == 0: if loss_fn == "mse":
return y_pred * x loss[i] = mean_squared_error(Y, Y_pred)
elif y == 1: elif loss_fn == "ce":
return -1 * (1 - y_pred) * x loss[i] = log_loss(Y, Y_pred)
else:
raise ValueError("y should be 0 or 1") if display_loss:
plt.plot(loss.values())
def grad_b_ce(self, x, y): plt.xlabel('Epochs')
y_pred = self.sigmoid(self.perceptron(x)) if loss_fn == "mse":
if y == 0: plt.ylabel('Mean Squared Error')
return y_pred elif loss_fn == "ce":
elif y == 1: plt.ylabel('Log Loss')
return -1 * (1 - y_pred) plt.show()
else:
raise ValueError("y should be 0 or 1") def predict(self, X):
Y_pred = []
def fit(self, X, Y, epochs=1, learning_rate=1, initi for x in X:
alise=True, loss_fn="mse", display_loss=False): y_pred = self.sigmoid(self.perceptron(x))
Y_pred.append(y_pred)
# initialise w, b return np.array(Y_pred)
if initialise: #custom cmap
self.w = np.random.randn(1, X.shape[1])
self.b = 0 my_cmap = matplotlib.colors.LinearSegmentedCol
ormap.from_list("", ["red", "yellow", "green"])
if display_loss:
loss = {} Generate data
data, labels = make_blobs(n_samples=1000, center
for i in tqdm_notebook(range(epochs), total=epo s=4,n_features=2, random_state=0)
chs, unit="epoch"): print(data.shape, labels.shape)
Output:
(1000, 2) (1000,)

plt.scatter(data[:,0], data[:,1], c = labels, cmap = my_cmap)


plt.show()

Output:

labels_orig = labels
#converting the multi-class to binary class
labels = np.mod(labels_orig,2)

plt.scatter(data[:,0], data[:,1], c=labels, cmap=my_cmap)


plt.show()

Output:

X_train, X_val, Y_train, Y_val = train_test_split(data, labels, stratify=labels,random_state=0)


print(X_train.shape, X_val.shape)

Output:
(750, 2) (250, 2)
Implementation of Feed Forward Neural self.db1 = (self.h3-y) * self.h3*(1-
Network code self.h3) * self.w5 * self.h1*(1-self.h1)

class FirstFFNetwork: self.dw3 = (self.h3-y) * self.h3*(1-


self.h3) * self.w6 * self.h2*(1-self.h2) * self.x1
def init (self): self.dw4 = (self.h3-y) * self.h3*(1-
self.w1 = np.random.randn() self.h3) * self.w6 * self.h2*(1-self.h2) * self.x2
self.w2 = np.random.randn() self.db2 = (self.h3-y) * self.h3*(1-
self.w3 = np.random.randn() self.h3) * self.w6 * self.h2*(1-self.h2)
self.w4 = np.random.randn()
self.w5 = np.random.randn()
self.w6 = np.random.randn() def fit(self, X, Y, epochs=1, learning_rate=1, initi
self.b1 = 0 alise=True, display_loss=False):
self.b2 = 0
self.b3 = 0 # initialise w, b
if initialise:
def sigmoid(self, x): self.w1 = np.random.randn()
return 1.0/(1.0 + np.exp(-x)) self.w2 = np.random.randn()
self.w3 = np.random.randn()
def forward_pass(self, x): self.w4 = np.random.randn()
self.x1, self.x2 = x self.w5 = np.random.randn()
self.a1 = self.w1*self.x1 + self.w2*self.x2 + self self.w6 = np.random.randn()
.b1 self.b1 = 0
self.h1 = self.sigmoid(self.a1) self.b2 = 0
self.a2 = self.w3*self.x1 + self.w4*self.x2 + self self.b3 = 0
.b2
self.h2 = self.sigmoid(self.a2) if display_loss:
self.a3 = self.w5*self.h1 + self.w6*self.h2 + self loss = {}
.b3
self.h3 = self.sigmoid(self.a3) for i in tqdm_notebook(range(epochs), total=epo
return self.h3 chs, unit="epoch"):
dw1, dw2, dw3, dw4, dw5, dw6, db1, db2, db3
def grad(self, x, y): = [0]*9
self.forward_pass(x) for x, y in zip(X, Y):
self.grad(x, y)
self.dw5 = (self.h3-y) * self.h3*(1- dw1 += self.dw1
self.h3) * self.h1 dw2 += self.dw2
self.dw6 = (self.h3-y) * self.h3*(1- dw3 += self.dw3
self.h3) * self.h2 dw4 += self.dw4
self.db3 = (self.h3-y) * self.h3*(1-self.h3) dw5 += self.dw5
dw6 += self.dw6
self.dw1 = (self.h3-y) * self.h3*(1- db1 += self.db1
self.h3) * self.w5 * self.h1*(1-self.h1) * self.x1 db2 += self.db2
self.dw2 = (self.h3-y) * self.h3*(1- db3 += self.db3
self.h3) * self.w5 * self.h1*(1-self.h1) * self.x2
m = X.shape[1]
self.w1 -= learning_rate * dw1 / m Y_pred_train = ffn.predict(X_train)
self.w2 -= learning_rate * dw2 / m Y_pred_binarised_train = (Y_pred_train >= 0.5).as
self.w3 -= learning_rate * dw3 / m type("int").ravel()
self.w4 -= learning_rate * dw4 / m Y_pred_val = ffn.predict(X_val)
self.w5 -= learning_rate * dw5 / m Y_pred_binarised_val = (Y_pred_val >= 0.5).asty
self.w6 -= learning_rate * dw6 / m pe("int").ravel()
self.b1 -= learning_rate * db1 / m accuracy_train = accuracy_score(Y_pred_binarise
self.b2 -= learning_rate * db2 / m d_train, Y_train)
self.b3 -= learning_rate * db3 / m accuracy_val = accuracy_score(Y_pred_binarised_
val, Y_val)
if display_loss:
Y_pred = self.predict(X) print("Training accuracy", round(accuracy_train, 2
loss[i] = mean_squared_error(Y_pred, Y) ))
print("Validation accuracy", round(accuracy_val, 2
if display_loss: ))
plt.plot(loss.values())
plt.xlabel('Epochs') Output:
plt.ylabel('Mean Squared Error') Training accuracy 0.73
plt.show() Validation accuracy 0.72

def predict(self, X): #visualize the results


Y_pred = [] plt.scatter(X_train[:,0], X_train[:,1], c=Y_pred_bin
for x in X: arised_train, cmap=my_cmap, s=15*(np.abs(Y_pr
y_pred = self.forward_pass(x) ed_binarised_train-Y_train)+.2))
Y_pred.append(y_pred) plt.show()
return np.array(Y_pred)

Exercise: Try to improve the accuracy in the above example.


3. Study and implementation of back propagation.

Why use the back propagation algorithm?


Earlier we discussed that the network is trained using 2 passes: forward and backward. At the
end of the forward pass, the network error is calculated, and should be as small as possible.
If the current error is high, the network didn’t learn properly from the data. What does this
mean? It means that the current set of weights isn’t accurate enough to reduce the network
error and make accurate predictions. As a result, we should update network weights to reduce
the network error.
The back propagation algorithm is one of the algorithms responsible for updating network
weights with the objective of reducing the network error. It’s quite important.
Here are some of the advantages of the back propagation algorithm:
 It’s memory-efficient in calculating the derivatives, as it uses less memory compared
to other optimization algorithms, like the genetic algorithm. This is a very important
feature, especially with large networks.
 The back propagation algorithm is fast, especially for small and medium-sized
networks. As more layers and neurons are added, it starts to get slower as more
derivatives are calculated.
 This algorithm is generic enough to work with different network architectures, like
convolutional neural networks, generative adversarial networks, fully-connected
networks, and more.
 There are no parameters to tune the back propagation algorithm, so there’s less
overhead. The only parameters in the process are related to the gradient descent
algorithm, like learning rate.
How back propagation algorithm works
How the algorithm works is best explained based on a simple network, like the one given in
the next figure. It only has an input layer with 2 inputs (X1 and X2), and an output layer with
1 output. There are no hidden layers.
The weights of the inputs are W1 and W2, respectively. The bias is treated as a new input
neuron to the output neuron which has a fixed value +1 and a weight b. Both the weights and
biases could be referred to as parameters.
Code along with outputs:
import numpy
import matplotlib.pyplot

def sigmoid(sop):
return 1.0/(1+numpy.exp(-1*sop))
def error(predicted, target):
return numpy.power(predicted-target, 2)
def error_predicted_deriv(predicted, target):
return 2*(predicted-target)
def sigmoid_sop_deriv(sop):
return sigmoid(sop)*(1.0-sigmoid(sop))
def sop_w_deriv(x):
return x
def update_w(w, grad, learning_rate):
return w - learning_rate*grad
x1=0.1
x2=0.4
target = 0.7
learning_rate = 0.01

w1=numpy.random.rand()
w2=numpy.random.rand()

print("Initial W : ", w1, w2)

predicted_output = []
network_error = []

old_err = 0
for k in range(80000):
# Forward Pass
y = w1*x1 + w2*x2
predicted = sigmoid(y)
err = error(predicted, target)

predicted_output.append(predicted)
network_error.append(err)

# Backward Pass
g1 = error_predicted_deriv(predicted, target)

g2 = sigmoid_sop_deriv(y)
g3w1 = sop_w_deriv(x1)
g3w2 = sop_w_deriv(x2)

gradw1 = g3w1*g2*g1
gradw2 = g3w2*g2*g1

w1 = update_w(w1, gradw1, learning_rate)


w2 = update_w(w2, gradw2, learning_rate)
print(predicted)
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(network_error)
matplotlib.pyplot.title("Iteration Number vs Error")
matplotlib.pyplot.xlabel("Iteration Number")
matplotlib.pyplot.ylabel("Error")
matplotlib.pyplot.figure()
matplotlib.pyplot.plot(predicted_output)
matplotlib.pyplot.title("Iteration Number vs Prediction")
matplotlib.pyplot.xlabel("Iteration Number")
matplotlib.pyplot.ylabel("Prediction")

Output:
Streaming output truncated to the last 5000 lines.
0.6999984864252651
0.6999984866522116
0.6999984868791244
0.699998487106003
0.6999984873328476
.
.
.
.
0.6999992843293953
0.6999992844367032
0.6999992845439951
0.6999992846512709
0.6999992847585306

Example: Execute the error plots for 80,000 epochs and show the saturated error outputvalue.
Also write a short note about your analysis with the output value.
4. Implement Batch gradient descent, Stochastic gradient
descent and mini batch gradient descent.
Objective:

To implement Batch gradient descent, Stochastic gradient descent and mini batch gradient
descent.

Introduction to the Topic:


We will use very simple home prices data set to implement batch and stochastic gradient descent in
python.

Batch gradient descent uses all training samples in forward pass to calculate cumulative error and then we
adjust weights using derivatives. In stochastic GD, we randomly pick one training sample, perform
forward pass, compute the error and immediately adjust weights.

So the key difference here is that to adjust weights, batch GD will use all training samples where as
stochastic GD will use one randomly picked training sample.

Mini batch is intermediate version of batch GD and stochastic GD. In mini gradient descent you will use a
batch of samples in each iteration. For example if you have total 50 training samples, you can take a batch
of 10 samples, calculate cumulative error for those 10 samples and then adjust weights.

To summarize it, In SGD we adjust weights after every one sample. In Batch we adjust weights after
going through all samples but in mini batch we do after every m samples (where m is batch size and it is 0
< m < n, where n is total number of samples).

Gradient descent allows you to find weights (w1,w2,w3) and bias in the following linear equation for
housing price prediction.

Example and solution:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
df = pd.read_csv("homeprices.csv")
df.sample(5)

output:
index,area,bedrooms,price
5,1170,2,38.0
14,2250,3,101.0
8,1310,3,50.0
10,1800,3,82.0
0,1056,2,39.07

from sklearn import preprocessing


sx = preprocessing.MinMaxScaler()
sy = preprocessing.MinMaxScaler()

scaled_X = sx.fit_transform(df.drop('price',axis='columns'))
scaled_y = sy.fit_transform(df['price'].values.reshape(df.shape[0],1))

scaled_X
scaled_y
scaled_y.reshape(20,)

(1) Batch Gradient Descent Implementation


def batch_gradient_descent(X, y_true, epochs, learning_rate = 0.01):

number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to number of features. In
# our case number_of_features = 2 (area, bedroom)
w = np.ones(shape=(number_of_features))
b = 0
total_samples = X.shape[0] # number of rows in X

cost_list = []
epoch_list = []

for i in range(epochs):
y_predicted = np.dot(w, X.T) + b

w_grad = -(2/total_samples)*(X.T.dot(y_true-y_predicted))
b_grad = -(2/total_samples)*np.sum(y_true-y_predicted)

w = w - learning_rate * w_grad
b = b - learning_rate * b_grad
cost = np.mean(np.square(y_true-
y_predicted)) # MSE (Mean Squared Error)

if i%10==0:
cost_list.append(cost)
epoch_list.append(i)

return w, b, cost, cost_list, epoch_list

w, b, cost, cost_list, epoch_list = batch_gradient_descent(scaled_X,scaled


_y.reshape(scaled_y.shape[0],),500)
w, b, cost

Output:
(array([0.70712464, 0.67456527]), -0.23034857438407427, 0.0068641890429808105)

plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list,cost_list)

Output:

def predict(area,bedrooms,w,b):
scaled_X = sx.transform([[area, bedrooms]])[0]
# here w1 = w[0] , w2 = w[1], w3 = w[2] and bias is b
# equation for price is w1*area + w2*bedrooms + w3*age + bias
# scaled_X[0] is area
# scaled_X[1] is bedrooms
# scaled_X[2] is age
scaled_price = w[0] * scaled_X[0] + w[1] * scaled_X[1] + b
# once we get price prediction we need to to rescal it back to origina
l value
# also since it returns 2D array, to get single value we need to do va
lue[0][0]
return sy.inverse_transform([[scaled_price]])[0][0]

predict(2600,4,w,b)
predict(1000,2,w,b)
predict(1500,3,w,b)

(2) Stochastic Gradient Descent Implementation

# we will use random libary to pick random training sample.


import random
random.randint(0,6)# randit gives random number between two numbers specified in the argument

def stochastic_gradient_descent(X, y_true, epochs, learning_rate = 0.01):

number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to number of features. In
# our case number_of_features = 3 (area, bedroom and age)
w = np.ones(shape=(number_of_features))
b = 0
total_samples = X.shape[0]

cost_list = []
epoch_list = []

for i in range(epochs):
random_index = random.randint(0,total_samples-
1) # random index from total samples
sample_x = X[random_index]
sample_y = y_true[random_index]

y_predicted = np.dot(w, sample_x.T) + b

w_grad = -(2/total_samples)*(sample_x.T.dot(sample_y-y_predicted))
b_grad = -(2/total_samples)*(sample_y-y_predicted)

w = w - learning_rate * w_grad
b = b - learning_rate * b_grad

cost = np.square(sample_y-y_predicted)

if i%100==0: # at every 100th iteration record the cost and epoch


value
cost_list.append(cost)
epoch_list.append(i)

return w, b, cost, cost_list, epoch_list

w_sgd, b_sgd, cost_sgd, cost_list_sgd, epoch_list_sgd = stochastic_gradien


t_descent(scaled_X,scaled_y.reshape(scaled_y.shape[0],),10000)
w_sgd, b_sgd, cost_sgd

Output:
(array([0.70999659, 0.67807531]), -0.23262199997984362,
0.011241941221627246)

w , b
plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list_sgd,cost_list_sgd)

Output:

predict(2600,4,w_sgd, b_sgd)
predict(1000,2,w_sgd, b_sgd)
np.random.permutation(20)

Output:
array([16, 10, 7, 8, 3, 17, 15, 13, 2, 12, 4, 1, 18, 9, 14, 6, 5, 19, 0,
11])

(3) Mini Batch Gradient Descent Implementation


def mini_batch_gradient_descent(X, y_true, epochs = 100, batch_size = 5, l
earning_rate = 0.01):

number_of_features = X.shape[1]
# numpy array with 1 row and columns equal to number of features. In
# our case number_of_features = 3 (area, bedroom and age)
w = np.ones(shape=(number_of_features))
b = 0
total_samples = X.shape[0] # number of rows in X

if batch_size > total_samples: # In this case mini batch becomes same


as batch gradient descent
batch_size = total_samples

cost_list = []
epoch_list = []

num_batches = int(total_samples/batch_size)

for i in range(epochs):
random_indices = np.random.permutation(total_samples)
X_tmp = X[random_indices]
y_tmp = y_true[random_indices]

for j in range(0,total_samples,batch_size):
Xj = X_tmp[j:j+batch_size]
yj = y_tmp[j:j+batch_size]
y_predicted = np.dot(w, Xj.T) + b

w_grad = -(2/len(Xj))*(Xj.T.dot(yj-y_predicted))
b_grad = -(2/len(Xj))*np.sum(yj-y_predicted)

w = w - learning_rate * w_grad
b = b - learning_rate * b_grad

cost = np.mean(np.square(yj-
y_predicted)) # MSE (Mean Squared Error)

if i%10==0:
cost_list.append(cost)
epoch_list.append(i)

return w, b, cost, cost_list, epoch_list

w, b, cost, cost_list, epoch_list = mini_batch_gradient_descent(


scaled_X,
scaled_y.reshape(scaled_y.shape[0],),
epochs = 120,
batch_size = 5
)
w, b, cost
Output:
(array([0.71015977, 0.67813327]), -0.23343143725261098,
0.0025898311387083403)

plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list,cost_list)

Output:

def predict(area,bedrooms,w,b):
scaled_X = sx.transform([[area, bedrooms]])[0]
# here w1 = w[0] , w2 = w[1], w3 = w[2] and bias is b
# equation for price is w1*area + w2*bedrooms + w3*age + bias
# scaled_X[0] is area
# scaled_X[1] is bedrooms
# scaled_X[2] is age
scaled_price = w[0] * scaled_X[0] + w[1] * scaled_X[1] + b
# once we get price prediction we need to to rescal it back to origina
l value
# also since it returns 2D array, to get single value we need to do va
lue[0][0]
return sy.inverse_transform([[scaled_price]])[0][0]
predict(2600,4,w,b)
Output:
128.65424087579652
predict(1000,2,w,b)
Output:
29.9855861683337

Exercise:
Implement Gradient Descent for Neural Network (or Logistic Regression) by Predicting if a
person would buy life insurance based on his age.
5. Study the effect of batch normalization and dropout in
neural network classifier PCA.

Introduction
In this lab exercise, we will discuss why we need batch normalization and dropout in deep neural
networks followed by experiments using Pytorch on a standard data set to see the effects of batch
normalization and dropout.
Batch Normalization
By normalizing the inputs we are able to bring all the inputs features to the same scale. In the
neural network, we need to compute the pre-activation for the first neuron of the first layer a₁₁.
We know that pre-activation is nothing but the weighted sum of inputs plus bias. In other words,
it is the dot product between the first row of the weight matrix W₁ and the input matrix X plus
bias b₁₁.

The mathematical equation for pre-activation at each layer ‘i’ is given by,

The activation at each layer is equal to applying the activation function to the output of the pre-
activation of that layer. The mathematical equation for the activation at each layer ‘i’ is given by,
In order to bring all the activation values to the same scale, we normalize the activation values
such that the hidden representation doesn’t vary drastically and also helps us to get improvement
in the training speed.
Why is it called batch normalization?
Since we are computing the mean and standard deviation from a single batch as opposed to
computing it from the entire data. Batch normalization is done individually at each hidden
neuron in the network.

To get a better insight into how batch normalization helps in faster converge of the network, we
will look at the distribution of values across multiple hidden layers in the network during the
training phase.
For consistency, we will plot the output of the second linear layer from the two networks and
compare the distributions of the output from that layer across the networks. The results look like
this:

From the graphs, we can conclude that the distribution of values without batch normalization has
changed significantly between iterations of inputs within each epoch which means that the
subsequent layers in the network without batch normalization are seeing a varying distribution of
input data. But the change in the distribution of values for the model with batch normalization
seems to be slightly negligible.
Dropout
In this section of the lab, we discuss the concept of dropout in neural networks specifically how
it helps to reduce overfitting and generalization error. After that, we will implement a neural
network with and without dropout to see how dropout influences the performance of a network
using Pytorch.
Dropout is a regularization technique that “drops out” or “deactivates” few neurons in the neural
network randomly in order to avoid the problem of overfitting.
The idea of Dropout
Training one deep neural network with large parameters on the data might lead to overfitting.
Can we train multiple neural networks with different configurations on the same dataset and take
the average value of these predictions?.

But creating an ensemble of neural networks with different architectures and training them
wouldn’t be feasible in practice. Dropout to the rescue.
Dropout deactivates the neurons randomly at each training step instead of training the data on the
original network, we train the data on the network with dropped out nodes. In the next iteration
of the training step, the hidden neurons which are deactivated by dropout changes because of its
probabilistic behavior. In this way, by applying dropout i.e…deactivating certain individual
nodes at random during training we can simulate an ensemble of neural network with different
architectures.
Dropout at Training time

In each training iteration, each node in the network is associated with a probability p whether to
keep in the network or to deactivate it (dropout) out of the network with probability 1-p. That
means the weights associated with the nodes got updated only p fraction of times because nodes
are active only p times during training.
Dropout at Test time

During test time, we consider the original neural network with all activations present and scale
the output of each node by a value p. Since each node is activated the only p times.
To show the overfitting, we will train two networks — one without dropout and another with
dropout. The network without dropout has 3 fully connected hidden layers with ReLU as the
activation function for the hidden layers and the network with dropout also has similar
architecture but with dropout applied after first & second Linear layer.
In this example, I have used a dropout fraction of 0.5 after the first linear layer and 0.2 after the
second linear layer. Once we train the two different models i.e…one without dropout and another
with dropout and plot the test results, it would look like this:
From the above graphs, we can conclude that as we increase the number of epochs the model
without dropout is overfitting the data. The model without dropout is learning the noise
associated with the data instead of generalizing for the data. We can see that the loss associated
with the model without drop increases as we increase the number of epochs unlike the loss
associated with the model with dropout.

Example:
Outline

1. Load dataset and visualise


2. Add batch normalization layers
3. Comparison with batch normalization layers
4. Add dropout layer
5. Comparison with dropout layer

import torch

import matplotlib.pyplot as plt


import numpy as np

import torchvision
import torchvision.transforms as transforms

import torch.nn as nn
import torch.optim as optim
import seaborn as sns
Dataset and visualisation
trainset = torchvision.datasets.MNIST(root='./data', train=True,
download=True,
transform=transforms.ToTensor())
def imshow(img, title):

plt.figure(figsize=(batch_size * 4, 4))
plt.axis('off')
plt.imshow(np.transpose(img, (1, 2, 0)))
plt.title(title)
plt.show()
def show_batch_images(dataloader):
images, labels = next(iter(dataloader))

img = torchvision.utils.make_grid(images)
imshow(img, title=[str(x.item()) for x in labels])

return images, labels


images, labels = show_batch_images(trainloader)

Batch Normalisation
class MyNetBN(nn.Module): nn.Linear(48, 24),
def __init (self): nn.BatchNorm1d(24),
super(MyNetBN, self). ini nn.ReLU(),
t () nn.Linear(24, 10)
self.classifier = nn.Seque )
ntial(
nn.Linear(784, 48), def forward(self, x):
nn.BatchNorm1d(48), x = x.view(x.size(0), -1)
nn.ReLU(), x = self.classifier(x)
return x b = model_bn.classifier[1](b)
model_bn = MyNetBN() b = b.detach().numpy().ravel()
print(model_bn)
batch_size = 512 sns.distplot(b, kde=True, color='g
trainloader = torch.utils.data.Dat ', label='BatchNorm')
aLoader(trainset, batch_size=batch plt.title('%d: Loss = %0.2f, Loss
_size, shuffle=True) with bn = %0.2f' % (i, loss.item()
loss_fn = nn.CrossEntropyLoss() , loss_bn.item()))
opt_bn = optim.SGD(model_bn.parame plt.legend()
ters(), lr=0.01) plt.show()
plt.pause(0.5)

oss_arr = []
loss_bn_arr = []
model_bn.train()

max_epochs = 2
print(' ')
for epoch in range(max_epochs):
for i, data in enumerate(trainload
er, 0):
plt.plot(loss_bn_arr, 'g', label='
BatchNorm')
inputs, labels = data
plt.legend()
plt.show()
# training steps for bn model
opt_bn.zero_grad()
outputs_bn = model_bn(inputs)
loss_bn = loss_fn(outputs_bn, labe
ls)
loss_bn.backward()
opt_bn.step()

loss_bn_arr.append(loss_bn.item())

if i % 10 == 0:

inputs = inputs.view(inputs.size(0
), -1)

model_bn.eval()

b = model_bn.classifier[0](inputs)
Outputs:
Exercise
Write down the dropout for the above program
6. Study of Singular value Decomposition for
dimensionality reduction.
Introduction:
Reducing the number of input variables for a predictive model is referred to as dimensionality
reduction. Fewer input variables can result in a simpler predictive model that may have
better performance when making predictions on new data.
Perhaps the more popular technique for dimensionality reduction in machine learning is
Singular Value Decomposition, or SVD for short. This is a technique that comes from the
field of linear algebra and can be used as a data preparation technique to create a projection
of a sparse dataset prior to fitting a model.

Dimensionality Reduction and SVD


Dimensionality reduction refers to reducing the number of input variables for a dataset. If
your data is represented using rows and columns, such as in a spreadsheet, then the input
variables are the columns that are fed as input to a model to predict the target variable. Input
variables are also called features.
We can consider the columns of data representing dimensions on an n-dimensional feature
space and the rows of data as points in that space. This is a useful geometric interpretation of
a dataset. Having a large number of dimensions in the feature space can mean that the
volume of that space is very large, and in turn, the points that we have in that space (rows of
data) often represent a small and non-representative sample.
This can dramatically impact the performance of machine learning algorithms fit on data with
many input features, generally referred to as the “curse of dimensionality.” Therefore, it is
often desirable to reduce the number of input features. This reduces the number of
dimensions of the feature space, hence the name “dimensionality reduction.”
A popular approach to dimensionality reduction is to use techniques from the field of linear
algebra. This is often called “feature projection” and the algorithms used are referred to as
“projection methods.” Projection methods seek to reduce the number of dimensions in the
feature space whilst also preserving the most important structure or relationships between the
variables observed in the data.

Singular Value Decomposition, or SVD, might be the most popular technique for
dimensionality reduction when data is sparse. Sparse data refers to rows of data where many
of the values are zero. This is often the case in some problem domains like recommender
systems where a user has a rating for very few movies or songs in the database and zero
ratings for all other cases. Another common example is a bag of words model of a text
document, where the document has a count or frequency for some words and most words
have a 0 value.
Examples of sparse data appropriate for applying SVD for dimensionality reduction:

 Recommender Systems
 Customer-Product purchases
 User-Song Listen Counts
 User-Movie Ratings
 Text Classification
 One Hot Encoding
 Bag of Words Counts
 TF/IDF

SVD can be thought of as a projection method where data with m-columns (features) is
projected into a subspace with m or fewer columns, whilst retaining the essence of the
original data.

The SVD is used widely both in the calculation of other matrix operations, such as matrix
inverse, but also as a data reduction method in machine learning.

Example and Code:


# compare svd number of components with logistic regression algorithm for classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.decomposition import TruncatedSVD
from sklearn.linear_model import LogisticRegression
from matplotlib import pyplot

# get the dataset


def get_dataset():
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
n_redundant=5, random_state=7)
return X, y

# get a list of models to evaluate


def get_models():
models = dict()
for i in range(1,20):
steps = [('svd', TruncatedSVD(n_components=i)), ('m', LogisticRegression())]
models[str(i)] = Pipeline(steps=steps)
return models

# evaluate a give model using cross-validation


def evaluate_model(model, X, y):
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1,
error_score='raise')
return scores

# define dataset
X, y = get_dataset()
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names = list(), list()
for name, model in models.items():
scores = evaluate_model(model, X, y)
results.append(scores)
names.append(name)
print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))
# plot model performance for comparison
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.xticks(rotation=45)
pyplot.show()

Output:

>1 0.542 (0.046)


>2 0.626 (0.050)
>3 0.719 (0.053)
>4 0.722 (0.052)
>5 0.721 (0.054)
>6 0.729 (0.045)
>7 0.802 (0.034)
>8 0.800 (0.040)
>9 0.814 (0.037)
>10 0.814 (0.034)
>11 0.817 (0.037)
>12 0.820 (0.038)
>13 0.820 (0.036)
>14 0.825 (0.036)
>15 0.865 (0.027)
>16 0.865 (0.027)
>17 0.865 (0.027)
>18 0.865 (0.027)
>19 0.865 (0.027)

Exercise: Draw the whisker plot for the above solution for creating the distribution of
accuracy scores for each configured number of dimensions and also find out the predicted
class by using combination of SVD transform and logistic regression model.
7.Train a sentiment analysis model on IMDB dataset, use
RNN layers.
Sentiment analysis probably is one the most common applications in Natural Language
processing. I don’t have to emphasize how important customer service tool sentiment
analysis has become. So here we are, we will train a classifier movie reviews in IMDB data
set, using Recurrent Neural Networks. Recurrent Neural Networks (RNNs) are a type of
neural network that are well suited for natural language processing tasks, such as sentiment
analysis. To perform sentiment analysis on IMDB movie reviews using an RNN, you would
need to first pre-process the text data by tokenizing the reviews and creating numerical
representations of the words, such as word embeddings. Then you would train the RNN on
the pre-processed data, using the review text as input and the corresponding sentiment
(positive or negative) as the output. Once the model is trained, you can use it to predict the
sentiment of new reviews by inputting the text into the trained model and interpreting the
output.
A recurrent neural network (RNN) is a class of artificial neural networks where connections
between nodes form a directed graph along a temporal sequence. This allows it to exhibit
temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their
internal state (memory) to process variable length sequences of inputs. This makes them
applicable to tasks such as unsegmented, connected handwriting recognition or speech
recognition.
The term “recurrent neural network” is used indiscriminately to refer to two broad classes of
networks with a similar general structure, where one is finite impulse and the other is infinite
impulse. Both classes of networks exhibit temporal dynamic behavior. A finite impulse
recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly
feedforward neural network, while an infinite impulse recurrent network is a directed cyclic
graph that cannot be unrolled.
Both finite impulse and infinite impulse recurrent networks can have additional stored states,
and the storage can be under direct control by the neural network. The storage can also be
replaced by another network or graph, if that incorporates time delays or has feedback loops.
Such controlled states are referred to as gated state or gated memory, and are part of long
short-term memory networks (LSTMs) and gated recurrent units. This is also called Feedback
Neural Network (FNN).

Sentiment analysis on the IMDB dataset using a Recurrent


Neural Network (RNN) in Python:
Sentiment analysis is the process of determining the sentiment or emotion expressed in a
piece of text, such as a review or a tweet. One way to perform sentiment analysis is by using
a Recurrent Neural Network (RNN) in Python. Here is an example of how to do sentiment
analysis on the IMDB dataset using an RNN:
import numpy as np
from keras.datasets import imdb
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense
# Load the IMDB dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=5000)

# Pad the sequences to a fixed length


max_length = 500
x_train = sequence.pad_sequences(x_train, maxlen=max_length)
x_test = sequence.pad_sequences(x_test, maxlen=max_length)

# Define the model architecture


model = Sequential()
model.add(Embedding(5000, 128, input_length=max_length))

model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))


model.add(Dense(1, activation='sigmoid'))

# Compile the model


model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model


model.fit(x_train, y_train, batch_size=32, epochs=15, validation_data=(x_test, y_test))

In this example, we first load the IMDB dataset using the imdb.load_data function. The
dataset consists of 25,000 labeled movie reviews for training and 25,000 for testing. The
reviews are already preprocessed and encoded as sequences of integers, where each integer
represents a specific word in a vocabulary of 5,000 words.
Next, we use the sequence.pad_sequences function to pad the sequences to a fixed length of
500, to make sure that all the input sequences have the same length.
Then, we define the architecture of the RNN model using the Keras library. The model
includes an Embedding layer, which maps the input sequences to a high-dimensional space,
an LSTM layer, which processes the input sequences, and a Dense layer, which is used for
classification.
After that, we compile the model by specifying the loss function, the optimizer, and the
evaluation metric.
Finally, we train the model using the fit function. The training process consists of multiple
iterations over the training data, called epochs, and at the end of each epoch, the model's
performance is evaluated using the validation data (x_test, y_test).
Once the model is trained, it can be used to classify new reviews as positive or negative by
calling the predict method and passing in the review as a padded sequence of integers.
8.Implementing Autoencoder for encoding the real-
world data.
Introduction
Building Auto encoders in Keras

 a simple auto encoder based on a fully-connected layer


 a sparse auto encoder
 a deep fully-connected auto encoder
 a deep convolutional auto encoder
 an image denoising model
 a sequence-to-sequence auto encoder
 a variational auto encoder
What are autoencoders?

"Auto encoding" is a data compression algorithm where the compression and decompression
functions are 1) data-specific, 2) lossy, and 3) learned automatically from examples rather than
engineered by a human. Additionally, in almost all contexts where the term "auto encoder" is
used, the compression and decompression functions are implemented with neural networks.

1) Auto encoders are data-specific, which means that they will only be able to compress data
similar to what they have been trained on. This is different from, say, the MPEG-2 Audio Layer
III (MP3) compression algorithm, which only holds assumptions about "sound" in general, but
not about specific types of sounds. An auto encoder trained on pictures of faces would do a
rather poor job of compressing pictures of trees, because the features it would learn would be
face-specific.

2) Auto encoders are lossy, which means that the decompressed outputs will be degraded
compared to the original inputs (similar to MP3 or JPEG compression). This differs from lossless
arithmetic compression.
3) Auto encoders are learned automatically from data examples, which is a useful property: it
means that it is easy to train specialized instances of the algorithm that will perform well on a
specific type of input. It doesn't require any new engineering, just appropriate training data.

To build an auto encoder, you need three things: an encoding function, a decoding function, and
a distance function between the amount of information loss between the compressed
representation of your data and the decompressed representation (i.e. a "loss" function). The
encoder and decoder will be chosen to be parametric functions (typically neural networks), and
to be differentiable with respect to the distance function, so the parameters of the
encoding/decoding functions can be optimizing to minimize the reconstruction loss, using
Stochastic Gradient Descent. It's simple! And you don't even need to understand any of these
words to start using auto encoders in practice.

Code:
import keras
from keras import layers
# This is the size of our encoded representations
encoding_dim = 32
# 32 floats -
> compression of factor 24.5, assuming the input is 784 floats
# This is our input image
input_img = keras.Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = layers.Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = layers.Dense(784, activation='sigmoid')(encoded)
# This model maps an input to its reconstruction
autoencoder = keras.Model(input_img, decoded)
# Let's also create a separate encoder model:
# This model maps an input to its encoded representation
encoder = keras.Model(input_img, encoded)
# This is our encoded (32-dimensional) input
encoded_input = keras.Input(shape=(encoding_dim,))
# Retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# Create the decoder model
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)
autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
# Encode and decode some digits
# Note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

# Use Matplotlib (don't ask)


import matplotlib.pyplot as plt

n = 10 # How many digits we will display


plt.figure(figsize=(20, 4))
for i in range(n):
# Display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# Display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()

Output:
Example: As we cannot limit ourselves to a single layer as encoder or decoder.
Implement same method by using stack of layers with Convolutional auto encoder
9.Study and implementation of LSTM

Introduction
Long Short Term Memory (LSTM) : Long short-term memory (LSTM) units (or blocks) are a
building unit for layers of a recurrent neural network (RNN). A RNN composed of LSTM units
is often called an LSTM network. A common LSTM unit is composed of a cell, an input gate, an
output gate and a forget gate. The cell is responsible for "remembering" values over arbitrary
time intervals; hence the word "memory" in LSTM. Each of the three gates can be thought of as
a "conventional" artificial neuron, as in a multi-layer (or feed forward) neural network: that is,
they compute an activation (using an activation function) of a weighted sum. Intuitively, they can
be thought as regulators of the flow of values that goes through the connections of the LSTM;
hence the denotation "gate". There are connections between these gates and the cell.

The expression long short-term refers to the fact that LSTM is a model for the short-term
memory which can last for a long period of time. An LSTM is well-suited to classify, process
and predict time series given time lags of unknown size and duration between important events.
LSTMs were developed to deal with the exploding and vanishing gradient problem when
training traditional RNNs.
Example

Recurrent neural networks have a wide array of applications. These include time series analysis,

document classification, and speech and voice recognition. In contrast to feed forward artificial

neural networks, the predictions made by recurrent neural networks are dependent on previous
predictions.
Draw a straight line. Let us see, if LSTM can learn the relationship of a straight line and predict
it.
First let us create the dataset depicting a straight line.
x = numpy.arange (1,500,1)
y = 0.4 * x + 30
plt.plot(x,y)

trainx, testx = x[0:int(0.8*(len(x)))], x[int(0.8*(len(x))):]


trainy, testy = y[0:int(0.8*(len(y)))], y[int(0.8*(len(y))):]
train = numpy.array(list(zip(trainx,trainy)))
test = numpy.array(list(zip(trainx,trainy)))

Now that the data has been created and split into train and test. Let’s convert the time series data
into the form of supervised learning data according to the value of look-back period, which is
essentially the number of lags which are seen to predict the value at time ‘t’.
So a time series like this −
time variable_x
t1 x1
t2 x2
: :
: :
T xT
When look-back period is 1, is converted to −
x1 x2
x2 x3
: :
: :
xT-1 xT
In [404]:
def create_dataset(n_X, look_back):
dataX, dataY = [], []
for i in range(len(n_X)-look_back):
a = n_X[i:(i+look_back), ]
dataX.append(a)
dataY.append(n_X[i + look_back, ])
return numpy.array(dataX), numpy.array(dataY)

look_back = 1
trainx,trainy = create_dataset(train, look_back)
testx,testy = create_dataset(test, look_back)

trainx = numpy.reshape(trainx, (trainx.shape[0], 1, 2))


testx = numpy.reshape(testx, (testx.shape[0], 1, 2))
Now we will train our model.
Small batches of training data are shown to network, one run of when entire training data is
shown to the model in batches and error is calculated is called an epoch. The epochs are to be run
‘til the time the error is reducing.
In [ ]:
from keras.models import Sequential
from keras.layers import LSTM, Dense

model = Sequential()
model.add(LSTM(256, return_sequences = True, input_shape = (trainx.shape[1], 2)))
model.add(LSTM(128,input_shape = (trainx.shape[1], 2)))
model.add(Dense(2))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
model.fit(trainx, trainy, epochs = 2000, batch_size = 10, verbose = 2, shuffle = False)
model.save_weights('LSTMBasic1.h5')
model.load_weights('LSTMBasic1.h5')
predict = model.predict(testx)
Now let’s see what our predictions look like.

plt.plot(testx.reshape(398,2)[:,0:1], testx.reshape(398,2)[:,1:2])
plt.plot(predict[:,0:1], predict[:,1:2])

Exercise
The International Airline Passengers prediction problem. This is a problem where, given a year
and a month, the task is to predict the number of international airline passengers in units of
1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144
observations.
10.Implementing word2vec for the real-world data.

The basic process for training a word2vec model is as follows:


1. Collect a large dataset of text data. This data is typically in the form of a list of sentences or a
collection of documents.
2. Tokenize the text data into individual words. This is often done using a tokenizer, such as the
NLTK library in Python.
3. Build a vocabulary of all the words in the dataset. This vocabulary will be used to create the word
vectors.
4. Train the model. The training process involves iterating over the dataset, and for each word,
predicting the words that are likely to appear in its context. This process is done using a neural
network with a single hidden layer.
5. Save the model. After the model is trained, it can be saved for later use.
6. Using the model: Once the word2vec model is trained, it can be used to perform several
operations such as finding the similarity between words, finding the odd one out, finding the
analogy and more.
The NLTK (Natural Language Toolkit) library in Python provides a convenient way to train a word2vec
model using its built-in tokenizer and utilities. Here is an example of how to train a word2vec model
using NLTK:
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from gensim.models import Word2Vec

# Collect the dataset (text data)


text = "This is some sample text data. It can be any kind of text data, such as a book, a news article, or a
webpage."
# Tokenize the text into sentences
sentences = sent_tokenize(text)
# Tokenize the sentences into words
sentences = [word_tokenize(sent) for sent in sentences]

# Train the word2vec model


model = Word2Vec(sentences, min_count=1)

# Access vector for one word


vector = model['text']

# Perform cosine similarity between two words


similarity = model.similarity('text', 'data')
# Perform a king of operation
word_vec = model.most_similar(positive=['text'], negative=['data'], topn=1)
# Save the model
model.save("word2vec.model")
In this example, the text data is tokenized into sentences and words using the sent_tokenize and
word_tokenize functions from NLTK. Then, the Word2Vec class from the gensim library is used to train
the model on the tokenized data. The min_count parameter is used to specify the minimum number of
occurrences of a word in the dataset for it to be included in the vocabulary.
Additionally, as you see in the last 3 lines, we are using the model to access vector for one word, perform
cosine similarity between two words and find the most similar word based on the provided positive and
negative words
It is important to note that the word2vec algorithm works better with large datasets, so for better
performance, it is recommended to use a larger dataset of text data when training the model.
Exercise
Ex 1: use pre-trained word2vec models from Google or Wikipedia
Ex 2: Write Python code that uses the gensim library to train a word2vec model on a pre-defined
dataset:

You might also like