0% found this document useful (0 votes)
10 views14 pages

Lab Report 03

The document details the design and implementation of a multi-layer neural network algorithm using the MNIST dataset for handwritten digit recognition and an XOR dataset for classification. It covers the initialization of weights, forward and backward passes, and training of the network, including accuracy evaluations for different learning rates. Limitations of the multi-layer perceptron learning algorithm are also discussed, highlighting issues like local minima, underfitting, and overfitting.

Uploaded by

Sadbin Mohshin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views14 pages

Lab Report 03

The document details the design and implementation of a multi-layer neural network algorithm using the MNIST dataset for handwritten digit recognition and an XOR dataset for classification. It covers the initialization of weights, forward and backward passes, and training of the network, including accuracy evaluations for different learning rates. Limitations of the multi-layer perceptron learning algorithm are also discussed, highlighting issues like local minima, underfitting, and overfitting.

Uploaded by

Sadbin Mohshin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Experiment No.

03

Name of the Experiment: Design and implementation of Multi-layer Neural Networks algorithm
(i.e., Back-propagation learning neural networks algorithm)

Dataset: MINST dataset

The MNIST database (Modified National Institute of Standards and Technology database) is a
large dataset of handwritten digits. It was produced from NIST's original datasets. Half of the
training set and half of the test set were taken from NIST's training dataset, while the other half of
the training set and the other half of the test set were taken from NIST's testing dataset.

Characteristics of Dataset:
• Large dataset of handwritten digits
• Total 70,000 images
• 60,000 training images and 10,000 testing images
• The size of every image is 28x28 pixels.
• Number of total features is 784.
• Total 10 classes

Implementation:

At first all the dependencies are loaded.


Code:

import numpy as np
import time
import matplotlib.pyplot
%matplotlib inline

Then the dataset is stored in csv file

def convert(imgf, labelf, outf, n):


f = open(imgf, "rb")
o = open(outf, "w")
l = open(labelf, "rb")

f.read(16)
l.read(8)
images = []

for i in range(n):
image = [ord(l.read(1))]
for j in range(28*28):
image.append(ord(f.read(1)))
images.append(image)

for image in images:


o.write(",".join(str(pix) for pix in image)+"\n")
f.close()
o.close()
l.close()
convert("/content/drive/MyDrive/4-2/CSE 4203/train-images.idx3-
ubyte", "/content/drive/MyDrive/4-2/CSE 4203/train-labels.idx1-ubyte",
"mnist_train.csv", 60000)
convert("/content/drive/MyDrive/4-2/CSE 4203/t10k-images.idx3-
ubyte", "/content/drive/MyDrive/4-2/CSE 4203/t10k-labels.idx1-ubyte",
"mnist_test.csv", 10000)

After that the train and test dataset are loaded and then scaling is performed

train_file = open("/content/mnist_train.csv", 'r')


train_list = train_file.readlines()
train_file.close()

train_file = open("/content/mnist_train.csv", 'r')


train_list = train_file.readlines()
train_file.close()

scaled_input_train = (np.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01

test_file = open("/content/mnist_test.csv", 'r')


test_list = test_file.readlines()
test_file.close()

all_values = test_list[100].split(',')
image_array = np.asfarray(all_values[1:]).reshape((28,28))

scaled_input_test = (np.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01


At starting point of MLP algorithm, weights and threshold are initialized

def __init__(self, sizes, epochs, lr):


self.sizes = sizes
self.epochs = epochs
self.lr = lr

# number of nodes in each layer


input_layer=self.sizes[0]
hidden_1=self.sizes[1]
hidden_2=self.sizes[2]
output_layer=self.sizes[3]

self.params = {
'W1':np.random.randn(hidden_1, input_layer) * np.sqrt(1. / hidden_
1),
'W2':np.random.randn(hidden_2, hidden_1) * np.sqrt(1. / hidden_2),
'W3':np.random.randn(output_layer, hidden_2) * np.sqrt(1. / output
_layer)
}

Now sigmoid and softmax function is defined

def sigmoid(self, x, derivative=False):


if derivative:
return (np.exp(-x))/((np.exp(-x)+1)**2)
return 1/(1 + np.exp(-x))

def softmax(self, x, derivative=False):


# Numerically stable with large exponentials
exps = np.exp(x - x.max())
if derivative:
return exps / np.sum(exps, axis=0) * (1 - exps / np.sum(exps, ax
is=0))
return exps / np.sum(exps, axis=0)
At the time of forward pass, each layer calculates the output and pass the as input to next layer

def forward_pass(self, x_train):


params = self.params

# input layer activations becomes sample


params['A0'] = x_train

# input layer to hidden layer 1


params['Z1'] = np.dot(params["W1"], params['A0'])
params['A1'] = self.sigmoid(params['Z1'])

# hidden layer 1 to hidden layer 2


params['Z2'] = np.dot(params["W2"], params['A1'])
params['A2'] = self.sigmoid(params['Z2'])

# hidden layer 2 to output layer


params['Z3'] = np.dot(params["W3"], params['A2'])
params['A3'] = self.softmax(params['Z3'])

return params['A3']

Using backpropagation, error is back propagated from output layer to input layer and the weights
to be altered is proportional the error and calculated
def backward_pass(self, y_train, output):
params = self.params
change_w = {}

# Calculate W3 update
error = 2 * (output - y_train) / output.shape[0] * self.softmax(para
ms['Z3'], derivative=True)
change_w['W3'] = np.outer(error, params['A2'])

# Calculate W2 update
error = np.dot(params['W3'].T, error) * self.sigmoid(params['Z2'], d
erivative=True)
change_w['W2'] = np.outer(error, params['A1'])

# Calculate W1 update
error = np.dot(params['W2'].T, error) * self.sigmoid(params['Z1'], d
erivative=True)
change_w['W1'] = np.outer(error, params['A0'])

return change_w

Then, adapt the weights

def update_network_parameters(self, changes_to_w):

for key, value in changes_to_w.items():


self.params[key] -= self.lr * value

Now, it is time to train the network


def compute_accuracy(self, test_data, output_nodes):
predictions = []

for x in train_list:
all_values = x.split(',')
# scale and shift the inputs
inputs = (np.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
# create the target output values (all 0.01, except the desired
label which is 0.99)
targets = np.zeros(output_nodes) + 0.01
# all_values[0] is the target label for this record
targets[int(all_values[0])] = 0.99
output = self.forward_pass(inputs)
pred = np.argmax(output)
predictions.append(pred == np.argmax(targets))

return np.mean(predictions)

def train(self, train_list, test_list, output_nodes):


start_time = time.time()
for iteration in range(self.epochs):
for x in train_list:
all_values = x.split(',')
# scale and shift the inputs
inputs = (np.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
# create the target output values (all 0.01, except the desi
red label which is 0.99)
targets = np.zeros(output_nodes) + 0.01
# all_values[0] is the target label for this record
targets[int(all_values[0])] = 0.99
output = self.forward_pass(inputs)
changes_to_w = self.backward_pass(targets, output)
self.update_network_parameters(changes_to_w)

accuracy = self.compute_accuracy(test_list, output_nodes)


print('Epoch: {0}, Time Spent: {1:.2f}s, Accuracy: {2:.2f}%'.for
mat(
iteration+1, time.time() - start_time, accuracy * 100
))

nn = NN(sizes=[784, 128, 64, 10], epochs=10, lr=0.001)


nn.train(train_list, test_list, 10)

Table 3.1 :Evaluation the correctness and the accuracy


Learning rate=0.001 Learning rate=0.01 Learning rate=0.05
Epoch: 1, Time Epoch: 1, Time Epoch: 1, Time
Spent: 76.40s, Spent: 85.05s, Spent: 75.38s,
Accuracy: 23.36% Accuracy: 51.47% Accuracy: 73.53%
Epoch: 2, Time Epoch: 2, Time Epoch: 2, Time
Spent: 156.99s, Spent: 164.65s, Spent: 155.20s,
Accuracy: 28.21% Accuracy: 56.15% Accuracy: 75.27%
Epoch: 3, Time Epoch: 3, Time Epoch: 3, Time
Spent: 233.56s, Spent: 242.52s, Spent: 231.23s,
Accuracy: 33.65% Accuracy: 60.13% Accuracy: 79.42%
Epoch: 4, Time Epoch: 4, Time Epoch: 4, Time
Spent: 314.37s, Spent: 320.58s, Spent: 311.87s,
Accuracy: 39.00% Accuracy: 66.85% Accuracy: 81.18%
Epoch: 5, Time Epoch: 5, Time Epoch: 5, Time
Spent: 390.51s, Spent: 397.03s, Spent: 387.57s,
Accuracy: 43.31% Accuracy: 71.10% Accuracy: 82.71%
Epoch: 6, Time Epoch: 6, Time Epoch: 6, Time
Spent: 468.83s, Spent: 476.86s, Spent: 465.46s,
Accuracy: 46.22% Accuracy: 73.64% Accuracy: 83.93%
Epoch: 7, Time Epoch: 7, Time Epoch: 7, Time
Spent: 546.70s, Spent: 551.20s, Spent: 542.02s,
Accuracy: 48.07% Accuracy: 75.20% Accuracy: 84.78%
Epoch: 8, Time Epoch: 8, Time Epoch: 8, Time
Spent: 621.76s, Spent: 628.39s, Spent: 620.20s,
Accuracy: 49.25% Accuracy: 74.55% Accuracy: 85.28%
Dataset: XOR data set

Fig 3.1: Dataset in the feature space

From the above figure, we see that dataset is not linearly separable. So, multi layer perceptron
learning algorithm will be applied to see if the model can classify the data set

Characteristics of Dataset:
● XOR dataset
● Total 4 samples
● 4 training samples and 4 testing samples
● Number of features is 2.
● Total 2 classes

Implementation:

At first all the dependencies are loaded.

Code:

import numpy as np
import math
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

Then the dataset is generated and train data as x_train, train class label as y_train, test data as
x_ test, test class label as y_ test are extracted from the dataset

# Define the input and output data for the XOR problem
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

After that I have initialized weights and bias as follows

#weiight
def __init__(self, input_size, hidden_size, output_size):
# Initialize the weights for the hidden and output layers
self.weights_hidden = np.random.normal(size=(input_size, hidden_si
ze))
self.weights_output = np.random.normal(size=(hidden_size, output_s
ize))

# Initialize the biases for the hidden and output layers


self.bias_hidden = np.zeros((1, hidden_size))
self.bias_output = np.zeros((1, output_size))
Now sigmoid and softmax function is defined

# Define the sigmoid activation function


def sigmoid(x):
return 1 / (1 + np.exp(-x))

# Define the derivative of the sigmoid activation function


def sigmoid_derivative(x):
return x * (1 - x)

At the time of forward pass, each layer calculates the output and pass the as input to next layer

def feedforward(self, X):

# Perform the feedforward pass through the MLP


self.hidden = sigmoid(np.dot(X, self.weights_hidden) + self.bias_h
idden)
self.output = sigmoid(np.dot(self.hidden, self.weights_output) + s
elf.bias_output)

Using backpropagation, error is back propagated from output layer to input layer and the weights
to be altered is proportional the error and calculated
def backpropagation(self, X, y, learning_rate):
# Calculate the error between the predicted output and the true ou
tput
output_error = y - self.output

# Calculate the derivative of the error with respect to the output


output_derivative = sigmoid_derivative(self.output)

# Calculate the derivative of the error with respect to the weight


s and biases of the output layer
output_weights_derivative = np.dot(self.hidden.T, output_error * o
utput_derivative)
output_bias_derivative = np.sum(output_error * output_derivative,
axis=0, keepdims=True)

# Calculate the error for the hidden layer


hidden_error = np.dot(output_error * output_derivative, self.weigh
ts_output.T)

# Calculate the derivative of the error with respect to the hidden


layer
hidden_derivative = sigmoid_derivative(self.hidden)

# Calculate the derivative of the error with respect to the weight


s and biases of the hidden layer
hidden_weights_derivative = np.dot(X.T, hidden_error * hidden_deri
vative)
hidden_bias_derivative = np.sum(hidden_error * hidden_derivative,
axis=0, keepdims=True)

Then, adapt the weights

# Update the weights and biases using the derivatives and the learning r
ate

self.weights_hidden += learning_rate * hidden_weights_derivative


self.bias_hidden += learning_rate * hidden_bias_derivative
self.weights_output += learning_rate * output_weights_derivative
self.bias_output += learning_rate * output_bias_derivative
Now, it is time to train the network
def train(self, X, y, epochs, learning_rate):
# Train the MLP for the specified number of epochs
for epoch in range(epochs):
for i in range(len(X)):
self.feedforward(X[i:i+1])
self.backpropagation(X[i:i+1], y[i:i+1], learning_rate)

def predict(self, X):


# Make a prediction using the trained MLP
self.feedforward(X)
return self.output.round()

mlp = MLP(input_size=2, hidden_size=2, output_size=1)

epochs = 1000
learning_rate = 0.01
mlp.train(X, y, epochs, learning_rate)

y_pred = mlp.predict(X)
print("Predictions:", y_pred)
print("Accuracy:", np.mean(y_pred == y))

Table 3.2 :Evaluation the correctness and the accuracy


Epoch Learning rate Accuracy(%)
1000 0.1 50
1000 0.3 75
1000 0.5 100
Limitations of multi-layer perceptron learning algorithm:

⮚ Can be stable at local minima

⮚ Underfitting

⮚ Overfitting

⮚ Divergency

Conclusion:
In conclusion, MLP is a powerful and versatile neural network model that has proven to be
effective in various machine learning applications. Its ability to learn and generalize from data, as
well as its flexibility in terms of network architecture and activation functions, make it a popular
choice in the field. However, its limitations in terms of overfitting and computational resources
should also be taken into consideration when using MLP in practical applications. Overall, MLP
is a valuable tool in the field of machine learning and can provide valuable insights and predictions
when used appropriately.

You might also like