MLP v4
MLP v4
September 8, 2024
INFERENCE:
The same network architeture with inbuilt pytorch and tensorflow packages works tremendously
faster compared to hand coded one.The accuracy value as well as confusionmatrix also doesnt match
exactly which might be due to some other approximations or changes in inbuilt packages.
For batch size and learning rate specified in the question, the network takes longer time for execution
and hence for practical comparison purpose the code was executed for 4 epoch with batch size of
256 and learning rate of 0.3 to observe the following:
The accuracy with sigmoid, relu and tanh (rest all being same) is 88.11%,97.3% and 96.61% re-
spectively. Hence, the network works best for relu followed by tanh and sigmoid.
further hyperparameter tuning was done by changing the values of batch size,epoch and learning
rate to observe the results. The inference is similar with other settings as well. For example, for
batch size of 128 , learning rate of 0.3 and max epoch of 8 , the accuracy obtained were 98.10%,
97.64%, and 93.06 for relu, tanh, and sigmoid respectively.
For the same hyperparameter the pytorch function with relu activation gave an accuracy of 93.14%.
Additionally on using L2 regularisation with weightage of 0.1,0.01,0.001 in all layers didnt give
much change in accuracy. For eg, 0 regularisation in hidden layers and 0.0001 relularisation in
final layer resulted in accuracy of 93.16% with sigmoid activation.However regularisation with
0,0,0.0001,0.0001 in each layer with relu activation improoved to 99.19%.
Apart from accuracy from an implementation point of view, the tanh consumed more time for
execution for each epoch. Increased batch size also reduced computation time as it reduces no of
iterations. Regularisation didnt give much improvement in results. The training and testing loss
followed a trajectory which almost aligned except that training loss graph had some more noise
compared to the testing one.
[41]: print('program starts')
program starts
1
# from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten,␣
↪Conv2D, MaxPooling2D
update_interval=200
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)
train_loss_interval.append(logs['loss'])
2
validation_data=(x_test, y_test),
verbose=0, # Suppress default output
callbacks=[tf.keras.callbacks.
↪LambdaCallback(on_batch_end=lambda batch, logs: batch_loss_callback(batch,␣
↪logs))])
# Make predictions
y_predicted = model.predict(x_test)
y_predicted_labels = [np.argmax(i) for i in y_predicted]
update_interval=200
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)
tf.keras.layers.Dense(250,␣
↪activation=activation1,kernel_regularizer=regularizers.l2(lambda2)),
tf.keras.layers.Dense(100,␣
↪activation=activation1,kernel_regularizer=regularizers.l2(lambda3)),
3
tf.keras.layers.Dense(10,␣
↪activation='softmax',kernel_regularizer=regularizers.l2(lambda4))
])
train_loss_interval.append(logs['loss'])
↪logs))])
# Make predictions
y_predicted = model.predict(x_test)
y_predicted_labels = [np.argmax(i) for i in y_predicted]
4
plt.plot(train_loss_interval, label=f'Training Loss (Every␣
↪{update_interval} Batches)')
plt.plot(val_loss_interval, label=f'Validation Loss (Every␣
↪{update_interval} Batches)')
elif activation=='tanh':
x=np.exp(z)
y=np.exp(-z)
num=x-y
den=x+y
return num/den
else:
return 1/(1+np.exp(-z))
def deri_activate(a,activation):
if activation=="relu":
array=np.where(a>0,1,0)
return array
elif activation=='tanh':
return 1-a*a
else:
return a*(1-a)
def cross_entropy(epoch,y_hat,y_label):
loss1=-y_label*np.log(y_hat)
loss=np.sum(loss1)/(np.size(loss1))
return loss
5
[52]: def forward_prop(a0,w1,w2,w3,w4,b1,b2,b3,b4,activation):
m=a0.shape[1]
z1=np.dot(w1,a0)+b1
a1=activate(z1,activation)
z2=np.dot(w2,a1)+b2
a2=activate(z2,activation)
z3=np.dot(w3,a2)+b3
a3=activate(z3,activation)
z4=np.dot(w4,a3)+b4
a4=np.exp(z4)
s4=np.sum(a4,0)
a4= (1/s4)*a4
return a1,a2,a3,a4
d4=a4-y_train
dw4=np.dot(d4,np.transpose(a3))
db4=np.dot(d4,np.ones([m,1]))
dw4=dw4/m
db4=db4/m
df=deri_activate(a3,activation)
d3=np.dot(np.transpose(w4),d4)*df
dw3=np.dot(d3,np.transpose(a2))
db3=np.dot(d3,np.ones([m,1]))
dw3=dw3/m
db3=db3/m
df=deri_activate(a2,activation)
d2=np.dot(np.transpose(w3),d3)*df
dw2=np.dot(d2,np.transpose(a1))
db2=np.dot(d2,np.ones([m,1]))
dw2=dw2/m
db2=db2/m
df=deri_activate(a1,activation)
d1=np.dot(np.transpose(w2),d2)*df
dw1=np.dot(d1,np.transpose(a0))
db1=np.dot(d1,np.ones([m,1]))
dw1=dw1/m
db1=db1/m
6
return dw1,dw2,dw3,dw4,db1,db2,db3,db4
cm_display.plot(cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.show()
TP=np.array([ confusion_matrix[i][j] for i in range(10) for j in range(10)␣
↪if i==j])
accuracy=np.sum(TP)/np.sum(confusion_matrix)
return confusion_matrix,accuracy
7
#data load and split
data=tf.keras.datasets.mnist
(x_train1,y_train1),(x_test1,y_test1)=data.load_data()
#normalise
x_train1 = x_train1 / 255
x_test1 = x_test1 / 255
#Hyperparameters
batch_size = 1024
alpha=0.01
max_epoch=4
activation='relu'
num_classes=10
x_train_batches, y_train_batches = create_batches(x_train1, y_train1,␣
↪batch_size)
x_test=x_test1.reshape(x_test1.shape[0], 28 * 28)
x_test=np.transpose(x_test)
[83]: train_loss_array=[]
test_loss_array=[]
iteration=0
for epoch in range(1,max_epoch+1):
# alpha=alpha*np.exp(-0.01*epoch)
for b in range(len(x_train_batches)):
iteration=iteration+1
print('\repoch {} batch {} iteration {}'.
↪format(epoch,b,iteration),end='',flush=True)
y_test=np.zeros((y_test1.shape[0], num_classes))
y_test[np.arange(y_test1.shape[0]),y_test1] = 1
y_test = np.transpose(y_test)
# Training
a1,a2,a3,a4=forward_prop(x_train,w1,w2,w3,w4,b1,b2,b3,b4,activation)
8
␣
↪dw1,dw2,dw3,dw4,db1,db2,db3,db4=back_prop(w2,w3,w4,x_train,a1,a2,a3,a4,y_train,activation)
␣
↪w1,w2,w3,w4,b1,b2,b3,b4=grad_descent(alpha,w1,w2,w3,w4,b1,b2,b3,b4,dw1,dw2,dw3,dw4,db1,db2,d
loss=cross_entropy(epoch,a4,y_train)
train_loss_array.append(loss)
#Testing(Validation)
a1,a2,a3,a4=forward_prop(x_test,w1,w2,w3,w4,b1,b2,b3,b4,activation)
y_pred=np.array([np.argmax(x) for x in np.transpose(a4)])
loss=cross_entropy(epoch,a4,y_test)
test_loss_array.append(loss)
[84]: #Evaluation
c_mat,accuracy=evaluation(y_pred,y_test1)
print("WITH MY FUNCTIONS")
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(c_mat)
plt.figure()
y1=train_loss_array[199::200]
y2=test_loss_array[199::200]
x=np.arange(199,len(train_loss_array),200)
#x=np.arange(len(y1))
plt.plot(x,y1,label='Train loss',color='blue')
plt.plot(x,y2,label='Test loss',color='red')
plt.legend()
plt.xlabel('Batch')
plt.ylabel('loss')
9
WITH MY FUNCTIONS
Final Test Accuracy: 81.74% and conf mat is
[[ 941 0 4 4 1 3 17 2 8 0]
[ 0 1111 3 7 0 1 4 0 9 0]
[ 26 27 818 49 22 1 33 19 36 1]
[ 5 2 24 874 2 9 8 28 50 8]
[ 2 12 7 1 777 1 14 3 10 155]
[ 24 40 9 189 56 416 36 29 64 29]
[ 20 10 39 0 14 11 859 0 5 0]
[ 4 44 24 3 9 1 1 885 10 47]
[ 14 26 21 101 18 15 17 20 722 20]
[ 15 19 14 11 99 6 1 55 18 771]]
10
0.2 Tanh Activation
[14]: batch_size = 1024
alpha=0.3
max_epoch=3
activation='tanh'
num_classes=10
x_train_batches, y_train_batches = create_batches(x_train1, y_train1,␣
↪batch_size)
x_test=x_test1.reshape(x_test1.shape[0], 28 * 28)
x_test=np.transpose(x_test)
train_loss_array=[]
test_loss_array=[]
w1=init*np.random.uniform(-m1,m1,[n1,n0])
w2=init*np.random.uniform(-m2,m2,[n2,n1])
w3=init*np.random.uniform(-m3,m3,[n3,n2])
w4=init*np.random.uniform(-m4,m4,[n4,n3])
b1=init*np.zeros([n1,1])
b2=init*np.zeros([n2,1])
11
b3=init*np.zeros([n3,1])
b4=init*np.zeros([n4,1])
y_test=np.zeros((y_test1.shape[0], num_classes))
y_test[np.arange(y_test1.shape[0]),y_test1] = 1
y_test = np.transpose(y_test)
# Training
a1,a2,a3,a4=forward_prop(x_train,w1,w2,w3,w4,b1,b2,b3,b4,activation)
␣
↪dw1,dw2,dw3,dw4,db1,db2,db3,db4=back_prop(w2,w3,w4,x_train,a1,a2,a3,a4,y_train,activation)
␣
↪w1,w2,w3,w4,b1,b2,b3,b4=grad_descent(alpha,w1,w2,w3,w4,b1,b2,b3,b4,dw1,dw2,dw3,dw4,db1,db2,d
loss=cross_entropy(epoch,a4,y_train)
train_loss_array.append(loss)
#Testing(Validation)
a1,a2,a3,a4=forward_prop(x_test,w1,w2,w3,w4,b1,b2,b3,b4,activation)
y_pred=np.array([np.argmax(x) for x in np.transpose(a4)])
loss=cross_entropy(epoch,a4,y_test)
test_loss_array.append(loss)
c_mat,accuracy=evaluation(y_pred,y_test1)
print("WITH MY FUNCTIONS")
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(c_mat)
plt.figure()
y1=train_loss_array[199::200]
y2=test_loss_array[199::200]
x=np.arange(199,len(train_loss_array),200)
plt.plot(x,y1,label='Train loss',color='blue')
plt.plot(x,y2,label='Test loss',color='red')
12
plt.legend()
plt.xlabel('Batch')
plt.ylabel('loss')
WITH MY FUNCTIONS
Final Test Accuracy: 97.32% and conf mat is
[[ 972 0 0 0 1 3 0 1 3 0]
[ 0 1126 1 2 0 1 1 2 2 0]
[ 7 1 1004 3 3 0 2 5 7 0]
[ 1 0 5 976 0 13 0 10 3 2]
[ 0 0 4 0 960 0 2 3 1 12]
[ 3 1 0 5 2 873 1 2 3 2]
[ 6 3 1 0 7 26 914 0 1 0]
[ 2 4 8 4 0 0 0 1003 0 7]
[ 6 2 2 6 4 8 0 6 939 1]
[ 3 3 1 8 12 6 0 9 2 965]]
13
[14]: Text(0, 0.5, 'loss')
num_classes=10
x_train_batches, y_train_batches = create_batches(x_train1, y_train1,␣
↪batch_size)
x_test=x_test1.reshape(x_test1.shape[0], 28 * 28)
x_test=np.transpose(x_test)
train_loss_array=[]
test_loss_array=[]
w1=init*np.random.uniform(-m1,m1,[n1,n0])
w2=init*np.random.uniform(-m2,m2,[n2,n1])
w3=init*np.random.uniform(-m3,m3,[n3,n2])
w4=init*np.random.uniform(-m4,m4,[n4,n3])
14
b1=init*np.zeros([n1,1])
b2=init*np.zeros([n2,1])
b3=init*np.zeros([n3,1])
b4=init*np.zeros([n4,1])
y_test=np.zeros((y_test1.shape[0], num_classes))
y_test[np.arange(y_test1.shape[0]),y_test1] = 1
y_test = np.transpose(y_test)
# Training
a1,a2,a3,a4=forward_prop(x_train,w1,w2,w3,w4,b1,b2,b3,b4,activation)
␣
↪dw1,dw2,dw3,dw4,db1,db2,db3,db4=back_prop(w2,w3,w4,x_train,a1,a2,a3,a4,y_train,activation)
␣
↪w1,w2,w3,w4,b1,b2,b3,b4=grad_descent(alpha,w1,w2,w3,w4,b1,b2,b3,b4,dw1,dw2,dw3,dw4,db1,db2,d
loss=cross_entropy(epoch,a4,y_train)
train_loss_array.append(loss)
#Testing(Validation)
a1,a2,a3,a4=forward_prop(x_test,w1,w2,w3,w4,b1,b2,b3,b4,activation)
y_pred=np.array([np.argmax(x) for x in np.transpose(a4)])
loss=cross_entropy(epoch,a4,y_test)
test_loss_array.append(loss)
c_mat,accuracy=evaluation(y_pred,y_test1)
print("WITH MY FUNCTIONS")
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(c_mat)
plt.figure()
y1=train_loss_array[199::200]
y2=test_loss_array[199::200]
x=np.arange(199,len(train_loss_array),200)
plt.plot(x,y1,label='Train loss',color='blue')
15
plt.plot(x,y2,label='Test loss',color='red')
plt.legend()
plt.xlabel('Batch')
plt.ylabel('loss')
WITH MY FUNCTIONS
Final Test Accuracy: 97.90% and conf mat is
[[ 971 1 1 0 0 2 1 1 1 2]
[ 0 1134 0 0 0 1 0 0 0 0]
[ 1 3 1015 3 1 0 2 5 2 0]
[ 0 1 4 990 0 4 0 3 5 3]
[ 0 0 5 0 963 1 2 0 0 11]
[ 3 1 0 9 1 869 1 1 3 4]
[ 5 4 2 0 8 21 918 0 0 0]
[ 1 3 8 1 2 0 0 1002 1 10]
[ 2 3 4 4 4 6 1 2 944 4]
[ 1 2 0 3 8 3 1 6 1 984]]
16
[15]: Text(0, 0.5, 'loss')
activation='sigmoid'
max_epoch=15
alpha=0.01
batch_size=64
print("WITH INBUILT FUNCTION AND PACKAGES")
cm,accuracy=ANN_func(x_train1,y_train1,x_test1,y_test1,activation,alpha,batch_size,max_epoch)
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(cm)
17
313/313 �������������������� 2s 5ms/step
[ ]:
[32]: activation='sigmoid'
max_epoch=3
18
alpha=0.01
batch_size=1024
print("WITH INBUILT FUNCTION AND PACKAGES-reularisation")
lambda1,lambda2,lambda3,lambda4=0,0,0.0001,0.0001
cm,accuracy=ANN_func_regularised(x_train1,y_train1,x_test1,y_test1,activation,alpha,batch_size
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(cm)
19
[ 25 3 26 130 10 570 24 24 61 19]
[ 28 2 39 1 17 17 844 0 10 0]
[ 12 42 11 2 6 1 0 891 8 55]
[ 6 23 17 45 26 59 13 10 747 28]
[ 13 9 1 4 166 21 2 57 10 726]], shape=(10, 10),
dtype=int32)
[33]: activation='sigmoid'
max_epoch=3
batch_size=1024
alpha=0.01
print("WITH INBUILT FUNCTION AND PACKAGES")
cm,accuracy=ANN_func(x_train1,y_train1,x_test1,y_test1,activation,alpha,batch_size,max_epoch)
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(cm)
20
Final Test Accuracy: 80.70% and conf mat is
tf.Tensor(
[[ 909 0 1 1 3 52 10 1 3 0]
[ 0 1106 10 5 0 1 2 0 10 1]
[ 15 41 826 12 20 6 30 17 63 2]
[ 3 15 21 837 2 52 1 22 52 5]
[ 1 1 3 1 702 0 33 8 9 224]
[ 45 8 16 127 11 562 33 13 56 21]
[ 33 2 32 0 20 15 847 1 8 0]
[ 1 53 11 1 5 3 0 893 7 54]
[ 21 29 45 83 13 45 20 19 668 31]
[ 9 5 1 8 138 16 5 99 8 720]], shape=(10, 10),
dtype=int32)
[34]: activation='sigmoid'
max_epoch=3
batch_size=1024
alpha=0.01
print("WITH INBUILT FUNCTION AND PACKAGES-reularisation")
lambda1,lambda2,lambda3,lambda4=0,0,0.0001,0.0001
cm,accuracy=ANN_func_regularised(x_train1,y_train1,x_test1,y_test1,activation,alpha,batch_size
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(cm)
21
Final Test Accuracy: 81.12% and conf mat is
tf.Tensor(
[[ 935 0 2 1 2 30 9 1 0 0]
[ 0 1101 7 4 0 1 1 2 19 0]
[ 17 38 792 26 25 12 72 13 37 0]
[ 12 3 30 820 2 61 1 23 52 6]
[ 1 7 2 0 754 0 32 2 10 174]
[ 28 3 29 167 12 552 40 23 24 14]
[ 26 2 30 0 15 17 865 0 3 0]
[ 7 53 7 1 11 6 0 882 10 51]
[ 7 23 34 99 17 61 23 13 666 31]
[ 10 6 0 8 145 22 4 63 6 745]], shape=(10, 10),
dtype=int32)
[35]: activation='tanh'
max_epoch=3
batch_size=1024
alpha=0.01
print("WITH INBUILT FUNCTION AND PACKAGES")
cm,accuracy=ANN_func(x_train1,y_train1,x_test1,y_test1,activation,alpha,batch_size,max_epoch)
22
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(cm)
23
dtype=int32)
[69]: activation='tanh'
max_epoch=3
batch_size=128
alpha=0.01
print("WITH INBUILT FUNCTION AND PACKAGES-reularisation")
lambda1,lambda2,lambda3,lambda4=0,0,0.0001,0.0001
cm,accuracy=ANN_func_regularised(x_train1,y_train1,x_test1,y_test1,activation,alpha,batch_size
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(cm)
24
[ 10 3 919 13 14 1 16 17 33 6]
[ 3 0 21 921 0 23 3 15 15 9]
[ 1 2 4 2 908 1 16 2 6 40]
[ 11 5 7 49 10 749 14 10 30 7]
[ 11 3 6 1 11 16 906 1 3 0]
[ 3 11 28 6 11 1 0 937 2 29]
[ 11 9 10 28 9 27 16 11 840 13]
[ 12 7 4 10 44 10 0 31 7 884]], shape=(10, 10),
dtype=int32)
[37]: activation='relu'
max_epoch=3
batch_size=1024
alpha=0.01
print("WITH INBUILT FUNCTION AND PACKAGES")
cm,accuracy=ANN_func(x_train1,y_train1,x_test1,y_test1,activation,alpha,batch_size,max_epoch)
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(cm)
25
Final Test Accuracy: 97.25% and conf mat is
tf.Tensor(
[[ 965 0 0 1 1 5 4 2 2 0]
[ 0 1124 3 1 0 1 4 1 1 0]
[ 7 2 1000 1 2 0 4 11 5 0]
[ 0 0 9 978 0 7 0 7 6 3]
[ 1 0 6 0 961 0 2 2 2 8]
[ 4 1 0 4 2 869 6 0 4 2]
[ 5 3 2 0 9 5 933 0 1 0]
[ 0 8 9 1 0 0 0 1000 2 8]
[ 4 0 2 7 5 7 6 7 934 2]
[ 3 8 1 7 14 3 1 10 1 961]], shape=(10, 10),
dtype=int32)
[68]: activation='relu'
max_epoch=3
batch_size=128
alpha=0.01
print("WITH INBUILT FUNCTION AND PACKAGES-reularisation")
lambda1,lambda2,lambda3,lambda4=0,0,0.0001,0.0001
cm,accuracy=ANN_func_regularised(x_train1,y_train1,x_test1,y_test1,activation,alpha,batch_size
print(f"Final Test Accuracy: {accuracy * 100:.2f}% and conf mat is ")
print(cm)
26
Final Test Accuracy: 91.67% and conf mat is
tf.Tensor(
[[ 951 0 4 2 0 10 8 2 3 0]
[ 0 1116 1 4 0 1 4 2 7 0]
[ 10 7 914 18 19 0 15 16 31 2]
[ 2 1 16 933 1 21 1 18 13 4]
[ 1 3 4 2 928 0 11 2 3 28]
[ 9 2 7 44 11 764 17 8 21 9]
[ 15 3 4 1 15 17 899 2 2 0]
[ 3 20 21 8 8 1 0 941 1 25]
[ 5 7 9 34 12 24 12 12 847 12]
[ 8 8 5 14 56 11 1 27 5 874]], shape=(10, 10),
dtype=int32)
[ ]:
[ ]:
27