DL - Assignment 1
DL - Assignment 1
Assignment 1
Implementation of Neural Network
Submission:
Submit all of your codes and results in a single zip file with name FirstName_RollNumber_01.zip
• Submit single zip file containing
(a) codes (b) report (c) Saved Models (d) Readme.txt
• There should be Report.pdf detailing your experience and highlighting any interesting results.
Kindly don’t explain your code in the report, just explain the results. Your report should include
your comments on the results of all the steps, with images, for example what happened when
you changed the learning rate etc.
• Readme.txt should explain how to run your code, preferably it should accept the command line
arguments e.g dataset path used for training the model.
• The assignment is only acceptable in .py files. No Jupyter notebooks.
• In the root directory, there should be 2 python files, a report and a folder containing saved
models.
• Root directory should be named as FirstName_RollNumber_01
• Your code notebooks should be named as ‘rollNumber_01_task1.py
• Follow all the naming conventions.
• For each convention, there is a 3% penalty if you don’t follow it.
• Email instructor or TA if there are any questions. You cannot look at others code or use others
code, however you can discuss with each other. Plagiarism will lead to a straight zero with
additional consequences as well.
• 2% (of obtained marks) deduction per day for late submission.
• The submissions will only be accepted till 27 April midnight.
• DON’T RESUBMIT THE DATASETS PROVIDED IN YOUR SUBMISSION.
Note: For this assignment (and for others in general) you are not allowed to search online for any kind of
implementation. Do not share code or look at the other’s code. You should not be in possession of any
implementation related to the assignment, other than your own. In case of any confusion please reach
out to the TA’s (email them or visit them during their office hours).
Objectives: In this assignment you will write the code for training the Neural Networks. The goals of this
assignment are as follows.
Report: You have to write a report explaining, what is your implementation logic? How and why you
distributed the data in three parts i.e. Train, Test and Validation set? How you came up with the
optimized weights (how you find learning rate?). Which activation function performs best on the given
dataset? Finally, your comment?
Task 2: Implement a Neural Network with a single hidden layer, multiple activation function and
cross entropy loss function (10 Points)
Steps for the implementation are below:
1. You are given a class named Neural_Network which will take sizes of input, output and hidden
layers as a parameter.
a. Input size: Size of the input feature vector to our Neural Network. In our case it is 2, pass
it as a parameter.
b. Hidden layer: number of neurons in the hidden layer of the architecture.
c. Output size: is the size output layer, number of neurons that will be in the output layer.
d. Randomly initialize W1 and W2 i.e. Weight matrices connecting input layer to hidden
layer and hidden layer to output layer respectively.
e. Pass activation functions as an input in the class e.g. ‘Sigmoid’, ‘Tanh’ or ‘Relu’.
2. Below is the list of functions which you need to implement within the class Neural_Network,
description of each function is provided in the source code.
a. Feedforward (self, X)
i. X is input feature(s) ii.
Return predicted vector(s)
b. Backpropagation (self,X, Y, y_pred, lr):
i. X is input feature(s)
ii. Y is actual label(s)
iii. Y_pred is predicted value(s)
iv. lr is learning rate
c. Sigmoid(self, s)
i. Return sigmoid applied on s value(s)
d. Sigmoid_derivative(self, s)
i. Return derivative of sigmoid, on s
e. tanh(self, s)
i. Return tanh applied on s value(s)
f. tanh_derivative(self, s)
i. Return derivative of tanh, on s
g. relu(self, s)
i. Return relu applied on s value(s)
h. relu_derivative(self, s)
i. Return derivative of relu, on s
i. Crossentropy(self, Y, Y_pred)
i. Y is the actual label(s) ii. Y_pred is a predicted label(s) iii. Return
error based on cross entropy
j. Train(self, trainX, trainY,epochs = 100, learningRate = 0.001, plot_err
= True ,validationX = Null, validationY = Null)
i. trainX is the training feature dataset in row format
ii. trainY is the label of the dataset iii. epochs is the number of times the entire
dataset will be passed to the network for training, default value is 100.
iv. learningRate is the constant used for weight update.
v. plot_err bool variable if you want to plot error on a graph or not
vi. validationX is the validation feature dataset in row format, show validation error
in each epoch
vii. validationY is the validation label of the dataset.
k. Save_model(self,name)
i. Save the model under the name of ‘name’
l. Load_model(self,name)
i. Load the model using ‘name’.
m. Accuracy(self, testX, testY)
i. testX is dataset for testing
ii. testY is the labels of test data
iii. plot accuracy on an image
iv. return accuracy
n. predict(self, testX)
i. testX is the test row feature
ii. return predicted value on testX
o. main()
i. Call all the functions to train and test the network in this function.
ii. Find the accuracy and loss curves of training and validation data and print the
accuracy of the test set and return all these values.
Task 3: Implement a multi-layer Neural Network for multi-class classification.
In this part, you will create a complete neural network architecture consisting of multiple layers. You
are required to report results with 2 hidden layers on the MNIST dataset. Follow the architecture of
your network as shown in the below diagram.
MNIST Dataset:
The dataset is attached with the assignment in the Task3_Data folder. This dataset contains 60000
training and 10000 test samples. Each sample is a grayscale image of size 28x28. There are 10
classes in which each sample is an image of one of the digits (0 to 9). Please note that in the MNIST
dataset there are 10 categories. If we randomly guess one category, there is a 1/10 probability that
it would be correct. Therefore, you cannot (theoretically) make a classifier that performs worse
than that. If you get less than 10% accuracy in any of your experiments, you can safely assume that
you are doing something fundamentally wrong. If your final results are less than 20% in terms of
accuracy, your solution(s) will not be graded. To load the dataset, we are providing you a function
load_dataset that will return you the followings:
load_dataset (path_to_dataset)
Note: You can also use any other custom function you want to load the data.
Points to keep in mind:
index = 25
plt.imshow(train_set_x[index])
To verify that all the data is loaded print the dimension of each
variable and you should get the following outputs:
Vectorization of Samples:
To input samples in the neural network, we need to convert a 2D matrix into a one
dimensional vector. Reshape your images of shape (num_px, num_px, 1) to a flattened
numpy array of shape (num_px*num_px, 1). After reshaping, your training and test
dataset should be a numpy-array where each row represents a flattened image. The
dimension of the data should be:
HINT: there is a reshape function in the numpy allowing you to reshape the whole matrix with a
single call. Do check that you have vectorized the images correctly.
Mean subtraction will translate our dataset to the mean of the whole data.
Validation Set:
Divide the training dataset after flattening into train and validation set. You can try different
fractions for this division. You can use the sklearn.model_selection.train_test_split() function for this
task. Try different versions where you also shuffle the data to divide them into these two sets.
Note: Do not use the final instances for the validation set. The dataset loading code loads the
instances in order. You might not get the instances for the last class (that comes in order which will
probably be class 9) in the training set at all this way. Remember to shuffle the training data before
train test split or use the library builtin function to do this.
t-SNE: t- Distributed Stochastic Neighbor Embedding (t-SNE) is an unsupervised,
non-linear technique primarily used for data exploration and visualizing
high-dimensional data. In simpler terms, t-SNE gives you a feel or intuition of how
high dimensional data is arranged in space.
You can use sklearn library to plot t-SNE and visualize data points.
Once you have loaded the data you need to implement the several helper functions.
1. Data preprocessing
data = meanSubtraction(data)
Note that this function must be called before splitting data into train, test and validation set.
2. Initialize Network
net = init_network(no_of_layers, input_dim, neurons_per_layer)
For example if you pass following parameters to this function:
net = init_network(2, 784, [100, 50, 10])
● Use np.random.randn()to initialize the weights matrices or you can use any other weight
initialization method you learned in the class.
It should return you the network architecture with parameters initialized:
Size of net(1).w = 784x100
Size of net(1).b = 100
Size of net(2).w = 100x50
Size of net(2).b = 50
Size of net(3).w = 50x10
Size of net(3).b = 10
You should add the empty arrays in each layer to store the activations and local gradients of each layer;
this will help you in back-propagation.
3. Training
net = train_sgd(net, train_set_x, train_set_y, test_set_x,
test_set_y, learning_rate, drop_out, batch_size, training_epochs)
• train_set_x input training data of MxN size. where M is the feature dimension which is
• batch_size tells how many examples to pick for each iteration e.g. 20
• training_epochs how many training epochs to run.
Please make sure your code is modular. You can divide your training process into following function
➢ feed-forward - this function will forward through (the network) your input examples and will
also compute local gradients along the way at each layer
➢ back propagating the loss - multiplying gradients from above to the current local gradients
➢ validate/test - function will test how well your network is doing in terms of loss and
accuracy.
You need to convert the labels to one hot encoding, because we now have 10 classes and their
labels are 0,1,2,3...and 9. For each training sample you need to generate a vector of length 10,
whose all indices will be zeros except the index of its original label that will be 1. For example:
4. Feed Forward step
net = feedForward(net, batch_data, keep_prob)
zi = wT xi + b
ai = sigmoid(zi)
Perform these two operations on all the layers and store the value of a in net(layer).a
At this step also store the value of derivative of your activation function in net(layer).local_grad
Derivative of sigmoid is σ(x)*(1 - σ (x)).
The first step here, you will apply the soft-max on the output layer. Now for each input example
you have 10 output probabilities and you also have its label vector of same length. You have to
implement softmax at the last layer and calculate cross-entropy loss.
Once you have calculated the loss you can compute the gradients dw at any layer using the
equations we discussed in class. Please note that you will essentially be running in reverse order.
Store dw and db for each layer.
6. Update Step
[net] = sgd (net, dw, db, learning_rate)
Again here you will be running a for loop by iterating layers from first to last and update the weights
using update rule.
Note: You are not restricted to implement the assignment in a way that is explained above,
you can break down or merge the several functions, but you are required to implement in a
modular way.
Report
For this assignment, and all other assignments and projects, you must write a report. In the report
you will describe the critical decisions you made, important things you learned, or any decisions you
made to write your algorithm a particular way. You are required to report the accuracy you achieved.
For each experiment, you are required to provide analysis of various hyperparameters.
1. Plot loss and accuracy curves for both training and testing data with mean image subtraction
and without mean image subtraction. Report the difference in their accuracy and loss
curves.
2. Visualize data points using t-SNE technique
• Having a shallow network of 2 hidden layers, take an output of 1st and 2nd hidden
layer and plot t-SNE and place those images in your report. If you have more layers,
you need to plot t-SNE plots after each layer for your best accuracy model.
• Add more hidden layers in your network and report how it affects the t-SNE
plot(You can take a small subset of like 500 images for comparison).
3. Show and analyse confusion matrix of 10- class classification problem.
4. Plot loss and accuracy curves for different configurations of the architecture..
5. Report the accuracy by changing the number of neurons in the hidden layers.
Marks Division:
1. The marks division is as below:
a. Working code [50 points]
i. Task 2
1. Feedforward [10 points]
2. FeedBackward [10 points]
3. Activations [10 points]
4. Loss [05 points]
ii. Task 3
1. Feedforward [10 points]
2. FeedBackward [10 points]
3. Activations [10 points]
4. Loss [05 points]
5. Generic [10 points]
6. Mean Subtraction [05 points]
7. t-SNE plots [15 points]
iii.
b. Report [40 points]
i. Task 1
1. Computation Graph [10 points]
2. Mathematical Derivation [10 points]
ii. Task 2
1. Loss and accuracy curves [10 points]
2. Test Accuracy [05 points]
3. Analysis(in different experiments) [10 points]
iii. Task 3
1. Loss and accuracy curves with and without mean subtraction
[10 points]
2. Test Accuracy [05 points]
3. Test Accuracy [05 points]
4. Confusion Matrix for Training, Validation and Test set [10 points]
5. t_SNE plot [10 points]
6. Analysis(in different experiments) [10 points]
Conclusion [10 points]