0% found this document useful (0 votes)
20 views8 pages

Report On Handwritten Digit Recognition Using A Feedforward Neural Network

This report is written on the machine learning to utilize the Python codes and datasets to identify the hand written digits

Uploaded by

mariajutt7711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views8 pages

Report On Handwritten Digit Recognition Using A Feedforward Neural Network

This report is written on the machine learning to utilize the Python codes and datasets to identify the hand written digits

Uploaded by

mariajutt7711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Report – Handwritten Digit Recognition using a Feedforward

Neural Network

Introduction
The Handwritten digit recognition is an essential problem in the fields of computer vision and machine
learning. It is often used as an introductory exercise for beginners in neural network modeling. The
MNIST dataset has 70,000 gray-scale images of handwritten digits between 0 and 9 and has been used
as the standard benchmark to evaluate and measure the performance of various classification
algorithms. Neural networks, especially deep learning architectures, have become the most prevalent
means to achieve high accuracy in the classification task.

This project involves building a feedforward neural network (FNN) using PyTorch on the MNIST
database to classify digits. This model, after training and evaluation, would help to understand the
performance of network and allows fine-tuning of parameters in the network to achieve the highest
level of accuracy possible. Key objectives of this project are developing a robust architecture for the
model and optimizing the training process as much as possible while analyzing the results in terms of the
ability of the network to recognize handwritten digits.

Dataset
MNIST dataset contains 28x28 pixels (grey scale images) of handwritten numbers. Each image
corresponds to a digit between 0 and 9. This dataset contains 60,000 images for training purposes and
for 10,000 images for testing purposes. There is an annotation for every image with the corresponding
digit that it represents.

In this project, the original training set is divided into two subsets which will constitute training subset of
about 50,000 images and a validation subset of about 10,000 images. This partition would enable to
have a trial evaluation of the model during training and then improve on it before testing it on unseen
data. Pixel values are normalized to be in the range [-1, 1] to speed up convergence during training.
Model Architecture
A feedforward neural network is a simple yet a powerful architecture for classification tasks, where the
data moves in one direction from input to output. In this project, an FNN model is designed for MNIST
image classification. The model architecture is as under:

 Input Layer: This layer receives the 28x28 pixel flattened images, making 784 input units. In
this layer each pixel is considered as a feature, and these features are passed to the neural
network for its classification.
 Hidden Layers: Inside hidden layer there are two layers:
o First Layer: It has 128 units which uses the Rectified Linear Unit (ReLU) activation
function. This function introduces the non-linearity in the network. This non-linearity
allows the model to learn the more complex patterns present in the data.
o Second Layer: Finally, there is the second hidden layer with 64 units which also uses
the ReLU activation in an attempt to refine the features learned.
 Output Layer: The output layer consists of 10 units, which is responsible to represent the
possible 10 digits i-e. from 0–9. Each unit outputs a score that shows the likelihood of the input
image belonging to the particular class.

Here, ReLU is used in the hidden layers to reduce the chances of vanishing gradients, making the process
of learning more efficient. However, in the outer layer the softmax is not directly used because it is fed
into the loss function.

Figure 1 Visual Representation of Model Arcgitecture


Loss Function and Optimizer
In multi-class classification problems like the digit recognition problem, the Cross-Entropy Loss function
is an ideal choice. This function is not only used to evaluate the difference between the predicted class
probabilities and the actual labels but also it penalizes an incorrect prediction relatively harshly. In this
project, `CrossEntropyLoss()` from PyTorch is used; which integrates the softmax function along with the
calculation of the loss.

The Adam optimizer which is used to optimize the parameters of the model has been selected for this
model. Adam is chosen instead of other optimization methods like stochastic gradient descent (SGD)
because of its adaptive nature; that is, the learning rate changes as it varies with the moments of the
gradient thus far computed in such a way that it converges faster and is more noise-resistant than many
other optimization methods. The training was carried with a learning rate value of 0.001 in order to
make the training smooth and efficient.

Training Process
The training of the model was carried out through repeated iterations over the whole dataset for many
epochs, in which all images of the training set are shown to the model during one epoch. Batches of 64
images feed into the model to avoid using too much memory and update only the model's parameters at
every step.

 Epochs:The model is trained for 20 epochs. Each epoch consists of a forward pass during which
the model scores the predicted class for every batch followed by a backpropagation step in
which the errors of the model were used for updating the weights in the network.
 Hyperparameters:
o Size of Batch : 64
o Rate of Learning: 0.001
o Epochs : 20

During the training phase of the model, both the training the validation losses were closely monitored to
keep the track of the learning progress of the model. The training loss were decreasing over every epoch
of the dataset during learning phase of the model , whereas, the validation loss were used to check if the

FNN model was generalizing well to the unseen data from the MNIST dataset.
Figure 2 Plot of training and validation loss over epochs

The training and validation loss over the epochs are shown in the above graphs. This visual
representation helps to observe the learning dynamics of the model, which indicates whether the model
is overfitting or generalizing well.

Evaluation
To perform the evaluation of the performance of the model, standard torchmetrics function present in
the PyTorch metrics which includes accuracy, precision, recall, and F1-score will be used. These metrics
provide insights into how well the model is classifying digits, including how many predictions are correct
(accuracy) and how well it identifies each class without making false predictions (precision and recall).

The model was tested on the 10,000 images in the test set, achieving a high accuracy, which is typical for
FNNs applied to MNIST.

 Accuracy: The overall accuracy on the test set was approximately 97%, indicating that the model
performed well on unseen data.
To further evaluate the model's performance, a confusion matrix was generated (Figure 3). This matrix
provides insight into the model’s predictions for each digit class, illustrating how many instances of each
digit were correctly or incorrectly classified.

Figure 3 Confusion Matrix

Results and Discussion


The training process of the model shows a steady improvement vis-à-vis a decrease in both training and
validation losses over the epochs. However, the performance of the model is leveled off toward the end
of training. This shows that the optimal configuration for the architecture of the model and
hyperparameters has been achieved.
Although the model has ackhieved a high accuracy, however, there are some limitations in the design of
model:

 The architecture of this model is relatively simple, with only two hidden layers. Whereas, deeper
architectures, like convolutional neural networks (CNNs), have shown higher levels of
performance over MNIST dataset.
 In our model the tunning of Hyperparameters was limited to the learning rate and batch size.
However, more comprehensive tunings, such as adjustments in the number of hidden units,
epochs, and regularization methods, could further improve the results.

Therefore, in future work, more dropout layers could be added to prevent overfitting of the model.
Similarly, learning rate schedules can be used to further fine-tune the process of learning of the model.

Figure 4 Training and Validation Losses for every epoch

In the left graph the blue dots are representing the training loss for each epoch, which is decreasing. This
indicates that the model is learning. Similarly, blue line is representing the validation loss for each epoch.
This is an indicator that the model is generalizing to unseen data. Whereas, in the right graph the blue
dots show the accuracy for each epoch. Similarly, the blue line shows the validation accuracy for each
epoch.

Figure 5 Training and Validation Losses over Epochs


The above graph shows the training and validation losses of the model over multiple epochs.

Conclusion
This project has successfully implemented a FNN to recognize handwritten numerical digits present in
the MNIST dataset. The network has achieved a high accuracy on both training and test sets, which
shows the effectiveness of FNNs in this classification task. Despite the simplicity of the model, it has
performed well, which demonstrates the potential of neural networks in tasks like image recognition.

The ability of the model to accurately classify digits has broad applications, such as automated data entry
systems and recognition of digitized documents. Future work could involve exploring more complex
architectures and optimization techniques to further enhance performance.
References

1. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). "Gradient-based learning applied to document
recognition." Proceedings of the IEEE, 86(11), 2278-2324.

2. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.

3. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019).

4. Kingma, D. P., & Ba, J. (2015). "Adam: A Method for Stochastic Optimization." 3rd International
Conference on Learning Representations (ICLR).

5. Hinton, G. E., Srivastava, N., & Krizhevsky, A. (2012). "Improving neural networks by preventing co-
adaptation of feature detectors." arXiv preprint arXiv:1207.0580.

6. Deng, L. (2012). "The MNIST database of handwritten digit images for machine learning research."
IEEE Signal Processing Magazine, 29(6), 141–142.

7. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É.

8. TorchMetrics Documentation (2023). TorchMetrics: A Metrics Collection for PyTorch.

You might also like