0% found this document useful (0 votes)

27 views6 pages

Enhancing Neural Network Models For MNIST Digit Recognition

Uploaded by

mohammedshameem197

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views6 pages

Enhancing Neural Network Models For MNIST Digit Recognition

Uploaded by

mohammedshameem197

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no.

1, (2024) 15

Enhancing Neural Network Models for

MNIST Digit Recognition
Vinnie Teh Edward Ding Hong Wai Chew Jin Cheng
School of Computing School of Computing School of Computing
Asia Pacific University of Asia Pacific University of Asia Pacific University of
Technology & Innovation (APU) Technology & Innovation (APU) Technology & Innovation (APU)
Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia
[email protected] [email protected] [email protected]

Liew Jie Yang Zailan Arabee bin Abdul Salam

Jason Chin Yun Loong
School of Computing School of Computing
School of Computing
Asia Pacific University of Asia Pacific University of
Asia Pacific University of
Technology & Innovation (APU) Technology & Innovation (APU)
Technology & Innovation (APU)
Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia
Kuala Lumpur, Malaysia [email protected] [email protected]
[email protected]

Abstract— Using the MNIST dataset, a standard in in this research has travelled through the landscape of several
computer vision, this study tries to improve neural networks' learning rules in hyperparameters guiding the neural network.
digit recognition ability. Focusing on elements such as neural While holding constant parameters across all trials, such as
network architecture, hyperparameters (dropout rate and learning rate and number of hidden layer, we methodically
training epochs), and their effect on digit identification, it modified the dropout rate and number of training epochs to
examines a variety of methodologies and strategies. The study investigate their effect on the neural network's performance.
identifies hyperparameter settings that significantly increase Furthermore, the journal article is, in essence, a
accuracy. Results indicate that the model with the highest comprehensive investigation of the MNIST digit recognition
accuracy, ranging from 80.96% to 98.67%, used the Adam challenge with an uncompromising dedication to excellence.
optimizer, four hidden layers with Dropout, 0.1 learning rate,
The delicate coordination of hyperparameter optimization,
and 23 epochs. These discoveries improve MNIST digit
advanced training techniques, and neural network
recognition and have wider ramifications, including those for
document analysis and financial transactions. architecture form the core of our methodology. With
implications across a wide range of areas, from automated
Keywords— Multilayer Perceptron, Training Epochs, document analysis to the precision-driven world of financial
Dropout Rate. Overfitting, Underfitting transactions, our findings, which have been hard documented
and analytically tested, are poised to redefine the state of the
I. INTRODUCTION art in MNIST digit recognition.
Recognizing handwritten digits remains a canonical
difficulty and a crucial milestone in the constantly changing II. LITERATURE REVIEW
fields of deep learning and machine vision. The MNIST
dataset, an extended collection of 28x28 pixel grayscale In the context of digit recognition, the first paper proposes
photographs showing handwritten numbers, has been used to the study of using a Random Forest Classifier (RFC) for digit
test the effectiveness of different machine-learning patterns. classification, it focuses on elevating the accuracy of results
This journal paper's main goal is to explore, innovate, and with different hyperparameters based on the performance
improve the performance of neural network models for criteria of stratified-k fold cross-validation. The study was
MNIST digit recognition by utilizing a comprehensive conducted from three key aspects which are the selection of
combination of methods and methodologies. materials, implementation of algorithms and parameter
modification.
The MNIST dataset—often referred to as the "Hello
World" of deep learning—represents digit recognition The random forest classifier was implemented using
problems in the field of machine vision symbolically. This Python programming on a device which uses Windows 11
dataset, which consists of 70,000 compiled samples of pro with 12th Gen Intel Core i5-12600 processor and 16gb of
handwritten digits from 0 to 9, has evolved into the RAM. Testing was performed by using a handwritten digit
benchmark of choice for assessing the performance of images dataset that contains 1797 images with 8x8 pixels
different machine learning techniques, particularly neural grayscale digit. Two algorithms were employed in these
networks. papers which are decision tree and random forest. Decision
trees act as a foundation of building blocks in more complex
The following factors merit careful consideration in the classification, while random forest used the majority voting
pursuit of this goal: the architectural complexities, which of decision trees to make predictions and addressed
include the fine-grained control over learning dynamics overfitting effectively compared to a single decision tree. The
through the wise selection of dropout rates and training study by (Ngan et al., 2023) used one fixed parameter of
epochs. The Multilayer Perceptron (MLP) and feedforward random_state and three hyperparameters of n_estimators,
neural network with numerous hidden layers, serves as the max_depth, max_features by modifying four different values
fundamental tenet of this research. Based on the complexity on each to optimize the random forest classifier’s
of its architecture, this decision enables us to tap into neural performance in digit classification. While random_state
networks' latent representational power for improved affects the producibility of the model, which is useful for
discriminative ability. comparing the effects of other attributes, n_estimators affects
Through algorithmic implementation, the expedition taken the diversity and accuracy of the model, which in charge of
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1, (2024) 16

capture more patterns in the data, however might also leads demonstrated its potential for further investigation of digit
to overfitting if the value is set to be too high, max_depth recognition.
affects its generalization of the model as higher value brings III. MATERIALS AND METHODS
more accuracy and creates deeper decision tree but also
leading to the risk of overfitting at the same time, A. Selection of Materials
max_features provides more features to choose when 1) Source code: The Python programming language,
building decision tree as lower value of this can reduce renowned for deep learning research, has been used for the
overfitting but reduces accuracy at the same time. Stratified study. Besides, Python has a large selection of tools and
K-folds cross-validation that divides the dataset into a more frameworks made particularly for developing neural
balanced subsets is being used as its performance criteria to networks and machine learning algorithms. The Pandas
ensure a fair evaluation of the model, addressing potential library handled data manipulation, which was used to load
class imbalance issues. and handle datasets, rendering it simpler to deal with the
The result of testing the three parameters with different MNIST dataset and carry out data preprocessing while
values resulting the importance of hyperparameter NumPy enabled numerical computations to carry out
adjustment, especially the sole setting of each normalization. Keras, which has TensorFlow as its backend
hyperparameters as 125 in n_estimators, max_dept to and benefits from the fast calculation capabilities of
‘None’ and max_features to 4 got the higher accuracy. It is TensorFlow, allowed for the flexible development of neural
worth noting that these parameters weren’t the highest nor network models to evaluate architectural alternatives for digit
lowest value that were used for testing, highlighting that classification.
simply using higher or lower value doesn’t guarantee 2) Machine: A computer with Windows 11 (Version 22H2)
improved accuracy. Interestingly, even when these operating system equipped with AMD Ryzen 7 Pro 3700U w
parameters that achieved the highest accuracy individually processor and 16 GB of installed RAM was used to carry out
were combined, they did not yield the highest overall this study.
accuracy, showcasing that each attribute has a unique impact 3) Dataset: The MNIST dataset has been used in this study
on the model’s performance. In summary, hyperparameter because this dataset serves as a model for several image
adjustment significantly enhances digit recognition accuracy classification schemes, especially handwritten digit
using Random Forest Classification. This approach recognition. According to Daniel (2022), MNIST was created
contributes to making a model to achieve the best result from an even bigger dataset, the NIST Special Database 19,
without being too time-consuming or costly, making it a that comprises handwritten uppercase and lowercase
suitable approach for newcomers in machine learning characters in addition to numbers. Besides, MNIST
practitioners. comprises 60000 handwritten digits for training the machine
In parallel, the literature review by Lead et al. (2021) learning model, and 10000 handwritten digits for model
focuses on finding the most accurate architecture for the task testing. Each digit in the MNIST is retained as a 28x28 pixel
by investigating the handwritten digit recognition based on grayscale image, and each data has 784 features.
various pre-trained deep learning models. Five pre-trained
models from PyTorch were being applied to the study which B. Selection of Methods
are from GoogLeNet, MobileNet v2, ResNet-50, ResNeXt-50 1) Data Loading: The training and testing datasets (train.csv
and Wide ResNet-50 using a MINST dataset. and test.csv) are loaded using Pandas.
After pre-processed with the data, Lead et.al (Lead et al., 2) Data Preprocessing and Transformation: The label will be
2021) applied 70000 28x28 pixels of grayscales images separated from the feature data in the training dataset. 20%
dataset that contained labelled images from 0 to 9 in the of the data in the training dataset will be randomly selected
testing part. The dataset was split into two datasets, 60000 for to implement cross-validation to avoid overfitting. The
training and 10000 for testing images on digit classification. remaining 80% of the data will be the training sets to train
During the training process, CNN Resnet-18 was used for its the neural network model.
training architecture and algorithm, it processes input images, 3) Neural Network Architecture Implementation: The neural
predicts the outcomes, and then learns to improve from the network's input layer includes 784 units (pixel units in a
feedback by comparing the result with the actual one. The 28x28 image). For multi-class classification, the output layer
evaluation involved using confusion matrices to analyze the has ten units, which are digits 0-9.
top 9 loss images and found that the confusion patterns are
quite similar with other digits. The result of accuracy and 4) Model Training: Several parameters will be modified to
training time is being considered while evaluating these train the Neural Network model and improve the
models and it returned that the Wide ResNet-50 achieved the performance of the model. To reduce an identified loss
least error percentage on Top-1 error (0.5278%) and Top-5 function, the internal model parameters which is weights and
error (0.0079%). To be noted, MobileNet v2 also achieved a biases must be adjusted repeatedly during the training phase.
commendable top-1 error rate of 0.5754% and top-5 error The training data will be tested, and the validation data will
(0.0079%) in just 498 seconds. The transition to CIFAR-10 be fitted into the model constructed.
dataset with same configuration revealed that ResNeXt-50 5) Model Evaluation: The accuracy, loss, validation loss,
achieved the least error rate on Top-1 error(14.0460%) and validation accuracy and times per epoch will be observed.
Top-5 error(0.5300%). To be noted, MobileNet v2 performed Accuracy: The percentage of accurately predicted labels in
its versatility by achieving a top-1 error rate of 15.2780% and the test dataset.
top-5 rate of 0.5380% with a smaller model size and faster Loss: The percentage difference between the predicted label
training.
Based on the outcome, it emphasizes how neural
network architecture for digit identification is always
evolving. The objective of study which is to enhance the
neural network for MNIST digit recognition was aligned with
Mobile v2's consistent performance across datasets which
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1, (2024) 17

and the actual label in the test dataset. compute the loss function in neural networks, which
determines whether learning process adjustments are
Validation Accuracy: The percentage of accurately
predicted labels with the target label in the validation necessary.
dataset.
Validation Loss: The percentage difference between the
predicted label and the target label in the validation dataset.
Fig. 3. Cross Entropy Loss (Turing,2023).
Times per Epoch: Amount of time that is required to
complete each Epoch in seconds.
6) Prediction: On the test dataset, predictions are made using 2) Multilayer Perceptron: A multilayer perceptron refers to an
the model that performs the best. The trained neural network Artificial Neural Network that comprises input, output, and
is used to make predictions on a different test dataset, and multiple hidden layers with numerous neurons to learn more
the predicted labels and image IDs will be saved. complex patterns. “Perceptron” means the capacity to see and
comprehend images, which mimics human perception
(Carolina, 2021). However, the single-neuron perceptron
C. Algorithm Implementation cannot analyze non-linear data. Hence, this issue was resolved
Neural networks are systems of interlinked neurons that upon the introduction of the Multilayer Perceptron. Multilayer
resemble the layers of the human brain. Computers may use perceptron neurons can employ any activation function, as
this to build an adaptive framework that constantly learns opposed to Perceptron neurons, which demand an activation
from errors. function that imposes a threshold. The Multilayer Perceptron
1) Feedforward Neural Network: A Feedforward Neural continually adjusts the network weights and lowers the cost
Network is an Artificial Neural Network that does not have function using backpropagation as its learning approach. The
looping nodes. Input data is fed into the network during the Multilayer Perceptron determines the Mean Squared Error
forward pass, and computations pass via the hidden layers gradient as seen in Fig. 5 for every input and output set
to produce an output in output nodes. The network's throughout every cycle after computing the weighted sums
prediction or categorization is represented in the output. and applying them through all layers. The weights of the first
Feedforward neural networks are trained by employing hidden layer are subsequently modified using this gradient
supervised learning. There are several processes the neural result, thereby propagating it back to the neural network's
network performs to compute the data. Firstly, input is
starting point.
multiplied by the given weight values. For example, x1*w1
= 2*3 = 6. The demands for the signal intensity of the
neuron are established by weights. The impact of input data
on the result will be determined by weight value. Secondly,
add the bias value to the product value in the prior phase.
For example, 6+b1 = 6+1 = 7. Thirdly, the weighted sum
will be calculated. Fourthly, by passing the weighted sum
to an activation function, the corresponding weighted sum
is converted into an output stream (Vihar, 2022). Fig.1
illustrates how the Feedforward Neural Network can learn
to categorize the handwritten digits in the MNIST dataset
Fig. 4. Cross Entropy Loss (Carolina, 2021).
by following these steps.

Fig. 5. Gradient of Mean Squared Error (Carolina, 2021).

3) Optimizer Algorithm: Adam Optimizer

Adam is an approach for calculating adaptive learning rates
that applies individual learning rates to different parameters.
This is accomplished by gauging the gradient's first and
Fig. 1. Recognizing Digits using Feedforward Neural Network. second moments, which are then used to modify the learning
(Rubentak,2023).
rate for each weight.

A feedforward neural network's mean square error cost

function shown in Fig. 2 is a smooth metric used to adjust
weights and biases, allowing for incremental improvements
for better performance with minimal effect on classified
data points. Fig. 6. Formula to estimate the moments adjust for bias (Vitaly, 2018).

Fig. 2. Mean Square Error Cost Function (Turing,2023).

Cross Entropy Loss shown in Fig. 3 and Fig. 4 is used to

Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1 (2024) 18
18

D. Purpose
The main purpose of this study is to improve the Neural
Network Model to accurately identify handwritten digits.
Several neural network models are trained using various
modified parameters in the Neural Network algorithm to
evaluate the effectiveness of various models and determine
which parameter values are best in digit recognition.

E. Parameters
The following parameters should be kept constant
throughout the study to provide a fair evaluation of each
hyperparameter, listed in Fig.7:

Constant Parameter
Parameter Value
SoftMax– output layer
Activation Function ReLU (Rectified Linear Unit)–
hidden layers
Fig. 8. Code to modify parameters.
Hidden layer 4
Fig. 8 is a sample of the code used to modify the two
Learning rate 0.1 parameters chosen. The first highlighted block is where the
Batch size 100 dropout rate is changed while the second highlighted block
Optimizer Adam Optimizer is where the number of epochs is changed. Although the
Fig. 7. Table of Constant Parameters. dropout rate can be modified separately for each layer, the
The following parameters will be examined and modified same dropout rate is applied for all layers in order to get
within this study. more consistent results for comparison and optimization.
The rest of the code contains set parameters such as the
1) Number of Training Epochs: The training epochs number of neurons in the input layer, hidden layers, and
represent the training iterations in the entire training output layer. It also has the activation function used for each
dataset. It is important to strike a balance between enabling layer, the learning rate, batch size, and optimizer used. The
the model to converge to an accurate solution and avoiding purpose of listing out these set parameters line by line is to
underfitting or overfitting. Cross-validation and tracking give a picture that defines everything clearly.
validation accuracy processes should be performed to
determine the optimal value of training.
2) Dropout Rate: Dropout is an approach applied to B. Results
minimize overfitting during the training process for a To find the optimal solution, parameters such as the number
neural network. It drives the network to acquire more of epochs and dropout rate have been experimented with in
enhanced features instead of solely being dependent on a this project. The main results that have been compared based
particular group of neurons. The dropout rate is the on the training of the models are the accuracy and validation
hyperparameter specifying the probability that a neuron accuracy after the entire training is over. Both these results
may be deactivated during training. The dropout rate has a are utilized to ensure that the data does not go wrong, and a
value range between 0 and 1, with 0 denoting no dropout, more detailed analysis is obtained. The average time taken
indicating every neuron is active, and 1 denoting entirely per epoch of the models is roughly the same, showing a very
dropout, indicating all neurons are deactivated. consistent time that is likely attributed to the performance of
Google Colab.
IV. RESULT AND DISCUSSION Number of Epochs
A. Discussion on Implementation
Generalization describes a model's ability to adapt and
appropriately respond to previously unobserved, new data.
To avoid overfitting and underfitting, it is significant to
achieve the ideal balance between the complexity of the Fig. 9. Table of Results for Model trained with different number of epochs.
model and adaptability (Evelyn,2023). The project aimed to In training neural network models, the number of epochs is a
configure a model that strikes the right balance between crucial modified parameter. Research shows that when
overfitting and underfitting. Thus, the model can be trained trained for 23 epochs, the model achieves its best average
to identify handwritten digits more accurately using the accuracy of 98.77%. Fig 9. Shows the outcome is better than
MNIST dataset. The original code is from Kaggle. Blocks the default value of 20 epochs and the other number of
of code that adjust the parameters have been developed to epochs, demonstrating that more training improves the
experiment on while keeping the others constant to see the model’s overall accuracy. Underfitting and overfitting
results of the specific parameter clearly. The use of Google principles are implemented here as the model doesn’t learn
Colab makes this process easier, where the code can be complicated patterns in the data when the number of epochs
modified by multiple people at the same time. Any changes is too low (for example: 14 or 17 epochs). On the other
are easily saved and run in real time. hand, the model is more likely to overfit if the number of
epochs is too high (for example: 26 epochs). When a model
learns training data excessively or deficiently, it leads to
poor accuracy and begins to pick up unimportant features
that cause it to perform poor data in recognizing digit.
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1 (2024) 19
18

rate is optimal when paired with 23 epochs, proving that the

Dropout Rate greatest generalization performance is achieved with this
configuration and adequately adapting to the training
dataset while avoiding overfitting. This optimal model is
efficiently trained across a reasonable number of epochs,
while preventing high dropout that may hinder learning.
Fig. 10. Table of Results for Model trained with different dropout rates.
0.1 dropout rate with 14-26 epochs

Fig. 13. Table of Results for Model trained with different number of epochs
with 0.1 dropout rate.
Fig. 13 shows that the average accuracy is consistent across
the 5 different training epochs. All the epochs' accuracy and
validation scores are high, showing that the model is
Fig. 11. Line Chart (Final Accuracy vs Dropout Rate). effective overall. This is indicative of the robustness and
The dropout rate determines the likelihood of neurons efficiency of the model with a 0.1 dropout rate. However, 20
becoming inactive during each training iteration. Fig 10 and epochs and 26 epochs both achieve slightly high accuracy,
11 shows higher accuracy and validation are generated by therefore 23 epochs are taken as the optimum solution due to
lower dropout rates, such as 0.1 and 0.2, while greater it being the average of them. This is to strike a balance to
dropout rates, such as 0.4 and 0.5. Based on average prevent overfitting, which could happen with more epochs,
accuracy, the ideal dropout rate is 0.1, which results in an while ensuring the model has enough training iterations to
average accuracy of 98.80%. Compared to the default rate achieve a good result.
of 0.3, which has an average accuracy of 98.25%. The From both results obtained from cross tuning, the second
lowest dropout rate of 0.1 results in a maximum validation model shows that when a dropout rate of 0.1 is used, the
accuracy of 98.15%. That represents a significant model's performance is unaffected by the number of epochs.
improvement, the same as the validation accuracy. This indicates that the number of epochs does not affect the
Deactivation takes part in the deactivation of neurons results of the model as significantly as the dropout rate. This
during training. Lower dropout rates such as 0.1 and 0.2 remarkable observation has proved that dropout rates serve
keep more neurons active, allowing the model to retain and as a more significant modified parameter than the number of
utilize a broader range of learned features. In contrast, epochs impacting the performance or accuracy of the model.
higher dropout rates (0.4 and 0.5) deactivate a significant The extensiveness of the model is directly affected by the
portion of neurons. The dropout rates must be balanced dropout rate. The model may be unable to retain the training
properly for a model to be generalized. The best option is to data (which can cause the risk of overfitting if there is
set a dropout rate of 0.1. to avoid overfitting and enhance excessive training data) if there is a greater dropout rate,
the network's capability to generalize new data. In which establishes more randomness and regularization. On
comparison to the default rate of 0.3, this rate results in a the other hand, a model can retain more data and complexity
significant improvement in both average accuracy and when the dropout rate is lower.
validation accuracy.
All in all, overfitting or underfitting can be successfully
regulated by the dropout rate, irrespective of how many
Cross Tuning numbers of epochs there are.
After obtaining the best value to use for the number of Both tables have confirmed that a 0.1 dropout rate is the
epochs and dropout rate separately, the best value of the most optimal solution when paired with 23 epochs. With this
number of epochs will be tested with multiple different in consideration, the model K will be chosen as it has the
values of dropout rate and vice versa to confirm the chosen highest average accuracy. The Digit Recognizer will be able
values for both modified parameters match one another to recognize the handwritten digit more accurately and
well. This is because certain over- or underfitting problems possibly faster too.
that might take place can be prevented when modified
parameters are considered individually. V. CONCLUSION
In this paper, it is demonstrated how the performance of
23 epochs with a 0.1-0.5 dropout rate
neural network models for digit recognition to MNIST
datasets can be improved by utilizing a comprehensive
combination of methods and methodologies. From
experimental results, it is found that the model K uses Adam
Optimizer, 4 hidden layers with Dropout Layer, 0.1 learning
Fig. 12. Table of Results for Model trained with different dropout rates rate, and 23 epochs get an average accuracy of 0.98715
with 23 epochs. compared with other results this is the highest accuracy. The
Fig. 12 shows a significant amount of fluctuation in average time taken per epoch for model K is 3.3 seconds,
average accuracy when the dropout rate is changed. This which is a very short time, indicating a model that can be
indicates that having the right dropout rate can greatly trained very fast and has high performance. This is especially
affect the effectiveness of the model and its results. Based crucial as performance speed is highly valued in this
on the average accuracy, it is confirmed that a 0.1 dropout technological era. Compared to the other results, the result of
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1 (2024) 20
18

model K shows that the performance of neural network generalization-in-machine-learning/

models has been improved to recognize MNIST datasets by
changing the parameters of the model. Finally, it is possible Amananandrai. (2023). 10 famous Machine Learning
to implement training with different optimizers such as Optimizers. https://dev.to/amananandrai/10-famous-
Adamax and SMORMS3 to further improve the accuracy of machine-learning-optimizers-1e22
the digit recognition (Amananandrai,2023).

REFERENCES
Ngan, J.F., Keong, Y.Q., Raymond, J.M.C., Wong, K.W.,
Gan J.X. (2023). Digital Classification using Random
Forest Classifier. Journal of Applied Technology and
Innovation,7(3),1-6.
http://jati.sites.apiit.edu.my/files/2023/07/Volume7_Iss
ue3_Paper11_2023.pdf
Lead, M.S., Brennan, B.C.C., Gwo, Y. T. & Hui, T.C.
(2021). MNIST handwritten digit recognition with
different CNN architectures. Journal of Applied
Technology and Innovation,5(1),1-4.
https://dif7uuh3zqcps.cloudfront.net/wp-
content/uploads/sites/11/2021/01/17192613/MNIST-
Handwritten-Digit-Recognition-with-Different-CNN-
Architectures.pdf
Ng,B.L. (2017). MNIST Dataset: Digit Recognizer.
https://www.kaggle.com/code/ngbolin/mnist-dataset-
digit-recognizer/notebook.
Daniel.E. (2022). MNIST — Dataset of Handwritten Digits.
https://medium.com/mlearning-ai/mnist-dataset-of-
handwritten-digits-
f8cf28edafe#:~:text=MNIST%20is%20a%20widely%2
0used,standard%20benchmark%20for%20classification
%20tasks.
Carolina,B. (2021). Multilayer Perceptron Explained with a
Real-Life Example and Python Code: Sentiment
Analysis. https://towardsdatascience.com/multilayer-
perceptron-explained-with-a-real-life-example-and-
python-code-sentiment-analysis-
cb408ee93141#:~:text=A%20Multilayer%20Perceptron
%20has%20input,use%20any%20arbitrary%20activatio
n%20function
Turing. (2023). Multilayer Perceptron Explained with a
Real-Life Example and Python Code: Sentiment
Analysis. https://www.turing.com/kb/mathematical-
formulation-of-feed-forward-neural-network
Vitaly, B.(2018). Adam — latest trends in deep learning
optimization. https://towardsdatascience.com/adam-
latest-trends-in-deep-learning-optimization-
6be9a291375c
Vihar. (2023). Feedforward Neural Networks: A Quick
Primer for Deep Learning. https://builtin.com/data-
science/feedforward-neural-network-intro
Evelyn, M.(2023). What Is Generalization In Machine
Learning?. https://magnimindacademy.com/blog/what-
is-generalization-in-machine-learning/
Rubentak. (2023). Understanding Feed Forward Neural
Networks with MNIST Dataset.
https://magnimindacademy.com/blog/what-is-

Mnist Handwritten Digit Classification
No ratings yet
Mnist Handwritten Digit Classification
26 pages
Artificial Intelligence & Neural Networks Unit-5 Basics of NN
50% (2)
Artificial Intelligence & Neural Networks Unit-5 Basics of NN
16 pages
Pattern Recognition
No ratings yet
Pattern Recognition
18 pages
Deep Learning - Handwritten Digit Recognition Using Python REVIEW 0
No ratings yet
Deep Learning - Handwritten Digit Recognition Using Python REVIEW 0
16 pages
Arun KRS
No ratings yet
Arun KRS
7 pages
Assignment 2, Machine Learning
No ratings yet
Assignment 2, Machine Learning
5 pages
V Minor
No ratings yet
V Minor
18 pages
Report On Handwritten Digit Recognition Using A Feedforward Neural Network
No ratings yet
Report On Handwritten Digit Recognition Using A Feedforward Neural Network
8 pages
Base Paper
No ratings yet
Base Paper
5 pages
Real Time Handwritten Digit Recognition Using Neural Networks For Accurate Marks Entry On Examination Portal
No ratings yet
Real Time Handwritten Digit Recognition Using Neural Networks For Accurate Marks Entry On Examination Portal
7 pages
ManishGiri G 2018465 34
No ratings yet
ManishGiri G 2018465 34
12 pages
MNIST Handwritten Digit Recognition With Different CNN Architectures
No ratings yet
MNIST Handwritten Digit Recognition With Different CNN Architectures
4 pages
Analogy Between CNN and RNN Using MNIST Dataset: Prof. Rathi R Assistant Professor Sr. Grade 1
No ratings yet
Analogy Between CNN and RNN Using MNIST Dataset: Prof. Rathi R Assistant Professor Sr. Grade 1
21 pages
Hand Written Digit Recognition
No ratings yet
Hand Written Digit Recognition
5 pages
Paper 2
No ratings yet
Paper 2
4 pages
Research Papers
No ratings yet
Research Papers
16 pages
AI Mini Project Report
No ratings yet
AI Mini Project Report
7 pages
Ijirt162606 Paper
No ratings yet
Ijirt162606 Paper
4 pages
Handwritten Digit Recognition Using Convolutional Neural Networks
No ratings yet
Handwritten Digit Recognition Using Convolutional Neural Networks
6 pages
2nd Research
No ratings yet
2nd Research
7 pages
Layer 2
No ratings yet
Layer 2
8 pages
Handwritten Digit Recognition
No ratings yet
Handwritten Digit Recognition
4 pages
Handwrittendigitrecognitionppt1 221115162428 68e03722
No ratings yet
Handwrittendigitrecognitionppt1 221115162428 68e03722
11 pages
Synopsis PDF
No ratings yet
Synopsis PDF
2 pages
MN1
No ratings yet
MN1
20 pages
Handwritten Digit Recognition Using Machine Learning
No ratings yet
Handwritten Digit Recognition Using Machine Learning
5 pages
Handwritten Digit Recognition Using CNN
100% (1)
Handwritten Digit Recognition Using CNN
6 pages
Handwritten Digit Recognition Using ML&DL
No ratings yet
Handwritten Digit Recognition Using ML&DL
3 pages
An In-Depth Deep Learning Approach To Handwritten Digits Recognition
No ratings yet
An In-Depth Deep Learning Approach To Handwritten Digits Recognition
7 pages
JOCC Volume 2 Issue 1 Page 9 19
No ratings yet
JOCC Volume 2 Issue 1 Page 9 19
11 pages
1st Research
No ratings yet
1st Research
13 pages
Handwritten Digit Recognition Using Quantum Convolution Neural Network
No ratings yet
Handwritten Digit Recognition Using Quantum Convolution Neural Network
9 pages
Convolutional Neural Network CNN For Image Detection and Recognition
No ratings yet
Convolutional Neural Network CNN For Image Detection and Recognition
5 pages
Handwritten Digit Pattern Recognition by Hybrid of Convolutional Neural Network (CNN) and Bosting Classifiers
No ratings yet
Handwritten Digit Pattern Recognition by Hybrid of Convolutional Neural Network (CNN) and Bosting Classifiers
14 pages
JETIR2303661
No ratings yet
JETIR2303661
8 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Handwritten Digit Recognition Systems
No ratings yet
Handwritten Digit Recognition Systems
12 pages
Shaik Muneer Roll no:22KT1A4257 3rd Year (AI&ML) PSCMR College of Engineering and Technology
No ratings yet
Shaik Muneer Roll no:22KT1A4257 3rd Year (AI&ML) PSCMR College of Engineering and Technology
20 pages
Esdcs Project
No ratings yet
Esdcs Project
5 pages
Classifying Hand-Written Digits Using Neural Network: A Project Report On
No ratings yet
Classifying Hand-Written Digits Using Neural Network: A Project Report On
19 pages
Convolutional Neural Network CNN For Ima
No ratings yet
Convolutional Neural Network CNN For Ima
5 pages
Digit Recognition Using Convolutional Neural Networks
No ratings yet
Digit Recognition Using Convolutional Neural Networks
4 pages
A Comparative Study On Handwriting Digit Recognition Using Neural Networks
No ratings yet
A Comparative Study On Handwriting Digit Recognition Using Neural Networks
5 pages
Classifying Hand-Written Digits Using Neural Network
No ratings yet
Classifying Hand-Written Digits Using Neural Network
21 pages
Project Report
No ratings yet
Project Report
44 pages
Handwritten Digit Recognition Project Paper
No ratings yet
Handwritten Digit Recognition Project Paper
15 pages
Digit Main
No ratings yet
Digit Main
30 pages
Recearch Paper
No ratings yet
Recearch Paper
8 pages
Mini Project Own
No ratings yet
Mini Project Own
6 pages
Handwritten Digit Recognition
No ratings yet
Handwritten Digit Recognition
19 pages
Convolutional Neural Network (CNN) For Image Detection and Recognition
No ratings yet
Convolutional Neural Network (CNN) For Image Detection and Recognition
6 pages
Fin Irjmets1646225071
No ratings yet
Fin Irjmets1646225071
6 pages
SVMBasedRealTimeHand WrittenDigitRecognitionSystem
No ratings yet
SVMBasedRealTimeHand WrittenDigitRecognitionSystem
7 pages
Dbms
No ratings yet
Dbms
14 pages
Lab DigitRecognitionMINST
No ratings yet
Lab DigitRecognitionMINST
10 pages
Mnist Classification Report
No ratings yet
Mnist Classification Report
15 pages
Hand Digit Recognition Using CNN & Ann: Upma Jain, Vipashi Kansal, Tanusha Mittal, Ms Sonali Gupta
No ratings yet
Hand Digit Recognition Using CNN & Ann: Upma Jain, Vipashi Kansal, Tanusha Mittal, Ms Sonali Gupta
10 pages
Updated 2nd Synopsis
No ratings yet
Updated 2nd Synopsis
33 pages
SVMBasedRealTimeHand WrittenDigitRecognitionSystem
No ratings yet
SVMBasedRealTimeHand WrittenDigitRecognitionSystem
7 pages
Recognition of Handwritten Digit Using CNN
No ratings yet
Recognition of Handwritten Digit Using CNN
77 pages
Unique Research Paper
No ratings yet
Unique Research Paper
5 pages
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
No ratings yet
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
1 page
DeepSeek图解10页
No ratings yet
DeepSeek图解10页
11 pages
OOP Principles UML Class Diagrams: Object-Oriented Software Development SE 450 - Winter 2021
No ratings yet
OOP Principles UML Class Diagrams: Object-Oriented Software Development SE 450 - Winter 2021
74 pages
Logistic Distribution
No ratings yet
Logistic Distribution
6 pages
VAE Vs GAN
100% (1)
VAE Vs GAN
3 pages
NNDL Mid - 2 Exam IMP Questions
No ratings yet
NNDL Mid - 2 Exam IMP Questions
1 page
Comparison of Arima and Exponential Smoothing Holt-Winters Methods For Forecasting CPI in The Tegal City, Central
No ratings yet
Comparison of Arima and Exponential Smoothing Holt-Winters Methods For Forecasting CPI in The Tegal City, Central
10 pages
NN Mdu Previousyears
No ratings yet
NN Mdu Previousyears
10 pages
10 Forward - Backward Algorithm
No ratings yet
10 Forward - Backward Algorithm
21 pages
1 Finite Autometa
No ratings yet
1 Finite Autometa
21 pages
CSC312 Automata Theory: Chapter # 5 by Cohen
No ratings yet
CSC312 Automata Theory: Chapter # 5 by Cohen
11 pages
Autoregressive Integrated Moving Average, Neural Network, Dan Adaptive Splines Threshold Autoregression
No ratings yet
Autoregressive Integrated Moving Average, Neural Network, Dan Adaptive Splines Threshold Autoregression
19 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
Rohan Datla Week 9 Homework 6
No ratings yet
Rohan Datla Week 9 Homework 6
3 pages
Penerapan Konsep Finite State Automata Dalam Proses Pendaftaran Kelas Kursus Bahasa Inggris Pada Tempat Kursus
No ratings yet
Penerapan Konsep Finite State Automata Dalam Proses Pendaftaran Kelas Kursus Bahasa Inggris Pada Tempat Kursus
6 pages
Lesson 9
No ratings yet
Lesson 9
15 pages
Chapter II Build A Neural Network Step by Step
No ratings yet
Chapter II Build A Neural Network Step by Step
31 pages
Game of Life Cellular Automata
No ratings yet
Game of Life Cellular Automata
10 pages
Syllabus Theory of Computation (CSC257)
No ratings yet
Syllabus Theory of Computation (CSC257)
6 pages
Back Propagation Example
No ratings yet
Back Propagation Example
3 pages
Conditional Distribution
No ratings yet
Conditional Distribution
4 pages
Adaline/Madaline
100% (8)
Adaline/Madaline
38 pages
Polymorphism
No ratings yet
Polymorphism
11 pages
Stella Maris Test 1
No ratings yet
Stella Maris Test 1
3 pages
Week 4 Notes
No ratings yet
Week 4 Notes
18 pages
Ebook Deep Learning Objective Type Questions
No ratings yet
Ebook Deep Learning Objective Type Questions
102 pages
Theory of Automata - Lecture - 1
No ratings yet
Theory of Automata - Lecture - 1
36 pages
RNN Neural Network
No ratings yet
RNN Neural Network
23 pages
Mutiple Choice Questions
No ratings yet
Mutiple Choice Questions
22 pages

Enhancing Neural Network Models For MNIST Digit Recognition

Uploaded by

Enhancing Neural Network Models For MNIST Digit Recognition

Uploaded by

Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no.

Enhancing Neural Network Models for

Liew Jie Yang Zailan Arabee bin Abdul Salam

Fig. 5. Gradient of Mean Squared Error (Carolina, 2021).

3) Optimizer Algorithm: Adam Optimizer

A feedforward neural network's mean square error cost

Fig. 2. Mean Square Error Cost Function (Turing,2023).

Cross Entropy Loss shown in Fig. 3 and Fig. 4 is used to

rate is optimal when paired with 23 epochs, proving that the

model K shows that the performance of neural network generalization-in-machine-learning/

You might also like