0% found this document useful (0 votes)
27 views6 pages

Enhancing Neural Network Models For MNIST Digit Recognition

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views6 pages

Enhancing Neural Network Models For MNIST Digit Recognition

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no.

1, (2024) 15

Enhancing Neural Network Models for


MNIST Digit Recognition
Vinnie Teh Edward Ding Hong Wai Chew Jin Cheng
School of Computing School of Computing School of Computing
Asia Pacific University of Asia Pacific University of Asia Pacific University of
Technology & Innovation (APU) Technology & Innovation (APU) Technology & Innovation (APU)
Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia
[email protected] [email protected] [email protected]

Liew Jie Yang Zailan Arabee bin Abdul Salam


Jason Chin Yun Loong
School of Computing School of Computing
School of Computing
Asia Pacific University of Asia Pacific University of
Asia Pacific University of
Technology & Innovation (APU) Technology & Innovation (APU)
Technology & Innovation (APU)
Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia
Kuala Lumpur, Malaysia [email protected] [email protected]
[email protected]

Abstract— Using the MNIST dataset, a standard in in this research has travelled through the landscape of several
computer vision, this study tries to improve neural networks' learning rules in hyperparameters guiding the neural network.
digit recognition ability. Focusing on elements such as neural While holding constant parameters across all trials, such as
network architecture, hyperparameters (dropout rate and learning rate and number of hidden layer, we methodically
training epochs), and their effect on digit identification, it modified the dropout rate and number of training epochs to
examines a variety of methodologies and strategies. The study investigate their effect on the neural network's performance.
identifies hyperparameter settings that significantly increase Furthermore, the journal article is, in essence, a
accuracy. Results indicate that the model with the highest comprehensive investigation of the MNIST digit recognition
accuracy, ranging from 80.96% to 98.67%, used the Adam challenge with an uncompromising dedication to excellence.
optimizer, four hidden layers with Dropout, 0.1 learning rate,
The delicate coordination of hyperparameter optimization,
and 23 epochs. These discoveries improve MNIST digit
advanced training techniques, and neural network
recognition and have wider ramifications, including those for
document analysis and financial transactions. architecture form the core of our methodology. With
implications across a wide range of areas, from automated
Keywords— Multilayer Perceptron, Training Epochs, document analysis to the precision-driven world of financial
Dropout Rate. Overfitting, Underfitting transactions, our findings, which have been hard documented
and analytically tested, are poised to redefine the state of the
I. INTRODUCTION art in MNIST digit recognition.
Recognizing handwritten digits remains a canonical
difficulty and a crucial milestone in the constantly changing II. LITERATURE REVIEW
fields of deep learning and machine vision. The MNIST
dataset, an extended collection of 28x28 pixel grayscale In the context of digit recognition, the first paper proposes
photographs showing handwritten numbers, has been used to the study of using a Random Forest Classifier (RFC) for digit
test the effectiveness of different machine-learning patterns. classification, it focuses on elevating the accuracy of results
This journal paper's main goal is to explore, innovate, and with different hyperparameters based on the performance
improve the performance of neural network models for criteria of stratified-k fold cross-validation. The study was
MNIST digit recognition by utilizing a comprehensive conducted from three key aspects which are the selection of
combination of methods and methodologies. materials, implementation of algorithms and parameter
modification.
The MNIST dataset—often referred to as the "Hello
World" of deep learning—represents digit recognition The random forest classifier was implemented using
problems in the field of machine vision symbolically. This Python programming on a device which uses Windows 11
dataset, which consists of 70,000 compiled samples of pro with 12th Gen Intel Core i5-12600 processor and 16gb of
handwritten digits from 0 to 9, has evolved into the RAM. Testing was performed by using a handwritten digit
benchmark of choice for assessing the performance of images dataset that contains 1797 images with 8x8 pixels
different machine learning techniques, particularly neural grayscale digit. Two algorithms were employed in these
networks. papers which are decision tree and random forest. Decision
trees act as a foundation of building blocks in more complex
The following factors merit careful consideration in the classification, while random forest used the majority voting
pursuit of this goal: the architectural complexities, which of decision trees to make predictions and addressed
include the fine-grained control over learning dynamics overfitting effectively compared to a single decision tree. The
through the wise selection of dropout rates and training study by (Ngan et al., 2023) used one fixed parameter of
epochs. The Multilayer Perceptron (MLP) and feedforward random_state and three hyperparameters of n_estimators,
neural network with numerous hidden layers, serves as the max_depth, max_features by modifying four different values
fundamental tenet of this research. Based on the complexity on each to optimize the random forest classifier’s
of its architecture, this decision enables us to tap into neural performance in digit classification. While random_state
networks' latent representational power for improved affects the producibility of the model, which is useful for
discriminative ability. comparing the effects of other attributes, n_estimators affects
Through algorithmic implementation, the expedition taken the diversity and accuracy of the model, which in charge of
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1, (2024) 16

capture more patterns in the data, however might also leads demonstrated its potential for further investigation of digit
to overfitting if the value is set to be too high, max_depth recognition.
affects its generalization of the model as higher value brings III. MATERIALS AND METHODS
more accuracy and creates deeper decision tree but also
leading to the risk of overfitting at the same time, A. Selection of Materials
max_features provides more features to choose when 1) Source code: The Python programming language,
building decision tree as lower value of this can reduce renowned for deep learning research, has been used for the
overfitting but reduces accuracy at the same time. Stratified study. Besides, Python has a large selection of tools and
K-folds cross-validation that divides the dataset into a more frameworks made particularly for developing neural
balanced subsets is being used as its performance criteria to networks and machine learning algorithms. The Pandas
ensure a fair evaluation of the model, addressing potential library handled data manipulation, which was used to load
class imbalance issues. and handle datasets, rendering it simpler to deal with the
The result of testing the three parameters with different MNIST dataset and carry out data preprocessing while
values resulting the importance of hyperparameter NumPy enabled numerical computations to carry out
adjustment, especially the sole setting of each normalization. Keras, which has TensorFlow as its backend
hyperparameters as 125 in n_estimators, max_dept to and benefits from the fast calculation capabilities of
‘None’ and max_features to 4 got the higher accuracy. It is TensorFlow, allowed for the flexible development of neural
worth noting that these parameters weren’t the highest nor network models to evaluate architectural alternatives for digit
lowest value that were used for testing, highlighting that classification.
simply using higher or lower value doesn’t guarantee 2) Machine: A computer with Windows 11 (Version 22H2)
improved accuracy. Interestingly, even when these operating system equipped with AMD Ryzen 7 Pro 3700U w
parameters that achieved the highest accuracy individually processor and 16 GB of installed RAM was used to carry out
were combined, they did not yield the highest overall this study.
accuracy, showcasing that each attribute has a unique impact 3) Dataset: The MNIST dataset has been used in this study
on the model’s performance. In summary, hyperparameter because this dataset serves as a model for several image
adjustment significantly enhances digit recognition accuracy classification schemes, especially handwritten digit
using Random Forest Classification. This approach recognition. According to Daniel (2022), MNIST was created
contributes to making a model to achieve the best result from an even bigger dataset, the NIST Special Database 19,
without being too time-consuming or costly, making it a that comprises handwritten uppercase and lowercase
suitable approach for newcomers in machine learning characters in addition to numbers. Besides, MNIST
practitioners. comprises 60000 handwritten digits for training the machine
In parallel, the literature review by Lead et al. (2021) learning model, and 10000 handwritten digits for model
focuses on finding the most accurate architecture for the task testing. Each digit in the MNIST is retained as a 28x28 pixel
by investigating the handwritten digit recognition based on grayscale image, and each data has 784 features.
various pre-trained deep learning models. Five pre-trained
models from PyTorch were being applied to the study which B. Selection of Methods
are from GoogLeNet, MobileNet v2, ResNet-50, ResNeXt-50 1) Data Loading: The training and testing datasets (train.csv
and Wide ResNet-50 using a MINST dataset. and test.csv) are loaded using Pandas.
After pre-processed with the data, Lead et.al (Lead et al., 2) Data Preprocessing and Transformation: The label will be
2021) applied 70000 28x28 pixels of grayscales images separated from the feature data in the training dataset. 20%
dataset that contained labelled images from 0 to 9 in the of the data in the training dataset will be randomly selected
testing part. The dataset was split into two datasets, 60000 for to implement cross-validation to avoid overfitting. The
training and 10000 for testing images on digit classification. remaining 80% of the data will be the training sets to train
During the training process, CNN Resnet-18 was used for its the neural network model.
training architecture and algorithm, it processes input images, 3) Neural Network Architecture Implementation: The neural
predicts the outcomes, and then learns to improve from the network's input layer includes 784 units (pixel units in a
feedback by comparing the result with the actual one. The 28x28 image). For multi-class classification, the output layer
evaluation involved using confusion matrices to analyze the has ten units, which are digits 0-9.
top 9 loss images and found that the confusion patterns are
quite similar with other digits. The result of accuracy and 4) Model Training: Several parameters will be modified to
training time is being considered while evaluating these train the Neural Network model and improve the
models and it returned that the Wide ResNet-50 achieved the performance of the model. To reduce an identified loss
least error percentage on Top-1 error (0.5278%) and Top-5 function, the internal model parameters which is weights and
error (0.0079%). To be noted, MobileNet v2 also achieved a biases must be adjusted repeatedly during the training phase.
commendable top-1 error rate of 0.5754% and top-5 error The training data will be tested, and the validation data will
(0.0079%) in just 498 seconds. The transition to CIFAR-10 be fitted into the model constructed.
dataset with same configuration revealed that ResNeXt-50 5) Model Evaluation: The accuracy, loss, validation loss,
achieved the least error rate on Top-1 error(14.0460%) and validation accuracy and times per epoch will be observed.
Top-5 error(0.5300%). To be noted, MobileNet v2 performed Accuracy: The percentage of accurately predicted labels in
its versatility by achieving a top-1 error rate of 15.2780% and the test dataset.
top-5 rate of 0.5380% with a smaller model size and faster Loss: The percentage difference between the predicted label
training.
Based on the outcome, it emphasizes how neural
network architecture for digit identification is always
evolving. The objective of study which is to enhance the
neural network for MNIST digit recognition was aligned with
Mobile v2's consistent performance across datasets which
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1, (2024) 17

and the actual label in the test dataset. compute the loss function in neural networks, which
determines whether learning process adjustments are
Validation Accuracy: The percentage of accurately
predicted labels with the target label in the validation necessary.
dataset.
Validation Loss: The percentage difference between the
predicted label and the target label in the validation dataset.
Fig. 3. Cross Entropy Loss (Turing,2023).
Times per Epoch: Amount of time that is required to
complete each Epoch in seconds.
6) Prediction: On the test dataset, predictions are made using 2) Multilayer Perceptron: A multilayer perceptron refers to an
the model that performs the best. The trained neural network Artificial Neural Network that comprises input, output, and
is used to make predictions on a different test dataset, and multiple hidden layers with numerous neurons to learn more
the predicted labels and image IDs will be saved. complex patterns. “Perceptron” means the capacity to see and
comprehend images, which mimics human perception
(Carolina, 2021). However, the single-neuron perceptron
C. Algorithm Implementation cannot analyze non-linear data. Hence, this issue was resolved
Neural networks are systems of interlinked neurons that upon the introduction of the Multilayer Perceptron. Multilayer
resemble the layers of the human brain. Computers may use perceptron neurons can employ any activation function, as
this to build an adaptive framework that constantly learns opposed to Perceptron neurons, which demand an activation
from errors. function that imposes a threshold. The Multilayer Perceptron
1) Feedforward Neural Network: A Feedforward Neural continually adjusts the network weights and lowers the cost
Network is an Artificial Neural Network that does not have function using backpropagation as its learning approach. The
looping nodes. Input data is fed into the network during the Multilayer Perceptron determines the Mean Squared Error
forward pass, and computations pass via the hidden layers gradient as seen in Fig. 5 for every input and output set
to produce an output in output nodes. The network's throughout every cycle after computing the weighted sums
prediction or categorization is represented in the output. and applying them through all layers. The weights of the first
Feedforward neural networks are trained by employing hidden layer are subsequently modified using this gradient
supervised learning. There are several processes the neural result, thereby propagating it back to the neural network's
network performs to compute the data. Firstly, input is
starting point.
multiplied by the given weight values. For example, x1*w1
= 2*3 = 6. The demands for the signal intensity of the
neuron are established by weights. The impact of input data
on the result will be determined by weight value. Secondly,
add the bias value to the product value in the prior phase.
For example, 6+b1 = 6+1 = 7. Thirdly, the weighted sum
will be calculated. Fourthly, by passing the weighted sum
to an activation function, the corresponding weighted sum
is converted into an output stream (Vihar, 2022). Fig.1
illustrates how the Feedforward Neural Network can learn
to categorize the handwritten digits in the MNIST dataset
Fig. 4. Cross Entropy Loss (Carolina, 2021).
by following these steps.

Fig. 5. Gradient of Mean Squared Error (Carolina, 2021).

3) Optimizer Algorithm: Adam Optimizer


Adam is an approach for calculating adaptive learning rates
that applies individual learning rates to different parameters.
This is accomplished by gauging the gradient's first and
Fig. 1. Recognizing Digits using Feedforward Neural Network. second moments, which are then used to modify the learning
(Rubentak,2023).
rate for each weight.

A feedforward neural network's mean square error cost


function shown in Fig. 2 is a smooth metric used to adjust
weights and biases, allowing for incremental improvements
for better performance with minimal effect on classified
data points. Fig. 6. Formula to estimate the moments adjust for bias (Vitaly, 2018).

Fig. 2. Mean Square Error Cost Function (Turing,2023).

Cross Entropy Loss shown in Fig. 3 and Fig. 4 is used to


Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1 (2024) 18
18

D. Purpose
The main purpose of this study is to improve the Neural
Network Model to accurately identify handwritten digits.
Several neural network models are trained using various
modified parameters in the Neural Network algorithm to
evaluate the effectiveness of various models and determine
which parameter values are best in digit recognition.

E. Parameters
The following parameters should be kept constant
throughout the study to provide a fair evaluation of each
hyperparameter, listed in Fig.7:

Constant Parameter
Parameter Value
SoftMax– output layer
Activation Function ReLU (Rectified Linear Unit)–
hidden layers
Fig. 8. Code to modify parameters.
Hidden layer 4
Fig. 8 is a sample of the code used to modify the two
Learning rate 0.1 parameters chosen. The first highlighted block is where the
Batch size 100 dropout rate is changed while the second highlighted block
Optimizer Adam Optimizer is where the number of epochs is changed. Although the
Fig. 7. Table of Constant Parameters. dropout rate can be modified separately for each layer, the
The following parameters will be examined and modified same dropout rate is applied for all layers in order to get
within this study. more consistent results for comparison and optimization.
The rest of the code contains set parameters such as the
1) Number of Training Epochs: The training epochs number of neurons in the input layer, hidden layers, and
represent the training iterations in the entire training output layer. It also has the activation function used for each
dataset. It is important to strike a balance between enabling layer, the learning rate, batch size, and optimizer used. The
the model to converge to an accurate solution and avoiding purpose of listing out these set parameters line by line is to
underfitting or overfitting. Cross-validation and tracking give a picture that defines everything clearly.
validation accuracy processes should be performed to
determine the optimal value of training.
2) Dropout Rate: Dropout is an approach applied to B. Results
minimize overfitting during the training process for a To find the optimal solution, parameters such as the number
neural network. It drives the network to acquire more of epochs and dropout rate have been experimented with in
enhanced features instead of solely being dependent on a this project. The main results that have been compared based
particular group of neurons. The dropout rate is the on the training of the models are the accuracy and validation
hyperparameter specifying the probability that a neuron accuracy after the entire training is over. Both these results
may be deactivated during training. The dropout rate has a are utilized to ensure that the data does not go wrong, and a
value range between 0 and 1, with 0 denoting no dropout, more detailed analysis is obtained. The average time taken
indicating every neuron is active, and 1 denoting entirely per epoch of the models is roughly the same, showing a very
dropout, indicating all neurons are deactivated. consistent time that is likely attributed to the performance of
Google Colab.
IV. RESULT AND DISCUSSION Number of Epochs
A. Discussion on Implementation
Generalization describes a model's ability to adapt and
appropriately respond to previously unobserved, new data.
To avoid overfitting and underfitting, it is significant to
achieve the ideal balance between the complexity of the Fig. 9. Table of Results for Model trained with different number of epochs.
model and adaptability (Evelyn,2023). The project aimed to In training neural network models, the number of epochs is a
configure a model that strikes the right balance between crucial modified parameter. Research shows that when
overfitting and underfitting. Thus, the model can be trained trained for 23 epochs, the model achieves its best average
to identify handwritten digits more accurately using the accuracy of 98.77%. Fig 9. Shows the outcome is better than
MNIST dataset. The original code is from Kaggle. Blocks the default value of 20 epochs and the other number of
of code that adjust the parameters have been developed to epochs, demonstrating that more training improves the
experiment on while keeping the others constant to see the model’s overall accuracy. Underfitting and overfitting
results of the specific parameter clearly. The use of Google principles are implemented here as the model doesn’t learn
Colab makes this process easier, where the code can be complicated patterns in the data when the number of epochs
modified by multiple people at the same time. Any changes is too low (for example: 14 or 17 epochs). On the other
are easily saved and run in real time. hand, the model is more likely to overfit if the number of
epochs is too high (for example: 26 epochs). When a model
learns training data excessively or deficiently, it leads to
poor accuracy and begins to pick up unimportant features
that cause it to perform poor data in recognizing digit.
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1 (2024) 19
18

rate is optimal when paired with 23 epochs, proving that the


Dropout Rate greatest generalization performance is achieved with this
configuration and adequately adapting to the training
dataset while avoiding overfitting. This optimal model is
efficiently trained across a reasonable number of epochs,
while preventing high dropout that may hinder learning.
Fig. 10. Table of Results for Model trained with different dropout rates.
0.1 dropout rate with 14-26 epochs

Fig. 13. Table of Results for Model trained with different number of epochs
with 0.1 dropout rate.
Fig. 13 shows that the average accuracy is consistent across
the 5 different training epochs. All the epochs' accuracy and
validation scores are high, showing that the model is
Fig. 11. Line Chart (Final Accuracy vs Dropout Rate). effective overall. This is indicative of the robustness and
The dropout rate determines the likelihood of neurons efficiency of the model with a 0.1 dropout rate. However, 20
becoming inactive during each training iteration. Fig 10 and epochs and 26 epochs both achieve slightly high accuracy,
11 shows higher accuracy and validation are generated by therefore 23 epochs are taken as the optimum solution due to
lower dropout rates, such as 0.1 and 0.2, while greater it being the average of them. This is to strike a balance to
dropout rates, such as 0.4 and 0.5. Based on average prevent overfitting, which could happen with more epochs,
accuracy, the ideal dropout rate is 0.1, which results in an while ensuring the model has enough training iterations to
average accuracy of 98.80%. Compared to the default rate achieve a good result.
of 0.3, which has an average accuracy of 98.25%. The From both results obtained from cross tuning, the second
lowest dropout rate of 0.1 results in a maximum validation model shows that when a dropout rate of 0.1 is used, the
accuracy of 98.15%. That represents a significant model's performance is unaffected by the number of epochs.
improvement, the same as the validation accuracy. This indicates that the number of epochs does not affect the
Deactivation takes part in the deactivation of neurons results of the model as significantly as the dropout rate. This
during training. Lower dropout rates such as 0.1 and 0.2 remarkable observation has proved that dropout rates serve
keep more neurons active, allowing the model to retain and as a more significant modified parameter than the number of
utilize a broader range of learned features. In contrast, epochs impacting the performance or accuracy of the model.
higher dropout rates (0.4 and 0.5) deactivate a significant The extensiveness of the model is directly affected by the
portion of neurons. The dropout rates must be balanced dropout rate. The model may be unable to retain the training
properly for a model to be generalized. The best option is to data (which can cause the risk of overfitting if there is
set a dropout rate of 0.1. to avoid overfitting and enhance excessive training data) if there is a greater dropout rate,
the network's capability to generalize new data. In which establishes more randomness and regularization. On
comparison to the default rate of 0.3, this rate results in a the other hand, a model can retain more data and complexity
significant improvement in both average accuracy and when the dropout rate is lower.
validation accuracy.
All in all, overfitting or underfitting can be successfully
regulated by the dropout rate, irrespective of how many
Cross Tuning numbers of epochs there are.
After obtaining the best value to use for the number of Both tables have confirmed that a 0.1 dropout rate is the
epochs and dropout rate separately, the best value of the most optimal solution when paired with 23 epochs. With this
number of epochs will be tested with multiple different in consideration, the model K will be chosen as it has the
values of dropout rate and vice versa to confirm the chosen highest average accuracy. The Digit Recognizer will be able
values for both modified parameters match one another to recognize the handwritten digit more accurately and
well. This is because certain over- or underfitting problems possibly faster too.
that might take place can be prevented when modified
parameters are considered individually. V. CONCLUSION
In this paper, it is demonstrated how the performance of
23 epochs with a 0.1-0.5 dropout rate
neural network models for digit recognition to MNIST
datasets can be improved by utilizing a comprehensive
combination of methods and methodologies. From
experimental results, it is found that the model K uses Adam
Optimizer, 4 hidden layers with Dropout Layer, 0.1 learning
Fig. 12. Table of Results for Model trained with different dropout rates rate, and 23 epochs get an average accuracy of 0.98715
with 23 epochs. compared with other results this is the highest accuracy. The
Fig. 12 shows a significant amount of fluctuation in average time taken per epoch for model K is 3.3 seconds,
average accuracy when the dropout rate is changed. This which is a very short time, indicating a model that can be
indicates that having the right dropout rate can greatly trained very fast and has high performance. This is especially
affect the effectiveness of the model and its results. Based crucial as performance speed is highly valued in this
on the average accuracy, it is confirmed that a 0.1 dropout technological era. Compared to the other results, the result of
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 8, no. 1 (2024) 20
18

model K shows that the performance of neural network generalization-in-machine-learning/


models has been improved to recognize MNIST datasets by
changing the parameters of the model. Finally, it is possible Amananandrai. (2023). 10 famous Machine Learning
to implement training with different optimizers such as Optimizers. https://dev.to/amananandrai/10-famous-
Adamax and SMORMS3 to further improve the accuracy of machine-learning-optimizers-1e22
the digit recognition (Amananandrai,2023).

REFERENCES
Ngan, J.F., Keong, Y.Q., Raymond, J.M.C., Wong, K.W.,
Gan J.X. (2023). Digital Classification using Random
Forest Classifier. Journal of Applied Technology and
Innovation,7(3),1-6.
http://jati.sites.apiit.edu.my/files/2023/07/Volume7_Iss
ue3_Paper11_2023.pdf
Lead, M.S., Brennan, B.C.C., Gwo, Y. T. & Hui, T.C.
(2021). MNIST handwritten digit recognition with
different CNN architectures. Journal of Applied
Technology and Innovation,5(1),1-4.
https://dif7uuh3zqcps.cloudfront.net/wp-
content/uploads/sites/11/2021/01/17192613/MNIST-
Handwritten-Digit-Recognition-with-Different-CNN-
Architectures.pdf
Ng,B.L. (2017). MNIST Dataset: Digit Recognizer.
https://www.kaggle.com/code/ngbolin/mnist-dataset-
digit-recognizer/notebook.
Daniel.E. (2022). MNIST — Dataset of Handwritten Digits.
https://medium.com/mlearning-ai/mnist-dataset-of-
handwritten-digits-
f8cf28edafe#:~:text=MNIST%20is%20a%20widely%2
0used,standard%20benchmark%20for%20classification
%20tasks.
Carolina,B. (2021). Multilayer Perceptron Explained with a
Real-Life Example and Python Code: Sentiment
Analysis. https://towardsdatascience.com/multilayer-
perceptron-explained-with-a-real-life-example-and-
python-code-sentiment-analysis-
cb408ee93141#:~:text=A%20Multilayer%20Perceptron
%20has%20input,use%20any%20arbitrary%20activatio
n%20function
Turing. (2023). Multilayer Perceptron Explained with a
Real-Life Example and Python Code: Sentiment
Analysis. https://www.turing.com/kb/mathematical-
formulation-of-feed-forward-neural-network
Vitaly, B.(2018). Adam — latest trends in deep learning
optimization. https://towardsdatascience.com/adam-
latest-trends-in-deep-learning-optimization-
6be9a291375c
Vihar. (2023). Feedforward Neural Networks: A Quick
Primer for Deep Learning. https://builtin.com/data-
science/feedforward-neural-network-intro
Evelyn, M.(2023). What Is Generalization In Machine
Learning?. https://magnimindacademy.com/blog/what-
is-generalization-in-machine-learning/
Rubentak. (2023). Understanding Feed Forward Neural
Networks with MNIST Dataset.
https://magnimindacademy.com/blog/what-is-

You might also like