0% found this document useful (0 votes)
23 views4 pages

MNIST Handwritten Digit Recognition With Different CNN Architectures

Uploaded by

hgamning77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views4 pages

MNIST Handwritten Digit Recognition With Different CNN Architectures

Uploaded by

hgamning77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 5, no.

1, (2021) 7

MNIST handwritten digit recognition with


different CNN architectures

Lead Ming Seng Brennan Bang Chen Chiang Zailan Arabee Abdul Salam
School of Computing School of Computing School of Computing
Asia Pacific University of Technology Asia Pacific University of Technology Asia Pacific University of Technology
& Innovation (APU) & Innovation (APU) and Innovation (APU)
Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia
[email protected] [email protected] [email protected]

Gwo Yih Tan Hui Tong Chai


School of Computing School of Computing
Asia Pacific University of Technology Asia Pacific University of
& Innovation (APU) Technology and Innovation (APU)
Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia
[email protected] [email protected]

Abstract—Handwritten digit recognition has long been a Different models are built and trained using convolution
popular research topic in computer vision and pattern operation, but achieving high accuracy depends on many
recognition. Recognizing handwritten digits used to be factors such as dataset used or network architecture. Our
challenging but thanks to many machine learning techniques experiment set out to see how different architectures can affect
nowadays, the problem is no longer. In this research, we looked the accuracy of handwritten digit recognition using the same
into the MNIST database using fast.ai and trained the CNN dataset which is the MNIST dataset. MNIST is a large
ResNet-18 model to recognize handwritten digits. We then database of handwritten digits that contains 70,000 grayscale
modified the architecture with different pre-trained models. For images, each of 28×28 pixels. Altogether there are 10 classes
this work, we implemented five PyTorch’s pre-trained models,
representing numbers from 0 to 9. The images of digits are
which are GoogLeNet, MobileNet v2, ResNet-50, ResNeXt-50,
Wide ResNet-50. The purpose of this paper is to reveal the most
normalized in size and centred which makes it an excellent
accurate architecture for handwritten digits recognition. Also, dataset for evaluation. The train-test distribution differs for
we provide comparisons of training time, top-1 error, top-5 this project as 42,000 images are used in the training set and
error and model size on all five models. 28,000 images in the test set.
We first began by implementing the ResNet-18
Keywords— Convolutional Neural Networks (CNN), CNN architecture and trained the model using the training dataset.
Architectures, Image Classification, Handwritten Digit
After getting the results, we then modified the architecture
Recognition
with predefined architectures from PyTorch. The models
I. INTRODUCTION implemented are GoogLeNet, MobileNet v2, ResNet-50,
ResNeXt-50 and Wide ResNet-50. All the models are trained
Handwritten recognition is the ability of machines to on the MNIST and the CIFAR-10 dataset to see the which is
recognize input handwritten by human. The variety of the most accurate of all.
handwriting styles, spacing variations and handwriting
inconsistencies all make it a much more challenging task for II. LITERATURE REVIEW
the machine. Nevertheless, machine learning models have
evolved significantly in recent years and are still growing. A. Similar projects
Many state-of-the-art models are able to achieve high Many researches have been conducted on handwritten
performance and a very high accuracy. With this success, this digit classification with different algorithms and classifiers.
technology is now used in many ways i.e. reading postal For Convolutional Neural Network, many models are also
address, bank check processing, form data entry, etc. available to train the algorithm to achieve a better result. Some
of the models include ResNeXt-50, ResNet-50 and
Convolutional Neural Networks (CNNs) are widely and
GoogLeNet.
conveniently used for these image recognition and
classification tasks. CNN is a special type of Neural Network Saining et al [2] presented a simpler neural network
capable of taking in an input image, assigning importance to focused on aggregating transformations of the same topology
various aspects and being able to distinguish one from called ResNeXt based on VGG and ResNet. Cardinality,
another. Recent research works have seen convolutional which is the size of the set of transformations, is increased and
neural networks being applied for facial recognition, experimented on vs Depth and Width. ResNeXt-50 and
document analyses, speech detection and license plate ResNeXt-10 compared with the ResNet-50 and ResNet-101
recognition [1]. models, have successfully reduced error rates by 3.2% and
2.3%. This shows that complex models that are deeper and
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 5, no. 1, (2021) 8

wider are not always better as it may take more time but return Fast.Ai, a deep learning library provides high-level
similar results. The problem of ResNet is diminishing feature components that can deliver state-of-the-art results quickly in
reuse, which is it does not force to go through the residual standard deep learning domains and provides low-level
block, and it can avoid learning. So, most of the block is not components to be mixed and matched for building new
contributing or contributing a little to the final goal. Besides approaches. Fast.Ai leverages the dynamism of Python
that, when comparing width to depth, the complexity of width language and the flexibility of PyTorch library. It offers a new
is higher than the depth, so ResNet is made as thin as possible type of dispatch system for Python with a semantic type
to increase the depth and have less parameters. hierarchy for tensors, a GPU-optimized computer vision
library, a new data block API, etc. [5]. Overall, Fast.Ai is
Zagoruyko and Komodakis [3] proposed the idea of easily approachable and rapidly productive.
decrease depth and increase width of residual network The
parameters are tested to know how deep and how wide in C. Conclusion / Recommendation
ResNet to be able to optimize it. The Wide ResNet-40-4 that In a nutshell, CNN is widely used for image classification
has fewer parameters is able to have a lower error rate problems. With the ever-growing advancements of
compared to the 1001-layer Pre-Activation ResNet. The Wide technology and complexity of datasets, more new and
ResNet-16-8 and Wide ResNet-28-10 achieve lower error efficient CNN architectures are developed. Spoiled by the
rates and they are shallower and wider than Wide ResNet-40- abundance of choice of CNN architectures, we need to pick
4. The training time of shallower is shorter because the GPUs the right architecture for the right problem. Therefore,
perform parallel computations. With dropout, the model is comparisons of different architectures on datasets are done to
also able to have consistent gain and reduce overfitting. evaluate which one suits best.
Basri et al [4] observed and compared the performance
contributed by different networks, the four models being III. ALGORITHM IMPLEMENTATION
discussed in the paper are AlexNet, MobileNet, GoogLeNet A. Data
and CapsuleNet. While considering the result from normal The database utilized in the project is MNIST database, it
data in the same condition, the error rate of relevant models has separated to training dataset, testing dataset and saved in
are GoogLeNet (Inception V3) 7%, AlexNet 8%, CapsuleNet train.csv, test.csv respectively. Both datasets contain plenty of
8.7% and MobileNet 20.3%. After adding the augmentation grayscale images from number zero to nine. The images in the
dataset in the training process, the error rate had been reduced training dataset are labelled according to their classes, while
to AlexNet 0.99%, GoogLeNet 1.49%, CapsuleNet 7.76% and images in testing dataset are not labelled.
MobileNet 16.42%. For the computation time, AlexNet was
fastest 1.14s, CapsuleNet 3.86s, MobileNet 12.52s and B. Preparation
GoogLeNet 22.53s. This implies that the structure of models
result in different performance and computation time, it also
emphasizes the importance of augmented datasets.
B. Methodology / Approach
 Dataset
The dataset used in this paper is the MNIST database of
handwritten digits. The dataset contains total 70,000
Fig. 1. Importing libraries
grayscales, each 28×28 pixels of size. Altogether there are 10
different classes, depicting the number 0 to 9. Normally the
dataset is split into 60,000 and 10,000 for training set and test
set respectively. The training dataset is to teach the model how
every digit looks like by including the labels. Then the test
dataset is used to test the model by feeding it only the images
to let it predict data it has never seen before.
 CNN
Convolutional neural networks combine artificial neural Fig. 2. Data transformation
networks with the recent methods of deep learning. They have
been used for years in image recognition tasks, like Before starting the session, there is some preparation
handwritten digit recognition, which is addressed in this required to be done to make the process go smoothly. First of
paper. CNNs are thought to be the first deep learning approach all, the programmer set up the environment by importing
with robustness that is successful in using multilayer required libraries for further usage of functions. Due to fast.ai
hierarchical structure networks. CNNs can reduce the number only accept images input, the data is stored in variables and
of trainable network parameters to improve the back- reshape to the desired dimensions of images (28 x 28). Before
propagation algorithm deficiency of forward propagation load into the data bunch, the data has been transformed and
networks. normalized with the aid of mnist_stats function. In the
They are particularly suitable for image processing and transformation, the flip action has been restricted to prevent
understanding due to the close link and spatial formation confusion.
between the levels and can extract the rich correlative C. Training
characteristics from the images [1].
The implemented algorithm and architecture is CNN
 Fast.Ai Resnet-18. The images data will be input and the model will
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 5, no. 1, (2021) 9

output the prediction result. Thus, the predicted result will be possible class and the class with highest probability will be
verified by the actual result and it will feedback to the model the result of the images. It was found that with ResNet-18,
for improvement, which is called learning. the accuracy on the MNIST dataset average around 96% with
a training time of 874 seconds. Original code from Kaggle
[6].
Fig. 3. Setting up the configurations of the architecture IV. RESULT AND DISCUSSION
The learner function cnn_learner has been set up according A. Discussion on implementation
to the environment and the function fit_one_cycle is utilized The aim is to propose a more accurate and faster
as a support. It shows the condition of the training including architecture for solving the MNIST handwritten digit image
loss, accuracy and time consumed. classification problem. We trained different architectures on
D. Evaluation the same dataset to compare and evaluate the time and
accuracy. Using the Fast.Ai’s default modification and some
of the more popular PyTorch’s default pre-trained models, we
train with a one cycle policy along with 10 epochs for each
Fig. 4. Visualizing top losses
architecture. The pre-trained models that we have used are
The object Classification Interpretation is created for GoogLeNet, MobileNet v2, ResNet-50, ResNeXt-50, Wide
evaluation and the images of top 9 loss are plotted. Confusion ResNet-50.
matrix are one of the suitable ways to present the data because B. Experiments on MNIST Dataset
it is easy to understand. From the figure and confusion matrix,
it is clearly shown that the confusion among the digits is TABLE I. COMPARISONS OF DIFFERENT ARCHITECTURES ON THE
usually caused due to the similar pattern of the digit. MNIST DATASET. TOP-1 AND TOP-5 ERROR HAS BEEN OBTAINED BY USING
THE AVERAGE OVER 3 RUNS
Top-1 Top-5 Training Model
Model
error (%) error (%) Time (s) Size (MB)
GoogLeNet 0.5317 0.0397 512 49.7
MobileNet v2 0.5754 0.0079 498 13.6
ResNet-50 0.6190 0.0159 510 97.8
ResNeXt-50 0.5794 0.0119 549 95.8
Wide ResNet- 0.5278 0.0079 540 132.0
50

Table I shows that the performance of the models on the


MNIST dataset appears to saturate. We argue that this is
because of the complexity of the dataset being regarded as one
of the simplest and is mostly used as a baseline for image
recognition. Nevertheless, it is still possible to see which
model is best suited for a smaller and simpler dataset such as
the MNIST dataset.
Fig. 5. Top-9 top losses

Fig. 6. Confusion matrix Fig. 8. Bubble Chart of MNIST Dataset comparing the Top-1 error, training
time and the size of the model. The model size is represented by the size of
E. Prediction the bubble.

The model with the lowest Top-1 error is Wide ResNet-50


Fig. 7. Evaluating performance at 0.5278% and a Top-5 error of 0.0079%. We also note that
MobileNet v2 has achieved the 3rd best Top-1 error of
The testing data is fed to evaluate the performance of the 0.5754% and the best Top-5 error alongside Wide ResNet-50
learning model. The model will rate the possibility of each at 0.0079% despite being 10x smaller than the model size of
Journal of Applied Technology and Innovation (e -ISSN: 2600-7304) vol. 5, no. 1, (2021) 10

Wide ResNet-50 at 13.6MB. It is also noted that MobileNet by these 3 factors, and we have thus concluded that based on
v2 has the fastest training time among the models at 498 our research, MobileNet v2 is the best among these 5 models
seconds. for the MNIST dataset problem
C. Experiment on Cifar 10 Dataset
REFERENCES
Due to the saturated results from the MNIST dataset
[1] X. Han., and Y. Li., “The Application of Convolution Neural Networks
experiment, we conducted more experiments on more in Handwritten Numeral Recognition,” International Journal of
complex datasets using the same configurations from the Database Theory and Application, Vol.8, No.3 (2015), pp.367-376.
previous experiment, notably the CIFAR-10 dataset. [2] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He., “The Aggregated
residual transformations for deep neural networks,” ´ CVPR, 2016.
TABLE II. COMPARISONS OF DIFFERENT ARCHITECTURES ON THE [3] S. Zagoruyko and N. Komodakis., “Wide residual networks. Arxiv,”
CIFAR 10 DATASET. TOP-1 AND TOP-5 ERROR HAS BEEN OBTAINED BY SERGEY ZAGORUYKO AND NIKOS KOMODAKIS: WIDE
USING THE AVERAGE OVER 5 RUNS RESIDUAL NETWORKS 1, 2016.
Top-1 error Top-5 error Training [4] Basri, R. & Akter, M. (2020). Bangla Handwritten Digit Recognition
Model Using Deep Convolutional Neural Network | Proceedings of the
(%) (%) Time(s)
GoogLeNet 15.4500 0.7800 843 International Conference on Computing Advancements. [Online].
2020. Doi.org. Available at
MobileNet v2 15.2780 0.5380 826
ResNet-50 19.0580 0.9440 850 https://doi.org/10.1145/3377049.3377077
ResNeXt-50 14.0460 0.5300 901 [5] fast.ai. 2020. Welcome To Fastai | Fastai. [online] Available at:
<https://docs.fast.ai/> [Accessed 31 August 2020].
Wide ResNet-50 20.3620 1.0720 952
[6] kaggle.com. (n.d.). Beginners guide to MNIST with fast.ai. [online]
Available at:
From Table II, we found that the model that performed the https://www.kaggle.com/christianwallenwein/beginners-guide-to-
best is ResNeXt-50 with a Top-1 error of 14.0460% and a mnist-with-fast-ai
Top-5 error of 0.5300%. The model that performed
exceptionally well was MobileNet v2 with a Top-1 error of
15.2780% and a Top-5 error of 0.5380% while being 7 times
smaller in size as well as being 75 seconds faster in training
time compared to ResNeXt-50, slightly outperforming
GoogLeNet. It is also noted that MobileNet v2 has one of the
fastest training times among the models.

Fig. 9. Bubble Chart of Cifar Dataset comparing the Top-1 error, training
time and the size of the model. The model size is represented by the size of
the bubble.

V. CONCLUSION
The task of image recognition is still growing and developing
as researchers alike develop progressively sophisticated
neural networks. After conducting research, it became clear
that the performance of models would differ from the task
being performed on. Accuracy and error rate, while being
important factors in determining the suitability of models on a
task, are not the only factors that we look at. Training time
plays an important role as a factor alongside accuracy and
error rate as the complexity and size of a dataset grows, the
more crucial training time becomes. The decision on picking
the right model on solving the task at hand is best determined

You might also like