Automated Digitization of Student's Marks From The Answer Book
Automated Digitization of Student's Marks From The Answer Book
https://doi.org/10.1007/s42979-024-02693-9
ORIGINAL RESEARCH
Abstract
Preparing student’s digital marksheet using images of student answer-books is a potential application in academic institu-
tions. Segmenting assigned marks automatically from answer-book images is extremely challenging, and it also demands
pre-processing before the recognition stage. In addition, recognizing handwritten digits is crucial due to different writing
styles. Existing research admits the superior performance of deep learning-based models in handwritten digit recognition
(HDR) applications for popular datasets. However, their implication on real-time data for an experimental setup needs much
attention. This paper presents an experimental setup that uses student answer-book images to record students’ marks digi-
tally. We proposed a lightweight convolutional neural network (CNN) model for HDR. We also introduced a contour-based
segmentation process for automatically extracting student details from answer-book images. The obtained results show the
state-of-the-art performance of our proposed CNN model for real-time images. Further, introducing additional pre-processing
before recognition significantly enhances the accuracy of the HDR experimental setup.
Keywords Convolutional neural network (CNN) · Handwritten digit recognition (HDR) · Handwritten digit segmentation ·
Image classification
SN Computer Science
Vol.:(0123456789)
350 Page 2 of 9 SN Computer Science (2024) 5:350
Existing work in the literature accepts pre-processed digit [1] proposed two-layer NN and compared the performance
images ready to feed in CNN model. However, real-time for two widely used loss functions, mean square loss (MSE),
images has blur, shadows, and illumination issues. Seg- and cross-entropy loss.
menting digit images in such cases is challenging. Further, Guo et al. [22] integrated a convolutional neural network
handwriting can vary in size, shape, background, and slant. with hidden markov model (CNN-HMM) for recognizing
Even individual letters can vary from person to person, house numbers from street view images. They have shown
depending on their writing styles. Additionally, the pres- that CNN features are more powerful than hand-crafted
ence of noise, such as smudging or non-standardized paper, features. Ghosh et al. [23] compared three neural network
can complicate the task of character recognition. To address approaches, deep neural network (DNN), deep belief net-
these challenges, we proposed an experimental setup that work (DBN), and convolutional neural network (CNN) for
generates digital record of student marks through automatic handwritten digit recognition on the MNIST dataset. Shima
segmentation and recognition. et al. [24] used the AlexNet-CNN model to extract the fea-
The key contributions of this paper are listed as follows: ture maps and then classified them through the SVM model.
James et al. [25] proposed to accompany an XGBoost pre-
1. A lightweight and accurate CNN model is proposed for dictor with CNN to improve accuracy over the NIST dataset,
handwritten digit recognition. With few trainable param- which consists of 810,000 isolated characters in lowercase,
eters, the proposed model achieves state-of-the-art accu- uppercase, and digits in English. Son et al. [26] used the
racy. VGG-CNN model for number recognition while reading the
2. An inclusion of pre-processing stage enhances the accu- numbers from the gas meter.
racy of handwritten digits. The pre-processing stage Neto et al. [27] presented a new Gated-CNN-BGRU
ensures that real-time images are consistent with the architecture for handwritten digit string recognition (HDSR)
MNIST dataset [17]. systems. They demonstrated the model robustness by achiev-
3. A contour-based segmentation technique automatically ing an average precision of 96.50%. Ahlawat [28] improved
detects the desired region of interest from the student’s the handwritten digit recognition accuracy using pure CNN
answer-book. architecture without ensemble architecture. Mukhoti et al.
4. An experimental setup that automatically captures stu- [29] used LeNet-CNN architecture for handwritten digit
dent information and generates a digital record. classification in Bangla and Hindi.
SN Computer Science
SN Computer Science (2024) 5:350 Page 3 of 9 350
bounding box. Next, we extract contours of roll number whose to binary by OTSU’s threshold. Next, we use the erosion oper-
y-coordinate is within yl , and contour height h is within the ator to thin digit image boundaries. Let, an image under the
range of hl and hu. Similarly, we extract marks contours whose set Roll_no and Marks, inverted and thresholded, is denoted
y-coordinate is higher than yh, and whose contour height h is as I. Using structuring element S, we get eroded image IE as,
within the range of hl and hu. As a result of the segmentation
process, we will have region of interest (ROI) with a set of
IE = I ⊝ S = {z|(S)z ⊆ I} (1)
individual digit images of roll number and marks denoted with This gives all digit images consistent with the MNIST data-
Roll_No and Marks. A summary of our proposed segmentation set. The aforesaid pre-processing steps are illustrated in
process is given in algorithm 1. Fig. 1.
Algorithm 1 Proposed Segmentation Algorithm
Pre‑processing
Handwritten Digit Recognition
The segmentation process results set of digit images Roll_no
and Marks, of students’ roll number and marks, respectively. In this phase, we have segmented individual digits from stu-
To enhance accuracy, we employed a pre-processing stage dents’ answer-books having their roll numbers and marks
that converts handwritten digit images closely matching the will pass through a pre-trained CNN model for recogni-
MNIST dataset. Since the collected images have a white back- tion purposes. The proposed CNN model includes three
ground and black-colored digits, we inverted these images to convolutional layers and three fully connected layers. Each
match the MNIST dataset. The generated image is transformed
SN Computer Science
350 Page 4 of 9 SN Computer Science (2024) 5:350
convolutional layer consists of convolutional units followed ReLU activation. The third convolutional layer has 16 ker-
by batch normalization and max-pooling operation. nels, each of size 3x3 with ReLU activation. Before flatten-
At an instance, each CNN layer accepts an output feature ing, a dropout feature is introduced to prevent overfitting of
xi(L−1) from the previous layer ( L − 1) and computes N (L) fea- the CNN model. This improves the generalization ability of
ture maps z(L)
j
using kernel w(L)
j
as, CNN for predicting unseen data. After flatting, we deployed
three fully connected layers (FCNs) to map convolutional
z(L)
j
= xi(L−1) ∗ w(L)
j
, 1 ≤ j ≤ N (L) (2) layer features to the final output layer for predicting class
probabilities. Each of these FCNs has 128, 50, and 10 neu-
Here, ∗ refers convolution operation between xi(L−1) and w(L)
j rons, respectively. Further, the first two FCNs have ReLU
. activation functions, and the final FCN has a softmax acti-
These feature maps are further passed through ReLU acti- vation function. An ADAM optimizer is utilized during the
vation to discard negative values from features, training phase to minimize cross-entropy loss ( LCE).
( )
N
xj(L) = max 0, z(L) (3) 1∑
j
LCE = − y log(ŷi ) (4)
N i=1 i
As a result, for each previous layer feature xi(L−1), CNN layer
computes N (L) features, each denoted with xj(L) . Thus, we Here, yi and ŷi are true labels and predicted probabilities,
receive N (L−1)
× N feature maps at the execution of L
(L) th respectively, for each of the N classes.
CNN layer. Next, a batch normalization [31] normalizes
each training mini-batch. Toward the end, a max-pooling Experimental Setup
layer gets abstract information about feature maps that
reduce feature dimensions for the next stage. There is a drop- The proposed HDR system includes three key stages: seg-
out layer and flatten layer between the connection of last mentation, pre-processing, and recognition. This three-stage
convolutional layer and fully connected layers. The proposed process, including sub-processes, is shown in Fig. 3.
lightweight CNN architecture is shown in Fig. 2. As shown in Fig. 3, individual digit images are seg-
As shown in Fig. 2, the first convolutional layer has 64 mented through selection of extracted contours that map to
kernels, each of size 5 × 5 with ReLU activation. The second roll number and marks. These segmented digit images are
convolutional layer has 32 kernels, each of size 5 × 5 with pre-processed before CNN-based HDR recognition for better
SN Computer Science
SN Computer Science (2024) 5:350 Page 5 of 9 350
SN Computer Science
350 Page 6 of 9 SN Computer Science (2024) 5:350
Dataset been trained and validated by merging MNIST and our own
custom datasets. Approach 4 is the same as 3, but custom
The proposed model utilizes the MNIST dataset that con- dataset images are eroded first before training and valida-
tains 60,000 training and 10,000 testing images. These tion. Our dataset includes 441 images of handwritten digits
images are for digits 0–9 with variations such as rotation, segmented from students’ answer-books. To analyze the per-
illumination, and scale to have diversity. A few sample digit formance of the proposed model over real-time images, we
images from the dataset are shown in Fig. 5. computed training and validation cross-entropy loss for all
Our experimental setup captures real-time images of the approaches and shown in Fig. 8.
student answer-book for preparing the digital record. For that As shown in Fig. 8, Approach 1 results in a stable model
purpose, we created a custom dataset of 441-digit images with few variations. Approach 2 has instability in training
extracted from real-time images for tuning our model. A few that results in lower accuracy. Approaches 3 and 4 show
samples of such answer-book images are shown in Fig. 6. stable model behavior with higher accuracy than previ-
ous approaches. Learning curves in Fig. 8 indicate that
Evaluation of Proposed CNN Model with a smaller data set in Approach 2, the cross-entropy
loss is relatively higher than in Approaches 3 and 4. Also,
Compared to state-of-the-art models, we designed a sim- cross-entropy is more stable and has fewer variations in
ple and effective CNN model for HDR. Our CNN model Approaches 3 and 4 than in Approach 2. The validation and
was trained on the Jetson Nano device through the 60,000 testing performance for all these approaches on real-time
training images of the MNIST dataset. The model was data is shown in Table 1.
trained for 100 epochs with a learning rate of 0.001. Opti- It is evident from Table 1 that Approach 4 achieves high-
mizing network parameters utilizes Adam [32], and the est accuracy of 96.26% due to an additional pre-process-
loss function is cross-entropy. With all these hyperparam- ing step before recognition. An accuracy obtained with
eters, the total parameters of proposed CNN models are Approach 4 is almost 14% higher than accuracy achieved
67,104, of which trainable parameters are 66,880. This using Approach 1.
model accuracy and MSE over 100 epochs are shown in
Fig. 7a and b, respectively. Performance Comparison with State‑of‑the‑Art HDR
As shown in Fig. 7a and b, the proposed lightweight CNN Methods
model achieves accuracy and MSE, 0.9937 and 1.417 × 10−4,
respectively. To validate the performance of our proposed model, we
compared our model with state-of-the-art HDR methods.
For comparison, we considered total parameters, trainable
Model Performance over Real‑Time Images parameters, and accuracy. The comparison results are shown
in Table 2.
Existing HDR methods shows performance over the MNIST As shown in Table 2, Zhao et al. [36] proposed an ensem-
datasets. However, their performance under real-time condi- ble learning approach that achieves 98.10% accuracy with
tions has yet to be experimented. We evaluated our model the least total and trainable parameters of 12,857. In contrast,
on real-time images of the custom dataset with four different Albahli et al. [37] proposed a faster regional convolutional
approaches. Approach 1 has been trained and validated on neural network (FRCNN) that achieves maximum accuracy
the MNIST dataset. Approach 2 is trained and validated on of 99.70%. However, this approach uses a dense CNN with
our own custom dataset, which is smaller. Approach 3 has total and trainable parameters of 6,031,422 and 5,921,356,
SN Computer Science
SN Computer Science (2024) 5:350 Page 7 of 9 350
Fig. 8 Learning curves for a custom dataset. Left to right, top to bottom: Approaches 1, 2, 3, and 4
Table 1 Validation and test performance of the proposed approaches respectively, shown in red. The similar accuracy has been
Validation accuracy (%) Test
observed for other models with total and trainable param-
accuracy eters above one lacs. Such a dense network makes the train-
(%) ing process complex and time intensive. Our CNN-based
model achieves an accuracy of 99.37% with total and train-
Approach 1 99.28 82.76
able parameters of 67,104 and 66,880, respectively, shown
Approach 2 95.45 94.4
in blue color. Looking at the obtained accuracy with lesser
Approach 3 99.29 95.13
number of total and trainable parameters, our proposed light-
Approach 4 99.15 96.26
weight CNN model achieves state-of-the-art performance.
Table 2 Model performance Model architecture Total parameters Trainable Accuracy (%)
comparison with state-of-the-art parameters
HDR methods
Yang et al. [33] DFC 430,500 430,500 99.13
Enriques et al. [34] CNN+ML 210,740 210,740 98.00
Saqib et al. [35] CNN+DL4J 1,111,946 1,111,946 99.21
Zhao et al. [36] CNN+KNN+RF 12,857 12,857 98.10
Albahli et al. [37] FRCNN 6,031,422 5,921,356 99.70
Proposed Lightweight CNN 67,104 66,880 99.37
SN Computer Science
350 Page 8 of 9 SN Computer Science (2024) 5:350
SN Computer Science
SN Computer Science (2024) 5:350 Page 9 of 9 350
25. Joseph James S, Lakshmi C, UdayKiran P. Parthiban: An effi- 33. Yang Z, Moczulski M, Denil M, Freitas ND, Smola A, Song
cient offline hand written character recognition using cnn and L, Wang Z. Deep Fried Convnets. Int Conf Comput Vis, ICCV.
xgboost. Int J Innov Technol Explor Eng. 2019;8(6):115–8. 2015;2015:1476–83. https://doi.org/10.1109/ICCV.2015.173.
26. Son C, Park S, Lee J, Paik J. Deep learning-based number detec- 34. Enriquez EA, Gordillo N, Bergasa LM, Romera E, Huélamo CG.
tion and recognition for gas meter reading. IEIE Trans Smart Convolutional neural network vs traditional methods for offline
Process Comput. 2019;8(5):367–72. https://doi.org/10.5573/ recognition of handwritten digits. Adv Intell Syst Comput.
IEIESPC.2019.8.5.367. 2019;855:87–99. https://doi.org/10.1007/978-3-319-99885-5_7.
27. De Sousa Neto AF, Bezerra BLD, Lima EB, Toselli AH. Hdsr- 35. Ali S, Shaukat Z, Azeem M, Sakhawat Z, Mahmood T, ur
flor: A robust end-to-end system to solve the handwritten digit Rehman K, An efficient and improved scheme for handwrit-
string recognition problem in real complex scenarios. IEEE ten digit recognition based on convolutional neural network.
Access. 2020;8:208543–53. https://doi.org/10.1109/ACCESS. SN Applied Sciences 2019;1(9). https:// d oi. o rg/ 1 0. 1 007/
2020.3039003. s42452-019-1161-5.
28. Ahlawat S, Choudhary A, Nayyar A, Singh S, Yoon B. Improved 36. Zhao H-H, Liu H. Multiple classifiers fusion and cnn feature
handwritten digit recognition using convolutional neural net- extraction for handwritten digits recognition. Granular Comput.
works (cnn). Sensors (Switzerland). 2020;20(12):1–18. https:// 2020;5(3):411–8. https://doi.org/10.1007/s41066-019-00158-6.
doi.org/10.3390/s20123344. 37. Albahli S, Nawaz M, Javed A, Irtaza A. An improved faster-
29. Mukhoti J, Dutta S, Sarkar R. Handwritten digit classifica- rcnn model for handwritten character recognition. Arab
tion in bangla and hindi using deep learning. Appl Artif Intell. J Sci Eng. 2021;46(9):8509–23. https:// d oi. o rg/ 1 0. 1 007/
2020;34(14):1074–99. https://doi.org/10.1080/08839514.2020. s13369-021-05471-4.
1804228.
30. Chang F, Chen C-J, Lu C-J. A linear-time component-labeling Publisher's Note Springer Nature remains neutral with regard to
algorithm using contour tracing technique. Comput Vis Image jurisdictional claims in published maps and institutional affiliations.
Underst. 2004;93(2):206–20. https://d oi.o rg/1 0.1 016/j.c viu.
2003.09.002. Springer Nature or its licensor (e.g. a society or other partner) holds
31. Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep exclusive rights to this article under a publishing agreement with the
Network Training by Reducing Internal Covariate Shift. author(s) or other rightsholder(s); author self-archiving of the accepted
2015;1:448–56. https://doi.org/10.5555/3045118.3045167. manuscript version of this article is solely governed by the terms of
32. Kingma DP, Ba JL, Adam: A Method for Stochastic Optimiza- such publishing agreement and applicable law.
tion. 2015; https://doi.org/10.48550/arXiv.1412.6980.
SN Computer Science