VashaNet: An Automated System For Recognizing Handwritten Bangla Basic Characters Using Deep Convolutional Neural Network
VashaNet: An Automated System For Recognizing Handwritten Bangla Basic Characters Using Deep Convolutional Neural Network
Keywords: Automated character recognition is currently highly popular due to its wide range of applications. Bengali
Artificial intelligence handwritten character recognition (BHCR) is an extremely difficult issue because of the nature of the script.
Character recognition Very few handwritten character recognition (HCR) models are capable of accurately classifying all different
Computer vision
sorts of Bangla characters. Recently, image recognition, video analytics, and natural language processing have
Deep convolutional neural network
all found great success using convolutional neural network (CNN) due to its ability to extract and classify
Image processing
features in novel ways. In this paper, we introduce a VashaNet model for recognizing Bangla handwritten basic
characters. The suggested VashaNet model employs a 26-layer deep convolutional neural network (DCNN)
architecture consisting of nine convolutional layers, six max pooling layers, two dropout layers, five batch
normalization layers, one flattening layer, two dense layers, and one output layer. The experiment was
performed over 2 datasets consisting of a primary dataset of 5750 images, CMATERdb 3.1.2 for the purpose
of training and evaluating the model. The suggested character recognition model worked very well, with test
accuracy rates of 94.60% for the primary dataset, 94.43% for CMATERdb 3.1.2 dataset. These remarkable
outcomes demonstrate that the proposed VashaNet outperforms other existing methods and offers improved
suitability in different character recognition tasks. The proposed approach is a viable candidate for the high
efficient practical automatic BHCR system. The proposed approach is a more powerful candidate for the
development of an automatic BHCR system for use in practical settings.
∗ Corresponding author.
E-mail addresses: [email protected] (M. Raquib), [email protected] (M.A. Hossain), [email protected] (M.K. Islam), [email protected]
(M.S. Miah).
https://doi.org/10.1016/j.mlwa.2024.100568
Received 27 December 2023; Received in revised form 11 June 2024; Accepted 22 June 2024
Available online 26 June 2024
2666-8270/© 2024 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
alphabets from students at a local high school. Recognizing handwritten layers, and one output layer, which aims to improve accuracy with
versions is more challenging than reading printed Bengali language for lower computing cost. The suggested VashaNet model uses max pooling
various reasons. Firstly, Bangla alphabets contain a wide range of mor- layers and dropout layers in order to mitigate overfitting and improve
phologically complicated characters. Secondly, different writers have the model’s capacity for prediction. Batch normalization is used for
distinct writing styles, which results in variations in the size, form, and the effective training of DCNN for image classification tasks since it
curvature of the same character. Finally, the problem of recognizing improves stability, convergence speed, and generalization capabilities.
gets worse with the physical similarity of certain characters. Even the The present study on Bangla Handwritten Character Recognition uti-
most fundamental Bangla characters are inherently complicated. The lizes two distinct datasets, including a primary dataset comprising 5750
use of ‘‘matra’’, which refers to a line positioned above characters, has images obtained from school students, the CMATERdb (Sarkar et al.,
a chance to cause considerable confusion, even among the most basic 2012) dataset consisting of 15000 (5750 images taken) images. The
characters. The ‘‘fota’’, which refers to a dot positioned beneath the primary dataset is partitioned into an 80:20 ratio, with 80% of the data
letters, represents an additional factor contributing to ambiguity. In allocated for training purposes and 20% for validation. The validation
another research (LeCun et al., 1989), the authors provided a back- subset is then further divided evenly to facilitate additional testing.
propagation network application for handwritten digit recognition. There are 5750 images chosen for training, validation, and testing out
Their method did not cover for any handwritten characters. On the of the 15000 images in the dataset. The dataset is divided into an 80:20
other hand, some of the works merely used the standard alphabet ratio, with 80% of the data allocated for training and the remaining
which consists of 50 letters for available dataset (Begum, Islam, Eva, 20% used for validation reasons. The validation subset is then further
Emon, & Siddique, 2023; Chowdhury, Hossain, ul Islam, Andersson, & divided evenly to facilitate additional testing. The VashaNet model
Hossain, 2019). They did not use advanced pretrained models or extra design is employed consistently for the recognition of 50 fundamental
datasets for evaluation. Furthermore, the study may lack comparative Bangla characters throughout the entire process. The scanned images
analysis and benchmarks if pretrained models are not used or evaluated obtained during the primary data collection phase go through several
against external datasets. This makes it difficult to assess the model’s pre-processing operations to ensure their suitability as a test set. After
effectiveness against other datasets or in relation to cutting-edge tech- determining the optimal hyper-parameter configuration, the proposed
niques. In another study, Jubaer, Tabassum, Rahman, and Islam (2023) VashaNet model undergoes multiple training iterations, each utilizing
employed a dataset consisting of 786 Bangla handwritten document varying values for batch size, and learning rates.
images in order to solve the segmentation problem. An innovative
approach was implemented along with a novel dataset provided to 1.1. Contributions
accomplish this. It integrates the Hough and Affine transformations for
skew correction with a deep learning-based object detection framework The major technical contributions of this research are summarized
called YOLO. In order to develop more handwritten image recognition as follows:
systems, they broaden their area of study by incorporating supervised • Developed an unique VashaNet model for recognizing Bangla
word recognition. In the automatic extraction of crucial information handwritten basic characters efficiently. Since our primary
from images, CNN outperforms multilayer perceptron (MLP), according dataset does not contain any source codes, the general structure
to a study by Choudhary, Ahlawat, and Rishi (2014). In their research, of our method for increasing BHCR is different from all other
they propose a DCNN-based HCR system to enhance accuracy. They models, which highlights its novelty.
demonstrated MLP and RBF classifiers, but it is also important to look • We provided a primary dataset of 5750 images that were taken
at additional classifiers like HMM, SVM, and so forth. Building effective directly from authors who write in different styles. This distinc-
models for BHCR through primary dataset obtaining is necessary in tive primary dataset allows the model to learn from real-world
order to raise recognition systems’ reliability and precision. Through handwriting differences, making it more successful at recognizing
the acquisition of a primary dataset, researchers may guarantee a Bangla characters.
thorough representation of the variation of handwritten Bangla char- • A further significant addition consists of experimenting with 5750
acters, tackling issues like writing styles and stroke thickness. With images chosen from CMATERdb dataset for training, validation,
the help of this carefully developed dataset, more reliable models that and testing out of the 15000 images. The model can now identify
can recognize a larger variety of handwritten Bangla writing may be a greater variety of Bangla characters according to the diversifi-
trained. High-quality annotated samples are acquired through control cation of the training process.
over the data collection process, providing a strong basis for deep
learning model training and validation. Effective models created from 1.2. Organization
original datasets will improve the state-of-the-art BHCR and help a
variety of applications that depend on precise character recognition. The rest of this article is organized as follows: the related works
Among the available techniques, CNN are the most effective option are summarized in Section 2. In Section 3, we describe the proposed
for identifying images of handwritten characters in Bangla. Bangla methodology in details, which consists of the dataset collection, pro-
letters are complex and have subtle variances. This makes it difficult cessing, and train validation test dataset, and we propose a neural
for traditional machine learning algorithms that rely on handmade network architecture. In Section 4, the recognition experimental results
features to capture this complexity. Comparably, the wide diversity are presented with a detailed explanation on evaluation criteria of the
and variety of handwriting styles found in Bangla characters restrict proposed model. Finally, our conclusion is given in Section 5.
the use of template matching algorithms. Even if ensemble and deep
learning techniques for feature engineering provide some advantages, 2. Related work
they frequently require a large amount of computing resources and
manual modification. As an alternative, CNN specializes in end-to-end In this section, previous studies on HCR in Bangla are summarized.
learning, translating invariance, demonstrating parameter efficiency, This study laid the foundation for subsequent research in the field. Rah-
automatically learning hierarchical features, and achieving cutting- man, Akhand, Islam, Shill, Rahman, et al. (2015) suggested to apply
edge performance from raw pixel data. CNN are recommended because the process of normalizing written character pictures prior to utilizing
of their cutting-edge functionality and capacity to recognize intricate CNN for the purpose of grouping them. However, it does not use feature
patterns. In our research, we designed a novel VashaNet model consist- extraction. The proposed BHCR-CNN model demonstrated an accuracy
ing of nine convolutional layers, six max pooling layers, two dropout rate of 85.36% when tested on a dataset consisting of 20,000 hand-
layers, five batch normalization layers, one flattening layer, two dense written characters exhibiting diverse forms and variations. In another
2
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
research (Sarkhel, Das, Saha, & Nasipuri, 2016), the authors presented
a cost-effective HCR method that establishes a balance between cost,
quality, and recognition while searching for a solution across all alter-
natives. Moreover, this method obtained 87.28% recognition accuracy
for CMATERdb dataset, which has 50 Bangla handwriting styles. Tapos
Datta et al. Purkaystha, Datta, and Islam (2017) proposed the DCNN
model to decode Bengali handwriting. It used kernels and local recep-
tive fields to gather useful features before starting the discriminating
task, then completely connected dense layers. The researchers em-
ployed the BanglaLekha-Isolated dataset to evaluate their approach,
which achieved a letter identification accuracy of 91.23% over 50
character classes. They scaled all the images to the same size (28 × 28).
The dataset was randomly jumbled and split in half. Following the
completion of training, a portion of 85% was allocated for the training
process, while the remaining 15% was reserved for the purpose of as-
sessing accuracy. Shaikh, Tabedzki, Chaki, and Saeed (2013) put out a
proposal for a system that recognizes Bengali characters based on their
visual features. The categorization of feature vectors is performed by a
k-nearest neighbors (K-NN) classifier using dynamic temporal warping
(DTW) as a distance metric. The recognition accuracy of the system’s
datasets is 76.80%. The HCR strategy was developed by Chowdhury
et al. (2019) by the utilization of data augmentation techniques. Only Fig. 1. Flowchart of the study.
3
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
Table 1
Tabular form of literature review.
Approaches Proposed idea Research gap
Rahman Suggested to apply the process of normalizing written It does not use feature extraction.
et al. (2015) character pictures prior to utilizing CNN for the purpose
of grouping them.
Sarkhel et al. The authors presented a cost-effective HCR method that There is a research gap in comparing the model to
(2016) establishes a balance between cost, quality, and more advanced methods.
recognition while searching for a solution across all
alternatives. Moreover, this method obtained 87.28%
recognition accuracy for CMATERdb dataset, which has
50 Bangla handwriting styles.
Purkaystha Proposed the DCNN model to decode Bengali Errors in the recognition task come as a result of
et al. (2017) handwriting. It used kernels and local receptive fields to letters higher form proximity to one another.
gather useful features before starting the discriminating Furthermore, a considerable percentage of mistakes
task, then completely connected dense layers. The are attributable to mislabeled, irreversibly distorted
researchers employed the BanglaLekha-Isolated dataset to and unlawful data instances in the collection.
evaluate their approach, which achieved a letter
identification accuracy of 91.23% over 50 character
classes. They scaled all the images to the same size
(28 × 28). The dataset was randomly jumbled and split
in half. Following the completion of training, a portion of
85% was allocated for the training process, while the
remaining 15% was reserved for the purpose of assessing
accuracy.
Shaikh et al. Put out a proposal for a system that recognizes Bengali The segmentation methods used may not produce
(2013) characters based on their visual features. The appropriate results for every character. The presented
categorization of feature vectors is performed by a technique produces good results for printed Bengali
k-nearest neighbors (K-NN) classifier using dynamic characters but may have difficulty generalizing to
temporal warping (DTW) as a distance metric. The handwritten characters. While the suggested technique
recognition accuracy of the system’s datasets is 76.80%. has a recognition accuracy of 76.8%, there is still
chance of optimization and development.
Chowdhury The HCR strategy was developed by the utilization of There is a research gap for sequence-based character
et al. (2019) data augmentation techniques. Only BanglaLekha datasets recognition. Furthermore, the article does not address
were utilized in this analysis by them. A CNN the possibility of transfer learning, in which
demonstrated a high level of accuracy, achieving a rate pre-trained models on similar tasks or datasets are
of 91.81% on the training dataset consisting of 50 fine-tuned for Bangla handwriting recognition. Further
character classes. In order to achieve a weighted recall of research might concentrate on improving the model
95%, the recall values for each class were combined. The architecture and deployment methodologies for
level of accuracy was 95%. The mean weighted F1 score low-latency inference, allowing for real-time
was also 95%. recognition applications on resource-constrained
devices.
Maitra et al. Implemented a CNN model using a standardized 50-class More efficient feature extraction algorithms are
(2015) BangIa basic character database. The researchers needed, particularly for character recognition
extracted features from this database to address five applications with little training data. Further research
distinct numerical identification tasks. These tasks is needed to determine the usefulness and limits of
involved recognizing numerals from English, Devanagari, their strategy. To correctly assess the performance of
Bangia, and Oriya scripts, all of which hold official status recognition systems, including CNN-based techniques
in India. are required.
row is composed of 5 grid boxes, and each grid box has dimensions of as well as particular factors that were pertinent to our task and dataset.
1.5 inches by 1.5 inches. The number of grid boxes on a page is 50, with Initially, we meticulously preprocessed the dataset, ensuring that we
each box accommodating a single basic character. Through this careful normalized the data and resized it to improve consistency and diversity.
organization, 50 characters are able to locate their proper positions, Partitioning the dataset into distinct subsets for training, validation,
thereby exhibiting a wide range of writing styles and subtleties. The and testing facilitated the development and assessment of the model,
conversion from a physical to a digital medium was executed with a hence confirming its performance. We employed performance met-
high degree of finesse and skill. Every page of handwriting was scanned rics such as F1- score, accuracy, precision, and recall in conjunction
at 800 dpi, which means that every stroke and curve was caught with with confusion matrix analysis to evaluate the efficacy of the model.
the highest level of detail possible, as depicted in Fig. 2. The comparison with baseline approaches provided valuable insights
The careful segmentation of the dataset adds even more modifi- into the model’s enhancement. Furthermore, we took into account
cations to it. Following the grid lines, each character was taken out, the equilibrium of data, quality management, and the special features
turning the handwritten pages into an enormous collection of individ- of Bangla handwritten characters, ensuring that the dataset is both
ual images. There are a total of 5750 images stored in the 50 folders representative and applicable to many contexts. Ethical considerations,
that make up the collection, with each folder holding 115 images. of utmost importance, guided the responsible management of sensi-
Additionally, the experiment took place using the CMATERdb dataset tive data. We maintained thorough documentation and metadata to
(5750 images taken). Samples from different datasets are shown in promote transparency and reproducibility, thereby simplifying future
Figs. 3 and 4 research endeavors. By combining these methods, we performed a
We used a comprehensive methodology to evaluate our model’s per- comprehensive assessment of our main dataset, thereby improving the
formance in recognizing Bangla handwritten characters on our primary trustworthiness and dependability of our research findings in Bangla
dataset. This approach incorporated conventional evaluation methods handwritten character recognition.
4
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
Table 2
Tabular form of literature review.
Approaches Proposed idea Research gap
Created another web-based Bangla handwriting While Supervised layer-DCNN (SL-DCNN) have
Roy et al. recognition system. This study employs the quadratic potential in machine learning, its use in related fields
(2017) classifier to recognize online handwriting data of study remains unexplored. More research is
collected by mouse and touch screen input. This study necessary to determine how SL-DCNN may be used in
examined 12500 Bangla characters and 2500 Bangla other fields.
numbers. This method had a 98.42% accuracy rate on Enhancing SL-DCNN’s capabilities could involve
10 number classes and 91.13% accuracy on 50 utilizing current developments in deep learning, such
character classes. Maxout units, Dropout, and Transfer Learning. It is
necessary to do research aimed at incorporating these
methods into SL-DCNN designs.
Rabby et al. Introduced Borno, which is the first multiclass In order to improve performance, there may be a lack
(2020) convolutional neural network model for of research in the area of language-specific
grapheme-based handwritten letters in Bangla, optimization strategies designed for Bangla character
gathered 1,069,132 pictures from Ekush, MatriVasha, recognition tasks.
CMATERdb, and BanglaLekha-Isolated to train a
model. The trained Borno model recognizes characters
with 92.61% accuracy on the validation set.
Das et al. Made an extended CNN model to figure out how to There is a research gap in comparing the model to
(2021) read Bangla handwriting. It uses the more advanced methods.
BanglaLekha-Isolated dataset, which has 10 numbers,
11 vowels, and 39 consonants, to test their CNN
model. The model recognizes Bangla digits at 99.50%,
vowels at 93.18%, consonants at 92.25%, and mixed
classes at 92.25%.
Jadhav et al. Used CNN to develop a low-cost architecture for There is still chance of optimization and development
(2022) Bengali character recognition using datasets such as accuracy.
CMATERdb, BanglaLekha-Isolated, and Ekush. The
suggested approach uses a CNN on both simulated
and actual data. CMATERdb, BanglaLekha-Isolated,
and Ekush recognized 87%, 89.60%, and 83.10% of
the terms, respectively.
M. Raquib (Proposed) Proposed a novel DCNN architecture with a novel The model may be made more flexible and useful by
dataset adding compound Bangla characters and numbers to
the dataset. Combining BHCR with other areas of
computer vision, such as segmentation and object
identification, provides new perspectives on how to
increase recognition precision in intricate layouts.
5
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
6
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
the input data at each position. The resulting products are then • Sequential Layer 1:The sequential layer is made up of two
summed together to produce a single value within the output convolutional layers with 32 filters, each employing 3 × 3 ker-
feature map, activation function = ReLu, it provides the network nels, ReLU activation, and ‘same’ padding. It has batch nor-
with non-linearity, allowing it to comprehend and communicate malization for standardization and a 2D max-pooling layer for
complex data correlations. We set padding = ‘‘same’’, to ensure downsampling. The output form from this layer is (56 × 56 × 32).
that the size of the output and input are equal. The output of this • Max Pooling Layer : This max-pooling layer uses the same
layer is (224 × 224 × 16), where 16 defines the filters number. configuration as the first max-pooling layer and reduces the di-
• Convolution Layer 2: This layer is similar to Layer 1, but not mensions further to (28 × 28 × 32).
exactly. This layer contains 32 filters. This layer can identify more • Sequential Layer 2:The sequential layer is a carbon copy of
intricate patterns in the feature map because to its larger filter previous Sequential Layer 1, except for the filter size 64 and
count (32) with same activation function. This layer generated output shape, which is (14 × 14 × 64).
(224 × 224 × 32) output. • Max Pooling Layer :Except for the output being (7 × 7 × 64),
• Convolution Layer 3: It uses a 2D convolutional operation with this layer is an identical clone of the other max pooling layer with
a collection of 64 different filters, building on the framework the same dropout.
established by earlier layers. The increased number in filters • Sequential Layer 3:With the exception of the output shape,
improves its ability to distinguish more complex features, making which is (3 × 3 × 128) and the filter size of 128, the sequential
it a key participant in advancing the network’s abstraction. The layer is an exact duplicate of the prior sequential layer 2.
spatial dimensions of the output feature map from this layer • Flatten Layer: This takes the result of all the earlier actions and
are (224 × 224 × 64), a slight reduction from the size of its applies them to the flatten layer, turning it into a flattened array.
predecessor. • Sequential Layer 4: The sequential layer is made up of a fully
• Max Pooling Layer : The output of the preceding layer is sent connected dense layer with 512 hidden unit with ReLu activation,
into the max-pooling layer, which reduces the size of the current one batch normalization and 30% dropout layer.
shape into (112 × 112 × 64). The subsequent layer is a max- • Sequential Layer 5: The sequential layer is exactly same with
pooling layer with a pool size of (2, 2), which denotes the use of a previous Sequential Layer 4 except 256 hidden units.
2 × 2 window for the downsampling process. Max-pooling keeps • Output Layer: Since the multiclass classifier uses 50 classes, its
the most important components from each 2 × 2 region while final layer uses a fully connected dense layer with 50 units and
cutting the spatial dimensions in half with a 2 × 2 pool size. softmax as the activation function.
7
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
Table 3 deeper levels of convolutional layers, use typical kernel sizes such as
VashaNet model architect summary. 3 × 3 or 5 × 5. In order to minimize spatial dimensions, use max-
Layer Output shape No. of pooling layers, which typically have sizes of 2 × 2. Consider dropout
parameters
regularization to avoid overfitting, and apply batch normalization after
Convolution2D 1 (None, 224, 1216 convolutional layers for stability. Before sending the feature maps to
224, 16)
deep layers, flatten them and modify the number of neurons according
Convolution2D 2 (None, 224, 12 832 to the difficulty of the job. Except for the output layer, where softmax is
224, 32)
suitable for classification tasks, use ReLU activation across the network.
Convolution2D 3 (None, 224, 51 264 Select a suitable optimizer, such as Adam, modify the learning rate
224, 64)
appropriately, and think about putting learning rate schedules into
Max Pooling2D (None, 112, 0 place for optimization. The overall structure was constructed through
112, 64)
a series of studies and experiments with different architectures, depths,
Sequential 1 (Convolution2D 4, (None, 56, 27 840 and widths. Performance measures like accuracy, precision, recall, and
Convolution2D 5, Batch 56, 32)
Normalization 1, Max Pooling2D )
F1-score were monitored on validation data, and adjustments were
made as needed through iteration and refinement.
Max Pooling2D (None, 28, 0
28, 32)
4. Experimental results over the primary dataset
Sequential 2 (Convolution2D 6, (None, 14, 55 680
Convolution2D 7, Batch 14, 64)
Normalization 2, Max Pooling2D ) The analysis of the results is divided into groups based on batch
sizes, epochs, the ‘‘Adam’’ optimizer’s different learning rates, and
Max Pooling2D (None, 7, 7, 0
64) the ratio of how often the dataset is split. True Positive (TP), False
Positive (FP), True Negative (TN), and False Negative (FN) are the four
Sequential 3 (Convolution2D 8, (None, 3, 3, 221 952
Convolution2D 9, Batch 128) categories used to assign ratings.
Normalization 3, Max Pooling2D )
Flatten (None, 1152) 0 4.1. Accuracy
Sequential 4 (Dense Layer 1, Batch (None, 512) 592 384
Normalization 4, Dropout 30%)
The mostly used performance metric for deep learning is accu-
racy. The accuracy of a classification system is measured by how
Sequential 5 (Dense Layer 2, Batch (None, 128) 66 176
Normalization 5, Dropout 30%) many classes it correctly predicts. It works better when used on a
symmetric dataset, where the likelihood of false positives and false
Output Layer (None, 50) 6450
negatives is equal. The basic equation for calculating the accuracy is
as follows (Hossain et al., 2021):
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1)
3.5. Training the model 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Using this strategy, after 50 iterations with a batch size of 24 and the
To start, we used a batch size of 24 for training our VashaNet model adam optimizer, our classifier model achieves an accuracy of 99.70%
on the dataset. After training for 50 iterations with a learning rate of in training and 94.78% in validation and 94.60% test accuracy on the
0.008 with adam optimizer, we achieved respectable results on our primary dataset. Fig. 9 shows that the training and validation accuracy
dataset. Here is a summary of our VashaNet model in Table 3: initially neared the same level.
one another intimately. But, when looking at the situation as a
3.6. Dropout whole, it can be seen that neither of them has very good accuracy in the
beginning stage. On the other hand, the accuracy of validation is lower
The dropout layer is utilized during training to arbitrarily ignore than the accuracy of training, means my model does not overfit the
a specific proportion of neurons. This indicates that the weight and data we used to train it. Fig. 10 represents a comparison between the
bias of those neurons are not changed in the backward pass, so their loss incurred during training and the loss incurred during validation.
contributions to activation in the forward pass are slowly lost over time. This indicates that the training loss and the validation loss were both
By doing so, it brings the neural network into order. With the addition much higher at the beginning of the process. The validation loss and the
training loss both go down with each successive epoch. The validation
of this layer, convolutional neural networks may be trained without
loss is still bigger than the training loss, which can be determined
risking overfitting.
simply looking at the graph. This is as a result of the model in use not
So, selecting the best deep convolutional neural network with 26
being overfit to our training data.
layer for bangla handwritten character image classification task for our
primary dataset is dependent on a number of factors, including the
4.1.1. Optimizer (adam)
size of our dataset, the computational resources available, reviewing
In Adam, many learning rates were investigated for training. Among
the existing architecture, the transfer learning potential, experiments, different learning rate on Adam, 0.0008 learning rate provide the
fine-tuning and optimization, iterating and refining, performance eval- higher degree of accuracy. This is a summary of Adam optimizer with
uation, and the specific requirements of our application. Prior research different learning rates in Table 4:
and experiments in the field of image classification, particularly for Table 4 illustrates the training and validation losses and accuracies
comparable tasks involving character recognition, are likely to have produced using the Adam optimizer for various learning rates. Among
influenced the architectural decisions, including the number of layers, the different learning rates, 0.0008 learning rate achieved a maximum
layer varieties, and layer configurations. We select our VashaNet model 94.60% test accuracy.
following significant investigation and experimenting with 36 different
architectural variants which was customized to our basic collection 4.1.2. Batch sizes
of 5750 images. We design a 26-layer DCNN architecture which is The proposed Vashanet model was trained using batches of 16, 24,
depicted in Fig. 8 and Table 3. The first step in choosing the parameters and 32 respectively. Fig. 11 displays the optimal outcome for a batch
for a CNN architecture is to comprehend the features of our dataset, size of 24. However, training takes more time with bigger batches, and
such as its complexity and size. To increase the number of filters at training per epoch has gone up by a large margin.
8
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
Table 5
Precision score on primary dataset.
Dataset name Precision score (%)
Test Data 94.60
9
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
Table 6 Our VashaNet model’s recall score of 95.25% suggests that it can
Recall score on primary dataset.
reliably recognize all positive events. What this means is that the model
Dataset name Recall score (%)
accurately identified 95.25% of all true positives. A greater recall score
Test Data 95.25 is recommended when the effects of false negatives are substantial.
However, in other contexts, where false positives are expensive, high
precision scores are more crucial (see Table 6).
score, and accuracy. Our VashaNet model’s precision score of 94.60%
reflects its success in recognizing positive occurrences from the total 4.1.7. F1 score
projected positive instances. During the experimentation phase, we conducted tests using the
testing portion of the dataset. Through the use of our VashaNet model,
4.1.6. Recall we were able to attain an F1 score of 94.62%. This shown that the
Recall score is a metric used to assess a model’s ability to predict model has a good balance between precision (how many of the pre-
positive classifications accurately. The accuracy score represents the dicted characters are right) and recall (how many of the actual char-
value of a positive prediction over all classes that were successfully acters are correctly identified by the model). By using the following
predicted. The proportion of samples correctly classified as positive equation, we can calculate the F1 score (Ahmed et al., 2022):
in relation to all real positive samples is known as the percentage of 2 × (𝑅𝑒𝑐𝑎𝑙𝑙 × 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
𝐹 1 − 𝑆𝑐𝑜𝑟𝑒 = (4)
true positive samples. Below is the formula for determining a recall 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
score (Ahmed et al., 2022): The F1 score is given in Table 7:
𝑇𝑃 Our VashaNet model that was trained on character recognition in
𝑅𝑒𝑐𝑎𝑙𝑙 = (3)
𝑇𝑃 + 𝐹𝑁 Bangla handwritten characters got an F1 score of 94.62%. This is a
We also calculated the recall sore for the test data image using the good sign that the model can be used effectively in real-world use.
above describe equation and methods. It gives the score as below in The F1 score is obviously beneficial; however, it is necessary to take
Table 6: into consideration several other factors when selecting an appropriate
10
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
Table 7
F1 score on primary dataset.
Dataset name F1 score (%)
Test Data 94.62
Fig. 15. Result analysis according to training accuracy, validation accuracy, test
accuracy, precision, recall, F1, and specificity.
Table 8
Specificity score on primary dataset.
Dataset name F1 score (%)
Test Data 99.89
11
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
Table 9
Summary of VashaNet model performances on different datasets.
Dataset Training Validation Training Validation
loss loss accuracy (%) accuracy (%)
Primary 0.0106 0.1904 99.70 94.78
CMATERdb 0.0132 0.3170 99.70 92.70
Fig. 17. Accuracy and losses of primary dataset. Fig. 19. Precision score analysis among the three datasets.
12
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
Fig. 21. F1 score analysis among the three datasets. Fig. 22. Specificity analysis between the two datasets.
4.5. Comparison with existing works DCNN, Hidden Markov Model (HMM) and other methods for bangla
handwritten character recognition. Additionally, the research stands
The present study presents an innovative methodology by con- out greatly due to its careful concentration on data.
ducting experiments using a recently developed primary dataset. The Rather than following only conventional practices of utilizing pre-
incorporation of this novel dataset, previously unexplored in the realm existing datasets, this study chooses an innovative methodology by
of research, introduces a novel dimension to the study. The model conducting experiments with a rigorously curated dataset. This pri-
applied in this study is particularly noteworthy as it constitutes a novel mary dataset reflects the authentic variety of diversity and variability
performance, given that its architecture has not been employed in any observed in handwritten Bangla characters in real-world settings. The
VashaNet model demonstrated remarkable performance in a very re-
other research. The model shows exceptional accuracy, particularly
alistic and intricate environment, achieving an exceptional training
when used with our developed dataset and CMATERdb dataset, high-
accuracy of 99.70% and an equally impressive validation accuracy of
lighting its competence in newly unexplored domains. Although there
94.78% and test accuracy of 94.60% for its primary dataset. In addi-
are many CNN architectural designs (Ahmed et al., 2022; Chowdhury
tion, the model achieves a training accuracy of 99.70%, a validation
et al., 2019; Purkaystha et al., 2017) in the field, none of them corre-
accuracy of 92.70% and a test accuracy of 94.43%, when evaluated
spond to the architecture of our proposed model. We have developed on the CMATERdb dataset. For the relevant studies, no source codes
a pioneering methodology for primary dataset acquisition, which com- were found to be implemented for our developed dataset, our VashaNet
plements our innovative DCNN architecture. The Comparative analyzes model continuously outperformed previous attempts (Biswas, Bhat-
of the results of the proposed model with similar works for different tacharya, & Parui, 2012; Rahman et al., 2015; Shaikh et al., 2013),
bangla character datasets are depicted in Table 10. proving that it is state-of-the-art accuracy for this primary dataset
According to Table 10, in comparison to past efforts in the field specifically. In this particular aspect, our VashaNet model has achieved
of Bangla handwritten character identification, which relied mostly on more precise performance through the use of a novel DCNN architec-
classical machine learning approaches (Sarkhel et al., 2016), the use ture. These important results not only highlight the power of the DCNN
of deep learning models (Das et al., 2021; Purkaystha et al., 2017; architecture we provide, but they also highlight the revolutionary
Roy et al., 2017) has proven to be substantially more effective. For potential of deep learning in enhancing precision and applicability in
this objective, our research presents a complete comparison of CNN, this crucial area.
13
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
Table 10
Comparison with existing works.
Author Methods Datasets Results
accuracy (%)
Biswas et al. HMM Primary 91.85
(2012)
Rahman et al. CNN BanglaLekha 89.3
(2015)
Sarkhel et al. SVM CMATERdb 87.28
(2016)
Fig. 24. Absence of ‘matra’.
Roy et al. (2017) DCNN with CMATERdb 90.33
inception
Purkaystha et al. DCNN BanglaLekha- 91.23
(2017) Isolated
Shaikh et al. KNN Primary 76.80
(2013)
Pramanik and Shape ICDAR 88.74
Bag (2018) Reduction
Chowdhury CNN Banglalekha- 91.81
et al. (2019) isolated
Rabby et al. Multiclass Assembled 92.61
(2020) CNN
Fig. 25. Overwriting or dissimilar shape.
Das et al. (2021) CNN BanglaLekha- 92.25
Isolated
Jadhav et al. CNN Ekush 83.10
(2022)
Ahmed, Akter CNN Primary 88.48
Mim, Saha,
Nahar, and
Aynul Hasan
Nahid (2023)
Our Proposed VashaNet Primary, 94.60, 94.43
Model (DCNN) CMATERdb
14
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568
CRediT authorship contribution statement Islam, M. S., Rahman, M. M., Rahman, M. H., Rivolta, M. W., & Aktaruzzaman, M.
(2022). Ratnet: A deep learning model for bengali handwritten characters recog-
nition. Multimedia Tools and Applications, [ISSN: 15737721] 81, http://dx.doi.org/
Mirza Raquib: Conceptualization, Methodology, Software, Valida-
10.1007/s11042-022-12070-4.
tion, Writing – original draft. Mohammad Amzad Hossain: Supervi- Jadhav, R., Gadge, S., Kharde, K., Bhere, S., & Dokare, I. (2022). Recognition of
sion, Writing – review & editing. Md Khairul Islam: Writing – review handwritten bengali characters using low cost convolutional neural network. In
& editing. Md Sipon Miah: Writing – review & editing. 2022 interdisciplinary research in technology and management (pp. 1–6). IEEE.
Jubaer, S. M., Tabassum, N., Rahman, M. A., & Islam, M. K. (2023). BN-DRISHTI:
Bangla document recognition through instance-level segmentation of handwritten
Declaration of competing interest text images. arXiv preprint arXiv:2306.09351.
Kaur, P., Kumar, Y., Ahmed, S., Alhumam, A., Singla, R., & Ijaz, M. F. (2022). Automatic
The authors declare that they have no known competing finan- license plate recognition system for vehicles using a cnn.. Computers, Materials &
cial interests or personal relationships that could have appeared to Continua, 71(1).
Kumari, T., Vardan, Y., Shambharkar, P. G., & Gandhi, Y. (2022). Comparative study on
influence the work reported in this paper.
handwritten digit recognition classifier using cnn and machine learning algorithms.
In 2022 6th international conference on computing methodologies and communication
Data availability (pp. 882–888). IEEE.
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., et al. (1989).
Data will be made available on request. Handwritten digit recognition with a back-propagation network. Advances in Neural
Information Processing Systems, 2.
Liu, W., Wei, J., & Meng, Q. (2020). Comparisions on KNN, SVM, BP and the CNN for
Acknowledgment handwritten digit recognition. In 2020 IEEE international conference on advances in
electrical engineering and computer applications (pp. 587–590). IEEE.
We would like to acknowledge the support provided by the Depart- Maitra, D. S., Bhattacharya, U., & Parui, S. K. (2015). CNN based common approach
to handwritten character recognition of multiple scripts. In 2015 13th international
ment of Information and Communication Engineering, and Research
conference on document analysis and recognition (pp. 1021–1025). IEEE.
cell of Noakhali Science and Technology University, Noakhali-3814, Narayan, A., & Muthalagu, R. (2021). Image character recognition using convolutional
Bangladesh. neural networks. In 2021 seventh international conference on bio signals, images, and
instrumentation (pp. 1–5). IEEE.
References Pramanik, R., & Bag, S. (2018). Shape decomposition-based handwritten compound
character recognition for bangla OCR. Journal of Visual Communication and Im-
age Representation, [ISSN: 10959076] 50, http://dx.doi.org/10.1016/j.jvcir.2017.11.
Ahmed, S. S., Akter Mim, A., Saha, D., Nahar, K., & Aynul Hasan Nahid, M. (2023).
016.
A CNN-based novel approach for the detection of compound bangla handwritten
Purkaystha, B., Datta, T., & Islam, M. S. (2017). Bengali handwritten character
characters. In 2023 11th international symposium on digital forensics and security (pp.
recognition using deep convolutional neural network. In 2017 20th international
1–6). http://dx.doi.org/10.1109/ISDFS58141.2023.10131677.
conference of computer and information technology (pp. 1–5). IEEE.
Ahmed, T., Uddin, M., Khan, M. A. R., & Hasan, A. R. M. (2022). Offline handwritten
Rabby, A. S. A., Islam, M. M., Hasan, N., Nahar, J., & Rahman, F. (2020). Borno: Bangla
character recognition including compound character from scanned document. Asian
handwritten character recognition using a multiclass convolutional neural network.
Journal of Research in Computer Science, 119–129. http://dx.doi.org/10.9734/ajrcos/
In Proceedings of the future technologies conference (pp. 457–472). Springer.
2022/v14i4297.
Rahman, M. M., Akhand, M., Islam, S., Shill, P. C., Rahman, M., et al. (2015). Bangla
Begum, H., Islam, M. M., Eva, H. S., Emon, N. H., & Siddique, F. A. (2023).
handwritten character recognition using convolutional neural network. International
Deep learning networks for handwritten bangla character recognition. International
Journal of Image, Graphics and Signal Processing (IJIGSP), 7(8), 42–49.
Journal of Applied Mathematics, 53.
Rakshit, P., Chatterjee, S., Halder, C., Sen, S., Obaidullah, S. M., & Roy, K. (2023).
Biswas, C., Bhattacharya, U., & Parui, S. K. (2012). HMM based online handwritten
Comparative study on the performance of the state-of-the-art CNN models for
bangla character recognition using Dirichlet distributions. In 2012 international
handwritten bangla character recognition. Multimedia Tools and Applications, 82(11),
conference on frontiers in handwriting recognition (pp. 600–605). IEEE.
16929–16950.
Chakraborty, P., Roy, S., Sumaiya, S. N., & Sarker, A. (2023). Handwritten character
Rasheed, A., Ali, N., Zafar, B., Shabbir, A., Sajid, M., & Mahmood, M. T. (2022).
recognition from image using CNN. In Micro-electronics and telecommunication
Handwritten urdu characters and digits recognition using transfer learning and
engineering: proceedings of 6th ICMETE 2022 (pp. 37–47). Springer.
augmentation with AlexNet. IEEE Access, 10, 102629–102645. http://dx.doi.org/
Choudhary, A., Ahlawat, S., & Rishi, R. (2014). A binarization feature extraction
10.1109/ACCESS.2022.3208959.
approach to OCR: MLP vs. RBF. Distributed Computing and Internet Technology, 341.
Roy, S., Das, N., Kundu, M., & Nasipuri, M. (2017). Handwritten isolated bangla
Chowdhury, R. R., Hossain, M. S., ul Islam, R., Andersson, K., & Hossain, S. (2019).
compound character recognition: A new benchmark using a novel deep learning
Bangla handwritten character recognition using convolutional neural network
approach. Pattern Recognition Letters, [ISSN: 01678655] 90, http://dx.doi.org/10.
with data augmentation. In 2019 joint 8th international conference on informatics,
1016/j.patrec.2017.03.004.
electronics & vision (ICIEV) and 2019 3rd international conference on imaging, vision
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., & Basu, D. K. (2012).
& pattern recognition (pp. 318–323). IEEE.
CMATERdb1: a database of unconstrained handwritten bangla and bangla–english
Chu, K. (1999). An introduction to sensitivity, specificity, predictive values and
mixed script document image. International Journal on Document Analysis and
likelihood ratios. Emergency Medicine, 11(3), 175–181.
Recognition (IJDAR), 15, 71–83.
Das, T. R., Hasan, S., Jani, M. R., Tabassum, F., & Islam, M. I. (2021). Bangla
Sarkhel, R., Das, N., Saha, A. K., & Nasipuri, M. (2016). A multi-objective approach
handwritten character recognition using extended convolutional neural network.
towards cost effective isolated handwritten bangla character and digit recognition.
Journal of Computer and Communications, 9(03), 158–171.
Pattern Recognition, [ISSN: 00313203] 58, http://dx.doi.org/10.1016/j.patcog.2016.
Dey, R., & Balabantaray, R. C. (2023). Development of a benchmark odia handwritten
04.010.
character database for an efficient offline handwritten character recognition with
Shaikh, S. H., Tabedzki, M., Chaki, N., & Saeed, K. (2013). Bengali printed character
a chronological survey. ACM Transactions on Asian and Low-Resource Language
recognition–a new approach. Information Systems and Industrial Management, 129.
Information Processing, 22(6), 1–28.
Wang, P., Fan, E., & Wang, P. (2021). Comparative analysis of image classification
Gupta, D., & Bag, S. (2021). CNN-based multilingual handwritten numeral recognition:
algorithms based on traditional machine learning and deep learning. Pattern
A fusion-free approach. Expert Systems with Applications, 165, Article 113784.
Recognition Letters, 141, 61–67.
Hossain, M. M., Asadullah, M., Rahaman, A., Miah, M. S., Hasan, M. Z., Paul, T.,
Zhang, C., Liu, B., Chen, Z., Yan, J., Liu, F., Wang, Y., et al. (2023). A machine
et al. (2021). Prediction on domestic violence in bangladesh during the covid-19
vision-based character recognition system for suspension insulator iron caps. IEEE
outbreak using machine learning methods. Applied System Innovation, 4(4), 77.
Transactions on Instrumentation and Measurement, 72, 1–13. http://dx.doi.org/10.
1109/TIM.2023.3300474.
15