0% found this document useful (0 votes)
60 views15 pages

VashaNet: An Automated System For Recognizing Handwritten Bangla Basic Characters Using Deep Convolutional Neural Network

VashaNet: An automated system for recognizing handwritten Bangla basic characters using deep convolutional neural network

Uploaded by

traviss9010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views15 pages

VashaNet: An Automated System For Recognizing Handwritten Bangla Basic Characters Using Deep Convolutional Neural Network

VashaNet: An automated system for recognizing handwritten Bangla basic characters using deep convolutional neural network

Uploaded by

traviss9010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Machine Learning with Applications 17 (2024) 100568

Contents lists available at ScienceDirect

Machine Learning with Applications


journal homepage: www.elsevier.com/locate/mlwa

VashaNet: An automated system for recognizing handwritten Bangla basic


characters using deep convolutional neural network
Mirza Raquib a , Mohammad Amzad Hossain a,e ,∗, Md Khairul Islam b , Md Sipon Miah c,d,e
a
Department of Information and Communication Engineering, Noakhali Science and Technology University, Noakhali 3814, Bangladesh
b
Department of Biomedical Engineering, Islamic University, Kushtia 7003, Bangladesh
c
Department of Information and Communication Technology, Islamic University, Kushtia 7003, Bangladesh
d
Department of Signal Theory and Communications, Universidad Carlos III de Madrid, 28911 Leganés, Madrid, Spain
e
School of Computer Science, University of Galway, Galway H91TK33, Ireland

ARTICLE INFO ABSTRACT

Keywords: Automated character recognition is currently highly popular due to its wide range of applications. Bengali
Artificial intelligence handwritten character recognition (BHCR) is an extremely difficult issue because of the nature of the script.
Character recognition Very few handwritten character recognition (HCR) models are capable of accurately classifying all different
Computer vision
sorts of Bangla characters. Recently, image recognition, video analytics, and natural language processing have
Deep convolutional neural network
all found great success using convolutional neural network (CNN) due to its ability to extract and classify
Image processing
features in novel ways. In this paper, we introduce a VashaNet model for recognizing Bangla handwritten basic
characters. The suggested VashaNet model employs a 26-layer deep convolutional neural network (DCNN)
architecture consisting of nine convolutional layers, six max pooling layers, two dropout layers, five batch
normalization layers, one flattening layer, two dense layers, and one output layer. The experiment was
performed over 2 datasets consisting of a primary dataset of 5750 images, CMATERdb 3.1.2 for the purpose
of training and evaluating the model. The suggested character recognition model worked very well, with test
accuracy rates of 94.60% for the primary dataset, 94.43% for CMATERdb 3.1.2 dataset. These remarkable
outcomes demonstrate that the proposed VashaNet outperforms other existing methods and offers improved
suitability in different character recognition tasks. The proposed approach is a viable candidate for the high
efficient practical automatic BHCR system. The proposed approach is a more powerful candidate for the
development of an automatic BHCR system for use in practical settings.

1. Introduction several successful implementations of deep learning approaches for rec-


ognizing English handwritten characters (Narayan & Muthalagu, 2021).
HCR system is a software or hardware with the capacity to read However handwriting recognition technology for Bangla is still in its
and interpret handwriting in any language (Rasheed et al., 2022). early stages, despite the fact that over 228 million people are native
Due to a wide range of potential uses, handwritten character recog- speakers of the language (Ahmed, Uddin, Khan, & Hasan, 2022). The
nition has been a hot topic among computer vision experts in recent goal of BHCR research employing robust CNN is to digitize handwritten
years (Chakraborty, Roy, Sumaiya, & Sarker, 2023; Dey & Balaban- Bangla text in an accurate and efficient manner. Developing BHCR
taray, 2023; Rakshit et al., 2023). In comparison to other pattern systems with CNN can help with document digitalization, effective
recognition techniques, convolutional neural network (CNN) has been information retrieval, and literacy enhancement because Bangla is a
found to perform better (Wang, Fan, & Wang, 2021), (Liu, Wei, & Meng, language that millions of people speak and has great cultural signifi-
2020), (Kumari, Vardan, Shambharkar, & Gandhi, 2022), (Rasheed cance. CNN is highly effective at identifying complicated patterns in
et al., 2022). Postal automation, mail sorting, ATM check processing, pictures, which makes them ideal for handling the complex and various
and license plate identification are just a few of the numerous applica- forms of handwritten Bangla characters. In this research, we investigate
tions for HCR systems (Gupta & Bag, 2021), (Islam, Rahman, Rahman, the problem of recognizing handwritten Bangla characters and quan-
Rivolta, & Aktaruzzaman, 2022), (Kaur et al., 2022). There have been tify the recognition accuracy using a dataset of handwritten Bangla

∗ Corresponding author.
E-mail addresses: [email protected] (M. Raquib), [email protected] (M.A. Hossain), [email protected] (M.K. Islam), [email protected]
(M.S. Miah).

https://doi.org/10.1016/j.mlwa.2024.100568
Received 27 December 2023; Received in revised form 11 June 2024; Accepted 22 June 2024
Available online 26 June 2024
2666-8270/© 2024 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

alphabets from students at a local high school. Recognizing handwritten layers, and one output layer, which aims to improve accuracy with
versions is more challenging than reading printed Bengali language for lower computing cost. The suggested VashaNet model uses max pooling
various reasons. Firstly, Bangla alphabets contain a wide range of mor- layers and dropout layers in order to mitigate overfitting and improve
phologically complicated characters. Secondly, different writers have the model’s capacity for prediction. Batch normalization is used for
distinct writing styles, which results in variations in the size, form, and the effective training of DCNN for image classification tasks since it
curvature of the same character. Finally, the problem of recognizing improves stability, convergence speed, and generalization capabilities.
gets worse with the physical similarity of certain characters. Even the The present study on Bangla Handwritten Character Recognition uti-
most fundamental Bangla characters are inherently complicated. The lizes two distinct datasets, including a primary dataset comprising 5750
use of ‘‘matra’’, which refers to a line positioned above characters, has images obtained from school students, the CMATERdb (Sarkar et al.,
a chance to cause considerable confusion, even among the most basic 2012) dataset consisting of 15000 (5750 images taken) images. The
characters. The ‘‘fota’’, which refers to a dot positioned beneath the primary dataset is partitioned into an 80:20 ratio, with 80% of the data
letters, represents an additional factor contributing to ambiguity. In allocated for training purposes and 20% for validation. The validation
another research (LeCun et al., 1989), the authors provided a back- subset is then further divided evenly to facilitate additional testing.
propagation network application for handwritten digit recognition. There are 5750 images chosen for training, validation, and testing out
Their method did not cover for any handwritten characters. On the of the 15000 images in the dataset. The dataset is divided into an 80:20
other hand, some of the works merely used the standard alphabet ratio, with 80% of the data allocated for training and the remaining
which consists of 50 letters for available dataset (Begum, Islam, Eva, 20% used for validation reasons. The validation subset is then further
Emon, & Siddique, 2023; Chowdhury, Hossain, ul Islam, Andersson, & divided evenly to facilitate additional testing. The VashaNet model
Hossain, 2019). They did not use advanced pretrained models or extra design is employed consistently for the recognition of 50 fundamental
datasets for evaluation. Furthermore, the study may lack comparative Bangla characters throughout the entire process. The scanned images
analysis and benchmarks if pretrained models are not used or evaluated obtained during the primary data collection phase go through several
against external datasets. This makes it difficult to assess the model’s pre-processing operations to ensure their suitability as a test set. After
effectiveness against other datasets or in relation to cutting-edge tech- determining the optimal hyper-parameter configuration, the proposed
niques. In another study, Jubaer, Tabassum, Rahman, and Islam (2023) VashaNet model undergoes multiple training iterations, each utilizing
employed a dataset consisting of 786 Bangla handwritten document varying values for batch size, and learning rates.
images in order to solve the segmentation problem. An innovative
approach was implemented along with a novel dataset provided to 1.1. Contributions
accomplish this. It integrates the Hough and Affine transformations for
skew correction with a deep learning-based object detection framework The major technical contributions of this research are summarized
called YOLO. In order to develop more handwritten image recognition as follows:
systems, they broaden their area of study by incorporating supervised • Developed an unique VashaNet model for recognizing Bangla
word recognition. In the automatic extraction of crucial information handwritten basic characters efficiently. Since our primary
from images, CNN outperforms multilayer perceptron (MLP), according dataset does not contain any source codes, the general structure
to a study by Choudhary, Ahlawat, and Rishi (2014). In their research, of our method for increasing BHCR is different from all other
they propose a DCNN-based HCR system to enhance accuracy. They models, which highlights its novelty.
demonstrated MLP and RBF classifiers, but it is also important to look • We provided a primary dataset of 5750 images that were taken
at additional classifiers like HMM, SVM, and so forth. Building effective directly from authors who write in different styles. This distinc-
models for BHCR through primary dataset obtaining is necessary in tive primary dataset allows the model to learn from real-world
order to raise recognition systems’ reliability and precision. Through handwriting differences, making it more successful at recognizing
the acquisition of a primary dataset, researchers may guarantee a Bangla characters.
thorough representation of the variation of handwritten Bangla char- • A further significant addition consists of experimenting with 5750
acters, tackling issues like writing styles and stroke thickness. With images chosen from CMATERdb dataset for training, validation,
the help of this carefully developed dataset, more reliable models that and testing out of the 15000 images. The model can now identify
can recognize a larger variety of handwritten Bangla writing may be a greater variety of Bangla characters according to the diversifi-
trained. High-quality annotated samples are acquired through control cation of the training process.
over the data collection process, providing a strong basis for deep
learning model training and validation. Effective models created from 1.2. Organization
original datasets will improve the state-of-the-art BHCR and help a
variety of applications that depend on precise character recognition. The rest of this article is organized as follows: the related works
Among the available techniques, CNN are the most effective option are summarized in Section 2. In Section 3, we describe the proposed
for identifying images of handwritten characters in Bangla. Bangla methodology in details, which consists of the dataset collection, pro-
letters are complex and have subtle variances. This makes it difficult cessing, and train validation test dataset, and we propose a neural
for traditional machine learning algorithms that rely on handmade network architecture. In Section 4, the recognition experimental results
features to capture this complexity. Comparably, the wide diversity are presented with a detailed explanation on evaluation criteria of the
and variety of handwriting styles found in Bangla characters restrict proposed model. Finally, our conclusion is given in Section 5.
the use of template matching algorithms. Even if ensemble and deep
learning techniques for feature engineering provide some advantages, 2. Related work
they frequently require a large amount of computing resources and
manual modification. As an alternative, CNN specializes in end-to-end In this section, previous studies on HCR in Bangla are summarized.
learning, translating invariance, demonstrating parameter efficiency, This study laid the foundation for subsequent research in the field. Rah-
automatically learning hierarchical features, and achieving cutting- man, Akhand, Islam, Shill, Rahman, et al. (2015) suggested to apply
edge performance from raw pixel data. CNN are recommended because the process of normalizing written character pictures prior to utilizing
of their cutting-edge functionality and capacity to recognize intricate CNN for the purpose of grouping them. However, it does not use feature
patterns. In our research, we designed a novel VashaNet model consist- extraction. The proposed BHCR-CNN model demonstrated an accuracy
ing of nine convolutional layers, six max pooling layers, two dropout rate of 85.36% when tested on a dataset consisting of 20,000 hand-
layers, five batch normalization layers, one flattening layer, two dense written characters exhibiting diverse forms and variations. In another

2
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

research (Sarkhel, Das, Saha, & Nasipuri, 2016), the authors presented
a cost-effective HCR method that establishes a balance between cost,
quality, and recognition while searching for a solution across all alter-
natives. Moreover, this method obtained 87.28% recognition accuracy
for CMATERdb dataset, which has 50 Bangla handwriting styles. Tapos
Datta et al. Purkaystha, Datta, and Islam (2017) proposed the DCNN
model to decode Bengali handwriting. It used kernels and local recep-
tive fields to gather useful features before starting the discriminating
task, then completely connected dense layers. The researchers em-
ployed the BanglaLekha-Isolated dataset to evaluate their approach,
which achieved a letter identification accuracy of 91.23% over 50
character classes. They scaled all the images to the same size (28 × 28).
The dataset was randomly jumbled and split in half. Following the
completion of training, a portion of 85% was allocated for the training
process, while the remaining 15% was reserved for the purpose of as-
sessing accuracy. Shaikh, Tabedzki, Chaki, and Saeed (2013) put out a
proposal for a system that recognizes Bengali characters based on their
visual features. The categorization of feature vectors is performed by a
k-nearest neighbors (K-NN) classifier using dynamic temporal warping
(DTW) as a distance metric. The recognition accuracy of the system’s
datasets is 76.80%. The HCR strategy was developed by Chowdhury
et al. (2019) by the utilization of data augmentation techniques. Only Fig. 1. Flowchart of the study.

BanglaLekha datasets were utilized in this analysis by them. A CNN


demonstrated a high level of accuracy, achieving a rate of 91.81%
on the training dataset consisting of 50 character classes. In order to 3. Methodology
achieve a weighted recall of 95%, the recall values for each class were
combined. The level of accuracy was 95%. The mean weighted F1 In this section, the methodologies of this research are presented with
score was also 95%. In their study, Maitra, Bhattacharya, and Parui appropriate explanation. The flowchart of this study is given in Fig. 1
(2015) implemented a CNN model utilizing a standardized 50-class and describes each step sequentially.
BangIa basic character database. The researchers extracted features
from this database to address five distinct numerical identification 3.1. Dataset collection
tasks, each comprising 10 classes. These tasks involved recognizing
numerals from English, Devanagari, Bangia, Telugu, and Oriya scripts, Despite the existence of existing datasets within the research com-
all of which hold official status in India. Roy, Das, Kundu, and Nasipuri munity, gathering a new dataset for Bangla handwritten character
(2017) created another web-based Bangla handwriting recognition sys- recognition is crucial for a number of reasons. First off, to provide
tem. This study employs the quadratic classifier to recognize online more character complexity, style, and variation in research community.
handwriting data collected by mouse and touch screen input. This Researchers can train their models more successfully to identify various
study examined 12500 Bangla characters and 2500 Bangla numbers. writing styles and variants by giving them access to a more robust and
This method had a 98.42% accuracy rate on 10 number classes and clean dataset that has a greater variety of handwritten characters. This
91.13% accuracy on 50 character classes. Rabby, Islam, Hasan, Nahar, will ultimately increase the models’ accuracy and performance. Fur-
and Rahman (2020) introduced Borno, which is the first multiclass thermore, an entirely novel data set may present additional difficulties
convolutional neural network model for grapheme-based handwritten
and unpredictability, enabling researchers to push the limits of existing
letters in Bangla, gathered 1,069,132 pictures from Ekush, MatriVasha,
approaches and algorithms for handwritten recognition. This expanded
CMATERdb, and BanglaLekha-Isolated to train a model. The trained
dataset can be extremely useful for tasks such as script comprehension,
Borno model recognizes characters with 92.61% accuracy on the val-
signature recognition, summarization, and other practical applications
idation set. Das, Hasan, Jani, Tabassum, and Islam (2021) made an
that demand precise and diverse character recognition capabilities.
extended CNN model to figure out how to read Bangla handwriting. It
Furthermore, building a new dataset helps address potential concerns
uses the BanglaLekha-Isolated dataset, which has 10 numbers, 11 vow-
with data quality and labeling accuracy in current available datasets.
els, and 39 consonants, to test their CNN model. The model recognizes
By curating a clean and precisely labeled dataset, researchers may
Bangla digits at 99.50%, vowels at 93.18%, consonants at 92.25%, and
prevent biases, inconsistencies, and inaccuracies that could impair the
mixed classes at 92.25%. The study by Zhang et al. (2023) addresses
the challenge of accurately identifying curved characters that include performance and generalizability of various models trained on such
intricate 3-D structures and are subject to significant surface reflections. data. In addition, a new dataset can contain the most current devel-
The authors suggest a character identification system based on machine opments in data collection techniques, ensuring that it correctly and
vision techniques specifically designed for the iron caps of suspension thoroughly represents real-world circumstances. By offering a more
insulators. Jadhav, Gadge, Kharde, Bhere, and Dokare (2022) used CNN recent dataset, researchers can remain more up-to-date changes in
to develop a low-cost architecture for Bengali character recognition the field and encourage innovation in Bangla handwritten character
using datasets such as CMATERdb, BanglaLekha-Isolated, and Ekush. recognition research.
The suggested approach uses a CNN on both simulated and actual The dataset was collected from the students at Harinarayanpur High
data. CMATERdb, BanglaLekha-Isolated, and Ekush recognized 87%, School in Noakhali. This collection, which consists of 5750 primary
89.60%, and 83.10% of the terms, respectively. In this research we images, is a wonderful work of art that has been tastefully created and
introduce a VashaNet model consisting of nine convolutional layers, presented. This dataset is fundamentally a masterwork of inclusion and
six max pooling layers, two dropout layers, five batch normalization organization. Furthermore, this dataset possesses a wealth of diversity.
layers, one flattening layer, two fully connected dense layers, and one The students, including individuals who were right-handed as well as
output layer. The purpose of this model is to recognize handwritten left-handed, made distinct contributions through their varied handwrit-
Bangla characters using our novel primary dataset. ing styles. These contributions are made on standard A4-sized papers,
In summary given in Tables 1 and 2, existing literature on character with a properly designed grid system. The layout of each page is orga-
recognition task. nized into a grid structure consisting of 10 rows and 5 columns. Each

3
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Table 1
Tabular form of literature review.
Approaches Proposed idea Research gap
Rahman Suggested to apply the process of normalizing written It does not use feature extraction.
et al. (2015) character pictures prior to utilizing CNN for the purpose
of grouping them.

Sarkhel et al. The authors presented a cost-effective HCR method that There is a research gap in comparing the model to
(2016) establishes a balance between cost, quality, and more advanced methods.
recognition while searching for a solution across all
alternatives. Moreover, this method obtained 87.28%
recognition accuracy for CMATERdb dataset, which has
50 Bangla handwriting styles.

Purkaystha Proposed the DCNN model to decode Bengali Errors in the recognition task come as a result of
et al. (2017) handwriting. It used kernels and local receptive fields to letters higher form proximity to one another.
gather useful features before starting the discriminating Furthermore, a considerable percentage of mistakes
task, then completely connected dense layers. The are attributable to mislabeled, irreversibly distorted
researchers employed the BanglaLekha-Isolated dataset to and unlawful data instances in the collection.
evaluate their approach, which achieved a letter
identification accuracy of 91.23% over 50 character
classes. They scaled all the images to the same size
(28 × 28). The dataset was randomly jumbled and split
in half. Following the completion of training, a portion of
85% was allocated for the training process, while the
remaining 15% was reserved for the purpose of assessing
accuracy.

Shaikh et al. Put out a proposal for a system that recognizes Bengali The segmentation methods used may not produce
(2013) characters based on their visual features. The appropriate results for every character. The presented
categorization of feature vectors is performed by a technique produces good results for printed Bengali
k-nearest neighbors (K-NN) classifier using dynamic characters but may have difficulty generalizing to
temporal warping (DTW) as a distance metric. The handwritten characters. While the suggested technique
recognition accuracy of the system’s datasets is 76.80%. has a recognition accuracy of 76.8%, there is still
chance of optimization and development.

Chowdhury The HCR strategy was developed by the utilization of There is a research gap for sequence-based character
et al. (2019) data augmentation techniques. Only BanglaLekha datasets recognition. Furthermore, the article does not address
were utilized in this analysis by them. A CNN the possibility of transfer learning, in which
demonstrated a high level of accuracy, achieving a rate pre-trained models on similar tasks or datasets are
of 91.81% on the training dataset consisting of 50 fine-tuned for Bangla handwriting recognition. Further
character classes. In order to achieve a weighted recall of research might concentrate on improving the model
95%, the recall values for each class were combined. The architecture and deployment methodologies for
level of accuracy was 95%. The mean weighted F1 score low-latency inference, allowing for real-time
was also 95%. recognition applications on resource-constrained
devices.

Maitra et al. Implemented a CNN model using a standardized 50-class More efficient feature extraction algorithms are
(2015) BangIa basic character database. The researchers needed, particularly for character recognition
extracted features from this database to address five applications with little training data. Further research
distinct numerical identification tasks. These tasks is needed to determine the usefulness and limits of
involved recognizing numerals from English, Devanagari, their strategy. To correctly assess the performance of
Bangia, and Oriya scripts, all of which hold official status recognition systems, including CNN-based techniques
in India. are required.

row is composed of 5 grid boxes, and each grid box has dimensions of as well as particular factors that were pertinent to our task and dataset.
1.5 inches by 1.5 inches. The number of grid boxes on a page is 50, with Initially, we meticulously preprocessed the dataset, ensuring that we
each box accommodating a single basic character. Through this careful normalized the data and resized it to improve consistency and diversity.
organization, 50 characters are able to locate their proper positions, Partitioning the dataset into distinct subsets for training, validation,
thereby exhibiting a wide range of writing styles and subtleties. The and testing facilitated the development and assessment of the model,
conversion from a physical to a digital medium was executed with a hence confirming its performance. We employed performance met-
high degree of finesse and skill. Every page of handwriting was scanned rics such as F1- score, accuracy, precision, and recall in conjunction
at 800 dpi, which means that every stroke and curve was caught with with confusion matrix analysis to evaluate the efficacy of the model.
the highest level of detail possible, as depicted in Fig. 2. The comparison with baseline approaches provided valuable insights
The careful segmentation of the dataset adds even more modifi- into the model’s enhancement. Furthermore, we took into account
cations to it. Following the grid lines, each character was taken out, the equilibrium of data, quality management, and the special features
turning the handwritten pages into an enormous collection of individ- of Bangla handwritten characters, ensuring that the dataset is both
ual images. There are a total of 5750 images stored in the 50 folders representative and applicable to many contexts. Ethical considerations,
that make up the collection, with each folder holding 115 images. of utmost importance, guided the responsible management of sensi-
Additionally, the experiment took place using the CMATERdb dataset tive data. We maintained thorough documentation and metadata to
(5750 images taken). Samples from different datasets are shown in promote transparency and reproducibility, thereby simplifying future
Figs. 3 and 4 research endeavors. By combining these methods, we performed a
We used a comprehensive methodology to evaluate our model’s per- comprehensive assessment of our main dataset, thereby improving the
formance in recognizing Bangla handwritten characters on our primary trustworthiness and dependability of our research findings in Bangla
dataset. This approach incorporated conventional evaluation methods handwritten character recognition.

4
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Table 2
Tabular form of literature review.
Approaches Proposed idea Research gap
Created another web-based Bangla handwriting While Supervised layer-DCNN (SL-DCNN) have
Roy et al. recognition system. This study employs the quadratic potential in machine learning, its use in related fields
(2017) classifier to recognize online handwriting data of study remains unexplored. More research is
collected by mouse and touch screen input. This study necessary to determine how SL-DCNN may be used in
examined 12500 Bangla characters and 2500 Bangla other fields.
numbers. This method had a 98.42% accuracy rate on Enhancing SL-DCNN’s capabilities could involve
10 number classes and 91.13% accuracy on 50 utilizing current developments in deep learning, such
character classes. Maxout units, Dropout, and Transfer Learning. It is
necessary to do research aimed at incorporating these
methods into SL-DCNN designs.

Rabby et al. Introduced Borno, which is the first multiclass In order to improve performance, there may be a lack
(2020) convolutional neural network model for of research in the area of language-specific
grapheme-based handwritten letters in Bangla, optimization strategies designed for Bangla character
gathered 1,069,132 pictures from Ekush, MatriVasha, recognition tasks.
CMATERdb, and BanglaLekha-Isolated to train a
model. The trained Borno model recognizes characters
with 92.61% accuracy on the validation set.

Das et al. Made an extended CNN model to figure out how to There is a research gap in comparing the model to
(2021) read Bangla handwriting. It uses the more advanced methods.
BanglaLekha-Isolated dataset, which has 10 numbers,
11 vowels, and 39 consonants, to test their CNN
model. The model recognizes Bangla digits at 99.50%,
vowels at 93.18%, consonants at 92.25%, and mixed
classes at 92.25%.

Jadhav et al. Used CNN to develop a low-cost architecture for There is still chance of optimization and development
(2022) Bengali character recognition using datasets such as accuracy.
CMATERdb, BanglaLekha-Isolated, and Ekush. The
suggested approach uses a CNN on both simulated
and actual data. CMATERdb, BanglaLekha-Isolated,
and Ekush recognized 87%, 89.60%, and 83.10% of
the terms, respectively.

M. Raquib (Proposed) Proposed a novel DCNN architecture with a novel The model may be made more flexible and useful by
dataset adding compound Bangla characters and numbers to
the dataset. Combining BHCR with other areas of
computer vision, such as segmentation and object
identification, provides new perspectives on how to
increase recognition precision in intricate layouts.

Fig. 3. Samples from primary dataset.

3.2. Dataset preprocessing

Preprocessing is one of the most critical aspects of HCR. Since


the accuracy of the HCR system is strongly dependent on the quality
of preprocessing and segmentation. The preprocessing phase’s major
goal is to make it as simple as possible for the handwritten character
recognition system to differentiate a character from the backdrop.
Images are preprocessed to eliminate distractions, improve readability
by adjusting contrast and brightness, and standardize character size and
Fig. 2. Example of primary bangla handwritten characters dataset. alignment. The Gaussian filter is used to reduce noise and decrease
edge sharpness. Gaussian filters are applied for the purpose of image
smoothing, hence allowing the reduction of minor fluctuations in pixel
values. This part is important for handwriting character recognition
because it can lessen the effect of small changes in writing style. A

5
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Fig. 4. Samples from CMATERdb dataset.

scanned paper may occasionally be somewhat skewed, meaning the


image is aligned with horizontal at an angle. While extracting data from
a scanned image, skews must be found and corrected using the Hough
Transform algorithm. The application of the Hough Transform enables
the identification of lines inside an image, encompassing lines that are
inclined or distorted. The use of dilation allows for the constancy of
the stroke width to be adjusted. Because different authors have their
own individual writing styles, the width of their strokes will also vary.
The grid line was applied to segment the images into distinct character
images. The segmented character images were subsequently organized
into folders based on their respective image classes. There were 50
Fig. 5. Original scanned image.
folders, each containing images of a distinct class. To precisely segment
the character region of each image, the bounding box method is used.
As a result, the dataset’s size was decreased and the focus was on the
character. The primary datasets that we have collected, as well as the
CMATERdb dataset, all follow the same preprocessing technique. Here
are various more preprocessing stages described accordingly.

• RGB to Gray: Image conversion from red–green–blue (RGB) to


grayscale is an important part of this procedure. There is a lot
of information in an RGB image that might not be needed for
processing. Many details that are unnecessary for processing are
removed when converting an RGB image to grayscale. All of the
images that come in are loaded as grayscale images and then
turned into 3D tensors. This lets the model see more fine details
in the images that come in. Fig. 6. Image after segmentation.
• Gray to 3D and 3D to 4D Conversation: The 3D tensors (224 ×
224 × 3) are subsequently transformed into 4D tensors (1 × 224 ×
224 × 3). Several channels, such as the red, green, and blue com-
3.4. Proposed VashaNet architecture
ponents of a color image, can be processed with relative ease due
to the channel dimension of the 4D tensor. With this technology,
DCNN is one of the most efficient techniques on the scene for classi-
a more comprehensive depiction of Bangla handwriting might
be acquired. The input process for DCNN in Bangla Handwriting fying images. DCNN was conceived with the visual cortex of the brain
Character Recognition can be made simpler by converting images in mind. The human brain has many layers, each of which may pick up
from 3D to 4D tensors. on increasingly nuanced details in order to identify an object. In the
• Rescale: At the last stage, change the size of the images by digital realm, a picture can be sent to a DCNN image classifier, which
dividing each pixel by 255 to get a new scale. In most instances, will then analyze it and place it into one of several predetermined
images are resized so that all pixel values fall between 0 and 1. categories. Keras and Tensorflow were utilized to build the model that
Resizing also decreases lighting or brightness disparities between was used to categorize the Bangla handwritten characters. To train
images. In Fig. 5 shows the scenario of original scanned images a neural network model, divide Bangla handwritten letters into 50
and Fig. 6 shows the scenario after splitting images. categories and employ a multilayer convolutional neural network. The
proposed VashaNet model is a sequential classifier, which we define in
3.3. Train-validation-test set terms of the DCNN architecture that is presented in Fig. 8.
The two datasets used in our research were subjected to the same ar-
The present work uses two distinct datasets, such as a collection chitectural framework. We described each layer briefly of our proposed
of 5750 images obtained from our primary dataset, a set of 5750(out VashaNet architecture in the following:
of 15000 images) images sourced from CMATERdb. Each dataset is
partitioned into an 80:20 ratio, allocating 80% of the data for train- • Convolution Layer 1: Layer 1 of the classifier is a 2D convo-
ing purposes and reserving the remaining 20% for validation. The lutional layer with the following parameters: filter size = 16,
validation subset is separated into equal parts for subsequent testing. which is used for capturing local patterns and features, kernel
Fig. 7 shows the summary of dataset splitting into train, validation size = (5, 5), according to the parameter’s instructions, it traverses
and test for two kinds of datasets, respectively, that were mentioned the input in stride increments. The elements inside the kernel
earlier. are multiplied individually with the corresponding elements of

6
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Fig. 7. Summary of the dataset splitting.

Fig. 8. Proposed VashaNet architecture.

the input data at each position. The resulting products are then • Sequential Layer 1:The sequential layer is made up of two
summed together to produce a single value within the output convolutional layers with 32 filters, each employing 3 × 3 ker-
feature map, activation function = ReLu, it provides the network nels, ReLU activation, and ‘same’ padding. It has batch nor-
with non-linearity, allowing it to comprehend and communicate malization for standardization and a 2D max-pooling layer for
complex data correlations. We set padding = ‘‘same’’, to ensure downsampling. The output form from this layer is (56 × 56 × 32).
that the size of the output and input are equal. The output of this • Max Pooling Layer : This max-pooling layer uses the same
layer is (224 × 224 × 16), where 16 defines the filters number. configuration as the first max-pooling layer and reduces the di-
• Convolution Layer 2: This layer is similar to Layer 1, but not mensions further to (28 × 28 × 32).
exactly. This layer contains 32 filters. This layer can identify more • Sequential Layer 2:The sequential layer is a carbon copy of
intricate patterns in the feature map because to its larger filter previous Sequential Layer 1, except for the filter size 64 and
count (32) with same activation function. This layer generated output shape, which is (14 × 14 × 64).
(224 × 224 × 32) output. • Max Pooling Layer :Except for the output being (7 × 7 × 64),
• Convolution Layer 3: It uses a 2D convolutional operation with this layer is an identical clone of the other max pooling layer with
a collection of 64 different filters, building on the framework the same dropout.
established by earlier layers. The increased number in filters • Sequential Layer 3:With the exception of the output shape,
improves its ability to distinguish more complex features, making which is (3 × 3 × 128) and the filter size of 128, the sequential
it a key participant in advancing the network’s abstraction. The layer is an exact duplicate of the prior sequential layer 2.
spatial dimensions of the output feature map from this layer • Flatten Layer: This takes the result of all the earlier actions and
are (224 × 224 × 64), a slight reduction from the size of its applies them to the flatten layer, turning it into a flattened array.
predecessor. • Sequential Layer 4: The sequential layer is made up of a fully
• Max Pooling Layer : The output of the preceding layer is sent connected dense layer with 512 hidden unit with ReLu activation,
into the max-pooling layer, which reduces the size of the current one batch normalization and 30% dropout layer.
shape into (112 × 112 × 64). The subsequent layer is a max- • Sequential Layer 5: The sequential layer is exactly same with
pooling layer with a pool size of (2, 2), which denotes the use of a previous Sequential Layer 4 except 256 hidden units.
2 × 2 window for the downsampling process. Max-pooling keeps • Output Layer: Since the multiclass classifier uses 50 classes, its
the most important components from each 2 × 2 region while final layer uses a fully connected dense layer with 50 units and
cutting the spatial dimensions in half with a 2 × 2 pool size. softmax as the activation function.

7
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Table 3 deeper levels of convolutional layers, use typical kernel sizes such as
VashaNet model architect summary. 3 × 3 or 5 × 5. In order to minimize spatial dimensions, use max-
Layer Output shape No. of pooling layers, which typically have sizes of 2 × 2. Consider dropout
parameters
regularization to avoid overfitting, and apply batch normalization after
Convolution2D 1 (None, 224, 1216 convolutional layers for stability. Before sending the feature maps to
224, 16)
deep layers, flatten them and modify the number of neurons according
Convolution2D 2 (None, 224, 12 832 to the difficulty of the job. Except for the output layer, where softmax is
224, 32)
suitable for classification tasks, use ReLU activation across the network.
Convolution2D 3 (None, 224, 51 264 Select a suitable optimizer, such as Adam, modify the learning rate
224, 64)
appropriately, and think about putting learning rate schedules into
Max Pooling2D (None, 112, 0 place for optimization. The overall structure was constructed through
112, 64)
a series of studies and experiments with different architectures, depths,
Sequential 1 (Convolution2D 4, (None, 56, 27 840 and widths. Performance measures like accuracy, precision, recall, and
Convolution2D 5, Batch 56, 32)
Normalization 1, Max Pooling2D )
F1-score were monitored on validation data, and adjustments were
made as needed through iteration and refinement.
Max Pooling2D (None, 28, 0
28, 32)
4. Experimental results over the primary dataset
Sequential 2 (Convolution2D 6, (None, 14, 55 680
Convolution2D 7, Batch 14, 64)
Normalization 2, Max Pooling2D ) The analysis of the results is divided into groups based on batch
sizes, epochs, the ‘‘Adam’’ optimizer’s different learning rates, and
Max Pooling2D (None, 7, 7, 0
64) the ratio of how often the dataset is split. True Positive (TP), False
Positive (FP), True Negative (TN), and False Negative (FN) are the four
Sequential 3 (Convolution2D 8, (None, 3, 3, 221 952
Convolution2D 9, Batch 128) categories used to assign ratings.
Normalization 3, Max Pooling2D )
Flatten (None, 1152) 0 4.1. Accuracy
Sequential 4 (Dense Layer 1, Batch (None, 512) 592 384
Normalization 4, Dropout 30%)
The mostly used performance metric for deep learning is accu-
racy. The accuracy of a classification system is measured by how
Sequential 5 (Dense Layer 2, Batch (None, 128) 66 176
Normalization 5, Dropout 30%) many classes it correctly predicts. It works better when used on a
symmetric dataset, where the likelihood of false positives and false
Output Layer (None, 50) 6450
negatives is equal. The basic equation for calculating the accuracy is
as follows (Hossain et al., 2021):
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1)
3.5. Training the model 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Using this strategy, after 50 iterations with a batch size of 24 and the
To start, we used a batch size of 24 for training our VashaNet model adam optimizer, our classifier model achieves an accuracy of 99.70%
on the dataset. After training for 50 iterations with a learning rate of in training and 94.78% in validation and 94.60% test accuracy on the
0.008 with adam optimizer, we achieved respectable results on our primary dataset. Fig. 9 shows that the training and validation accuracy
dataset. Here is a summary of our VashaNet model in Table 3: initially neared the same level.
one another intimately. But, when looking at the situation as a
3.6. Dropout whole, it can be seen that neither of them has very good accuracy in the
beginning stage. On the other hand, the accuracy of validation is lower
The dropout layer is utilized during training to arbitrarily ignore than the accuracy of training, means my model does not overfit the
a specific proportion of neurons. This indicates that the weight and data we used to train it. Fig. 10 represents a comparison between the
bias of those neurons are not changed in the backward pass, so their loss incurred during training and the loss incurred during validation.
contributions to activation in the forward pass are slowly lost over time. This indicates that the training loss and the validation loss were both
By doing so, it brings the neural network into order. With the addition much higher at the beginning of the process. The validation loss and the
training loss both go down with each successive epoch. The validation
of this layer, convolutional neural networks may be trained without
loss is still bigger than the training loss, which can be determined
risking overfitting.
simply looking at the graph. This is as a result of the model in use not
So, selecting the best deep convolutional neural network with 26
being overfit to our training data.
layer for bangla handwritten character image classification task for our
primary dataset is dependent on a number of factors, including the
4.1.1. Optimizer (adam)
size of our dataset, the computational resources available, reviewing
In Adam, many learning rates were investigated for training. Among
the existing architecture, the transfer learning potential, experiments, different learning rate on Adam, 0.0008 learning rate provide the
fine-tuning and optimization, iterating and refining, performance eval- higher degree of accuracy. This is a summary of Adam optimizer with
uation, and the specific requirements of our application. Prior research different learning rates in Table 4:
and experiments in the field of image classification, particularly for Table 4 illustrates the training and validation losses and accuracies
comparable tasks involving character recognition, are likely to have produced using the Adam optimizer for various learning rates. Among
influenced the architectural decisions, including the number of layers, the different learning rates, 0.0008 learning rate achieved a maximum
layer varieties, and layer configurations. We select our VashaNet model 94.60% test accuracy.
following significant investigation and experimenting with 36 different
architectural variants which was customized to our basic collection 4.1.2. Batch sizes
of 5750 images. We design a 26-layer DCNN architecture which is The proposed Vashanet model was trained using batches of 16, 24,
depicted in Fig. 8 and Table 3. The first step in choosing the parameters and 32 respectively. Fig. 11 displays the optimal outcome for a batch
for a CNN architecture is to comprehend the features of our dataset, size of 24. However, training takes more time with bigger batches, and
such as its complexity and size. To increase the number of filters at training per epoch has gone up by a large margin.

8
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Fig. 11. Performance analysis for different batch sizes.

Fig. 9. Comparison between training and validation accuracy.

Fig. 12. Performance analysis for different train–test split ratio.

Table 5
Precision score on primary dataset.
Dataset name Precision score (%)
Test Data 94.60

4.1.4. Confusion matrix


The confusion matrix provides a summary of the prediction results.
The expected categorization issue outcome is briefly described in the
confusion matrix table. It provides useful information for understanding
Fig. 10. Comparison between training and validation loss.
the model by indicating the proportion of correct and wrong predic-
tions. We create the confusion matrix during the validation data split
Table 4 used in our train–test configuration, which is shown in Fig. 13.
Effect of different learning rates. The confusion matrix reveals that ‘do’, ‘bha’, ‘ddra’ were misidenti-
Learning Training Training Validation Validation fied the most. The structural similarities between ‘do’, ‘bha’, and ‘ddra’
rate loss accuracy (%) loss accuracy (%) may be the root cause of this mistake.
0.0001 0.0075 99.83 0.0075 96.87
0.0002 0.0362 98.24 0.2283 93.39 4.1.5. Precision
0.0003 0.0124 99.70 0.1949 94.43
Precision yields an optimistic estimated value. The percentage of
0.0004 0.0165 99.52 0.3316 93.39
0.0005 0.0207 99.33 0.4039 90.78 positive predictions made by a model that are accurate serves as
0.0006 0.0413 98.61 0.1897 93.74 evidence of its predictive efficacy. The formula for determining the
0.0007 0.0103 99.76 0.2142 94.26 accuracy rating is as follows (Ahmed et al., 2022):
0.0008 0.0106 99.70 0.1904 94.78
0.0009 0.0320 98,98 0.3578 94.26 𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2)
0.001 0.0216 99.28 0.3580 90.78 𝑇𝑃 + 𝐹𝑃
We calculated the precision score for the test data. It gives the score
as below in Table 5:
Our VashaNet model’s precision score of 94.60% that shown how
4.1.3. Train–test split ratio
well it can pick out positive events from the total number of positive
During model training, we used a range of four possible split ratios. events that have been predicted. With such a high accuracy rating, we
Eighty-to-twenty is the most precise ratio shown in Fig. 12. Each part can be sure that the model knows how to read handwritten Bangla
of the validation split is then partitioned into a test dataset and a characters correctly. The precision score is useful, but it does not tell
validation dataset. Fig. 12 displays a summary of the performance of the whole story of the model’s efficacy. For a more complete picture of
the difference splitting ratio. the model’s efficacy, it is also crucial to look at metrics like recall, F1

9
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Fig. 13. Confusion matrix for 50 classes on validation data.

Table 6 Our VashaNet model’s recall score of 95.25% suggests that it can
Recall score on primary dataset.
reliably recognize all positive events. What this means is that the model
Dataset name Recall score (%)
accurately identified 95.25% of all true positives. A greater recall score
Test Data 95.25 is recommended when the effects of false negatives are substantial.
However, in other contexts, where false positives are expensive, high
precision scores are more crucial (see Table 6).
score, and accuracy. Our VashaNet model’s precision score of 94.60%
reflects its success in recognizing positive occurrences from the total 4.1.7. F1 score
projected positive instances. During the experimentation phase, we conducted tests using the
testing portion of the dataset. Through the use of our VashaNet model,
4.1.6. Recall we were able to attain an F1 score of 94.62%. This shown that the
Recall score is a metric used to assess a model’s ability to predict model has a good balance between precision (how many of the pre-
positive classifications accurately. The accuracy score represents the dicted characters are right) and recall (how many of the actual char-
value of a positive prediction over all classes that were successfully acters are correctly identified by the model). By using the following
predicted. The proportion of samples correctly classified as positive equation, we can calculate the F1 score (Ahmed et al., 2022):
in relation to all real positive samples is known as the percentage of 2 × (𝑅𝑒𝑐𝑎𝑙𝑙 × 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
𝐹 1 − 𝑆𝑐𝑜𝑟𝑒 = (4)
true positive samples. Below is the formula for determining a recall 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
score (Ahmed et al., 2022): The F1 score is given in Table 7:
𝑇𝑃 Our VashaNet model that was trained on character recognition in
𝑅𝑒𝑐𝑎𝑙𝑙 = (3)
𝑇𝑃 + 𝐹𝑁 Bangla handwritten characters got an F1 score of 94.62%. This is a
We also calculated the recall sore for the test data image using the good sign that the model can be used effectively in real-world use.
above describe equation and methods. It gives the score as below in The F1 score is obviously beneficial; however, it is necessary to take
Table 6: into consideration several other factors when selecting an appropriate

10
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Table 7
F1 score on primary dataset.
Dataset name F1 score (%)
Test Data 94.62

Fig. 15. Result analysis according to training accuracy, validation accuracy, test
accuracy, precision, recall, F1, and specificity.

Fig. 14. Result analysis according to precision, recall, and F1 score.

Table 8
Specificity score on primary dataset.
Dataset name F1 score (%)
Test Data 99.89

model for a specific assignment. Additional factors that require consid-


eration include measures related to computing efficiency, data quality,
and practical applicability.
There is a summary of experimental results that are shown in Fig. 14
for each class with respect to precision, recall, and F1 score. Fig. 16. Training accuracy, validation accuracy and Test accuracy of the overall
datasets.
4.1.8. Specificity
Our VashaNet model’s specificity score of 99.89% for Bangla hand-
written character identification indicates that the model properly rec- training accuracy, validation accuracy, test accuracy, precision, recall,
ognizes 99.89% of the negative cases, or situations in which the model F1 score, specificity.
correctly detects characters that are not present in the input. By using
the following equation, we can calculate the specificity score (Chu,
4.3. Model performance over other datasets
1999):
𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓 𝑖𝑐𝑖𝑡𝑦 = (5) This study looks at how well a DCNN model called VashaNet can
𝑇𝑁 + 𝐹𝑃
recognize handwritten characters in Bangla. It uses two datasets: a pri-
The Specificity score is given in Table 8: mary dataset that we provided, a benchmark dataset called CMATERdb.
This high specificity score indicates that the model performs well in Across all two datasets, the study compares the model’s performance
distinguishing between various Bangla characters, with few misclassi- in terms of precision score, recall score, f1 score, specificity, training
fications of unrelated characters (see Table 8).
accuracy, validation accuracy, training losses, and validation losses.
The outcomes demonstrate how well the CNN model performs on all
4.2. Performance analysis over pre-trained models
two datasets when it comes to handwritten character recognition in
Bangla.
Our proposed VashaNet technique correctly identifies handwritten
Bengali characters. To enhance its significance, our VashaNet tech-
nique is evaluated against four pre-trained models: VGG-16, MobileNet, 4.3.1. Accuracy and losses
DenseNet, and ResNet50. Our model outperforms them all. With a After running 50 epochs with a batch size of 24 and utilizing
remarkable test accuracy of 94.60%, VashaNet demonstrates its effi- the Adam optimizer with 0.0008 learning rate, our VashaNet model
cacy in accurately determining various character types associated with. achieved training accuracies of 99.70%, 99.70%on the primary dataset,
This superior performance extends across multiple evaluation metrics, CMATERdb dataset respectively. Furthermore, the Vashanet model gets
including precision (94.60%), F1-score (94.62%), recall (95.25%), and a validation accuracy of 94.78% for the primary dataset, 92.70% for the
specificity (99.89%). Such comprehensive superiority underscores the CMATERdb dataset. Similarly, our model achieve 94.60%, and 94.43%
robustness and reliability of our proposed method for bangla handwrit- test accuracy for primary and CMATERdb dataset respectively.
ten character image analysis. The experiment shows that our VashaNet model perform equally
There is a summary of experimental results that are shown in Fig. 15 on both datasets. The training accuracy, validation accuracy and test
for VashaNet, VGG-16, MobileNet, ResNet50, DenseNet with respect to accuracy of the overall datasets are shown in Fig. 16.

11
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Table 9
Summary of VashaNet model performances on different datasets.
Dataset Training Validation Training Validation
loss loss accuracy (%) accuracy (%)
Primary 0.0106 0.1904 99.70 94.78
CMATERdb 0.0132 0.3170 99.70 92.70

Fig. 17. Accuracy and losses of primary dataset. Fig. 19. Precision score analysis among the three datasets.

Fig. 18. Accuracy and losses of CMATERdb dataset.

Summary of the performances of overall VashaNet model are shown


in Table 9.
It is clear from comparing the left portion of the curves in Figs. 17,
and 18, the training accuracy and validation accuracy initially showed
a close relationship to one another. The difference between the losses Fig. 20. Recall score analysis among the three datasets.
experienced during the training phase and the losses experienced dur-
ing the validation phase can be seen by comparing these curves, which
are shown in the right section of Fig. 17 and in Figs. 18. This implies 4.3.4. F1 score
that the training loss and validation loss displayed considerably higher According to the VashaNet model, the primary dataset, CMATERdb
values throughout the early phases of the procedure. Both the valida- dataset each have F1 scores of 94.62%, and 93.74%, respectively.
tion loss and the training loss demonstrate a declining pattern with each Fig. 21 displays the overall F1-score dataset summary.
successive epoch. The graphical representation demonstrates that the
validation loss consistently surpasses the training loss. Nevertheless, it 4.3.5. Specificity
is important to acknowledge that the accuracy of the validation set is According to the VashaNet model, the primary dataset, CMATERdb
relatively inferior to that of the training set, suggesting that the model dataset each have specificity scores of 99.89%, and 99.87%, respec-
fails to show overfitting to the specific data it was trained on across all tively. Fig. 22 displays the overall specificity score of two datasets.
datasets.
4.4. Performance analysis over pre-trained models on cmaterdb dataset
4.3.2. Precision
We compare our suggested handwritten character image classifica-
The Vashanet model demonstrates precision scores of 94.60%, and
tion capacity with the identical pre-trained models for the CMATERdb
94.32% for the primary dataset, and CMATERdb dataset respectively.
dataset in order to emphasize its capabilities even more. For each of
The summary of the precision overall dataset is shown in Fig. 19.
the 50 character types, Fig. 23 shows the accuracy, precision, F1-score,
sensitivity, specificity, and average outcomes of these models compared
4.3.3. Recall to our VashaNet model. This figure shows the average accuracy of our
The VashaNet model displays recall scores of 95.25% for the pri- suggested model, which is the greatest among the models, for VGG-16,
mary dataset, and 93.74% for the CMATERdb dataset respectively. ResnNet50, MobileNet, DenseNet, and VashaNet, which are 88.17%,
Fig. 20 shows an overview of the recall score based on the two datasets. 87.47%, 86.08%, 90%, and 94.43%, respectively. Our suggested tech-
. nique performs better than the others not only in terms of accuracy

12
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Fig. 21. F1 score analysis among the three datasets. Fig. 22. Specificity analysis between the two datasets.

but also in other assessment metrics, such as precision (94.32%),


recall (93.74%), F1-score (93.74%), and specificity (99.87%) for the
CMATERdb dataset. Our VashaNet model’s consistent performance on
two different datasets in training accuracy, validation accuracy, test
accuracy, precision, recall, f1 score, specificity, is an indication of its
stability and capacity for generalization. It shows that instead of be-
coming overfit to certain properties of a single dataset, the model may
efficiently acquire significant features and patterns that are constant
across many types of data. This indicates that there is a high likelihood
of transfer learning since the model may learn characteristics that are
relevant to a variety of domains. Moreover, it shows that the model can
adequately capture complex patterns found in both datasets, demon-
strating its versatility and efficiency when working with a variety of
data sources.
There is a summary of experimental results that are shown in Fig. 23
for VashaNet, VGG-16, MobileNet, ResNet50, DenseNet with respect to Fig. 23. Result analysis according to training accuracy, validation accuracy, test
training accuracy, validation accuracy, test accuracy, precision, recall, accuracy, precision, recall, F1, and specificity for CMATERdb dataset.

F1 score, specificity for CMATERdb dataset.

4.5. Comparison with existing works DCNN, Hidden Markov Model (HMM) and other methods for bangla
handwritten character recognition. Additionally, the research stands
The present study presents an innovative methodology by con- out greatly due to its careful concentration on data.
ducting experiments using a recently developed primary dataset. The Rather than following only conventional practices of utilizing pre-
incorporation of this novel dataset, previously unexplored in the realm existing datasets, this study chooses an innovative methodology by
of research, introduces a novel dimension to the study. The model conducting experiments with a rigorously curated dataset. This pri-
applied in this study is particularly noteworthy as it constitutes a novel mary dataset reflects the authentic variety of diversity and variability
performance, given that its architecture has not been employed in any observed in handwritten Bangla characters in real-world settings. The
VashaNet model demonstrated remarkable performance in a very re-
other research. The model shows exceptional accuracy, particularly
alistic and intricate environment, achieving an exceptional training
when used with our developed dataset and CMATERdb dataset, high-
accuracy of 99.70% and an equally impressive validation accuracy of
lighting its competence in newly unexplored domains. Although there
94.78% and test accuracy of 94.60% for its primary dataset. In addi-
are many CNN architectural designs (Ahmed et al., 2022; Chowdhury
tion, the model achieves a training accuracy of 99.70%, a validation
et al., 2019; Purkaystha et al., 2017) in the field, none of them corre-
accuracy of 92.70% and a test accuracy of 94.43%, when evaluated
spond to the architecture of our proposed model. We have developed on the CMATERdb dataset. For the relevant studies, no source codes
a pioneering methodology for primary dataset acquisition, which com- were found to be implemented for our developed dataset, our VashaNet
plements our innovative DCNN architecture. The Comparative analyzes model continuously outperformed previous attempts (Biswas, Bhat-
of the results of the proposed model with similar works for different tacharya, & Parui, 2012; Rahman et al., 2015; Shaikh et al., 2013),
bangla character datasets are depicted in Table 10. proving that it is state-of-the-art accuracy for this primary dataset
According to Table 10, in comparison to past efforts in the field specifically. In this particular aspect, our VashaNet model has achieved
of Bangla handwritten character identification, which relied mostly on more precise performance through the use of a novel DCNN architec-
classical machine learning approaches (Sarkhel et al., 2016), the use ture. These important results not only highlight the power of the DCNN
of deep learning models (Das et al., 2021; Purkaystha et al., 2017; architecture we provide, but they also highlight the revolutionary
Roy et al., 2017) has proven to be substantially more effective. For potential of deep learning in enhancing precision and applicability in
this objective, our research presents a complete comparison of CNN, this crucial area.

13
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

Table 10
Comparison with existing works.
Author Methods Datasets Results
accuracy (%)
Biswas et al. HMM Primary 91.85
(2012)
Rahman et al. CNN BanglaLekha 89.3
(2015)
Sarkhel et al. SVM CMATERdb 87.28
(2016)
Fig. 24. Absence of ‘matra’.
Roy et al. (2017) DCNN with CMATERdb 90.33
inception
Purkaystha et al. DCNN BanglaLekha- 91.23
(2017) Isolated
Shaikh et al. KNN Primary 76.80
(2013)
Pramanik and Shape ICDAR 88.74
Bag (2018) Reduction
Chowdhury CNN Banglalekha- 91.81
et al. (2019) isolated
Rabby et al. Multiclass Assembled 92.61
(2020) CNN
Fig. 25. Overwriting or dissimilar shape.
Das et al. (2021) CNN BanglaLekha- 92.25
Isolated
Jadhav et al. CNN Ekush 83.10
(2022)
Ahmed, Akter CNN Primary 88.48
Mim, Saha,
Nahar, and
Aynul Hasan
Nahid (2023)
Our Proposed VashaNet Primary, 94.60, 94.43
Model (DCNN) CMATERdb

Fig. 26. Unnecessary ‘matra’.

The research study introduces a novel DCNN model called VashaNet,


paving the way for a revolutionary method of handwritten character
model. Our proposed VashaNet model performed incredibly well, with
identification in Bangla. VashaNet is able to achieve state-of-the-art
test accuracy at 94.60%, and 94.43% over our primary, and CMATERdb
accuracy after a thorough experimental process over a primary dataset
dataset respectively. The VashaNet model demonstrates efficacy in
consisting of 5750 photos and comparison with advanced pretrained
effectively addressing the challenges associated with processing authen-
models including VGG-16, MobileNet, ResNet-50, and DenseNet. Addi-
tic handwritten characters. Even though this method can find Bangla
tionally, VashaNet shows its resilience and adaptation across several
alphabet letters, it cannot find a string of characters. The model may be
datasets by not only outperforming these pretrained models on the
made more flexible and useful by adding compound Bangla characters
primary dataset but also displaying similar accuracy on the CMATERdb
and numbers to the dataset. Combining BHCR with other areas of com-
dataset. The absence of prior use or open source code for the pri-
puter vision, such as segmentation and object identification, provides
mary dataset highlights the architecture’s originality and ingenuity,
new perspectives on how to increase recognition precision in intricate
cementing VashaNet’s credibility as a pioneering solution in Bangla
layouts. Additionally, document management and retrieval systems
handwritten character identification. The inclusion of both a unique
may be made more efficient by including BHCR models into textual
dataset and an innovative model architecture significantly enhances the
categorization tasks. Investigating multi-modal strategies that integrate
novel status of this research within the field of BHCR. The accuracy
textual and visual characteristics from handwritten text can open up
of character recognition systems in practical applications like optical
new avenues for increasing classification accuracy. All things consid-
character recognition, language translation, and archival digitization
ered, multidisciplinary BHCR research projects promise for improving
will be directly impacted by this development, especially for scripts
recognition technology .
with complex syllabi like Bangla.
Abbreviations table
4.6. Error analysis

There are some inaccuracies in our model as shown in Figs. 24–


Table 11
26 but overall it provides a good measurement of performance. The
Abbreviations table.
inaccuracy can be attributed to a number of factors, including the
Short form Abbreviation
presence of extraneous ’matra,’ the absence of essential ’matra,’ a highly
DCNN Deep Convolutional Neural Network
different character shape, or ‘overwriting’.
HCR Handwritten Character Recognition
CNN Convolutional Neural Network
5. Conclusion and future work BHCR Bengali Handwritten Character Recognition
MLP Multilayer Perceptron
This study presents a light of innovation in the field of Bangla char- K-NN k-nearest Neighbors
DTW Dynamic Temporal Warping
acter recognition, revealing an automated system driven by VashaNet

14
M. Raquib et al. Machine Learning with Applications 17 (2024) 100568

CRediT authorship contribution statement Islam, M. S., Rahman, M. M., Rahman, M. H., Rivolta, M. W., & Aktaruzzaman, M.
(2022). Ratnet: A deep learning model for bengali handwritten characters recog-
nition. Multimedia Tools and Applications, [ISSN: 15737721] 81, http://dx.doi.org/
Mirza Raquib: Conceptualization, Methodology, Software, Valida-
10.1007/s11042-022-12070-4.
tion, Writing – original draft. Mohammad Amzad Hossain: Supervi- Jadhav, R., Gadge, S., Kharde, K., Bhere, S., & Dokare, I. (2022). Recognition of
sion, Writing – review & editing. Md Khairul Islam: Writing – review handwritten bengali characters using low cost convolutional neural network. In
& editing. Md Sipon Miah: Writing – review & editing. 2022 interdisciplinary research in technology and management (pp. 1–6). IEEE.
Jubaer, S. M., Tabassum, N., Rahman, M. A., & Islam, M. K. (2023). BN-DRISHTI:
Bangla document recognition through instance-level segmentation of handwritten
Declaration of competing interest text images. arXiv preprint arXiv:2306.09351.
Kaur, P., Kumar, Y., Ahmed, S., Alhumam, A., Singla, R., & Ijaz, M. F. (2022). Automatic
The authors declare that they have no known competing finan- license plate recognition system for vehicles using a cnn.. Computers, Materials &
cial interests or personal relationships that could have appeared to Continua, 71(1).
Kumari, T., Vardan, Y., Shambharkar, P. G., & Gandhi, Y. (2022). Comparative study on
influence the work reported in this paper.
handwritten digit recognition classifier using cnn and machine learning algorithms.
In 2022 6th international conference on computing methodologies and communication
Data availability (pp. 882–888). IEEE.
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., et al. (1989).
Data will be made available on request. Handwritten digit recognition with a back-propagation network. Advances in Neural
Information Processing Systems, 2.
Liu, W., Wei, J., & Meng, Q. (2020). Comparisions on KNN, SVM, BP and the CNN for
Acknowledgment handwritten digit recognition. In 2020 IEEE international conference on advances in
electrical engineering and computer applications (pp. 587–590). IEEE.
We would like to acknowledge the support provided by the Depart- Maitra, D. S., Bhattacharya, U., & Parui, S. K. (2015). CNN based common approach
to handwritten character recognition of multiple scripts. In 2015 13th international
ment of Information and Communication Engineering, and Research
conference on document analysis and recognition (pp. 1021–1025). IEEE.
cell of Noakhali Science and Technology University, Noakhali-3814, Narayan, A., & Muthalagu, R. (2021). Image character recognition using convolutional
Bangladesh. neural networks. In 2021 seventh international conference on bio signals, images, and
instrumentation (pp. 1–5). IEEE.
References Pramanik, R., & Bag, S. (2018). Shape decomposition-based handwritten compound
character recognition for bangla OCR. Journal of Visual Communication and Im-
age Representation, [ISSN: 10959076] 50, http://dx.doi.org/10.1016/j.jvcir.2017.11.
Ahmed, S. S., Akter Mim, A., Saha, D., Nahar, K., & Aynul Hasan Nahid, M. (2023).
016.
A CNN-based novel approach for the detection of compound bangla handwritten
Purkaystha, B., Datta, T., & Islam, M. S. (2017). Bengali handwritten character
characters. In 2023 11th international symposium on digital forensics and security (pp.
recognition using deep convolutional neural network. In 2017 20th international
1–6). http://dx.doi.org/10.1109/ISDFS58141.2023.10131677.
conference of computer and information technology (pp. 1–5). IEEE.
Ahmed, T., Uddin, M., Khan, M. A. R., & Hasan, A. R. M. (2022). Offline handwritten
Rabby, A. S. A., Islam, M. M., Hasan, N., Nahar, J., & Rahman, F. (2020). Borno: Bangla
character recognition including compound character from scanned document. Asian
handwritten character recognition using a multiclass convolutional neural network.
Journal of Research in Computer Science, 119–129. http://dx.doi.org/10.9734/ajrcos/
In Proceedings of the future technologies conference (pp. 457–472). Springer.
2022/v14i4297.
Rahman, M. M., Akhand, M., Islam, S., Shill, P. C., Rahman, M., et al. (2015). Bangla
Begum, H., Islam, M. M., Eva, H. S., Emon, N. H., & Siddique, F. A. (2023).
handwritten character recognition using convolutional neural network. International
Deep learning networks for handwritten bangla character recognition. International
Journal of Image, Graphics and Signal Processing (IJIGSP), 7(8), 42–49.
Journal of Applied Mathematics, 53.
Rakshit, P., Chatterjee, S., Halder, C., Sen, S., Obaidullah, S. M., & Roy, K. (2023).
Biswas, C., Bhattacharya, U., & Parui, S. K. (2012). HMM based online handwritten
Comparative study on the performance of the state-of-the-art CNN models for
bangla character recognition using Dirichlet distributions. In 2012 international
handwritten bangla character recognition. Multimedia Tools and Applications, 82(11),
conference on frontiers in handwriting recognition (pp. 600–605). IEEE.
16929–16950.
Chakraborty, P., Roy, S., Sumaiya, S. N., & Sarker, A. (2023). Handwritten character
Rasheed, A., Ali, N., Zafar, B., Shabbir, A., Sajid, M., & Mahmood, M. T. (2022).
recognition from image using CNN. In Micro-electronics and telecommunication
Handwritten urdu characters and digits recognition using transfer learning and
engineering: proceedings of 6th ICMETE 2022 (pp. 37–47). Springer.
augmentation with AlexNet. IEEE Access, 10, 102629–102645. http://dx.doi.org/
Choudhary, A., Ahlawat, S., & Rishi, R. (2014). A binarization feature extraction
10.1109/ACCESS.2022.3208959.
approach to OCR: MLP vs. RBF. Distributed Computing and Internet Technology, 341.
Roy, S., Das, N., Kundu, M., & Nasipuri, M. (2017). Handwritten isolated bangla
Chowdhury, R. R., Hossain, M. S., ul Islam, R., Andersson, K., & Hossain, S. (2019).
compound character recognition: A new benchmark using a novel deep learning
Bangla handwritten character recognition using convolutional neural network
approach. Pattern Recognition Letters, [ISSN: 01678655] 90, http://dx.doi.org/10.
with data augmentation. In 2019 joint 8th international conference on informatics,
1016/j.patrec.2017.03.004.
electronics & vision (ICIEV) and 2019 3rd international conference on imaging, vision
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., & Basu, D. K. (2012).
& pattern recognition (pp. 318–323). IEEE.
CMATERdb1: a database of unconstrained handwritten bangla and bangla–english
Chu, K. (1999). An introduction to sensitivity, specificity, predictive values and
mixed script document image. International Journal on Document Analysis and
likelihood ratios. Emergency Medicine, 11(3), 175–181.
Recognition (IJDAR), 15, 71–83.
Das, T. R., Hasan, S., Jani, M. R., Tabassum, F., & Islam, M. I. (2021). Bangla
Sarkhel, R., Das, N., Saha, A. K., & Nasipuri, M. (2016). A multi-objective approach
handwritten character recognition using extended convolutional neural network.
towards cost effective isolated handwritten bangla character and digit recognition.
Journal of Computer and Communications, 9(03), 158–171.
Pattern Recognition, [ISSN: 00313203] 58, http://dx.doi.org/10.1016/j.patcog.2016.
Dey, R., & Balabantaray, R. C. (2023). Development of a benchmark odia handwritten
04.010.
character database for an efficient offline handwritten character recognition with
Shaikh, S. H., Tabedzki, M., Chaki, N., & Saeed, K. (2013). Bengali printed character
a chronological survey. ACM Transactions on Asian and Low-Resource Language
recognition–a new approach. Information Systems and Industrial Management, 129.
Information Processing, 22(6), 1–28.
Wang, P., Fan, E., & Wang, P. (2021). Comparative analysis of image classification
Gupta, D., & Bag, S. (2021). CNN-based multilingual handwritten numeral recognition:
algorithms based on traditional machine learning and deep learning. Pattern
A fusion-free approach. Expert Systems with Applications, 165, Article 113784.
Recognition Letters, 141, 61–67.
Hossain, M. M., Asadullah, M., Rahaman, A., Miah, M. S., Hasan, M. Z., Paul, T.,
Zhang, C., Liu, B., Chen, Z., Yan, J., Liu, F., Wang, Y., et al. (2023). A machine
et al. (2021). Prediction on domestic violence in bangladesh during the covid-19
vision-based character recognition system for suspension insulator iron caps. IEEE
outbreak using machine learning methods. Applied System Innovation, 4(4), 77.
Transactions on Instrumentation and Measurement, 72, 1–13. http://dx.doi.org/10.
1109/TIM.2023.3300474.

15

You might also like