Bangla Continuous Handwriting Character and Digit Recognition Using CNN Id 295
Bangla Continuous Handwriting Character and Digit Recognition Using CNN Id 295
net/publication/339646668
CITATIONS READS
6 640
5 authors, including:
All content following this page was uploaded by Shifat Nayme on 19 June 2020.
1 Introduction
In this globe every region has separate languages. As Bangla is one of the most
familiar dialects on the planet and approximately 220 million individuals apply
Bangla for their utterance and writing motive. In this present paper we work on
Bangla language. Therefore, recognition of Bangla continuous handwritten characters
has a great significance. This present article describes a system which can recognize
continuous handwritten Bangla texts and digits. OCR is a system which optically read
document to human readable form to machine understandable form. This system is
very popular in practical life application like language processing, library automation,
reading aid for blind, post office, banks, and government documents digitalized etc.
This system has a few modules, which are given below in a sequence such as
preprocessing, line segmentation, word segmentation, character segmentation,
character recognition. To obtain the best result Convolutional Neural Network (CNN)
has been used as a classifier for character recognition system.
2 Literature review
In 1870, a retina scanner which was invented by Carey [1], is an image transmission
system and it is the first character recognition system. There are two different types of
Bangla scripts are existing one is machine printed and another is Handwritten. In past
few years, there are many researches has been done in handwritten character
recognition in Bangla language. This research related works also achieves great
success. There are a couple of works are accessible for Bangla printed character
recognition system. Some significant work has been done in back years such as “A
complete Bangla OCR System for printed Characters” [2], “A complete OCR System
for Continuous Bengali Characters” [3], “An end to end System for Bangla Online
Handwriting Recognition” [4], “A hierarchical approach to recognition of handwritten
Bangla characters” [5], “ A complete printed Bangla OCR System” [6]. In all these
papers stated above shows various methodologies have been introduced by different
authors. Maximum works are done in printed continuous character recognition. But a
very few deals have been done with a complete OCR for handwritten continuous
character. From that standpoint, this paper is mainly proposed only continuous
handwritten character recognition. In this paper, Bangla continuous characters are
portioned utilizing some conventional approach just as some new philosophy.
3 Proposed methodology
In this paper to recognition of continuous handwritten character from sentences here
we present a new segmentation method for character. The main part done in the
character segmentation which has many phases that are following below.
3.1 Prepossessing
In this section here we preprocess the input image. First step has been done by
converting the original image as grayscale image. Then removing the noise from the
image. Then convert the image as binary image to finding the foreground area from
the image. Elimination of unnecessary information as far as possible.
3.2 Line segmentation
Text line detection has been done by detecting the position between two consecutive
lines. Scanning the row horizontally if find white pixels that’s means it is a text.
Where the pixels are white considering as text figure. If a horizontal row is totally
black it is denoting a gap between two lines. That’s how a line can detected. In past
studies there are many works have been done in handwriting line segmentation in
different languages English [7], Hindi [8], achieve great success.
word can have both the modifiers shown in Fig 7. To segmentation of character from
word, each word image is resized to h×w. Different types of word image can be found
(i) with no modifier, (ii) upper part or lower part modifier (iii) with both upper- and
lower-part modifiers. To identifying different types of word with different modifiers
there has been a flag indicator set. For different types of modifiers different method is
used to segmentation of character. Different types of word with modifiers described
below.
Fig. 7. word with both upper part and lower part modifiers
3.4.1 Word with no modifier
Considering that Bangla language has “matra”, the upper part of the word has
removed from the image. As the main body part of maximum Bangla character occurs
in the middle part. Considering that, the principle body of the word is taken form the
lower part of the image. To elimination of matra take the main body part by this
equation (Height-25). Here height is 100. From that image the connected white pixels
in y-axis is consider as an individual character shown in Fig 8. After indicating each
character removing the unwanted vertical gaps finally separates each character shown
in Fig. 9.
Fig. 10. Modifier above the matra Fig. 11. Upper part modifier detection
Fig 12(a). Lower Fig 12(b). Upper part Fig. 13. Segmented character with upper part
The lower part modifier is connected most of the time with the word. To remove the
lower part modifier from the word, calculate the starting white pixel point and ending
point vertically from the lower part of the image. After segmenting of each character,
the modifier is added after that character which belongs to this modifier.
Fig. 14. Lower part Fig. 15. Upper part no modifier Fig. 16. Lower part modifier
detected detected
3.4.3. Word with both upper- and lower-part modifiers
After discussing above two sections if a word has a both upper- and lower-part
modifiers then at least one portion (1/3) of both upper part and lower part has at least
one zero portion, Shown in Fig 17. For that reason, the word has both upper- and
lower-part modifiers. Then the same process will be used for character segmentation
as mentioned above in 3.4.1 and 3.4.2. section.
Fig. 17. Word with both lower and upper part modifier Fig. 18. Segmented character
and upper part modifiers
5. Experimental Result
Few test samples of handwritten word image which is recognized by EkushNet
correctly in Fig. 20. By using this segmentation method from thousand of segmented
character about 70 percent character recognized properly. In Fig 21 some word image
which recognized falsely by EkushNet.
1 অা মা র 1 ো স গ ন ো র
2 অ িা প ন া র
2 বা া লা
িা ব ো শ ষ 3 ব া িা ধ ম া ন
3
4 বকাল 4 ব য় া ন্ধ র া
Fig. 20. Word image recognized correctly Fig. 21. Word image falsely recognize
References
1. J. Mantas, An overview of character recognition methodologies, Pattern Recognition 19,
425-430 (1986).
2. I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in
Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–
350.
3. Jalal Uddin Mahmud, Mohammed Feroz Raihan and Chowdhury Mofizur Rahman, “A
Complete OCR System for Continuous Bangla Characters", IEEE TENCON-2003:
Proceedings of the Conferenceon Convergent Technologies for the Asia Pacific, 2003.
4. S. Bhattacharya, D. S. Maitra, U. Bhattacharya, S. K. Parui, "An end-to-end system for
Bangla online handwriting recognition", 15th Int. Conf. on Frontiers in Handwriting
Recognition, pp. 373-378, 2016.
5. S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, and D. K. Basu, “A hierarchical
approach to recognition of handwritten Bangla characters,” Pattern Recognit., vol. 42, no. 7,
pp. 1467–1484, Jul. 2009.
6. B. B. Chaudhuri, U. Pal, “A complete printed Bangla OCR system,” Pattern Recognition,
vol. 31, pp. 531–549, 1998.
7. G. Louloudisa *, B.Gatosb,1, I.Pratikakisb,1, C.Halatsisa (2009). Text line and word
segmentation of handwritten documents.
8. G. S. Sindhushree, R. Amarnath and P. Nagabhushan (2019), Entropy-Based Approach for
Enabling Text Line Segmentation in Handwritten Documents
9. AKM Shahariar Azad Rabby, Sadeka Haque, Sheikh Abujar, Syed Akhter Hossain,
EkushNet: Using Convolutional Neural Network for Bangla Handwritten Recognition,
Procedia Computer Science, Volume 143, 2018, Pages 603-610, ISSN 1877-0509
10. Ekush: A multipurpose and multitype comprehensive database for Online Off-line
BanglaHandwritten Characters, Website: https://github.com/shahariarrabby/Ekush. Last
access:20 Jun. 18
11. R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, and D. K. Basu, “Cmaterdb1: a
database of unconstrained handwritten Bangla and Bangla– English mixed script document
image,” International Journal on Document Analysis and Recognition (IJDAR), vol. 15, no.
1, pp.71–83, 2012