0% found this document useful (0 votes)
16 views5 pages

Handwritten Character Recognition Using Machine Learning Approach - A Survey

The paper presents a survey on handwritten character recognition (HCR) using machine learning approaches, particularly focusing on neural networks. It discusses the various phases of HCR, including image acquisition, preprocessing, segmentation, feature extraction, and classification, highlighting the advancements and challenges in the field. Despite significant research efforts, the paper concludes that no single system has yet achieved complete success in HCR, indicating ongoing opportunities for further exploration.

Uploaded by

Vạn Kiệt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views5 pages

Handwritten Character Recognition Using Machine Learning Approach - A Survey

The paper presents a survey on handwritten character recognition (HCR) using machine learning approaches, particularly focusing on neural networks. It discusses the various phases of HCR, including image acquisition, preprocessing, segmentation, feature extraction, and classification, highlighting the advancements and challenges in the field. Despite significant research efforts, the paper concludes that no single system has yet achieved complete success in HCR, indicating ongoing opportunities for further exploration.

Uploaded by

Vạn Kiệt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO) - 2015

Handwritten Character Recognition using


Machine Learning Approach - A Survey

Shivangkumar R Patel Ms. Jasmine Jha


L.J.I.E.T. PG Department L.J.I.E.T. PG Department
Ahmedabad, Gujarat - India Ahmedabad, Gujarat - India
[email protected] [email protected]

Abstract—Handwritten character recognition is a very


popular due to its wide range of application. Processing
application forms, digitizing ancient articles, postal address
processing, bank cheque processing and many others are the
growing fields in area of handwritten character processing.
Handwritten character is attracting researchers since last 3
decades. Many approaches have been proposed for effective
recognition. In this paper, we have shown the detail survey on
handwritten character recognition using neural network as a
machine learning approache.

Keywords—handwritten character recognition; feature


extraction; classification; machine learning; ann; svm

I. INTRODUCTION
First, Handwritten Character Recognition (HCR) is a
classical application of pattern recognition. In 1981, Bezdek
et. al. [1] gave the definition of pattern recognition, a process
of identifying structure in data by comparisons to known
structure; the known structure is developed through method of
classification. In general terms, handwritten character
recognition is the process to classify characters from the input Fig. 1. Types of Character Recognition Systems
handwritten texts, as per the predefined character classes.
Applications of HCR span across the wide domain like, In 1980 to 1990, HCR rapidly got attention in research
Identification of characters, Digitization of handwritten with online and offline approaches [6], [7]. After 1990, image
record, Application form reading and based on data entry, processing and pattern recognition merged with each other
Translation system – recognizes the unknown language and with the use of Artificial Intelligence, and after that, very
translate it in a known language, Reading aids for the blind efficient and powerful computers and gadgets like scanners,
[2], [3], Bank cheque processing, Signature verification, cameras and other some special devices were developed.
Vehicle number plates [2], [3], Automatic pin code reading to There is a large applications area that is covered with
postal mail [2], [3] etc. handwritten character recognition. Even after all these
research, till this date not a single system exists that
In our daily life, we are doing character recognition all the
completely fulfills the goal of handwritten character
time. While reading notes, sign-board or novel, our brain
recognition [8].
continuously does the HCR. We match it with our past
experience and memory, and based on that we react or take an Offline handwritten character recognition system acquire
action or infer some new things. So, this is our natural static inputs. That means digitized text documents or scanned
character recognition. image copy of handwritten text [8]. Online handwritten
character recognition system acquire live handwriting for
First time character recognition was done by Tyuring, who
recognition. Here a person writes on the digital device with the
tried to develop an aid for the visually handicapped [4]. The
use of a special pen, and that data is used as live feed for
first time character recognizer came in 1940s. Before that,
system. Main difference between both systems is that online
mostly all works were related to machine-printed text or a
system contains one extra parameter that is time with data [8].
small set of handwritten symbols or texts [5].
And it also contains the strokes, speed, pen-up and pen-down
Fig.1, show the types of character recognition system. information as parameters [8]. State of the art, framework of

978-1-4799-7678-2/15/$31.00 ©2015 IEEE


International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO) - 2015

handwritten character recognition system shown in the Fig.2. for the next phase of recognition system. Gray scale
Basically all handwritten character recognition system contain conversion, binary conversion, noise removal, etc. are various
image acquisition, preprocessing, segmentation, feature techniques that are performed in this phase. Fig.3, show the
extraction and classification phases. basic preprocessing operations on the image.

(a) RGB Image - 230x755x3 uint8 (b) Gray Scale Image - 230x755 uint8

(c) Binary Image - 230x755 logical


Fig. 3. Preprocessed Images

Bluche et al. used gray scale conversion, binary conversion


and then a noise removal technique is performed on the input
data [11], [9]. While considering the results in [10] after the
Gray scale and binary conversion, researcher used edges
detection for segmentation. Otsu’s algorithm widely used for
gray scale image to binary image conversion.

C. Segmentation
Segmentation, is the process of splitting input text data
image to line and then after individual character. It removes
the unwanted part from the data image. There are two types of
segmentation available, External and Internal. External
segmentation is segmenting the paragraphs, lines and words.
On the other side internal segmentation is segmenting of
individual character from input text data [12], [13].
Fig. 2. Framework of Handwritten Character Recognition System
Various algorithms are available for segmentation.
Rest of the paper is organized as follow. Section II, is Histogram profiles and connected component analysis are
literature review of various off-line handwritten characters some of the methods for line segmentation which are used in
recognition systems by various researchers. Section III, shows [14], [15]. Fig.4 and Fig.5 show the line and character looks
the comparison table of various systems. Last, Section IV like after histogram based segmentation.
concludes the survey.

II. LITERATURE SURVEY


This Literature, review explore the every phase of
Fig. 4. Segmented Line based on Histogram
handwritten character recognition system.

A. Image Acquisition
Image acquisition is the process of acquiring handwritten
input data for character recognition system. Based on image or
data acquisition, online and offline systems were developed. (a) (b) (c) (d) (e)
Bluche et al. used Rimes Dataset, which is in English [11]. For Fig. 5. Segmented Characters based on Histogram
the numeric data, MNIST is a very popular dataset and used in
[9]. Some other datasets are, Chars74K (English characters - Spatial space detection for the words and Histogram
natural images) [19], CEDAR (paid) [20], Semeion method for the characters and other symbols which are used in
handwritten digit dataset [21], Pen-Based recognition of [16], [17]. In [2], for character segmentation authors are using
Handwritten digits dataset [21], etc. When there is no standard bounding box technique. After successful segmentation, resize
dataset available researchers use their own dataset for the operation is performed on all segmented image for uniform
recognition system [10]. size.

B. Preprocessing D. Feature Extraction


The Preprocessing is performed on acquired input data. It Feature Extraction is the process of collecting different and
enhances the quality of input data and makes it more suitable very useful information of an object or a group of objects, so

978-1-4799-7678-2/15/$31.00 ©2015 IEEE


International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO) - 2015

based on that collected information, we can classify new field of handwritten character recognition, machine learning
unknown objects by matching it. Feature is the robust used various methods like artificial neural networks, support
representation of the raw data. vector machine, naive bayes, nearest neighbor algorithms,
decision trees, neuro-fuzzy, etc.

Fig. 6. Feature Extraction Techniques [22]

Fig.6, show various feature extraction methods. Zone


based, statistical, structural, chain code histogram, sliding Fig. 7. Classification Techniques
window, gradient feature, hybrid, etc. [3], [10], [14] are some
of the most useful feature extraction techniques. In 1961, Bluche et al. in [9], used HMM with the convolution
Freeman introduced chain code method called Freeman Chain neural networks, and made an explicit feature extraction
Code. There are mainly two directions of chain code, 4- system. Developed system was tested with Rimes dataset and
neighborhood and 8-neighborhood. found that character recognition rate was fast, but accuracy is
low. Imani and Bingyu Chi et al. used Hidden Markovian
Xuewen Wang and Kai Ding et al. compare the Gabor Modeling, and results show that efficiency of the HMM
feature and Chain code(the contour direction) feature, and classification are more dependent on feature extraction
results showed that Gabor feature is much better than chain methods [21], [24].
code feature [18], [19]. Kai Ding and Cheng-Lin Liu et al.
showed that Gradient feature and Gabor feature have some Rahman et al. in [25], proposed system for Bangla
common properties : both are applicable on binary and gray characters. He used Backpropagation neural network, and
scale images and is also immune to image noise [19], [20]. In achieved 94.3% recognition accuracy. Amal Ramzi et al. in
addition, performance is almost similar but Gabor is more [8], used Backpropagation neural networks as a classifier.
suitable for large scale texture data. Imani et al. used in [21], Ramzis proposed system for Arabic handwritten characters
chain code histogram feature and distribution of foreground with combining online and offline features. The proposed
density across zones. system got 99.5% training accuracy but in testing, the
maximum recognition rate achieved was 78.8%.
E. Classification Rajashekararadhya et al. compared the NNC (Nearest
Classification or Recognition process is for decision Neighbor Classifier) with SVM (Support Vector Machine).
making, like this new character fit in which class or looks like. Results showed that SVM is one step ahead of NNC [2].
It means, in the phase of classification characters are identified Nasien et al. used SVM with FCC in [26]. Nasien used
and assign labeling. Performance of the classification depends English handwritten NIST dataset. Accuracy of recognition is
on good feature extraction and selection. Various 86% for lowercase characters, 88.46% for uppercase
classification techniques are available and they all are characters, and 73.45% for (lowercase + uppercase) characters
ultimately based on image processing and artificial [26].
intelligence. Nisha Sharma et al. in [27], proposed recognition system
In the Fig.7 show the various classification techniques for hand printed English character, numerals and special
[22]. Template matching, Statistical technique, Structural symbols. Proposed system used multilayer perceptron neural
technique are the classical techniques which are mainly based network with Backpropagation and SVM classifier. System
on image processing. Neural networks, fuzzy logic and genetic used hand printed English characters - uppercase, lowercase,
techniques are based on soft computing. Jayashree et al. in numerals, and special symbols as a datasets. Recognition rate
[23], proposed hybrid method of soft computing. She used is 98% for numerals, 96.5% for special characters, 95.35% for
neuro-fuzzy with adaptive network, neuro-fuzzy is the uppercase and 92% for lowercase characters. Nisha Sharma et
hybridization of fuzzy logic and neural network. al. gave the reason behind choosing neural network and SVM
is: “Neural network have been preferred to be used due to their
After artificial intelligence involved with machine high noise tolerance and SVM, for its high flexibility,
learning, almost all research areas are covered by it, and with scalability and speed”.
the machine learning, very good results are achieved. In the

978-1-4799-7678-2/15/$31.00 ©2015 IEEE


International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO) - 2015

In the next section, the various research works comparison and managed the time complexity. After many research works,
in handwritten character recognition is shown. we found that there is not even a single technique or system
that can completely fulfill the requirements of Handwritten
III. COMPARISON TABLE Character Recognition. So, off-line handwritten character
recognition is still an open area of research for identifying
Comparison between various researchers proposed model various complexities and to resolve them.
show in the Table 1.

IV. DISCUSSION AND CONCLUSION Acknowledgment


Handwritten Character Recognition is very useful in our I want to thanks to my parents, both lovely sisters, dear
daily lives, because it covers large area of useful applications. friends, classmates and, some old and present college
Various researchers proposed their work in this area and professors for their great support for make my confidence
achieved good accuracy rate. Very few researchers focused always up and up.

TABLE 1: COMPARISON TABLE

Paper Feature Extraction Language Dataset Classifier Results and Comments

Simple & Efficient Zone Kanada and Numerical Data


Kanada (97.75% NNC + 98.2% SVM),
1.[2] based Hybrid Feature Tamil Own Created NNC and SVM
Tamil (93.9% NNC + 94.9% SVM)
Extraction Algorithm Numerals Datasets
Alphanumerical
2.[3] not specify Not specify RNN good recognition
+ symbols
Grapheme segmentation
3.[9] English Rimes Database MLP Very Fast But Low accuracy
and Sliding Window
DWT with Multi English Own Character NN with Euclidian good accuracy – up to 99.23%, But taking
4.[11]
resolution technique Characters Dataset Distance matrix more time
Chain Code histogram
Features, Distribution of Own database 198
5.[21] Farsi HMM 89.00%
foreground density across word classes
zones
10,000 single
character image and
4709 legal amount
Chinese Alpha-
6.[24] Gradient feature text line images HMM Average 97.13%
Numeric
extracted from real
life Chinese bank
checks
4x8 and 8x4 matrix for
each character
7.[25] Bangla Not specify NN very simple and 94.30%
Segmentation of row and
column
Multi zoning, Geometrical BPNN: 98% for English numeral,
feature distance and 96.5% for special characters,
English
angle, topological feature 95.35% for uppercase English characters
8.[27] characters Own dataset BPNN and SVM
end point transition, and 92% for lowercase English characters
and symbols
Directional feature chain SVM: 92.167% for (uppercase and
code histogram lowercase)characters
DCT Discrete cosine DBN - Dynamic Average 85%, this result with corrupted
9.[28] Arabic Numbers ADBase database
transformation Bayesian Network data, slow recognition
7 FE methods and then
ranking the feature vector ANFIS & IBA 99.52% and speed for recognition 24
10.[10] Numeric MNIST
and make new 3 feature ANFIS digits/sec
vector

Computing, 2009. NaBIC 2009. World Congress on. IEEE. 2009,


pp. 526530.
References [3] Laurence Likforman-Sulem. Recent Approaches in Handwriting
[1] Timothy J Ross. Fuzzy logic with engineering applications. John Recognition with Markovian Modelling and Recurrent Neural
Wiley & Sons, 2009. Networks. In: Recent Advances of Neural Network Models and
[2] SV Rajashekararadhya and P Vanaja Ranjan. Zone-based hybrid Applications. Springer, 2014, pp. 261267.
feature extraction algorithm for handwritten numeral recognition [4] J Mantas. An overview of character recognition methodologies. In:
of two popular Indian scripts. In: Nature & Biologically Inspired Pattern recognition 19.6 (1986), pp. 425430.

978-1-4799-7678-2/15/$31.00 ©2015 IEEE


International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO) - 2015
[5] Talaat S El-Sheikh and Ramez M Guindi. Computer recognition of Systems, Signal Processing and Computing Technologies
Arabic cursive scripts. In: Pattern Recognition 21.4 (1988), pp. (ICESC), 2014 International Conference on. IEEE. 2014, pp.
293302. 402407.
[6] Qi Tian et al. Survey: Omni font-printed character recognition. In: [24] Bingyu Chi and Youbin Chen. Chinese Handwritten Legal Amount
Visual Communications,91, Boston, MA. International Society for Recognition with HMM-Based Approach. In: Document Analysis
Optics and Photonics. 1991, pp. 260268. and Recognition (ICDAR), 2013 12th International Conference on.
[7] Shunji Mori, Ching Y Suen, and Kazuhiko Yamamoto. Historical IEEE. 2013, pp. 778782.
review of OCR research and development. In: Proceedings of the [25] Arifur Rahaman et al. Analysis on handwritten Bangla character
IEEE 80.7 (1992), pp. 10291058. recognition using ANN. In: Informatics, Electronics & Vision
[8] Amal Ramzi and Ammar Zahary. Online Arabic handwritten (ICIEV), 2014 International Conference on. IEEE. 2014, pp. 15.
character recognition using online-offline feature extraction and [26] Dewi Nasien, Habibollah Haron, and Siti Sophiayati Yuhaniz.
back-propagation neural network. In: Advanced Technologies for Support Vector Machine (SVM) for English Handwritten
Signal and Image Processing (ATSIP),2014 1st International Character Recognition. In: Computer Engineering and
Conference on. IEEE. 2014, pp. 350355. Applications (ICCEA), 2010 Second International Conference on.
[9] Theodore Bluche, Hermann Ney, and Christopher Kermorvant. Vol. 1. IEEE. 2010, pp. 249252.
Feature extraction with convolutional neural networks for [27] Nisha Sharma, Bhupendra Kumar, and Vandita Singh. Recognition
handwritten word recognition. In:Document Analysis and of off-line hand printed English Characters, Numerals and Special
Recognition (ICDAR), 2013 12th International Confer- ence on. Symbols. In: Confluence The Next Generation Information
IEEE. 2013, pp. 285289. Technology Summit (Confluence), 2014 5th International
[10] Amir Bahador Bayat. Recognition of Handwritten Digits Using Conference-. IEEE. 2014, pp. 640645.
Optimized Adaptive Neuro-Fuzzy Inference Systems and Effective [28] Jawad H AlKhateeb and Marwan Alseid. DBN-Based learning for
Features. In: Journal of Pattern Recognition and Intelligent Arabic handwritten digit recognition using DCT features. In:
Systems. Computer Science and Information Technology (CSIT), 2014 6th
[11] DK Patel, T Som, and Manoj Kumar Singh. Multiresolution International Conference on. IEEE. 2014, pp. 222-226.
technique to handwritten English character recognition using
learning rule and Euclidean distance metric. In: Signal Processing
and Communication (ICSC), 2013 International Conference on.
IEEE. 2013, pp. 207212.
[12] Simone Marinai, Marco Gori, and Giovanni Soda. Artificial neural
networks for document analysis and recognition. In: Pattern
Analysis and Machine Intelligence, IEEE Transactions on 27.1
(2005), pp. 2335.
[13] Vassilis Papavassiliou et al. Handwritten document image
segmentation into text lines and words. In: Pattern Recognition
43.1 (2010), pp. 369377.
[14] SA Angadi and MM Kodabagi. A Robust Segmentation Technique
for Line, Word and Character Extraction from Kannada Text in
Low Resolution Display Board Images. In: Signal and Image
Processing (ICSIP), 2014 Fifth Interna-tional Conference on.
IEEE. 2014, pp. 4249.
[15] R Indra Gandhi and K Iyakutti. An attempt to recognize
handwritten Tamil character using Kohonen SOM. In: Int. J.
Advanced Network. Appl 1.3 (2009),pp. 188192.
[16] J Venkatesh and C Sureshkumar. Tamil Handwritten Character
Recognition Using Kohonons Self Organizing Map. In: the
proceedings of IJCSNS International Journal of Computer Science
and Network Security 9.12 (2009), pp. 156161.
[17] C Suresh Kumar and T Ravichandran. Handwritten Tamil
character recognition using RCS algorithms. In: Int. J. of Computer
Applications,(0975-8887) volume-8-no 8 (2010).
[18] Xuewen Wang, Xiaoqing Ding, and Changsong Liu. Optimized
Gabor filter based feature extraction for character recognition. In:
Pattern Recognition, 2002. Proceedings. 16th International
Conference on. Vol. 4. IEEE. 2002, pp. 223226.
[19] Kai Ding et al. A comparative study of Gabor feature and gradient
feature for handwritten Chinese character recognition. In: Wavelet
Analysis and Pattern Recognition, 2007. ICWAPR07. International
Conference on. Vol. 3. IEEE.2007, pp. 11821186.
[20] Cheng-Lin Liu, Masashi Koga, and Hiromichi Fujisawa. Gabor
feature extraction for character recognition: comparison with
gradient feature. In: Document Analysis and Recognition, 2005.
Proceedings. Eighth International Conference on. IEEE. 2005, pp.
121125.
[21] Zahra Imani et al. Offline handwritten Farsi cursive text
recognition using hidden Markov models. In: Machine Vision and
Image Processing (MVIP), 2013 8th Iranian Conference on. IEEE.
2013, pp. 7579.
[22] Richa Goswami and OP Sharma. A Review on Character
Recognition Techniques. In: International Journal of Computer
Applications 83.7 (2013), pp. 1823.
[23] Jayshree Rajesh Prasad and Uday V Kulkarni. Gujrati Character
Recognition Using Adaptive Neuro Fuzzy Classifier. In: Electronic

978-1-4799-7678-2/15/$31.00 ©2015 IEEE

You might also like