Handwritten Character Recognition Using Machine Learning Approach - A Survey
Handwritten Character Recognition Using Machine Learning Approach - A Survey
I. INTRODUCTION
First, Handwritten Character Recognition (HCR) is a
classical application of pattern recognition. In 1981, Bezdek
et. al. [1] gave the definition of pattern recognition, a process
of identifying structure in data by comparisons to known
structure; the known structure is developed through method of
classification. In general terms, handwritten character
recognition is the process to classify characters from the input Fig. 1. Types of Character Recognition Systems
handwritten texts, as per the predefined character classes.
Applications of HCR span across the wide domain like, In 1980 to 1990, HCR rapidly got attention in research
Identification of characters, Digitization of handwritten with online and offline approaches [6], [7]. After 1990, image
record, Application form reading and based on data entry, processing and pattern recognition merged with each other
Translation system – recognizes the unknown language and with the use of Artificial Intelligence, and after that, very
translate it in a known language, Reading aids for the blind efficient and powerful computers and gadgets like scanners,
[2], [3], Bank cheque processing, Signature verification, cameras and other some special devices were developed.
Vehicle number plates [2], [3], Automatic pin code reading to There is a large applications area that is covered with
postal mail [2], [3] etc. handwritten character recognition. Even after all these
research, till this date not a single system exists that
In our daily life, we are doing character recognition all the
completely fulfills the goal of handwritten character
time. While reading notes, sign-board or novel, our brain
recognition [8].
continuously does the HCR. We match it with our past
experience and memory, and based on that we react or take an Offline handwritten character recognition system acquire
action or infer some new things. So, this is our natural static inputs. That means digitized text documents or scanned
character recognition. image copy of handwritten text [8]. Online handwritten
character recognition system acquire live handwriting for
First time character recognition was done by Tyuring, who
recognition. Here a person writes on the digital device with the
tried to develop an aid for the visually handicapped [4]. The
use of a special pen, and that data is used as live feed for
first time character recognizer came in 1940s. Before that,
system. Main difference between both systems is that online
mostly all works were related to machine-printed text or a
system contains one extra parameter that is time with data [8].
small set of handwritten symbols or texts [5].
And it also contains the strokes, speed, pen-up and pen-down
Fig.1, show the types of character recognition system. information as parameters [8]. State of the art, framework of
handwritten character recognition system shown in the Fig.2. for the next phase of recognition system. Gray scale
Basically all handwritten character recognition system contain conversion, binary conversion, noise removal, etc. are various
image acquisition, preprocessing, segmentation, feature techniques that are performed in this phase. Fig.3, show the
extraction and classification phases. basic preprocessing operations on the image.
(a) RGB Image - 230x755x3 uint8 (b) Gray Scale Image - 230x755 uint8
C. Segmentation
Segmentation, is the process of splitting input text data
image to line and then after individual character. It removes
the unwanted part from the data image. There are two types of
segmentation available, External and Internal. External
segmentation is segmenting the paragraphs, lines and words.
On the other side internal segmentation is segmenting of
individual character from input text data [12], [13].
Fig. 2. Framework of Handwritten Character Recognition System
Various algorithms are available for segmentation.
Rest of the paper is organized as follow. Section II, is Histogram profiles and connected component analysis are
literature review of various off-line handwritten characters some of the methods for line segmentation which are used in
recognition systems by various researchers. Section III, shows [14], [15]. Fig.4 and Fig.5 show the line and character looks
the comparison table of various systems. Last, Section IV like after histogram based segmentation.
concludes the survey.
A. Image Acquisition
Image acquisition is the process of acquiring handwritten
input data for character recognition system. Based on image or
data acquisition, online and offline systems were developed. (a) (b) (c) (d) (e)
Bluche et al. used Rimes Dataset, which is in English [11]. For Fig. 5. Segmented Characters based on Histogram
the numeric data, MNIST is a very popular dataset and used in
[9]. Some other datasets are, Chars74K (English characters - Spatial space detection for the words and Histogram
natural images) [19], CEDAR (paid) [20], Semeion method for the characters and other symbols which are used in
handwritten digit dataset [21], Pen-Based recognition of [16], [17]. In [2], for character segmentation authors are using
Handwritten digits dataset [21], etc. When there is no standard bounding box technique. After successful segmentation, resize
dataset available researchers use their own dataset for the operation is performed on all segmented image for uniform
recognition system [10]. size.
based on that collected information, we can classify new field of handwritten character recognition, machine learning
unknown objects by matching it. Feature is the robust used various methods like artificial neural networks, support
representation of the raw data. vector machine, naive bayes, nearest neighbor algorithms,
decision trees, neuro-fuzzy, etc.
In the next section, the various research works comparison and managed the time complexity. After many research works,
in handwritten character recognition is shown. we found that there is not even a single technique or system
that can completely fulfill the requirements of Handwritten
III. COMPARISON TABLE Character Recognition. So, off-line handwritten character
recognition is still an open area of research for identifying
Comparison between various researchers proposed model various complexities and to resolve them.
show in the Table 1.