0% found this document useful (0 votes)
2 views14 pages

Recognition of English Handwriting and Typed From Images Using Tesseract On Android Platform

With the advent of digitalization in most spheres of human pursuit, conversion of digitized text from images has gained considerable momentum over the years, despite the fact that the concept of image character recognition essentially dates back to the period before the invention of the computer. This paper is an endeavor to put forth the experimental workflow of recognizing text from image using Google’s open source Optical Character Recognition (OCR) Engine Tesseract. Here, Tesseract has been

Uploaded by

Cao Van Kien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views14 pages

Recognition of English Handwriting and Typed From Images Using Tesseract On Android Platform

With the advent of digitalization in most spheres of human pursuit, conversion of digitized text from images has gained considerable momentum over the years, despite the fact that the concept of image character recognition essentially dates back to the period before the invention of the computer. This paper is an endeavor to put forth the experimental workflow of recognizing text from image using Google’s open source Optical Character Recognition (OCR) Engine Tesseract. Here, Tesseract has been

Uploaded by

Cao Van Kien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/342561816

Recognition of English Handwriting and Typed from Images using Tesseract


on Android Platform

Article · May 2020

CITATION READS

1 2,507

4 authors:

Shubhendu Banerjee Sumit Singh


Narula Institute of Technology National Institute of Technology, Jamshedpur
6 PUBLICATIONS 69 CITATIONS 11 PUBLICATIONS 76 CITATIONS

SEE PROFILE SEE PROFILE

Atanu Das Rajib Bag


Netaji Subhash Engineering College Supreme Knowledge Foundation Group of Institutions
86 PUBLICATIONS 279 CITATIONS 57 PUBLICATIONS 702 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Atanu Das on 30 June 2020.

The user has requested enhancement of the downloaded file.


International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

Recognition of English Handwriting and Typed from Images using


Tesseract on Android Platform

Shubhendu Banerjee1, Sumit Kumar Singh*2, Atanu das3, Rajib Bag4


1
Department of CSE, Narula Institute of Technology, 700109, India.
2
Department of CSE, Netaji Subhash Engineering College, 700152, India
3
Department of CSE, Supreme Knowledge Foundation Group of Institutions,
712139, India
[email protected], [email protected],
[email protected], [email protected]

Abstract
With the advent of digitalization in most spheres of human pursuit,
conversion of digitized text from images has gained considerable momentum over
the years, despite the fact that the concept of image character recognition
essentially dates back to the period before the invention of the computer. This
paper is an endeavor to put forth the experimental workflow of recognizing text
from image using Google’s open source Optical Character Recognition (OCR)
Engine Tesseract. Here, Tesseract has been trained in a manner so as to
recognize handwritten and typed texts in English script and produce outputs with
various levels of observed accuracy. The method is supported by an added
characteristic of externally induced image quality augmentation prior to text
extraction. The paper is predominantly aimed at building a resourceful Android
application that would enable the user to digitize text from images even on the
small screen. The research analysis asserts a precision level up to 93% in case
of handwritten text and 98% for typed characters, which is an attempt towards
advancement over existing methods. Apart from image to text conversion, it also
includes the text to speech translation feature, which renders its significance
among the visually impaired mass.

Keywords: Optical character recognition, handwriting recognition, speech, androids,


pattern recognition.

1. Introduction
Imitation of human functions with the aid of machines has been a united vision of
mankind. Researchers, from time to time, have taken resort to artificial intelligence and
machine learning to develop complex software and applications to substitute human labor
with machine operation. Over the years machine reading has evolved from dream to
realism through advancement of sophisticated and vigorous Optical character recognition
(OCR) systems [1]. OCR involves alpha numeric and character pattern recognition and
their subsequent mechanical or electronic conversion into editable and searchable data.
The most primitive use of OCR dates to 1870 with the invention of retina scanner by
Carey [2-3]. Extensive development and use of OCR technologies began in the 1960’s
and 1970’s, with the formation of simplified fonts known as OCR-7B that were easier to
convert to digitally readable text and are still in vogue as the font imprinted on credit
cards & debit cards [4]. Gradually the OCR technology was universally commercialized
in postal services to greatly speed up mail sorting. The most used commercial OCRs of
this period include IBM 1418 designed to identify IBM font 407, OCR-A, OCR-B [5-7].
By the year 2000, OCR had succeeded in decrypting even inferior quality printed and

ISSN: 2005-4238 IJAST 4042


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

handwritten texts to certain extent. OCR technology was now used to develop CAPTCHA
programs to thwart bots and spammers [8-9].
As one can observe, over the decades, OCR has grown more accurate and sophisticated
owing to progression in related technology areas, for instance Artificial Intelligence,
Machine Learning, and Computer Vision [10-13]. Earlier the works were executed on the
platforms of Neural Networking, Image Histograms, Clustering, Block Segmentation, to
name a few. Nowadays OCR software makes use of techniques as feature detection,
pattern recognition, and text mining for the purpose of transmuting manuscripts faster and
more accurately than ever before [14-18].
The application of OCR for extracting text from images in various languages, apart
from English, has considerably subsided with time as numerous works have been
accomplished in this area that has resulted in development of considerable quality and
quantity of software.
The development of software’s paved the way for a system that would port the concept
of OCR in Android mobile applications making it easier for the mass to avail of its
appliance and its implementation in day to day work. This paper aims primarily in
building such an application so as to enable making optimum use of OCR through mobile
applications taking into consideration the widespread accessibility of Smart phones in
today’s digital age. The paper draws on Tesseract, the same open source OCR engine that
Google uses to recognize languages and image translation.
Tesseract initially began as a PhD research project Hewlett-Packard Laboratories
Bristol and developed between 1985 and 1994 [19]. It was rated one of the top 3 engines
in the UNLV Accuracy Test as ‘HP Labs OCR’ in 1995. With partial modifications,
Tesseract was released as an open source by HP in 2005, which in turn was re-released as
open source community by Google in 2006 [20]. The most recent (LSTM based) stable
version 4.1.1, which released on December 26, 2019 can be available on GitHub.
Tesseract possesses unicode (UTF-8) support, and can recognize more than 100
languages. It can generate outputs in various formats as invisible-text-only PDF, plain
text, OCR (HTML), TSV.
The paper attempts to take a step ahead of the existing feature of image to text
extraction by training the Tesseract OCR engine to analyze and decrypt digital
handwritings besides typed characters. The focus is primarily on the precise conversion of
English handwritten images into texts and followed by the generated output’s vocal
translation for the assistance of the visually impaired. The text conversion methodology
integrates image enhancement through realignment, rescaling, and resizing of and noise
removal from the given image by incorporation of advanced algorithms. Moreover,
training of Tesseract with a self-adapted dataset for handwritten characters plays a
significant role in definitive output. The paper is an attempt at achieving noteworthy
development in generating output by embracing these additional features within its ambit.
Each chronological step in course of the Android application development proposed in
this paper is discussed at length in the forthcoming sections. The manner, in which
Tesseract has been trained, specifically to meet the requisite of this paper that involves
detection of English typed and handwritten texts, has been elaborated in Section 2.1. The
comprehensive methodology for gaining text and speech output from image input has
been demonstrated through flowcharts and algorithms with illustration in Section 2.2.
Sections 3 and 4 focus on the analysis of the overall paper weighing the possibilities of
superior output and concludes by drawing on its limitations and future scope.

ISSN: 2005-4238 IJAST 4043


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

2. Materials & Experimental Procedures


2.1. Training Process of Tesseract
To work on a particular language for text extraction, first and foremost it is necessary to
train Tesseract to recognize that language. In this paper, Tesseract has been trained in
such a manner as to recognize English typography as well as English handwritten text
from images and provide output with an average 93% accuracy. This training process
comprises several steps that have been depicted through the algorithm as provided
through algorithm 1. While training Tesseract, it is imperative to keep a track of the
engine’s speed of image recognition without compromising on the accuracy level.
Algorithm 1 Tesseract Training Procedure
1: procedure: MYPROCEDURE
2: Start with training English typed language
and English manuscripts.
3: input: Dataset of both typed and handwritten fonts
4: top: preprocessing of data: Arrange data set
5: loop:
6: $tesseract <image file> <box file> batch.nochop
makebox: prepare separate box file for each character
7: $tesseract <image file> <box file> nobatch
box.train : training of each image box file in pairs
8: $unicharset_extractor <box file> : taking box file as inputby character set file
9: $mftraining <filename.tr> -F font_properties -U
unicharset : creation of clusters and prototypes
10: output: Generation of pffmtable file
11: output: Generation of inttemp file
12: output: Generation of Microfeat file
13: $wordlist2dawg <dictionaryfile> <dawg file>
: matching it with directory
14: end loop
15: $combine_tessdata <3 letter language code>
: combinination of generated files
16: output: Trained Successfully
For text extraction in a particular language, the Tesseract Data Subject Directory should
include eight data files as follows:
tessdata/xxx.inttemp
tessdata/xxx.normproto
tessdata/xxx.pffmtable
tessdata/xxx.unicharset
tessdata/xxx.freq-dawg
tessdata/xxx.word-dawg
tessdata/xxx.user-words
tessdata/xxx.DangAmbigs
In order for the engine to recognize characters, it has to be supplied with a complete
data of character sets. For this purpose, a text or word processor file encompassing all
possible characters is initially set up. The samples of English typographic and handwritten
characters for this paper have been acquired from GitHub dataset consisting of over
4,66,544 words (url: https://github.com/dwyl/english-words) and Abstractfonts
comprising of 13,873 font types (url: http://www.abstractfonts.com/) respectively. A
customized dataset has been devised out of the Abstractfonts for identification of
handwritten texts. The Tesseract has been trained for character identification using sample
texts from these character datasets. Following character training, the trained texts are

ISSN: 2005-4238 IJAST 4044


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

saved as UTF-8 text file for further programming.


Next, a box file (Figure 1. Example of box file) is prepared, which in essence, is a
text file which enlists the characters in a given training image in order, one for each line.
It consists of 6 columns, where the 1st column includes the saved UTF-8 text code
characters, the next 4 columns depict the coordinates of the bounding box in the region of
the image and the final column represents the page number of the character in the image
file. The purpose of preparing a box file is to generate training files from training images
using the following command: $tesseract <image file><box file>batch.nochopmakebox.
In case of multiplicity of resultant pages, individual box files need to be generated for
each page.
A 102 106 41 29 0
B 149 107 35 29 0
C 195 108 32 28 0
D 233 109 33 27 0
E 275 108 34 27 0

Figure 1. An example of box file

Since the box file is generated taking into consideration the nature of English
typographic characters, it, to a major extent, fails in suitably generating box information
for English handwritten character, thereby making editing inevitable. Editing of the box
file calls for editors like HTML, Notepad++, Gedit that recognises UTF-8 code. Here in
this paper, Gedit is used for editing purposes. In instances as such, where a single
character of the training image is split into two lines within the box file, manual merging
of the bounding boxes is sought for.
Upon completion of generation of box files for corresponding training images,
Tesseract in Run in training mode with the command- $tesseract eng.exp2.tif eng.exp2
nobatchbox.train - which in turn shall produce two files namely: eng.exp2.tr containing
the features of each character of the training page, and eng.exp2.txt.
Next, for creation of a character set file, Tesseract’s unicharset file, containing
information on each symbol, is made use of. In order to generate the unicharset data file,
the program named unicharset_extractor, is used on the already generated box files using
the following command: $unicharset_extractor<box file>
Example: $unicharset_extractoreng.exp0.box
Each line of this character set file relates to one character in UTF-8 format which is
preceded by a hexadecimal numeral embodying a binary mask encoding its properties.
Once the extraction of character features of all the training pages takes place, character
shape features are clustered in an attempt to create prototypes. Here, the clustering is done
through two diverse programs namely mftraining and cntraining. The mftraining program
is run using the command: $mftraining<filename.tr> -F font_properties -U unicharset
Example: $eng.segoescriptb.exp2.tr -F font_properties -U unicharset; where, -U is
previously generated unicharset through unicharset_extractor and F is used to include the
font_properties file.
This will again output two data files: pffmtable and inttemp. pffmtable includes the
count of anticipated features of each character and inttemp comprises the shape
prototypes, which however cannot be opened. Besides, a third file called Microfeat is also
created by this program, that is not used further in this paper.
Next, to run the cntraining program, we use the following command:
$cntraining<filename.tr>

ISSN: 2005-4238 IJAST 4045


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

Example: $eng.segoescriptb.exp0.tr
This command will produce the normproto data file, which will subsequently perform
the character normalization training for Tesseract as shown below.
a1
significant elliptical 1
0.367188 0.403906 0.230469 0.242188
0.000400 0.000400 0.000400 0.000400
b1
significant elliptical 1
0.328125 0.326562 0.234375 0.175781
0.000400 0.000400 0.000400 0.000400
c1
significant elliptical 1
0.347656 0.386719 0.253906 0.195312
0.000400 0.000400 0.000400 0.000400
Tesseract uses as many as eight dictionary files for a language. Here three files namely
frequent_word_list, words_list, user_char are used for reference. Among the three files,
one file is a simple UTF-8 text file, and the rest files are coded as a Directed Acyclic
Word Graph (DAWG). To create DAWG files, first we need to build on a word list of
English language that is formatted as a UTF-8 text file with one word per line. To get two
UTF-8 text files, the word list is split into two sets namely frequent_word_list and
words_list.
The resultant files are converted into their corresponding DAWG file using the
command: $wordlist2dawg <dictionaryfile><dawg file>
Example: $wordlist2dawg words_list word-dawg
By this time the training procedure is just about complete. The remaining task is to
rename the files as per the desirable language code and then fuse the multiple generated
files during different steps given in the earlier sections.

Figure 2. Modification of file names after Training

At the outset, the files are renamed by prefixing each file name with a three-letter
language code (lang.). Here, for English, lang.=“eng.” has been affixed to the file names
as provided within the two-column table (Figure 2. Modification of file names).
This result in the generation of an output file eng.traineddata which has been copied to
the tess data directory (usually: /usr/local/share/tessdata). This is the concluding stage of
the training process and it is quite likely that Tesseract will now be able to identify and
distinguish any image file comprising of basic characters of English script.
2.1. Proposed Method
This paper is drawn on the Tess-Two Library version and tested on Android Version 6.0
and higher. The functional methodology is adapted by the engine for projection of visual

ISSN: 2005-4238 IJAST 4046


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

and audio output after it is made ready for identification of texts from scripts.
Since each Android application is permitted to run solely within its self-restricted
sandbox, for the application to get access to resources or information beyond its sandbox,
the app has to seek for an appropriate permission. An app permission declaration is hence
registered in the app manifest seeking the user’s consent during runtime.
The image is then captured with an auto focus & 2.0f frame per second shutter speed of
the camera. The captured image is saved in the phone’s internal storage in .JPG format
and the automated storage path thus created is preserved for future reference and
accessibility. There is also provision for the user to select images already saved in their
phone gallery without actually capturing it from the app camera. Workflow of the
proposed method is described in (Figure 3. Workflow).
START

N Check
Permission

Y
Error Message

Image
STOP Source Local
Camera Storage

Camera Input
Open Image
N Y N

Error Message Create Error Message


Image File

Code No:
STOP STOP
xxx
is Generated

Per-processing Dynamic Crop with


Algorithm 3 Rotation
Image Identification

Get URI File

Check Y
Activity Request Code
N
and Validity
Check
Error Message
Prepare Test
Data
STOP
Start OCR

N Voice
Conversion
Get Text

Text Y
Output
Modulate Voice END
Speed and Pitch Output
END

Figure 3. Workflow of the proposed methodology

These images may be resized as per the user’s convenience by dynamic cropping to
remove the irrelevant regions with a focus to retain the primary text region. It also allows
the user to adjust skewed images by rotating or straightening the captured image as per
convenience such that the baselines of the texts are placed at a parallel angle to the
horizontal plane of the image (Figure 4. Cropping).

ISSN: 2005-4238 IJAST 4047


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

Figure 4. a) Proposed application home screen (b) Input image (c)


Cropped
(a) image (d) De-skewed
(b) image (d)
(c)(e) Add more images (e) (f)
or proceed
Visual and audio output

This image, however, may be distorted owing to the intervention of various degradable
factors like thermal noise, poor lighting, and presence of dust particles, temperature shift
and other environmental aspects. This distortion in the image quality may adversely affect
the OCR Engine’s detection capability giving rise to errors. In order to overcome this
setback, the image in question is made to undergo a pre-processing stage of rescaling,
contrasting and noise removal for superior visual interpretation. This paper employs the
state-of-the-art algorithm 2 for desirable augmentation.
Algorithm 2: Pre –processing of Input Image
1: procedure: MYPROCEDURE
//Rescaling
2: input: Cropped Image
3: if (image ≥300dpi) then
4: retain image specification
5: else
6: convert image into grayscale and saved in byte array
7: setDpi(): byte wise operation done to change image dpi up to 300
8: end if
9: output: 300 or more dpi Image
//Image Enhancement
10: input: 300 or more dpi Image
11: Let xi , j is the intensity value of the input image M×N image at the position (i,j)
where (i,j)  A  {1,2,3,...., M } {1,2,3,...., N}
12: if ( xi , j < 160) then
13: setPixel = 0; // close to Black then convert it into black
14: else
15: setPixel = 255;
16: end if
17: output: Enhanced Image
//Noise Removal
18: input: Enhanced Image
19: a same size zero valued flag image F (M×N) is generated where f i , j is the pixel value at
location (i,j)
20: loop:
19: Consider a 5×5 matrix taking center xi , j
20: Construct sets S1, S2 taking the elements of the matrix i.e. i varies from i-2 to i+2 and
j varies from j-2 to j+2
21: 
S1 = xi , j : 0  xi , j  160

ISSN: 2005-4238 IJAST 4048


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054


S2 = xi , j : 161  xi , j  255
22: if (n(S1)> n(S2)) then
23: N1 = 0
24: else
25: N2 = 1
26: end if
27: Construct a 3×3 matrix centering the same xi , j
28: Construct a set S3, S4 taking the elements of the matrix i.e. i varies from i-1 to i+1
and j varies from j-1 to j+1
29: 
S3 = xi , j : 0  xi , j  160
S4 = x i, j : 161  xi , j  255
30: if (n(S3)> n(S4)) then
31: N3 = 0
32: else
33: N4 = 1
34: end if
35: if (N1== N3 && xi , j ==0 || N1== N3 &&
xi , j ==255) then
36: f i , j =0
37: if (N2== N4 && xi , j ==255 || N2== N4 &&
xi , j ==0) then
38: f i , j =1
39: end if
40: end loop
41: loop:
42: if ( f i , j ==0) then
43: xi , j ==0
44: else
45: xi , j ==255
46: end if
47: end loop
48: output: Enhanced and Noise Free Image
It is suggested for better image quality output that the resolution of the cropped image
is at least 300 dpi. If the resolution is detected to be less than the desired, it needs to be
rescaled. A case in point is presented for reference in (Figure 5. 300dpi). Further, the
contrast factor of the image is altered to transform the grayscale image into black & white.
The parts of the image with pixel value less than 160 are converted to 0 and represented
as black. The remaining pixel values are converted to 255.

(a) (b)

Figure 5. (a) Input image less than 300dpi (b) Rescaled image with
300dpi

ISSN: 2005-4238 IJAST 4049


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

Lastly, the image is filtered for removal of noise that might have crept in owing to
several unlikable factors. The noise removal technique as mentioned in the algorithm 3
enables segregation of noise from actual text matter. The samples of the input image and
the output image post filtering is given in (Figure 6. Image enhancement). Tesseract
is able to distinguish characters from the rest of the content if the image is of a superior
quality which largely affects the accuracy level of the engine.

(a) (b)
Figure 6. (a) Captured image (b) Image Intensified and free of noise

3. Result and Discussion


For development of the proposed application, the paper puts together Android Studio
and JTessBoxEditor software for app building and dataset training correspondingly. The
machine configuration is backed up by a 16 GB RAM and Core i5 8th Generation
Processor.
The experimental findings of this proposed app have been analogized with that of the
results of Google Translate, which, as on date, appears to be the most advanced among
similar software such as ABBYY OCR engine and Microsoft Oxford.
Post scrutiny of the first illustration projected through the above images (Figure 7.
Typed characters), it can be observed from (Table 1. Comparison of Figure 7),
where the difference in the levels of accuracy of two applications for typed English word
detection and output barely presents any disparity.

(a)

(e) (d)
(b) (c)

ISSN: 2005-4238 IJAST 4050


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

Figure 7. (a) Input image (b) (c) Google Translate output (d) (e) Proposed
application visual and audio output

Table 1. Comparison of results of Figure 7


Applications No of Words No of Errors Accuracy
Google Translate 2 98.65%
Proposed Application 149 1 99.32%

Table 2. Comparison of results of Figure 8


Applications No. of Words No. of Errors Accuracy
Google Translate 85 5 94.11%
Proposed Application 85 3 96.47%

(c)
(a) (b)

Figure 8. (a) Input image (b) Google Translate output (c) Proposed
application visual and audio output
The second illustration outcome on typed characters (Figure 8. Typed characters
with image), as stated in (Table 2. Comparison of Figure 8), depicts a negligible
percentage of variation with the proposed application scoring over Google
Translate. This may be owing to the fact that the latter could not comprehend the
accurate words attributable to the meandering character of the input image.

(a)

(b) (c)

Figure 9. (a) Input image (b) Google Translate output (c) Proposed
application visual and audio output

ISSN: 2005-4238 IJAST 4051


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

However, as the illustrations shift to handwritten texts, the analysis as shown


(Figure 9. Handwritten texts sample one) in (Table 3. Comparison of Figure 9)
shows a significant rise in detection of words by the proposed application. While
our proposed application shows an accuracy level of about 96%, which is 30%
superior of that of Google Translate. Google Translate shows accuracy percentage
of about 65%.
Table 3. Comparison of results of Figure 9
Applications No of Words No of Errors Accuracy
Google Translate 82 29 64.63%
Proposed Application 82 3 96.34%

(a)

(b) (c)
Figure 10. (a) Input image (b) Google Translate output (c) Proposed
application visual and audio output
Based on another illustration (Figure 10. Handwritten texts sample two) of
handwritten text and its consequence in (Table 4. Comparison of Figure 10), it is
established that the proposed application yields better output on handwritten
scripts as contrasted to Google Translate. The differential percentage, even in this
case, is above 30% with Google Translate generating no more than 60.63%
accuracy.
Table 4. Comparison of results of Figure 10
Applications No of Words No of Errors Accuracy
Google Translate 30 12 60.63%
Proposed Application 30 2 93.34%

Table 5. Analysis of Accuracy for English Typed Documents


Applications No of Words No of Errors Accuracy
Google Translate 50 1 98.00%
Proposed Application 50 1 98.00%
Google Translate 100 2 98.00%
Proposed Application 100 1 99.00%
Google Translate 150 2 98.66%
Proposed Application 150 1 99.33%
Google Translate 200 3 98.50%
Proposed Application 200 3 98.50%
Google Translate 250 3 98.80%
Proposed Application 250 2 99.20%
Google Translate 300 4 98.66%
Proposed Application 300 3 99.00%

ISSN: 2005-4238 IJAST 4052


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

Table 6. Analysis of Accuracy for English Manuscript


Applications No of Words No of Errors Accuracy
Google Translate 50 17 66.00%
Proposed Application 50 4 92.00%
Google Translate 100 31 69.00%
Proposed Application 100 7 93.00%
Google Translate 150 52 65.33%
Proposed Application 150 9 94.00%
Google Translate 200 71 64.50%
Proposed Application 200 13 93.50%
Similar trials of text detection and output analysis have been carried out on
images with different numbers of characters in it, ranging from 50 to 300 for
typed and 50-200 for handwritten English texts. As can be inferred from
(Table 5. Analysis of English Typed Documents) & ( Table 6. Analysis of
English Handwritten Documents) , the results are relatively analogous on all
occasions. Conversely, when a similar analysis is performed on handwritten
illustrations, a notable increase in output accuracy is detected. Resting on the
above statistics, it may be prudently drawn an inference that the proposed
application generates significantly superior output for handwritten texts than
existing application(s) of the recent times, which is the utmost aim of this
application development.

4. Conclusion
The paper presents an elaborate overview of the process of digitization of text, laying
emphasis on English handwritten text over typographic texts. Handwritten text being the
focal area, this paper was expected to deal with an infinite sample set with diverse
characteristics. For optimization of data input, the paper has sought to construct and
develop a personalized data set for handwritten texts. Google’s open source OCR Engine
Tesseract has been made to undergo different phases of training with the purpose of
identifying words and characters from these chosen data sets. Apart from the customary
training process of Tesseract, this paper has strived to introduce certain distinctive
attributes of its own like image augmentation for better quality image input, image
resizing for preferred sectional input and audio can be controlled for a desired voiced
speed and pitch of an output. Also, its ability to perform on the Android platform
promotes its applicability and accessibility to a wider range. The resultant output of the
total process illustrates considerable progress in the field of handwritten manuscript
detection, there remains much space towards achieving better accuracy. Potential research
options in this area include imbibing advanced machine learning methods for acquiring
maximum accuracy, as well as digital translation of regional and multilingual scripts in
both typed and handwritten forms.

References
[1] Mantas, J. An Overview of Character Recognition Methodologies. Pattern Recognition. 1986, 19, 425–
430.
[2] Fragoso, V.; Gauglitz, S.; Zamora, S.; Kleban, J.; Turk, M., "TranslatAR: A mobile augmented reality
translator," Applications of Computer Vision (WACV), 2011 IEEE Workshop on , vol., no., pp.497,502,
5-7 Jan. 2011
[3] Nagy, G. At the frontiers of OCR. Proceedings of the IEEE. 1992, 80, 1093-1100.

ISSN: 2005-4238 IJAST 4053


Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology
Vol. 29, No.4, (2020), pp. 4042 – 4054

[4] ARCHANA A. SHINDE, Text Pre-processing and Text Segmentation for OCR; Publisher:
International Journal of Computer Science Engineering and Technology; pp. 810-812.
[5] Pal, U.; Roy, R.K.; Kimura, F. Multi-lingual City Name Recognition for Indian Postal Automation.
International Conference on Frontiers in Handwriting Recognition. 2012,
[6] Jianbin Jiao, Qixiang Ye, Qingming Huang, A configurable method for multi-style license plate
recognition. 2009. Pattern Recognition, Volume 42, Issue 3, , Pages 358-369.
[7] Graef, R.; Morsy, M.M.N. A Novel Hybrid Optical Character Recognition Approach for Digitizing
Text in Forms. Extending the Boundaries of Design Science Theory and Practice. 2019, 11491, 206-
220
[8] I. Marosi, "Industrial OCR approaches: architecture algorithms and adaptation techniques", Document
Recognition and Retrieval XIV SPIE, pp. 6500-01, Jan 2007.
[9] Smith, R. An overview of the Tesseract OCR engine. International Conference on Document Analysis
and Recognition. 2007
[10] S.V. Rice, G. Nagy, T.A. Nartker, Optical Character Recognition: An Illustrated Guide to the
Frontier, USA:Kluwer Academic Publishers, pp. 57-60, 1999.
[11] Rehman, A.; Naz, S.; Razzak, M.I. Writer identification using machine learning approaches: a
comprehensive review. Multimedia Tools and Applications. 2018, 78, 10889-10931.
[12] Lu, S.; Liu, L.; Lu, Y.; Wang, P.S.P Cost-Sensitive Neural Network Classifiers for Postcode
Recognition. International Journal of Pattern Recognition and Artificial Intelligence. 2012, 26, 1-14.
[13] Gheorghita, S.; Munteanu, R.; Graur, A. An Effect of Noise in Printed Character Recognition System
Using Neural Network. Advances in Electrical and Computer Engineering. 2013, 13, 65–68.
[14] Unnikrishnan,R., Smith,R. Combined script and page orientation estimation using the tesseract ocr
engine. In: Proceedings of the International Workshop on Multilingual OCR. ACM;2009,p.6.
[15] Huang, W.; He, D.; Yang, X.; Zhou, Z.; Kifer, D.; Giles, C.L. Detecting Arbitrary Oriented Text in The
Wild With A Visual Attention Model. Proceedings of the 24th ACM international conference on
Multimedia. 2016, 551–555.
[16] Singh, S.; Sharma, A.; Chhabra, I. A Dominant Points-Based Feature Extraction Approach to
Recognize Online Handwritten Strokes. International Journal on Document Analysis and Recognition.
2017, 20, 37–58.
[17] Geometric Rectification of Camera-Captured Document Images. Jian Liang; DeMenthon, D.;
Doermann, D.; April 2008. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30,
no.4, pp.591-605..
[18] Pawar,N.,Shaikh,Z.,Shinde,P.,Warke,Y. Image to text conversion using tesseract. International
Research Journal of Engineering and Technology 2019;6(2):516519.
[19] Smith,R.,Antonova,D.,Lee,D.S. Adapting the tesseract open source ocr engine for multilingual ocr. In:
Proceedings of the International Workshop on Multi lingual OCR .ACM; 2009,p.1.
[20] Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems
Man and Cybernetics. 1975, 9, 62–66.

ISSN: 2005-4238 IJAST 4054


Copyright ⓒ 2020 SERSC

View publication stats

You might also like