Pattern 3
Pattern 3
NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
ABSTRACT
Image processing, together with pattern recognition, have been areas of study exploited in recent
years, achieving advances such as object classification, face recognition and text recognition. In
particular, the latter allows handwritten, typewritten or printed texts to be converted into editable
digital texts. This paper describes the design and operation of a handwritten text digitization system,
through the implementation of the Google Vision API, oriented to Android devices. The objective of
the research is to verify if its use increases the efficiency when recognizing autograph text due to the
poor performance of Optical Character Recognition (OCR) systems when processing texts of this
type. The developed system consists of three modules: 1) Image acquisition, 2) API consumption
request and 3) Digitization of the generated OCR. For the evaluation of its performance, eleven
document formats corresponding to the areas of education, health and industry were used, and four
different image conditions (with respect to quality adjustment and area of interest cropping), as well
as a comparison with some of the existing applications on the market. Based on the above, the
average recognition of the handwritten words was calculated with respect to those contained in
each format used and a 67% efficiency of the system was determined.
Keywords— Android, Google Vision API, optical character recognition, handwritten text.
DOI Number: 10.14704/nq.2022.20.10.NQ55457 NeuroQuantology 2022; 20(10): 4766-4778
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
that isin videos, files with extension *.pdf or problems with the implementation of
imagesobtained from physical documents, automatic learning techniques (Machine
allowing themanipulation of character strings, Learning, for its translation into English). An
facilitating theextracting and capturing example is the Google Vision API, a tool that
information that would not otherwise becould uses previously trained Machine Learning
do when, for example, the files are models to extract information from images, 4767
noteditable, or only a written version is add labels to classify them, detect objects and
availablehand. Indeed, one of the main faces, or identify printed and handwritten text
problemswithin the OCR generation is the that exists in them (Paramita et al., 2021).
recognition off hand written text, resulting in This paper describes the operation of
lessperformance in its digitization since the a system, own design, which consists of three
resultobtained is far from the expected (Cirillo modules: 1) Acquisition image, 2) API
& Rementeria, 2022). consumption request, and 3) Digitization of
Due to the usefulness of the OCR generated, for the digitization of
implementing OCR as a tool assist in carrying handwritten text by implementing the Google
out some daily taskshas incorporated into the Vision API, in order to verify its efficiency in
mobile application market,since the use of the autograph text recognition The evaluation
smartphones has scopes that had not of performance of the system was performed
previously been developed,even going so far by establishing different imaging conditions
as to replace specific purpose devicesfor the with light variation and cropping, analysing
practicality and quality they provide in the the correctly recognized words in comparison
realization ofvarious processes (Costa et al., with those existing within each document and
2021). Mobile Application Examplescommonly contrasting the results with some of the
used for recognition anddigitization of printed applications existing in the market.
and handwritten text are: a)Cam Scanner,
SYSTEM DEVELOPMENT
from INTSIG (Donthu et al., 2021), mainly
The system was developed for use on Android
characterizedfor being one of the most used
devices and is composed of three main
applications for thescanning and digitalization
modules: 1) Module of image acquisition, 2)
of documents; b) Google Lens, fromGoogle
Consumption request module of the API and
Inc. (Feng & Shah, 2022), which aims to serve
3) Digitization of the generated OCR. In the
astranslator, search engine and identifier of
Figure 1 graphically represents the general
text in images inreal time; and c) Office Lens,
structure of the system operation.
from Microsoft (Li et al., 2019), which
In the image acquisition module, the
alsoserves as a document scanner and
user adjusts manually the image containing
digitizer, but witha simpler interface than the
the handwritten text that you want to digitize.
other two copies.
This image is captured from the device
As an alternative to creating process
camera or selected from gallery of this Also,
automation solutions, companies such as
within the same interface, it can be rotated
Google, Microsoft or Amazon have released
vertically or horizontally and cropped,
Application Programming Interfaces (APIs)
adjusting the edges of the area to be
((Liao & Xu, 2020); (Osco et al., 2022);(Pande
processed. Figure 2 presents each of the
et al., 2022)), with the aim that It is possible
actions that must be carried out manually by
to develop quality technology that allows
the user for the operation of the first module
contributing to the resolution of various
and thus, acquire the image desired.
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
4768
The second module consists of the requested from Google Vision API; in this
request for the consumption of the Google case, since it is OCR, you must specify the text
Vision API (Paramita et al., 2021), therefore, is detection (DOCUMENT_TEXT_DETECTION)
the module that allows requesting and [fifteen]. Once the consumption request is
receiving the response from the generation of made to the API, it returns as a response a list
the OCR for digitizing the handwritten text of OCR objects generated, which is assigned
obtained from the image acquired in the its value to a constant Text Annotation that
previous module (see Figure 3). allows access to the data.
To make the request for consumption Finally, the third digitization module
of the service, it is it is necessary to create an of the OCR (also included in the process
account on the Google Cloud platform shown in Figure 3), is the one that allows to
((Plattfaut et al., 2022); (Watanabe et al., work with the list contained in the constant
2022))and generate the project credentials for declared where the obtained OCR is stored as
the authentication, otherwise it could not be an API response. This is traversed and each
set communication with the API. value content in the indexes is overwritten
Furthermore, the device in question must within the file which are subsequently
have internet connection because it is a cloud printed, in such a way that can be viewed
computing service. The image becomes a byte within the interface and saved to a *.txt file,
array using the apace commons package and so that the digitized text is found available to
sends as a parameter of the consumption the user.
request, specifying the type of processing
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
4769
Figure 2: Manual process that the user performs in the module image acquisition.
Figure 3. Algorithm of the operation of the buttons (Floating Action Button, FAB, for its
modules of the Google Vision API acronym inEnglish), which allow the
consumption request and digitization of the acquisition of the image and the second is the
OCR generated. activity where it is shown one the image
Figure 4 presents two examples of activities acquired and processed, the OCR obtained
(windows or interfaces in the Android and a TOAST (pop-up message), that the *.txt
operating system (Wiepking et al., 2021)), file has been generated. It is worth
corresponding to the design of the developed mentioning that SOMTM is the name placed
system. The first is the main activity of the him on the development project in question.
system, containing a menu with three floating
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
4770
Figure 3: Algorithm of the operation of the modules of the Google Vision API consumption
Figure 4: Examples of activities corresponding to the design of the system graphic interface
Figure 4. Examples of activities corresponding to the design of the system graphic interface:
a) Image acquisition. b) Visualization of the acquired and processed image, as well as the OCR
obtained and file generation pop-up message *.txt.
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
In turn, A, B, C and D represent the number of wordscorrectly recognized by the system under
theimaging conditions in the following order: (A) image withnatural light and clipping to the edge of
the sheet, (B) image withwhite enhancement and cropping to the edge of the sheet, (C) imagewith
natural light and crop leaving the area where the text ismanuscript and (D) image with white
enhancement and croppingleaving the area where the handwritten text is.
After carrying out the analysis of each format with the four imaging conditions by using the 4772
system integrated, the same procedure was repeated using Cam Scanner, Google Lens and Office
Lens.
Table 1. List of formats used with the assigned ID, number of handwritten words contained in
each format and number of words recognized by each image condition.
Number Imaging conditions
ID format of
(A) (B) (C) (D)
words
I 91 42 42 41 43
II 586 502 522 534 538
III 44 26 26 28 30
IV 35 19 21 22 24
V 150 120 120 128 129
VI 126 110 110 119 120
VII 60 30 30 31 31
VIII 41 26 28 28 30
IX 32 20 20 22 24
X 49 22 22 21 22
XI 161 123 123 130 132
In the same way, the word count was maderecognized under each of the imaging conditions,
foreach application used, in order to calculate thearithmetic mean, as presented in equation (1) (Xin
et al., 2021),and thus, estimate the average efficiency of the recognition ofhandwritten words for
each application.
𝐸𝑥𝑖 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑖 (1)
𝑥= =
𝑛 𝑛
Where,
𝑥 = is the arithmetic mean.
𝑥1 = is the percentage of recognition of handwritten words by format.
𝑛 = is the number of sample formats.
To obtain the value of were involved, both the amountof recognized handwritten words, such as the
total number ofhandwritten words contained by format, asshown in equation (2).
𝑅𝑒𝑐𝑜𝑔𝑛𝑖𝑧𝑒𝑑𝑎𝑛𝑑𝑤𝑟𝑖𝑡𝑡𝑒𝑛𝑤𝑜𝑟𝑑𝑠 ∗ 100 (2)
𝑥1 =
𝐻𝑎𝑛𝑑𝑤𝑟𝑖𝑡𝑡𝑒𝑛 𝑤𝑜𝑟𝑑𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑓𝑜𝑟𝑚𝑎𝑡
When calculating recognition percentages of handwritten words per format under each condition of
image, it was observed that condition (A) is where a lower recognition has and, on the contrary, the
condition (D), is the one that presents a more height of handwritten words contained in documents
(Figures 6, 7, 8 and 9). Likewise, recognition is influenced by factors external to the system and
coming from mainly from the image to be processed or from the structure of the document from
which you want to extract the text.
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
4773
Figure 6: the percentage of recognition by format in the image condition with natural light and
cropping at the edge of the page.
For example, him attendance and delivery list format contains cells very small compared to the
reporting format, which it only contains lines to be able to write on them; the tutorial format
contains a table with breaks specific ones destined to write the manuscript text, but with different
cell dimensions; or the recipe format that has a small blank space without printed lines, which
means that the writer can do it with an indefinite text layout.
Others Factors that affected the generation of the OCR were the distance between the
characters and their position within the format, since, if they intersected with some edge of some
table contained in the format, were very close or overlapping, the system did not recognize the
character in question, omitting its processing or changing it to another that believed similar,
reducing the amount of text correctly identified manuscript. Furthermore, at the moment displaying
the generated OCR result, the characters digitized from some formats that contained tables
presented an order of appearance, of certain characters, different from the original format.
Figure 6 shows the percentage of recognition by format in the image condition with natural
light and cropping at the edge of the page.
Figure 7: Percentage of recognition by format in condition image with white enhancement and
cropping to the edge of the sheet.
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
4774
Figure 8: Recognition percentage by format in condition image with natural light and crop leaving
the area where the handwritten text.
Figure 9: Percentage of recognition by format in condition image with white enhancement and
cropping leaving the area where is the handwritten text.
It was seen that, of the applications disadvantage of using this app is that OCR is
available in the market that were selected, not a free feature, it is you must pay a
Office Lens had a lower performance because, monthly or annuity, otherwise only the
at the time of processing the image only digitization of the document is performed
identified the printed text that existed within without the text identification.
the format, but did not identify the text Based on text recognition percentage
manuscript, which meant that in some of the manuscript by format, the arithmetic mean
formats employees could not digitize a single was obtained with relation to each of the
character. Google Lens, despite using the imaging conditions, allowing to appreciate
same API to generate the OCR, had some more clearly than the system developed has
variations because in some formats comparable performance with Google Lens
recognition was less than the system, but and with Cam Scanner, as shown in Figure 10,
mostly had favourable recognition; without where it is observed that the system under
However, digitizing is not allowed in this the condition of image (A) obtains 64%, under
application of some document, that is, the the image condition (B) obtains 65%, under
identification of the characters is done in real the image condition (C) 68 % and in the image
time and with labels superimposed on the condition (D) 71%.
image that is focused from the camera of the The mentioned results were used to
device. Lastly, Cam Scanner was the app with calculate the average recognition percentage
the higher performance in character and determine the efficiency of the system, in
recognition manuscripts, however, the main
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
this way 71% of the Cam Scanner, 68% Google Lens, 4% Office Lens and 67% of the system.
4775
Figure 10.Average recognition for each image condition in relation to the application used.
In order to have a more detailed overview of the performance of the system and considering
that it is aMachine Learning problem, the following were obtained metrics of the confusion matrix
(Zeng & Zhang, 2020): accuracy (E), rate of error (Te), sensitivity (known as recall or rate of true
positives, TPR); specificity (known as the false positive rate, FPR for its acronym in English); and
precision (P). Said Metrics were calculated using equations (3 – 7).
𝑉𝑃 = 𝑉𝑁 (3)
𝐸=
𝑉𝑃 + 𝑉𝑁 + 𝐹𝑃 + 𝐹𝑁
𝐹𝑃 + 𝐹𝑁 (4)
𝑇𝑒 =
𝑉𝑃 + 𝑉𝑁 + 𝐹𝑃 + 𝐹𝑁
𝑉𝑃 (5)
𝑇𝑃𝑅 =
𝑉𝑃 + 𝐹𝑁
𝑉𝑁 (6)
𝐹𝑃𝑅 =
𝑉𝑁 + 𝐹𝑃
𝑉𝑃 (7)
𝑃=
𝑉𝑃 + 𝐹𝑃
Where, VP= True Positives, VN= True Negatives, PF= False Positives, FN= False Negatives.
Table 2 shows a comparison of the result of the metrics obtained in the evaluation of the
performance of the system compared to selected applications. VP are the correctly recognized
words, FP the incorrectly recognized words, for example, in the word "Design" that for some factor
unrelated to the algorithm (like calligraphy), the ñ was changed to an n, giving a wrong recognition
result; and FN, the words handwritten embodied in the format but that were not extracted from the
image, being discarded from the sample for OCR generation. Regarding the VN, it is considered a null
value because the reconnaissance analysis is focused on performing handwritten word identification
tests, not the other way around. I mean, I don't know conducted a round of tests to verify that
effectively the system did not recognize an element not manuscript, because of the importance
given to the correct identification of each word. Due to the above, it scores 0% on the specificity
assessment for each app.
Table 2. Confusion matrix metrics obtained for the system performance evaluation.
App
Metrics
System CamScanner Office Lens Google Lens
𝐸 68 % 72 % 21 % 68 %
𝑇𝑒 33 % 28 % 96 % 32 %
𝑇𝑃𝑅 92 % 93 % 4% 93%
𝑃 71 % 75 % 46 % 72 %
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
FUTURE WORKS
CONCLUSIONS AND RECOMMENDATIONS The system with implementation of the
When performing the analysis of word Google Vision API had comparable
recognition by each of the formats used as a performance with applications of the market
sample and the four established image selected for digitizing text manuscript.
conditions, it is observed that OCR is However, as future work that allows the 4776
determined by the characteristics of the improvement of its operation, a module of
image that you want to process, from quality image pre-processing for the removal of
and lighting to the position that the text edges of lines and increase recognition of
occupies within the format or if there are characters, as well as the implementation of
tables, lines and reduced spaces available for algorithms OCR-based optimization, such as
writing. Similarly, the legibility of the KNN (K– Nearest Neighbours), decision trees
calligraphy written is a determining factor, or neural networks artificial, in order to
since, if the letters are small or have a reduce the limitations identified being an API
reduced space between them, makes the client. In addition, it is proposed to carry out
word is not recognized, the characters are tests that allow to have VN values and thus,
changed to similar symbols or omit their be able to evaluate with greater precision the
digitization if they are not find similarity. metrics corresponding to the system
On the other hand, the use of the performance.
Google Vision API allows the generation of
OCR with a minimum level of resources of the BIBLIOGRAPHY
device because it is a computing service in the [1]. Akhoondzadeh, M. (2022). Advances in
cloud; However, there is a disadvantage of the Seismo-LAI anomalies detection within
complexity of understand and understand Google Earth Engine (GEE) cloud
more thoroughly the operation of Machine platform. Advances in Space Research,
Learning algorithms implemented in your 69(12), 4351–4357.
development, because being a client limits the https://doi.org/https://doi.org/10.1016/j
functions and does not you have access to the .asr.2022.03.033
encoding algorithm recognition. [2]. Antico, M., Balletti, N., Laudato, G.,
Likewise, when making the Lazich, A., Notarantonio, M., Oliveto, R.,
comparison with some of the existing Ricciardi, S., Scalabrino, S., & Simeone, J.
applications on the market, it is considered (2021). Postural control assessment via
system developed a functional tool for the Microsoft Azure Kinect DK: An evaluation
OCR generation at 67% efficiency average in study. Computer Methods and Programs
recognition, leaving 4% for below the highest in Biomedicine, 209, 106324.
performing application in the tests. https://doi.org/https://doi.org/10.1016/j
Finally, by obtaining the metrics of the .cmpb.2021.106324
matrix of confusion, with the purpose of [3]. Bor, H. (2022). Absolute weighted
having a greater analysis of the performance, arithmetic mean summability of factored
it was observed that the system has a infinite series and Fourier series. Bulletin
accuracy of 68%, an error rate of 33%, a 92% Des Sciences Mathématiques, 176,
sensitivity and 71% accuracy, confirming that, 103116
in accordance with the conditions of image [4]. https://doi.org/https://doi.org/10.1016/j
evaluated and the formats used, it is a system .bulsci.2022.103116
able to correctly recognize a considerable [5]. Carrascosa, M., & Bellalta, B. (2022).
amount of handwritten words. Cloud-gaming: Analysis of Google Stadia
traffic. Computer Communications, 188,
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
www.neuroquantology.com
eISSN 1303-5150
NEUROQUANTOLOGY | AUGUST 2022 | VOLUME 20 | ISSUE 10 | PAGE 4766-4778| DOI: 10.14704/NQ.2022.20.10.NQ55457
BHARGAV / GOOGLE VISION API IMPLEMENTATION FOR HANDWRITTEN TEXT DIGITIZATION SYSTEM
www.neuroquantology.com
eISSN 1303-5150
Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.