Document 12

Uploaded by

ar drive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views4 pages

Document 12

Uploaded by

ar drive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Official Document Text Extraction using Templates

and Optical Character Recognition

Florin Harbuzariu, Cosmin Irimia, Adrian Iftene

Faculty of Computer Science
“Alexandru Ioan-Cuza” University
700259, Iasi, Romania
{harbuzariualexandruflorin, irimia.cosmin, adiftene}@gmail.com

Abstract—Documents have been used across history ever since II. S IMILAR S OLUTIONS
civilized societies first began appearing. Documents are used
everywhere today in our daily activities and were affected This section will present a couple of existing solutions in
by technological leaps. From documents written on paper, we the industry that deals with the problem of extracting text from
switched to digital documents. One of the technological fields official documents using OCR techniques. Considering OCR
that are dealing with documents is Computer Vision, specifically as a technology has been around for some time, there are
OCR, or optical character recognition. OCR is the process in already a couple of well-developed solutions to this problem
which an image containing text is converted into digital text
format [1]. Because computers are used everywhere nowadays, [2]. However, each system has a couple of disadvantages that
systems have already been designed for working with documents. will be presented. Those disadvantages will then serve as
In many systems that deal with documents, there is still a need reference points for the development of our system.
for manual work. This paper proposes a way in which OCR can
be applied to official documents for the extraction of their text. A. OmniPage
Index Terms—OCR, image processing, document processing OmniPage1 is a software designed for OCR and text ex-
traction for official documents. The program was created
in the late 1980s and is currently being developed by the
I. I NTRODUCTION Kofax company [3]. Despite being old, the program is still
getting updates and new versions to this day. OmniPage has
a template editor that can be used to define from scratch new
The main problem presented in this paper is extracting text
document templates for later text extraction. The user uploads
from documents. From now on, whenever the word document
a document into the editor and manually selects the fields of
is used, it refers to official documents such as identity cards,
interest. However, OmniPage still has some disadvantages:
driving licenses, and others. These documents are also scanned
• OmniPage may need a large computing power due to
or photographed, and stored as digital images. Generally,
there are two solutions used in practice: manual approach or the complex operations it does. OmniPage uses advanced
automatic systems using OCR. The manual approach implies machine-learning techniques to extract text [4].
• It may prove to be a challenge to casual users due to the
that a person must receive the documents, and all textual
information will be typed by hand into the computer. Before large amount of configuration it offers.
the technological advances in the research field of OCR, this B. IRISXtract
was the preferred method for decades. However, nowadays
when there is an alternative, certain problems arise that must IRISXtract2 is a software aimed at OCR and text extraction
be addressed regarding the traditional method. for official documents. The software was developed in the year
2013 by the company IRIS [5]. The program is getting updates
Among those problems is the importance of time. A lot
even today. IRISXtract has a desktop version and one major
of time is wasted on manually typing textual information. A
difference between this software and the other is that it has
time that could be spent on other more important activities that
no support for official document templates. The software uses
aren’t tedious. Another point is the financial benefit that the
machine learning to automatically detect document fields. The
system may bring. It may save more money, in the long run,
user doesn’t have the possibility of defining his new document
to design a system and integrate it than to keep the traditional
templates. This may sound good at first because it cuts down
way. All those wasted hours that are now used on more impor-
on the amount of work needed to be done by the users to
tant tasks greatly influence the financial profits as well. Such
extract text, but in the long run, it brings more problems. By
statements may seem bold at first but this paper will prove
not relying on predefined templates, the software is relying
them at the end when the proposed system’s performance
more on its machine-learning techniques. This can lower the
is evaluated and compared with the manual solution. Based
on these arguments, you can already see how an automated 1 https://www.kofax.com/products/omnipage

system would greatly benefit the company integrating it. 2 https://irisdatacapture.com/software/irisxtract/

Authorized licensed use limited to: Universitas Indonesia. Downloaded on November 20,2024 at 08:48:57 UTC from IEEE Xplore. Restrictions apply.
accuracy of the software. Lower accuracy means the software extracted text is sensitive or not. The property category tells
is more prone to errors in the text it extracted. In addition, it what type of field it is. This is used when applying regexes
may increase the processing time by not having a template. for text extraction.
Categories. There is no list of predefined documents, due
III. P ROPOSED SOLUTION
to the templatization mechanism. However, there is a list of
A. System Architecture predefined field categories. The currently allowed categories
The solution presented is a lightweight system composed are name, address, id, id series, id number, nationality, issuer,
of two major components: the backend and the storage [10]. date, driving license number, and driving license categories.
The backend component is composed of two other inner C. Template Mechanism
components: the template mechanism and the document text
extraction mechanism. Those inner components work together The focus of this paper is this mechanism. The template
to accomplish the system’s goals. The template mechanism mechanism helps the system in defining specific templates for
component is dealing with the uploaded template documents. uploaded documents. Those templates will then be used later
The component is preprocessing these templates before sav- in recognizing input documents from users. The mechanism
ing them locally. Those templates will be later queried and consists of three main steps: defining the template, processing
checked by the component to find a match whenever a user the template, and applying the template. The first step is
wishes to extract text from an uploaded document. Finally, the implemented by the interface used by the entrypoints and the
document extraction text component is dealing with the task system only receives the defined template through an API
of extracting text from uploaded documents. The component call. The last step is implemented by the OCR mechanism
is searching for a match, and after it finds one it applies OCR whenever it looks for a match of an uploaded document and
techniques to a list of coordinates, defined by the template. also when it applies OCR to the defined template’s fields. The
second step is about processing the uploaded template. This is
done because if the uploaded template is left unaltered, certain
problems will appear later at the OCR mechanism.
1) Processing Stage: The processing stage consists of two
steps: detecting and removing any faces from the uploaded
template and removing any relevant declared fields.

Fig. 2. The processing steps of the template mechanism

Removing means replacing that element with a white blank

rectangular shape. This is done because faces and text fields
may vary from document to document, and this greatly affects
Fig. 1. The system’s architecture the matching algorithm. The final processed image is saved
on the server. The faces are detected using OpenCV Haar
B. Data Structures Cascades3 , particularly the frontal face haar cascade. Haar
cascades are algorithms that can detect objects of interest
The structures of interest in this system are categories,
in images [6]. The advantage of this approach is that the
templates, pages, and fields. All of those structures, except
algorithms offer decent performance for this specific context
categories, are represented as tables in a relational database.
[7]. The haar cascade is already trained and has its weights
Templates. Any template contains a generated id and a
stored in a local XML file.
name given by the user when he creates it. Each template
2) Field Categories: The field categories are meant to
is made up of multiple pages.
describe the selected fields for each template. Each category
Pages. A template is considered a list of pages because, in
has a list of regexes that will be applied to extract the text of
practice, documents may contain multiple pages as well. In
that field. The categories are meant to map regexes to fields.
addition to this, a page contains a generated id, a name given
by the creator, and an image path that represents the local path D. OCR Mechanism
to that uploaded file. The mechanism consists of two main steps: processing
Fields. Each field is represented primarily by a rectangle the document and applying the regexes for text extraction.
area that shows the field’s position in the uploaded template. In The processing has to be done because if the document is
addition, each field has a sensitive boolean flag and a category
property. The property sensitive is meant to warn the user if the 3 https://docs.opencv.org/3.4/db/d28/tutorial cascade classifier.html

Authorized licensed use limited to: Universitas Indonesia. Downloaded on November 20,2024 at 08:48:57 UTC from IEEE Xplore. Restrictions apply.
left unaltered then the regexes cannot be properly applied.
Each regex is mapped to a specific set of coordinates. This
section will present more details about the processing stage.
The processing stage consists of two steps: aligning the
document and finding a match. Each template from the list
of defined templates is compared with the uploaded document
to find a match. The comparison is done by using algorithms
Fig. 4. The steps in aligning the document by warping
that compute a similarity score. Structural Similarity Index
Measure, or SSIM in short, and Relative Average Spectral
Error, or RASE, have been used to compute the similarity • brightness enhancement - The overall lightness of the
score. The alignment of the document is made in three steps: image is increased.
1) Obtaining an array of contours: This is done by us- • sharpness enhances - Sharpness is the amount of detail
ing the method f indContours from OpenCV4 . A variable in an image. It resembles the edges between zones of
threshold is used to obtain the array of contours. Therefore different colors [8].
these steps need to be repeated for each threshold value in a • contrast enhance - The visibility of elements is improved
specific interval. Before applying the method f indContours, by changing their relative brightness and darkness.
the image needs to be altered. First, it’s converted to the • binarization - The pixels in the image are mapped into
gray color using the method cvtColor with the parameter dual collections, white and black. By doing this, the
COLOR BGR2GRAY . The resulting image is then changed image is divided into foreground text and background
using the method GaussianBlur. After that, it’s changed again [9].
using the method Canny. This method is where the selected • noise removal - This process removes or reduces the noise
threshold value is relevant. Finally, the image is then changed in the image.
using the method erode. From that image, the contours will
Those techniques do not guarantee 100% the extraction of
be extracted using f indContours.
the text, but they increase the rate of success.

IV. E VALUATION
The original system took around sixty seconds to extract
text from a document. Such a performance was undesirable
considering the context in which the system was used. The sys-
tem has been through multiple iterations before implementing
the template mechanism in its current state. Those iterations
Fig. 3. The steps in finding the contour areas of interest improved the performance of the OCR mechanism.

2) Selecting the contour with the largest area: This is done TABLE I
because it is presumed that if a user sent an image with a P ERFORMANCE OF EACH ITERATION OF THE SYSTEM
document then that document is the object of interest in the Iteration Improvement Total Time
picture. Because of this, the object of interest will have the Iteration 1 Original Approach 66.88
largest contour area. The area of the contour is calculated Iteration 2 800×600 Image Resize 38.07
Iteration 3 200×100 Image Resize 30.30
using the method contourArea from OpenCV. Only the Iteration 4 TesseractAPI Transition 7.12
contour areas that contain four corners are considered possible Iteration 5 Common Words List 5.92
candidates because most documents contain four corners. Iteration 6 Priority Enhancement List 5.19
Iteration 7 Removed Useless Enhancements 3.45
3) Warping the selected contour so that it’s aligned with
the compared template: The selected contour is warped using
the method warpP erspective from OpenCV. It is warped The performance brought by these changes is more detailed
instantly by using perspective transformation matrices. The in the paper that presents the original system composed of
matrix is obtained using getP erspectiveT ransf orm from only the OCR mechanism [11]. Those changes were made
OpenCV. After warping it, RASE is used to calculate the before adding the template mechanism. In the original system,
similarity score and see if it’s the maximum score. the templates were simply defined as classes. The template
mechanism changed that approach and the old components
E. Enhancement Methods need to be tested again. To test the system’s performance,
Whenever a valid text cannot be extracted using a regex, twenty documents will be used to calculate the average time it
different enhancement techniques are applied to help the OCR takes for their templatization and for their text to be extracted.
engine in extracting the text. The techniques used are: After testing, the templatization takes an average of 0.22
seconds and the OCR mechanism takes an average of 3.03
4 https://opencv.org/ seconds. After testing the performance it is time to test the

Authorized licensed use limited to: Universitas Indonesia. Downloaded on November 20,2024 at 08:48:57 UTC from IEEE Xplore. Restrictions apply.
accuracy as well. We will test the accuracy for a set of twenty also drops. This can affect the financial benefits of the offered
documents with various image resolutions. service in the long run due to the drop in quality. Even though
the maximum accuracy of the manual approach is higher than
TABLE II the maximum accuracy of the automated solution, the latter
T HE ACCURACY OF THE SYSTEM FOR DIFFERENT IMAGE WIDTHS
has the benefit of keeping the accuracy constant. Thus, it can
Width (pixels) Accuracy (%) be said that the automated solution may benefit financially the
Original Size 90.48 company that decides to implement it. All of the documents
1,800 89.52
800 83.81
used in the evaluation of the proposed system are Romanian
200 80.90 documents. The documents had dimensions of around 900 and
3,000 pixels and were around 1MB and 5MB in size.
The previous table shows different accuracy percentages for V. C ONCLUSION
various width sizes. It is considered that the height of the In conclusion to this paper, the system is capable of defining
image is changed also accordingly to keep the aspect ratio. We templates for specific official documents thanks to the template
can see that for the width of 200 pixels, we have an accuracy mechanism, the focus of this paper. Later, those templates are
of around 80%, which is good enough for the amount of used to efficiently recognize input official documents. Finally,
performance this change of resolution brings. Thus, we can say the text is extracted from the identified documents according
the proposed system is not only fast, but accurate enough. One to the OCR mechanism and the template definition that was
last problem regarding the performance problem is whether or previously created. As such, the goals that were detailed at
not it justifies replacing a decade-old solution with this system. the beginning of this paper were achieved, thanks to the
The next table justifies the benefits of replacing the classic implemented system.
manual solution with an automated one. Twenty documents
were used as testing material and five persons helped in ACKNOWLEDGEMENTS
calculating the average time for the manual approach. To Data processing and analysis in this paper were supported
simulate the tiredness one gets from a day at work doing the by the Research center with integrated techniques for the in-
same task over and over again, three periods in the day were vestigation of atmospheric aerosols in Romania, under project
chosen: morning, noon, and evening. SMIS 127324 - RECENT AIR (RA).
TABLE III R EFERENCES
M ANUAL VS . AUTOMATED SOLUTION (T IME )
[1] J. Memon, M. Sami, and R.A Khan, ”Handwritten Optical Character
Recognition (OCR): A Comprehensive Systematic Literature Review
Time of Day Manual (s) Automated (s)
(SLR),” arXiv preprint arXiv:2001.00139 [cs.CV], Jan. 2020 .
Morning 46.3 3.01 [2] M.A. Awel, and A.I. Abidi, ”Review on optical character recognition,”
Noon 51.4 3.30 International Research Journal of Engineering and Technology (IRJET),
Evening 57.4 3.20 vol. 6, issue 6, pp. 3666–3669, 2019.
[3] P. Bernzott, J. Dilworth, D. George, B. Higgins, J. Knight, and et
al., ”Optical character recognition method and apparatus,” U.S. Patent
As the previous table shows, the automated solution greatly US5278920A, issued Jul. 15, 1992.
beats the manual solution when it comes to the time it takes [4] D. Marcondes, A. Simonis, and J. Barrera, ”The role of prior infor-
on average to extract textual information from documents. It mation and computational power in Machine Learning,” arXiv preprint
arXiv:2211.01972 [cs.LG], Oct. 2022.
can be seen that the more tired a person becomes, the higher [5] H. Schild, and A. Jantzen, ”IRISXtract for Documents Version 4.1
the time on average increases. While on the other hand, the Installation Step-by-Step Tutorial,” Version 1.3, February 3, 2017.
automated approach time is constant since a machine cannot [6] A. Priadana, and M. Habibi, ”Face Detection using Haar Cascades to
Filter Selfie Face Image on Instagram,” International Conference of Ar-
get tired. Table IV shows that the accuracy quality follows tificial Intelligence and Information Technology (ICAIIT), Yogyakarta,
a similar path. The more tired a person gets, the more the Indonesia, pp. 6-9, doi: 10.1109/ICAIIT.2019.8834526, Mar. 2019.
accuracy drops in percentage. Although at the start, the manual [7] A. Schmidt, and A. Kasiński, ”The Performance of the Haar Cascade
Classifiers Applied to the Face and Eyes Detection,” 10.1007/978-3-540-
approach accuracy is higher than the automated one, it is 75175-5 101, Oct. 2007.
affected later and drops in quality. The automated approach [8] J. Caviedes, and S. Gurbuz, ”No-reference sharpness metric based on
quality is constant always. local edge kurtosis,” InProceedings. International conference on image
processing, vol. 3, pp. III-III, IEEE, Sep. 2002.
[9] S. Uchida, ”Image processing and recognition for biological images,”
TABLE IV Development, growth & differentiation, vol. 55, no. 4, pp. 523–49, 2013.
M ANUAL VS . AUTOMATED SOLUTION (ACCURACY ) [10] M. Baboi, A. Iftene, and D. Gı̂fu, ”Dynamic Microservices to Create
Scalable and Fault Tolerance Architecture,” In 23rd International Con-
Time of Day Manual (%) Automated (%) ference on Knowledge-Based and Intelligent Information & Engineering
Morning 91.15 81.12 Systems. Procedia Computer Science, vol. 159, pp. 1035–1044, 2019.
Noon 86.77 80.62 [11] C. Irimia, F. Harbuzariu, I. Hazi, and A. Iftene, ”Official Document
Evening 79.59 81.13 Identification and Data Extraction using Templates and OCR,” In Pro-
ceedings of 26th International Conference on Knowledge-Based and
Intelligent Information & Engineering Systems. 7-9 September 2022,
One problem regarding the drop in accuracy when it comes Verona, Italy, Procedia Computer Science, vol. 207, pp. 1571—1580,
to the manual approach is that the quality of the offered service 2022.

Authorized licensed use limited to: Universitas Indonesia. Downloaded on November 20,2024 at 08:48:57 UTC from IEEE Xplore. Restrictions apply.

Optical Character Recognition:: An Illustrated Guide To The Frontier
No ratings yet
Optical Character Recognition:: An Illustrated Guide To The Frontier
197 pages
Ocr Presentation
No ratings yet
Ocr Presentation
15 pages
Survey of Post-OCR Processing Approaches
No ratings yet
Survey of Post-OCR Processing Approaches
37 pages
MANVA
No ratings yet
MANVA
51 pages
Ocr PPT GRP 12
No ratings yet
Ocr PPT GRP 12
10 pages
PDL-III Report FINAL
No ratings yet
PDL-III Report FINAL
34 pages
The 7 Best Free OCR Software Apps To Convert Images Into Text
100% (1)
The 7 Best Free OCR Software Apps To Convert Images Into Text
9 pages
ASET 21 Team 3 Paper
No ratings yet
ASET 21 Team 3 Paper
19 pages
OCR With Tesseract, Amazon Textract, and Google Document AI: A Benchmarking Experiment
No ratings yet
OCR With Tesseract, Amazon Textract, and Google Document AI: A Benchmarking Experiment
22 pages
Micro-Project OCR Finally
No ratings yet
Micro-Project OCR Finally
13 pages
Multilingual Text Recognition System
No ratings yet
Multilingual Text Recognition System
21 pages
Your Big Idea
No ratings yet
Your Big Idea
14 pages
PEACE：用于科学文档光学字符识别的化学导向数据集
No ratings yet
PEACE：用于科学文档光学字符识别的化学导向数据集
11 pages
Fi Pdflatex mk4 - Bezdeklarace
No ratings yet
Fi Pdflatex mk4 - Bezdeklarace
41 pages
OCRRRRRRRRRRR
No ratings yet
OCRRRRRRRRRRR
6 pages
3 M&a
No ratings yet
3 M&a
24 pages
Mini Project-04,52 00
No ratings yet
Mini Project-04,52 00
85 pages
Applsci 13 04584 With Cover
No ratings yet
Applsci 13 04584 With Cover
28 pages
A12REVIEW
No ratings yet
A12REVIEW
18 pages
OCR++: A Robust Framework For Information Extraction From Scholarly Articles
No ratings yet
OCR++: A Robust Framework For Information Extraction From Scholarly Articles
9 pages
Text Extraction From Document Image
No ratings yet
Text Extraction From Document Image
7 pages
Text Extraction From Image: Team Members CH - Suneetha (19mcmb22) Mohit Sharma (19mcmb13)
No ratings yet
Text Extraction From Image: Team Members CH - Suneetha (19mcmb22) Mohit Sharma (19mcmb13)
20 pages
Text Extraction From Image: Team Members CH - Suneetha (19mcmb22) Mohit Sharma (19mcmb13)
No ratings yet
Text Extraction From Image: Team Members CH - Suneetha (19mcmb22) Mohit Sharma (19mcmb13)
20 pages
AI Possible Risks & Mitigations: Optical Character Recognition
No ratings yet
AI Possible Risks & Mitigations: Optical Character Recognition
33 pages
Optical Character Recognition by Open Source OCR Tool Tesseract A Case Study
No ratings yet
Optical Character Recognition by Open Source OCR Tool Tesseract A Case Study
7 pages
Raspberry Pi
No ratings yet
Raspberry Pi
21 pages
9589-First Manuscript-57755-2-10-20220620 - X
No ratings yet
9589-First Manuscript-57755-2-10-20220620 - X
12 pages
Extracting Text From Images
No ratings yet
Extracting Text From Images
9 pages
A Comparative Analysis of Optical Character Recognition Models For Extracting and Classifying Texts in Natural Scenes
No ratings yet
A Comparative Analysis of Optical Character Recognition Models For Extracting and Classifying Texts in Natural Scenes
12 pages
Free Cmyk Chart Printable Download US Letter
No ratings yet
Free Cmyk Chart Printable Download US Letter
1 page
Optical Character Recognition (Ocr) : Karan Panjwani T.E - B, 68 Guided By: Prof. Shalini Wankhade
No ratings yet
Optical Character Recognition (Ocr) : Karan Panjwani T.E - B, 68 Guided By: Prof. Shalini Wankhade
24 pages
Text Detector (OCR)
No ratings yet
Text Detector (OCR)
12 pages
BCA - RC Spreadsheet User Guide Version 3
100% (1)
BCA - RC Spreadsheet User Guide Version 3
308 pages
A Survey of Modern Optical Character Rec PDF
No ratings yet
A Survey of Modern Optical Character Rec PDF
37 pages
Development of Text Extraction Technique 3acb33e9
No ratings yet
Development of Text Extraction Technique 3acb33e9
8 pages
Ocr & Cbir
No ratings yet
Ocr & Cbir
13 pages
Raj Synopsis12
No ratings yet
Raj Synopsis12
5 pages
EGP Strategy Document Final
No ratings yet
EGP Strategy Document Final
74 pages
ANN Miniproject Report
No ratings yet
ANN Miniproject Report
11 pages
Surrvey Paper On Intelligent Reader For Visually Impaired People
No ratings yet
Surrvey Paper On Intelligent Reader For Visually Impaired People
5 pages
Abstract (Extract Text From Image)
No ratings yet
Abstract (Extract Text From Image)
2 pages
Latest Base Paper
No ratings yet
Latest Base Paper
4 pages
Optical Character Recognition Using MATLAB: Sandeep Tiwari, Shivangi Mishra, Priyank Bhatia, Praveen Km. Yadav
No ratings yet
Optical Character Recognition Using MATLAB: Sandeep Tiwari, Shivangi Mishra, Priyank Bhatia, Praveen Km. Yadav
4 pages
Optical Character Recognition: Divyanshu Sagar Ahmed Zaid Faizee Vidyut Singhania
No ratings yet
Optical Character Recognition: Divyanshu Sagar Ahmed Zaid Faizee Vidyut Singhania
11 pages
Optical Character Recognizer: Team Member
No ratings yet
Optical Character Recognizer: Team Member
7 pages
10 1109@icirca48905 2020 9183326
No ratings yet
10 1109@icirca48905 2020 9183326
6 pages
AI Summary
No ratings yet
AI Summary
3 pages
CMP 222 Week 8 - Optical Character Recognition
No ratings yet
CMP 222 Week 8 - Optical Character Recognition
8 pages
Optical Character Recognition: Selected Topics in Computer Science
No ratings yet
Optical Character Recognition: Selected Topics in Computer Science
7 pages
WIKIPEDIA - OCR or Optical Character Recognition
No ratings yet
WIKIPEDIA - OCR or Optical Character Recognition
6 pages
Character Recoganization
No ratings yet
Character Recoganization
6 pages
Practical Assignment 01: OCR - Optical Character Recognition
No ratings yet
Practical Assignment 01: OCR - Optical Character Recognition
16 pages
Unlocking Text From Images: The Future of OCR Technology
No ratings yet
Unlocking Text From Images: The Future of OCR Technology
4 pages
OCR Presentation
No ratings yet
OCR Presentation
16 pages
OCR Assignment
No ratings yet
OCR Assignment
5 pages
Exam Questions ITIL-4-Foundation
100% (1)
Exam Questions ITIL-4-Foundation
15 pages
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
No ratings yet
Vaidhi Ayush Gurkirat Jatin Project Synopsis Format
6 pages
OCR (Optimal Character Recogintion)
No ratings yet
OCR (Optimal Character Recogintion)
7 pages
OCR Using Image Processing
No ratings yet
OCR Using Image Processing
8 pages
Arduino Based Digital Thermometer
67% (3)
Arduino Based Digital Thermometer
3 pages
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
No ratings yet
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
25 pages
Optical Character Recognition
No ratings yet
Optical Character Recognition
7 pages
A5E01428618-03 SITRANS CV en en-US
No ratings yet
A5E01428618-03 SITRANS CV en en-US
114 pages
IRS Imp
No ratings yet
IRS Imp
76 pages
5.0 Best Practices For OCR
No ratings yet
5.0 Best Practices For OCR
4 pages
Bosch Protocol Technical Information: en Application Note
No ratings yet
Bosch Protocol Technical Information: en Application Note
10 pages
Pactor Modem-SCS - Manual - PTC-IIe - 4.0
No ratings yet
Pactor Modem-SCS - Manual - PTC-IIe - 4.0
218 pages
User's Manual User's Manual
No ratings yet
User's Manual User's Manual
128 pages
Ebooks Implementation Guide Sme
No ratings yet
Ebooks Implementation Guide Sme
35 pages
Cyber Security Courseware With Tools & Labsetup - Santosh Chaluvadi
No ratings yet
Cyber Security Courseware With Tools & Labsetup - Santosh Chaluvadi
55 pages
Spam Detection Viva Questions Full
No ratings yet
Spam Detection Viva Questions Full
5 pages
Introduction To Optimum Design, Fourth Edition Arora - Own The Complete Ebook Set Now in PDF and DOCX Formats
100% (3)
Introduction To Optimum Design, Fourth Edition Arora - Own The Complete Ebook Set Now in PDF and DOCX Formats
55 pages
Reasoning Mains Checklist - 1
No ratings yet
Reasoning Mains Checklist - 1
73 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
47 pages
Sakshi Shinde SIP
No ratings yet
Sakshi Shinde SIP
25 pages
Report FlipFlops
No ratings yet
Report FlipFlops
15 pages
Interrupt System in 8086
No ratings yet
Interrupt System in 8086
21 pages
4IR Assi (AH Sir)
No ratings yet
4IR Assi (AH Sir)
18 pages
Turbo HD DVR V3.4.83 - Build170526 Release Notes - External
No ratings yet
Turbo HD DVR V3.4.83 - Build170526 Release Notes - External
2 pages
Introduction To Microprocessor: Advance Technology
No ratings yet
Introduction To Microprocessor: Advance Technology
13 pages
Programmable Machine Pre History - MechMachTheor - May2001
No ratings yet
Programmable Machine Pre History - MechMachTheor - May2001
15 pages
Kmu Cat Rollnoslip 333580
No ratings yet
Kmu Cat Rollnoslip 333580
1 page
Contoh Resume Jurnal Pendidikan
No ratings yet
Contoh Resume Jurnal Pendidikan
4 pages
Log
No ratings yet
Log
3 pages
Final Project Report Mobile Phone Jammer
No ratings yet
Final Project Report Mobile Phone Jammer
19 pages
The Hidden ROI of Embedded Analytics
No ratings yet
The Hidden ROI of Embedded Analytics
7 pages
Andrew Wells CV
No ratings yet
Andrew Wells CV
3 pages
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet

Document 12

Uploaded by

Document 12

Uploaded by

Official Document Text Extraction using Templates

and Optical Character Recognition

Florin Harbuzariu, Cosmin Irimia, Adrian Iftene

system would greatly benefit the company integrating it. 2 https://irisdatacapture.com/software/irisxtract/

Fig. 2. The processing steps of the template mechanism

Removing means replacing that element with a white blank

You might also like