0% found this document useful (0 votes)
102 views6 pages

Application of Image Recognition and Machine Learning Technologies For Payment Data Processing

This document discusses approaches for receipt image preprocessing, receipt text detection, receipt text recognition, and receipt text analysis to enable automatic processing of receipt data. It describes methods for noise reduction, contrast enhancement, binarization, text detection, and character recognition. The goal is to extract useful information from receipt images in a fast and accurate manner.

Uploaded by

AbhiknavKumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views6 pages

Application of Image Recognition and Machine Learning Technologies For Payment Data Processing

This document discusses approaches for receipt image preprocessing, receipt text detection, receipt text recognition, and receipt text analysis to enable automatic processing of receipt data. It describes methods for noise reduction, contrast enhancement, binarization, text detection, and character recognition. The goal is to extract useful information from receipt images in a fast and accurate manner.

Uploaded by

AbhiknavKumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Application of Image Recognition and Machine

Learning Technologies for Payment Data Processing


Review and Challenges

Artjoms Suponenkovs, Guntis MosƗns, JƗnis Kampars, Audris Locmelis,


Aleksandrs Sisojevs KrišjƗnis Pinka, JƗnis Grabis RomƗns Taranovs
Department of Image Processing and Department of Management Accenture
Computer Graphics Information Technology Accenture
Riga Technical University Riga Technical University Riga, Latvia
Riga, Latvia Riga, Latvia [email protected]
[email protected] [email protected] [email protected]

Abstract— The automatic receipt analysis problem is very unnecessary information. Therefore, receipt analysis procedure
relevant due to high cost of manual document processing. will be a faster, more precise and simpler process.
Therefore, the presented paper investigates the problems of
receipt image analysis. It describes approaches for receipt image After receipt image pre-processing, it is possible to detect
pre-processing, receipt text detection, receipt text recognition and the area containing receipt text. The receipt text detection
receipt text analysis. These approaches allow to make receipt procedure tries to find text margins. Receipt text detection is a
analysis system adaptable for a real-life environment and to very complicated task. Text styles, sizes and different language
convert the input information to a usable format for analysing texts could cause a lot of problems. To solve this problem, it is
information in the receipts. A pipeline for payment data necessary to find text margin unique features and receipt text
processing staring with image capture to payment data posting is features have to be analyzed. We can use different texture
defined and appropriate technologies for every stage of the analysis types. It can be spatial frequencies analysis, structural
process are proposed. Advantages and limitations of these elements analysis, statistical characteristics analysis, etc.
technologies are reviewed and open research challenges are Spatial frequencies analysis can be helpful to high frequencies
identified. The payment data processing is analyzed as an enabler detection. High frequencies detection is very helpful text
of digital transformation of expense reporting processes. feature. Structural elements analysis may be useful for
bounding box detection. Statistical characteristics analysis get
Keywords — computer vision, image pre-processing, receipt
additional information about receipt texture. There are many
analysis, text recognition, natural language processing, document
classification.
statistical characteristics: contrast, dissimilarity, homogeneity,
energy, entropy, etc. For statistical characteristics calculation,
special matrix can be used. After detection of unique features
I. INTRODUCTION of text margins, we can analyse a receipt image and find
The receipt analysis problem is very relevant due to high separate text margins. As a result, receipt text detection
cost of manual document processing. To address this problem, methods reduce unnecessary information. Therefore, further
there is a number of processing stages to be considered. There analysis procedure will be a faster, more precise and simpler.
are four important stages: receipt image pre-processing, After receipt text detection, we have separate text margins -
detection of image area containing receipt text, receipt text bounding boxes. Accordingly, we can analyse each text margin
recognition and receipt text analysis. (bounding box) separately. As need arises, it is possible to use
Receipt image pre-processing is a necessary stage. This image pre-processing for each bounding box. It can be useful
stage allows to decrease real-life environment and noise for bounding boxes with different text sizes and styles. It
influence on input information. At this stage, it possible to allows to normalize textual information and prepare this
solve a brightness level problem and a contrast problem. The information for further analysis.
image pre-processing methods allow to make contrast The next task is the receipt text recognition. This task is to
stretching and brightness level measurement. Image brightness, recognize all characters in bounding box. For this purpose,
contrast, noise and colour correction works more precisely, if specially trained artificial neural network may be built. Given
we have additional information about receipt type. This input information, artificial neural network can detect
information about receipt is useful for the noise reduction and characters. Unfortunately, artificial neural networks usually
receipt text normalization. Receipt text normalization can be require a large amount of training data. It is not always
useful when the arrangement of a receipt text is not quite possible, to collect these training data. In this case, it is possible
straight. As a result, image pre-processing methods reduce to use other text recognition technologies. Text recognition

Accenture is the sponsor of this research.

978-1-5090-1201-5/15/$31.00 ©2017 European Union


yields separate characters or semantic information held in the A. Receipt Image Pre-Processing
receipt. Usually, a receipt image contain some noise. This noise
The next task is the receipt text analysis. This task is to causes problems for receipt text analysis. For that reason it may
understand sequence of characters and find important fields of be worth using Gaussian filtering or Perona-Malik filtering to
receipt. For this purpose, the semantic analyzing can be used. reduce noise. There is a small difference between these two
Unit of semantic information is a word. Therefore, it is very methods of filtration. The first one (Gaussian filtering)
important to detect receipt keywords. Using these keywords it smoothes edges, but the second one smoothes only weak edges.
is possible to classified receipts and find informative fields. In case, when input receipt image contains many objects,
The other approach uses artificial neural network to analyse receipt edge information can be useful. Therefore, we can save
text of receipt. As a result of receipt text analysis we get the edge information by using Perona-Malik filtering [1,2].
necessary receipt data. Using this receipt data, receipt data Usually, a receipt image has a low brightness level. In this
analysis, automatic labelling, entity extraction, receipt analysis case, some characters are not visible. Therefore, it is important
system correction, etc. can be implemented. to use histogram equalization. This equalization allows to make
The presented paper investigates the above-mentioned contrast stretching and brightness level measurement [3,4].
challenges.
B. Binarization
II. PROBLEM STATEMENT Input colourful image of receipt has a lot unnecessary
information. This image contains four colour channels (red,
In this work several approaches and methods are described
green, blue and alpha). Therefore, each image pixel consists of
which are used for receipt image pre-processing, receipt text
four bytes. This means that each pixel has 2^32 (4 294 967
detection, receipt text recognition and receipt text analysis.
296) possible values. The binarization methods allow to reduce
These approaches and methods allow to perform such tasks as:
the amount of unnecessary information. In this case, each pixel
• noise reduction and contrast stretching; has 2 possible values:
• brightness adaptation and colour correction; "1" - text pixel;
• binarization; "0" - receipt pixel.

• detection of unique textual information features; The popular binarization method is Otsu thresholding [5,6].
This method allows to convert greyscale image into
• receipt text detection; monochrome. Otsu method finds the binarization threshold,
that allows to reduce inter-class dispersion. This method
• text image rotation;
divides the receipt image into two pixel regions (into two
• receipt text recognition; class). Otsu thresholding provides good results in many cases.
It's when the brightness level gets heterogeneous, that problems
• receipt text analysis. occur. To meet these challenges, it is possible to use two
binarization methods.
III. MATERIALS AND METHODS
1) Local thresholding method using moving averages
The proposed approach consists of 7 different parts - 4 This method [7] calculates the mean value of intensity at
basic parts and 3 additional parts Fig. 1. each step. The mean intensity calculation is shown in the (1):

1 k
m(k ) = ¦ zi
n i = k +1− n
(1)

where k - pixel index (or step number);


z - pixel intensity;
n - the amount of points used in calculating the mean
intensity.
In the next step, binarization threshold can be calculated as
follows (2):

Tk = b * m(k ) (2)
Fig. 1. Proposed approach flowchart
where b - is the constant;
m - the mean intensity from (1).
As a result, is calculating the threshold for each pixel. V = Max( R, G , B )
2) Local thresholding method using region brightness
­ 255 *V − min( R, G , B ) (5)
level S=®
At the beginning proposed method calculate region ¯ V
brightness levels Fig. 2.
where B - blue, R - red and G - green are colour coordinates
of image pixels (range is from 0 to 255);
S- saturation (range is from 0 to 255);
V- value (range is from 0 to 255);
*comment - if V = 0 then S = 0.
It is accepted that receipt has only two colours: black and
white. In this case, it is possible to define black and white
colour range as follows:
• the white colour saturation range is from 0 to 60;
• the white colour value range is from the mean intensity
(1,3) to 255;
• the black colour value from 0 to the mean intensity;
Fig. 2. Average brightness level of region
• the black colour saturation range is from 0 to 255
The average brightness level of region can be calculated as minus the mean intensity which is divided by 2 (6).
follows (3):
V (6)
Max _ Saturation = 255 −
x 0 + n −1 y 0 + n −1 3
¦ ¦ V ( x, y )
x= x0 y= y0
V ( xr , y r ) = 2
(3) where V - the mean intensity from (1) or (3).
n
The colour corrections method allows to improve the
where V - is pixel intensity; binarization result.
x and y - pixel coordinates;
C. Morphological Image Processing
xr and yr - region coordinates; The morphological methods[9] use invert monochrome
n - region size; image. After binarization, characters may have gaps of contour.
To meet this challenge, it is possible to use morphological
x0 and y0 - region left top pixel coordinates. dilation Fig. 3.
In the next step, the binarization threshold can be calculated
as follows (4):

Txy = b * V ( x / n, y / n) (4)

where b - is the constant;

V - the mean region intensity from (3).


These two binarization methods perform good results, but it
is possible to improve these results using colour corrections.
Fig. 3. Morphological dilation
3) Colour corrections
This method uses additional information about receipt The morphological dilation method is very important. This
colours. It is possible to calculate additional pixel parameter - method allows bringing together text characters into a set of
saturation. For this purpose, we can use Hue Saturation Value pixel regions. This is necessary in order to perform receipt text
colour model (HSV)[8]. In this case, the value of monochrome detection.
image pixel is depending on two parameters: saturation and
value. These parameters can be calculated as follows (5):
D. Receipt Text Detection F. Receipt Text Recognition
After morphological dilation, there are connected pixel The next task is the receipt text recognition. This task is to
regions. Therefore, it is possible to make segmentation and to recognize all characters in bounding box. For this purpose, it is
obtain separate segments. These segments should contain possible to use artificial neural network (ANN) or optical
textual information. But in some cases, segments may contain character recognition engine (OCR) [14, 15]. Several OCR
noise. For that reason, it is important to check contents of the engine options may be considered for receipt text recognition:
segments. The content verification can be performed by texture Cuneiform, Tesseract and Finereader Engine. The second and
analysing. The main goal of textural analysis is to detect unique the third engine have good performance. Sadly, the third
textual information features. There are three important textural engine is not free. Therefore, Tesseract engine was used in the
analysis types: spatial frequencies analysis, statistical proposed approach.
characteristics analysis and structural elements analysis.
Statistical characteristics analysis gets additional information G. Receipt Text Analysis
about receipt texture. There are many statistical characteristics: In the end, it is possible to process sequence of characters
contrast, dissimilarity, homogeneity, energy, entropy and and find important information. A receipt can contain a lot of
others. For statistical characteristics calculation, Gray Level important fields: purchase price, addresses, time stamps,
Co-Occurrence Matrix (GLCM) and Gray Level Size Zone claimed amounts, expense category, etc. Depending on
Matrix (GLSZM) can be used [10, 11, 12]. After receipt text assigned tasks, it possible to teach computer system to
detection we have separate text margins - bounding boxes recognize proper receipt fields. For this purpose, we can
produce a machine learning engine using the TensorFlow
E. Text Image Normalization library.
Receipt text normalization can be useful when the
arrangement of a receipt text is not quite straight. In this case, it IV. EXPERIMENTAL RESULTS
is possible to rotate bounding box using rotation matrix [13].
New bounding box coordinates can be calculated as follows There are two experiments: binarization and receipt texture
(7): analysis.

A. Binarization
Cos (θ ) Sin(θ ) º
[x, y ]* ª« »=
The point of this experiment is to compare 3 binarization
methods:
¬ Sin(θ ) Cos (θ ) ¼

(7)
= [x * Cos (θ ) + − y * Sin(θ ) x * Sin(θ ) + y * Cos (θ ) ] = • Otsu thresholding;
= [ xnew, ynew] • local thresholding method using moving averages;
where x and y - old coordinates of pixel; • local thresholding method using region brightness
level.
xnew and ynew - new coordinates of pixel;
The methods were estimated by the following criterions:
ș – rotation angle can be calculated as follows (8):
• number of images with very bad receipt content
visibility (VBV) - more than 30 percent of receipt
§ Y1 − Y 2 · characters are invisible;
θ = arctg ¨ − ¸ (8)
© X1 − X 2 ¹ • number of images with bad receipt content visibility
(BV) - more than 10 percent of receipt characters are
where X1, Y1 and X2, Y2, are coordinates of point1 and invisible;
point2 (Fig. 4).
• number of images with middle receipt content
visibility (MV) - receipt text is visible, but it is not
possible to clearly see all characters;
• number of images with good receipt content visibility
(GV).
In this experiment were processed 68 receipt images. These
provide important estimation results which are shown in Fig. 2
and Fig. 5. The Fig. 6 shows estimation results of Otsu
binarization. There are many images with bad (BV) and very
bad (VBV) visibility.

Fig. 4. Text image normalization


The above-mentioned actions were performed 297 times.
These provide important features which are shown in Table 1.
There are features which allow to distinguish textual
information from background. It is not difficult to see how
most of the textual information features are dissimilarity and
contrast. The background features, on the other hand, are large
zone gray level emphasis (LZLGE), homogeneity, etc. (Table
1). Some features are easy to understand, like dissimilarity and
contrast. But some of those invisible, imperceptible to the
naked eye features are hard to understand, like LZLGE and the
barycenter on sizes. Therefore, statistical characteristics
analysis can be helpful for textual information feature finding
and receipt text detection.
Fig. 5. Otsu binarization estimation The Table 1 shows the feature confidence levels. The
confidence level defines the importance of feature. Higher
The binarization results of the local thresholding methods confidence level means more importance of feature. The
are much better than the results of the first method Fig. 3. Both confidence range is from 0 to 1.
of local thresholding methods have relatively good results.
Therefore, these methods can be useful for receipt text
binarization. TABLE I. FEATURES COMPARISON BETWEEN TEXTUAL INFORMATION
AND BACKGROUND

Feature
Feature is greater for Confidence
number/s
Large Zone
55 - LZLGE Low Gray
Background 0.987
(GLZM) level
Emphasis

The
61 - BARYS
barycenter on Background 0.980
(GLZM)
sizes

14 - GLCM
Dissimilarity Text 0.980
(VERTICAL)
Zone
59 - ZPC Percentage.
Text 0.980
Fig. 6. Local thresholding estimation (GLZM) Equality of
iso-sizes

B. Receipt Texture Analysis 58-SZNU


Size Zone
Non Uniform. Text 0.976
The point of this experiment is to find unique textual (GLZM)
Uniformity
information features. For this purpose, it is possible to use
statistical characteristics analysis. Statistical textural features
63-VARS The variance
can be calculated by two gray level matrix. Sixty-four (GLZM) on sizes
Background 0.973
statistical features were used in this experiment. Forty-eighth
features can be calculated by GLCM and sixteen by GLSZM. 13 - GLCM
Contrast Text 0.973
In this experiment is performed features comparison between (Vertical)
texture of textual information and texture of background. 2,26,38 -
Background - it is unimportant information (paper, table, etc.). GLCM
There is experiment description: (Horizontal, Dissimilarity Text 0.970
Diagonal 46°,
• selection of pixels of textual information and Diagonal 136°)
background;
15 - GLCM
Homogeneity Background 0.963
• calculation of 64 features of textual information and (Vertical)
background;
V. CONCLUSIONS
• comparison of 64 features of textual information and The results of this research show that there are many
background; opportunities to improve the quality of automatic receipt
• recording of the comparison results. analysis. These improvements are possible thanks to the above-
mentioned methods and approaches for receipt image pre-
processing, receipt text detection, receipt text recognition and [4] J. Alex Stark, “Adaptive Image Contrast Enhancement Using
receipt text analysis. Generalizations of Histogram Equalization“ IEEE Transactions on
image processing, 2000.
The first experiment results show that local thresholding [5] N. Bhargava, A. Kumawat, R. Bhargava, “Threshold and binarization
methods have relatively good results. These binarization for document image analysis using otsu’s Algorithm“,International
methods allow to perform brightness adaptation. This Journal of Computer Trends and Technology (IJCTT) – volume 17
Number 5 Nov 2014.
adaptation is very important for character recognition. Further
[6] Linda G. Shapiro, George C. Stockman, “Computer Vision”, Prentice
research show that it is possible to improve binarization results Hall, 2006.
using colour correction and morphological dilation. [7] C. Gonzalez, E. Woods, “Digital Image Processing“, Third Edition,
The second experiments results show that it is possible to published by Pearson Education, Inc., 2008.
distinguish textual information from background using [8] Noor A. Ibraheem, Mokhtar M. Hasan, Rafiqul Z. Khan, Pramod K.
Mishra, “Understanding Color Models: A Review”, ARPN Journal of
statistical characteristics. Therefore, statistical characteristics Science and Technology, 2011-2012.
analysis can be helpful for textual information invisible feature [9] P. Joshi, D.M. Escriva, V. Godoy, “OpenCV By Example“, Packt
finding and receipt text detection. Publishing, 2016.
The presented paper investigation shows that the new [10] Albregtsen F. Statistical Texture Measures Computed from Gray Level
Coocurrence Matrices. Oslo; 2008.
information technologies, like Tesseract and TensorFlow, can
be helpful for receipt analysis. These technologies can [11] A. Suresh, K. L. Shunmuganathan, “Image Texture Classification using
Gray Level Co-Occurrence Matrix Based Statistical Features“, European
accelerate the development of automatic receipt analysis Journal of Scientific Research, 2012.
system. Methods and algorisms proposed in this work allow to [12] G. Thibault, J. Angulo, and F. Meyer, “Advanced statistical matrices for
make receipt analysis system adaptable for real-life texture characterization: Application to dna chromatin and microtubule
environment and helped to save customer's time. network classification“. In IEEE International Conference on Image
Processing(ICIP), 2011.
[13] D. F. Rogers and J. A. Adams, “Matematical elememnts for computer
REFERENCES graphics”, 2001.
[1] P. Perona and J. Malik, “Scale-space and edge detection using [14] S. F. Rashid, “Optical Character Recognition - A Combined ANN/HMM
anisotropic diffusion“. IEEE Trans. Pattern Anal. Mach. Intell., 12:629– Approach”, Technical University of Kaiserslautern, 2014.
639, 1990.
[15] C. Misra, P.K Swain, J.K Mantri, “Text Extraction and Recognition
[2] J. Weickert, “Anisotropic Diffusion in Image Processing“. Teubner, from Image using Neural Network“, International Journal of Computer
Stuttgart, 1998. Applications, 2012.
[3] Yeong-Taeg Kim, “Contrast enhancement using brightness preserving
bi-histogram equalization“, Published in: IEEE Transactions on
Consumer Electronics, 1997.

You might also like