Application of Image Recognition and Machine Learning Technologies For Payment Data Processing
Application of Image Recognition and Machine Learning Technologies For Payment Data Processing
Abstract— The automatic receipt analysis problem is very unnecessary information. Therefore, receipt analysis procedure
relevant due to high cost of manual document processing. will be a faster, more precise and simpler process.
Therefore, the presented paper investigates the problems of
receipt image analysis. It describes approaches for receipt image After receipt image pre-processing, it is possible to detect
pre-processing, receipt text detection, receipt text recognition and the area containing receipt text. The receipt text detection
receipt text analysis. These approaches allow to make receipt procedure tries to find text margins. Receipt text detection is a
analysis system adaptable for a real-life environment and to very complicated task. Text styles, sizes and different language
convert the input information to a usable format for analysing texts could cause a lot of problems. To solve this problem, it is
information in the receipts. A pipeline for payment data necessary to find text margin unique features and receipt text
processing staring with image capture to payment data posting is features have to be analyzed. We can use different texture
defined and appropriate technologies for every stage of the analysis types. It can be spatial frequencies analysis, structural
process are proposed. Advantages and limitations of these elements analysis, statistical characteristics analysis, etc.
technologies are reviewed and open research challenges are Spatial frequencies analysis can be helpful to high frequencies
identified. The payment data processing is analyzed as an enabler detection. High frequencies detection is very helpful text
of digital transformation of expense reporting processes. feature. Structural elements analysis may be useful for
bounding box detection. Statistical characteristics analysis get
Keywords — computer vision, image pre-processing, receipt
additional information about receipt texture. There are many
analysis, text recognition, natural language processing, document
classification.
statistical characteristics: contrast, dissimilarity, homogeneity,
energy, entropy, etc. For statistical characteristics calculation,
special matrix can be used. After detection of unique features
I. INTRODUCTION of text margins, we can analyse a receipt image and find
The receipt analysis problem is very relevant due to high separate text margins. As a result, receipt text detection
cost of manual document processing. To address this problem, methods reduce unnecessary information. Therefore, further
there is a number of processing stages to be considered. There analysis procedure will be a faster, more precise and simpler.
are four important stages: receipt image pre-processing, After receipt text detection, we have separate text margins -
detection of image area containing receipt text, receipt text bounding boxes. Accordingly, we can analyse each text margin
recognition and receipt text analysis. (bounding box) separately. As need arises, it is possible to use
Receipt image pre-processing is a necessary stage. This image pre-processing for each bounding box. It can be useful
stage allows to decrease real-life environment and noise for bounding boxes with different text sizes and styles. It
influence on input information. At this stage, it possible to allows to normalize textual information and prepare this
solve a brightness level problem and a contrast problem. The information for further analysis.
image pre-processing methods allow to make contrast The next task is the receipt text recognition. This task is to
stretching and brightness level measurement. Image brightness, recognize all characters in bounding box. For this purpose,
contrast, noise and colour correction works more precisely, if specially trained artificial neural network may be built. Given
we have additional information about receipt type. This input information, artificial neural network can detect
information about receipt is useful for the noise reduction and characters. Unfortunately, artificial neural networks usually
receipt text normalization. Receipt text normalization can be require a large amount of training data. It is not always
useful when the arrangement of a receipt text is not quite possible, to collect these training data. In this case, it is possible
straight. As a result, image pre-processing methods reduce to use other text recognition technologies. Text recognition
• detection of unique textual information features; The popular binarization method is Otsu thresholding [5,6].
This method allows to convert greyscale image into
• receipt text detection; monochrome. Otsu method finds the binarization threshold,
that allows to reduce inter-class dispersion. This method
• text image rotation;
divides the receipt image into two pixel regions (into two
• receipt text recognition; class). Otsu thresholding provides good results in many cases.
It's when the brightness level gets heterogeneous, that problems
• receipt text analysis. occur. To meet these challenges, it is possible to use two
binarization methods.
III. MATERIALS AND METHODS
1) Local thresholding method using moving averages
The proposed approach consists of 7 different parts - 4 This method [7] calculates the mean value of intensity at
basic parts and 3 additional parts Fig. 1. each step. The mean intensity calculation is shown in the (1):
1 k
m(k ) = ¦ zi
n i = k +1− n
(1)
Tk = b * m(k ) (2)
Fig. 1. Proposed approach flowchart
where b - is the constant;
m - the mean intensity from (1).
As a result, is calculating the threshold for each pixel. V = Max( R, G , B )
2) Local thresholding method using region brightness
255 *V − min( R, G , B ) (5)
level S=®
At the beginning proposed method calculate region ¯ V
brightness levels Fig. 2.
where B - blue, R - red and G - green are colour coordinates
of image pixels (range is from 0 to 255);
S- saturation (range is from 0 to 255);
V- value (range is from 0 to 255);
*comment - if V = 0 then S = 0.
It is accepted that receipt has only two colours: black and
white. In this case, it is possible to define black and white
colour range as follows:
• the white colour saturation range is from 0 to 60;
• the white colour value range is from the mean intensity
(1,3) to 255;
• the black colour value from 0 to the mean intensity;
Fig. 2. Average brightness level of region
• the black colour saturation range is from 0 to 255
The average brightness level of region can be calculated as minus the mean intensity which is divided by 2 (6).
follows (3):
V (6)
Max _ Saturation = 255 −
x 0 + n −1 y 0 + n −1 3
¦ ¦ V ( x, y )
x= x0 y= y0
V ( xr , y r ) = 2
(3) where V - the mean intensity from (1) or (3).
n
The colour corrections method allows to improve the
where V - is pixel intensity; binarization result.
x and y - pixel coordinates;
C. Morphological Image Processing
xr and yr - region coordinates; The morphological methods[9] use invert monochrome
n - region size; image. After binarization, characters may have gaps of contour.
To meet this challenge, it is possible to use morphological
x0 and y0 - region left top pixel coordinates. dilation Fig. 3.
In the next step, the binarization threshold can be calculated
as follows (4):
Txy = b * V ( x / n, y / n) (4)
A. Binarization
Cos (θ ) Sin(θ ) º
[x, y ]* ª« »=
The point of this experiment is to compare 3 binarization
methods:
¬ Sin(θ ) Cos (θ ) ¼
−
(7)
= [x * Cos (θ ) + − y * Sin(θ ) x * Sin(θ ) + y * Cos (θ ) ] = • Otsu thresholding;
= [ xnew, ynew] • local thresholding method using moving averages;
where x and y - old coordinates of pixel; • local thresholding method using region brightness
level.
xnew and ynew - new coordinates of pixel;
The methods were estimated by the following criterions:
ș – rotation angle can be calculated as follows (8):
• number of images with very bad receipt content
visibility (VBV) - more than 30 percent of receipt
§ Y1 − Y 2 · characters are invisible;
θ = arctg ¨ − ¸ (8)
© X1 − X 2 ¹ • number of images with bad receipt content visibility
(BV) - more than 10 percent of receipt characters are
where X1, Y1 and X2, Y2, are coordinates of point1 and invisible;
point2 (Fig. 4).
• number of images with middle receipt content
visibility (MV) - receipt text is visible, but it is not
possible to clearly see all characters;
• number of images with good receipt content visibility
(GV).
In this experiment were processed 68 receipt images. These
provide important estimation results which are shown in Fig. 2
and Fig. 5. The Fig. 6 shows estimation results of Otsu
binarization. There are many images with bad (BV) and very
bad (VBV) visibility.
Feature
Feature is greater for Confidence
number/s
Large Zone
55 - LZLGE Low Gray
Background 0.987
(GLZM) level
Emphasis
The
61 - BARYS
barycenter on Background 0.980
(GLZM)
sizes
14 - GLCM
Dissimilarity Text 0.980
(VERTICAL)
Zone
59 - ZPC Percentage.
Text 0.980
Fig. 6. Local thresholding estimation (GLZM) Equality of
iso-sizes