0% found this document useful (0 votes)
10 views5 pages

Libcrn, An Open-Source Document Image Processing Library

Uploaded by

bob wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views5 pages

Libcrn, An Open-Source Document Image Processing Library

Uploaded by

bob wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2016 15th International Conference on Frontiers in Handwriting Recognition

libcrn, an Open-Source Document Image Processing Library

Yann LEYDIER, Jean DUONG Stéphane BRÈS, Véronique ÉGLIN, Frank LeBOURGEOIS, Martial TOLA
CoReNum Université de Lyon, CNRS
F-69006, France INSA-Lyon, LIRIS, UMR5205,
[email protected], [email protected] F-69621, France
[email protected]

Abstract—In this paper we introduce libcrn, a multiplatform We introduce libcrn3 , licensed in LGPL (a non-
open-source document image processing library aimed at contaminating Open Source license). Its aim is to allow both
researchers and companies. It is written in C++11 and has researchers and engineers to implement document image
a non-contaminating license that makes it available for use in
any project without legal constraints. processing chains and algorithms. libcrn is available for
The features include low-level image processing (color Windows (Visual C++ 2015), Linux, MacOS and Android.
format conversion, binarization, convolution, PDE. . . ), docu- It is written in C++11 using the “modern” guidelines issued
ment images specific tools (connected components extraction, by the C++ committee so that users can easily and safely
recursive block description, PDF export. . . ), maths (matrix use it (e.g.: no memory management is required from the
arithmetics, linear algebra, GMMs, equation solvers. . . ), clas-
sification and clustering (kNN, k-means, HMMs. . . ). users and no leak can happen). We implemented many
The API is comprehensively documented and libcrn’s archi- image processing algorithms but also the mathematical tools
tecture follows modern C++ guidelines to facilitate the handling needed to process the data that can be extracted from images.
of the library and enforce its safe usage.
A sample OCR, which is only 30 lines long, is described to II. F EATURES
illustrate libcrn’s scope of possibilities.
A. General points
Keywords-document image processing; open source; library;
toolbox In order to facilitate the storage of data, most of the
objects in libcrn can be serialized in XML files. Multi-
I. I NTRODUCTION platform utilities are packaged so that no overwork weights
on the user to make applications run on any OS (e.g.:
Implementing the most basic document image processing automatic file path format conversion, file manipulation,
algorithm may be a good exercise for students. However, character set conversion, dynamically loaded modules. . . ).
when focusing on high-level processing chains or complex
methods, researchers as well as manufacturers rely on tool- B. Image
boxes and software libraries for low-level tasks.
1) Formats: We provide built-in support for numerous
Commercial software libraries are available (Intel IPP,
pixel types (see tab. I). Any other type of pixel format (such
Lead Tools. . . ) and used in industry. They generally offer
as matrices!) is supported as long as it implements the basic
the basic tools needed to perform simple image manipulation
arithmetic operators.
and are well suited for non-specialist engineers. Specialists
often prefer Open Source libraries as it is possible to check Category Subcategory Types
the details of the algorithms and modify and fine tune them Color RGB-based RGB, HSV, YUV (television)
when needed. Perception-based XYZ, L*a*b*, L*u*v*
The most widely used image processing library is Monochromatic Grayscale double, int, byte
Binary bool
OpenCV1 . Whereas is contains a great amount of algorithms, Other 2D vectors Cartesian and polar coords
it is not originally meant for document images and lacks Angles radian, degree and byte
features and services that make it inconvenient. Custom any type with arithmetic ops
Qgar2 [1] is an Open Source document image processing Table I
library created in the early 2000s. It features the most P IXEL TYPES SUPPORTED BY libcrn.
elementary tools to create document analysis software but
also lacks some crucial features such as RGB images.
Although Qgar can be easily extended, its development has All the color types are trivially convertible and multiple
been stalled since 2008. binarization methods are offered: Niblack, Sauvola [2], local
min or max, Fisher, entropy and Otsu [3].
1 http://opencv.org/
2 http://www.qgar.org/ 3 https://github.com/Liris-Pleiad/libcrn

2167-6445/16 $31.00 © 2016 IEEE 211


DOI 10.1109/ICFHR.2016.45
2) Algorithms: Classical basic transformation algorithms
are provided, such as rotate by shear, mathematical morphol-
ogy, convolution, distance transform. . .
Feature extractors can be combined to allow the com-
parison of shapes. Sample feature extractors are provided Noisy image Angles κ1 κ2
including profile projections and gradients histograms (see
fig. 1).

Divergence Hessian corner κ1 Lvv


Figure 3. Sample of differential features (σ = 1.5 on the first line,
σ = 4.5 on the second).

C. Document image processing


Figure 1. a,b) Angle histograms computed inside zones (rectangular or
radial). c) 9 angle histograms computed with all pixels, weighted by their libcrn provides a way to describe a document’s layout
distance to automatically centered anchors. structure with nested lists of rectangular blocks. Each block
refers to a part of the original image that is instantiated only
Shapes can also be compared directly with an FFT-based when needed. This structure is of course serializable.
cross-correlation or gradient matching. The gradient match- The fast connected components extraction [5] directly
ing is a double inclusion measure. A mask of significant feeds the aforementioned block structure.
gradients is automatically computed and the angles inside Simple line and word segmentation algorithms are im-
a shape’s mask are compared to the angles on the other plemented and more can be easily added to feed the block
shape. If the location of the pixel in the other shape is out layout structure.
of its dilated mask, then a penalty is applied. The sum of Other useful algorithms specific to document images such
both inclusion measures is the dissimilarity between the two as stroke thickness estimation, text line height estimation and
shapes (see fig. 2). skew estimation are offered.

Images A⊂B B⊂A D. Maths


Mask Angles Mask Angles 1) Algebra: libcrn’s math toolbox includes matrices of
integers, reals and complex numbers and provide all arith-
metics. Two system solvers (Cramer and Gauss-Jordan) are
included.
A Fundamental linear algebra methods such as diagonaliza-
tion and inversion are implemented. Effective computation
procedures will be added4 for tensor computation [6], [7]:
Higher order structure tensors [8] are proposed as extensions
B of classical structure tensor used in image processing [9].
Figure 2. Two-ways inclusion measurement of two shapes. Since they are tricky to compare (e.g. for clustering, feature
extraction, etc.), their glyphs are to be used for analysis [10].
3) PDE: Gradients can be obtained from both grayscale Thus, Zernike moments have been implemented and should
and color [4] images by different means (half derivatives, be soon added to image analysis toolbox.
Gaussian kernel. . . ) and can be isotropically diffused. From 2) Data analysis: Multidimensional data may be pro-
the gradients, a large number of transforms and descriptors cessed using Principal Component Analysis (PCA). Further-
are computed (see fig. 3): more, Gaussian mixture models are also supported for both
uni and multivariate samples. In particular, the Expectation-
• first and second derivatives along Cartesian axes, Maximization fitting procedure (EM) is implemented.
isophotes and flowlines; 3) Interpolation and regression: The math module con-
• edge and corner; tains tools for linear interpolation, cubic splines interpolation
• Hessian eigenvalues κ1 and κ2 , Hessian corner; and polynomial regression.
• flowline, isophote, Gaussian and gradient curvature;
• divergence, Laplacian. 4 Code is already available on demand. Full integration will be done for
the next release of libcrn

212
4) Geometry: libcrn provides angle arithmetics and tools III. 30 LINES FOR AN OCR
such as circular mean and variance, but also circular his- To illustrate libcrn’s ease of use, we present a very simple
tograms utilities: circular earth mover’s distance [11], kur- OCR engine that is only 30 lines long. It works on a
tosis, trigonometric moments. . . medieval manuscript excerpt written in capital letters with
5) Signal processing: FFT can be applied on complex no spacing between words (see fig. 5). An occurrence of
matrices and vectors. each letter in the alphabet was manually extracted and put
E. Pattern analysis in a folder named data, where each file name corresponds
to the image’s label. The source code is displayed in fig. 6.
1) Clustering: libcrn provides clustering algorithms for
both vectorial and metric data:
• k-means
• k-medoids (PAM and fast [12])
• Outliers: LOF and LoOP [13]
• Spectral Clustering (all formulas from [14]–[17])
• Affinity Propagation [18]
2) Classification: Many data classification problems can
be addressed, using a wide variety of tools from a highly
generic kNN implementation to discrete or Gaussian semi-
continuous HMMs.
3) Other: Other “combinatorics oriented” algorithms are
also available in libcrn, such as Kuhn and Munkres’ “Hun-
garian” bipartition algorithm, the A* path finder or the
disjoint set forest distribution.
F. GUI
Figure 5. Sample medieval manuscript.
Custom widgets are provided to create applications with
Gtkmm – Gtk’s C++ wrapper – versions 2 and 3. This
The first step (fig. 6.1) is to create a feature extractor
include displaying and browsing the block structure of an
that will be used to compare each character to the database.
image, image overlays and the automated generation of
To do that, we use a FeatureSet, which can contain
configuration panels. A Qt widget library will be available
multiple elementary feature extractors. The FeatureSet
soon.
will concatenate the feature vectors extracted by each
A demonstration application is included in libcrn. It
FeatureExtractor. For simplicity, we use the four
allows to quickly test some features of the library on
profile projections and the horizontal and vertical black
any image (see fig. 4). This tool is very handy to run
pixels projections.
simple algorithms over a given image without writing a new
In step 2, we open each pre-labelled character image. As
program.
it is often impossible to know whether an image file contains
RGB, grayscale or binary pixels, we directly store the
image in a Block object. A Block automatically converts
the input image to the format desired by the user (the
default grayscaling and binarization methods can be changed
programmatically at anytime). It also contains named lists of
sub-blocks that will be used later. The extracted features are
stored in a list of shared pointers5 to the base class Object.
The line segmentation is performed in 6.3. The document
image is opened in the same way as the database images.
We create a temporary BlockTreeExtractor that will
extract the text lines and store them in a sub-block list named
Lines.
Just before we actually extract the characters (fig. 6.4),
we get an estimation of the mean stroke width. This will
help us filter the noise. Connected components are extracted
Figure 4. Titus, libcrn’s quick testing tool. 5 Shared pointers are memory management objects that deletes pointers
when they are not referenced anymore.

213
within each line. After that, each Block in the Lines sub- [2] J. Sauvola, T. Seppänen, S. Haapakoski, and M. Pietikäinen,
block list will contain a sub-block list named Characters. “Adaptive document binarization,” in International Confer-
The recursive sub-block lists can be used for more complex ence on Document Analysis and Recognition (ICDAR), vol. 1,
Ulm, Germany, 1997, pp. 147–152.
purposes and can even match an XML Alto’s structure.
Finally, connected components smaller than the mean stroke [3] N. Otsu, “A threshold selection method from gray-level
width are removed and the remaining ones are sorted from histograms,” Automatica, vol. 11, no. 285-296, pp. 23–27,
left to right. 1975.
The 5th and last step is the actual recognition. Each
[4] F. LeBourgeois, “Content based image retrieval using gradient
Block remaining in the Characters sub-sub-list rep- color fields,” in International Conference on Pattern Recog-
resents a letter in the text. Its feature vector is extracted nition (ICPR), Barcelona, Spain, 2000, pp. 1027–1030.
using the same FeatureSet as the database. We search
its nearest neighbor in the database a retrieve a class number [5] L. He, Y. Chao, K. Suzuki, and K. Wu, “Fast connected-
that can be used to compute the character’s transcription. component labeling,” Pattern Recognition, vol. 42, no. 9,
pp. 1977 – 1987, 2009. [Online]. Available: http://www.
Now that our homemade OCR is fully described, we sciencedirect.com/science/article/pii/S0031320308004573
shall not discuss its performance: profile projections are
not known to be the best features for medieval manuscript [6] E. Nelson, Tensor analysis. Princeton University Press, 1967.
recognition! The purpose here was to illustrate the easiness
of designing applications with libcrn. Variety of document [7] J. G. Simmonds, A brief on tensor analysis. Springer-Verlag,
1994.
analysis problems can be addressed, not restricted to ancient
scripts: Printed pages or manuscripts may be considered. [8] T. Schultz, J. Weickert, and H.-P. Seidel, “A higher-order
Historic or business documents may be processed. Even structure tensor,” July 2007.
more borderline tasks are feasible (e.g.: text extraction from
scenes, plate recognition, mobile applications, etc.). [9] S. D. Zenzo, “A note on the gradient of a multi-image,”
Computer Vision, Graphics, and Image Processing, vol. 33,
pp. 116–125, 1986.
IV. C ONCLUSION
[10] T. schultz and G. Kindlmann, “A maximum enhencing higher-
In this paper we introduced libcrn, a multiplatform (non- order tensor glyph,” in Eurographics/IEEE-VGTC Symposium
contaminating) open-source document image processing li- on Visualization, vol. 29, no. 3, 2010.
brary written in C++11 and aimed at researchers and compa-
nies available for Windows (Visual C++ 2015), Linux, Ma- [11] J. Rabin, J. Delon, and Y. Gou, “Circular earth mover’s
cOS and Android. Its API is comprehensively documented distance for the comparison of local features,” in Pattern
Recognition, 2008. ICPR 2008. 19th International Conference
and libcrn’s architecture follows modern C++ guidelines to on. IEEE, 2008, pp. 1–4.
facilitate the handling of the library and enforce its safe
usage (e.g.: no memory management is required from the [12] H.-S. Park and C.-H. Jun, “A simple and fast algorithm
users and no leak can happen). for k-medoids clustering,” Expert Systems with Applications,
libcrn includes low-level image processing (expandable vol. 36, no. 2, Part 2, pp. 3336 – 3341, 2009.
[Online]. Available: http://www.sciencedirect.com/science/
pixel formats, binarization, PDE. . . ), document images spe- article/pii/S095741740800081X
cific tools (connected components extraction, recursive block
description. . . ), maths helpers (algebra, data analysis, geom- [13] H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek,
etry, signal processing. . . ) and pattern analysis algorithms “Loop: Local outlier probabilities,” in Proceedings of the
(classification, clustering. . . ). 18th ACM Conference on Information and Knowledge
Management, ser. CIKM ’09. New York, NY, USA:
We described a short code example that provides OCR ACM, 2009, pp. 1649–1652. [Online]. Available: http:
capacities in only 30 lines. This illustrated the use of feature //doi.acm.org/10.1145/1645953.1646195
extractors, segmentation providers and classification tools.
Many other applications may be designed for research or [14] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering:
industrial needs. libcrn has been useful in several projects Analysis and an algorithm,” in ADVANCES IN NEURAL
INFORMATION PROCESSING SYSTEMS. MIT Press, 2001,
and remains constantly improving. pp. 849–856.

R EFERENCES [15] M. Meila and J. Shi, “A random walks view of spectral


segmentation,” 2001.
[1] K. Tombre, C. Ah-Soon, P. Dosch, A. Habed, and G. Masini,
“Stable, robust and off-the-shelf methods for graphics recog- [16] J. Shi and J. Malik, “Normalized cuts and image
nition,” in Pattern Recognition, 1998. Proceedings. Four- segmentation,” IEEE Trans. Pattern Anal. Mach. Intell.,
teenth International Conference on, vol. 1. IEEE, 1998, vol. 22, no. 8, pp. 888–905, Aug. 2000. [Online]. Available:
pp. 406–408. http://dx.doi.org/10.1109/34.868688

214
/ / 1. Feature e x t r a c t o r
auto f e a t u r e E x t r a c t o r = crn : : F e a t u r e S e t {};
f e a t u r e E x t r a c t o r . PushBack ( s t d : : m a k e s h a r e d <c r n : : F e a t u r e E x t r a c t o r P r o f i l e >
( c r n : : D i r e c t i o n : : LEFT | c r n : : D i r e c t i o n : : RIGHT | c r n : : D i r e c t i o n : : TOP |
c r n : : D i r e c t i o n : : BOTTOM, 1 0 , 1 0 0 ) ) ;
f e a t u r e E x t r a c t o r . PushBack ( s t d : : m a k e s h a r e d <c r n : : F e a t u r e E x t r a c t o r P r o j e c t i o n >
( c r n : : O r i e n t a t i o n : : HORIZONTAL | c r n : : O r i e n t a t i o n : : VERTICAL , 1 0 , 1 0 0 ) ) ;
/ / 2 . Database c r e a t i o n
a u t o d a t a b a s e = s t d : : v e c t o r <c r n : : S O b j e c t >{};
f o r ( a u t o c = ’A ’ ; c <= ’Z ’ ; ++ c )
{
c o n s t a u t o c h a r F i l e N a m e = ” d a t a ” p / c + ” . png ” p ;
a u t o c h a r b l o c k = c r n : : B l o c k : : New ( c r n : : NewImageFromFile ( c h a r F i l e N a m e ) ) ;
d a t a b a s e . push back ( f e a t u r e E x t r a c t o r . E x t r a c t (∗ c h a r b l o c k ) ) ;
}
/ / 3. Line segmentation
a u t o p a g e b l o c k = c r n : : B l o c k : : New ( c r n : : NewImageFromFile ( i m a g e F i l e N a m e ) ) ;
c r n : : B l o c k T r e e E x t r a c t o r T e x t L i n e s F r o m P r o j e c t i o n {U” L i n e s ” } . E x t r a c t ( ∗ p a g e b l o c k ) ;
/ / 4. Character segmentation
c o n s t a u t o sw = c r n : : S t r o k e s W i d t h ( ∗ p a g e b l o c k −>GetGray ( ) ) ;
auto s = crn : : S t r i n g {};
f o r ( a u t o n l i n e = s i z e t { 0 } ; n l i n e < p a g e b l o c k −>G e t N b C h i l d r e n (U” L i n e s ” ) ; ++ n l i n e )
{
a u t o l i n e = p a g e b l o c k −>G e t C h i l d (U” L i n e s ” , n l i n e ) ;
l i n e −>E x t r a c t C C (U” C h a r a c t e r s ” ) ;
l i n e −>F i l t e r M i n O r (U” C h a r a c t e r s ” , sw , sw ) ;
l i n e −>S o r t T r e e (U” C h a r a c t e r s ” , c r n : : D i r e c t i o n : : LEFT ) ;
f o r ( a u t o n c h a r = s i z e t { 0 } ; n c h a r < l i n e −>G e t N b C h i l d r e n (U” C h a r a c t e r s ” ) ;
++ n c h a r )
{
a u t o c h a r a c t e r = l i n e −>G e t C h i l d (U” C h a r a c t e r s ” , n c h a r ) ;
/ / 5. Recognition
auto f e a t u r e s = f e a t u r e E x t r a c t o r . E x t r a c t (∗ c h a r a c t e r ) ;
auto r e s = crn : : B a s i c C l a s s i f y : : N e a r e s t N e i g h b o r ( f e a t u r e s ,
d a t a b a s e . b e g i n ( ) , d a t a b a s e . end ( ) ) ;
s += c h a r 3 2 t (U ’A ’ + r e s . c l a s s i d ) ;
}
s += U ’ \ n ’ ;
}
CRNVerbose ( s ) ; / / d i s p l a y t h e r e s u l t

Figure 6. Minimalistic code for an OCR

[17] L. Zelnik-Manor and P. Perona, “Self-tuning spectral cluster-


ing,” in Advances in neural information processing systems,
2004, pp. 1601–1608.

[18] B. J. Frey and D. Dueck, “Clustering by passing messages


between data points,” science, vol. 315, no. 5814, pp. 972–
976, 2007.

215

You might also like