An Overview of Advances of Pattern Recognition Systems in Computer Vision
An Overview of Advances of Pattern Recognition Systems in Computer Vision
1. Introduction
First of all, let's give a tentative answer to the following question: what is pattern
recognition (PR)? Among all the possible existing answers, that which we consider being
the best adapted to the situation and to the concern of this chapter is: "pattern recognition is
the scientific discipline of machine learning (or artificial intelligence) that aims at classifying
data (patterns) into a number of categories or classes". But what is a pattern?
In 1985, Satoshi Watanabe (Watanabe, 1985) defined a pattern as "the opposite of chaos; it is
an entity, vaguely defined, that could be given a name." In other words, a pattern can be any
entity of interest which one needs to recognise and/or identify: it is so worthy that one
would like to know its name (its identity). Examples of patterns are: a pixel in an image, a
2D or 3D shape, a typewritten or handwritten character, the gait of an individual, a gesture,
a fingerprint, a footprint, a human face, the voice of an individual, a speech signal, ECG
time series, a building, a shape of an animal.
A pattern recognition system (PRS) is an automatic system that aims at classifying the input
pattern into a specific class. It proceeds into two successive tasks: (1) the analysis (or
description) that extracts the characteristics from the pattern being studied and (2) the
classification (or recognition) that enables us to recognise an object (or a pattern) by using
some characteristics derived from the first task.
The classification scheme is usually based on the availability of the training set that is a set
of patterns already having been classified. This learning strategy is termed as supervised
learning in opposition to the unsupervised learning. A learning strategy is said to be
unsupervised if for the system is not given an a priori information about classes; it
establishes the classes itself based on the regularities of the features. Features are those
measurements which are extracted from a pattern to represent it in the features space. In
other words, pattern analysis enables us to use some features to describe and represent it
instead of using the pattern itself. Also called characteristics, attributes or signatures the
recognition efficiency and reliability are dependent on their choice.
Pattern recognition constitutes an important tool in various application domains, but
unfortunately, that is not always an easy task to carry out. Commonly, one can encounter
four major methodologies in PRSs; which are: statistical approach, syntactic approach,
Source: Vision Systems: Segmentation and Pattern Recognition, ISBN 987-3-902613-05-9,
Edited by: Goro Obinata and Ashish Dutta, pp.546, I-Tech, Vienna, Austria, June 2007
170
template matching, neural networks. In this chapter, our remarks and details will be
directed, mainly, towards systems based on the statistical approach since it is the more
commonly used in practice.
1.1 Statistical approach
Typically, statistical PRSs are based on statistics and probabilities. In these systems, features
are converted to numbers which are placed into a vector to represent the pattern. This
approach is most intensively used in practice because it is the simplest to handle.
In this approach, patterns to be classified are represented by a set of features defining a
specific multidimensional vector: by doing so, each pattern is represented by a point in the
multidimensional features space. To compare patterns, this approach uses measures by
observing distances between points in this statistical space. For more details and deeper
considerations on this approach, one can refer to (Jain, 2000) that presents a review of
statistical pattern recognition approaches.
1.2 Syntactic approach
Also called structural PRSs, these systems are based on the relation between features. In this
approach, patterns are represented by structures which can take into account more complex
relations between features than numerical feature vectors used in statistical PRSs
(Venguerov & Cunningham, 1998). Patterns are described in hierarchical structure
composed of sub-structures composed themselves of smaller sub-structures.
As explained in (Sonka et al., 1993), the shape is represented with a set of predefined
primitives called the codebook and the primitives are called codewords. For example, given
the codewords on the left of figure 1, the shape on the right of the figure can be represented
as the following string S, when starting from the pointed codeword on the figure:
S=dbabcbabdbabcbab
(1)
The system parses the set of extracted features using a kind of predefined grammar. If the
whole features extracted from a pattern can be parsed to the grammar then the system has
recognised the pattern. Unfortunately, grammar-based syntactic pattern recognition is
generally very difficult to handle.
a
a
b
b
d
b
b
c
d
c
b
c
d
a
b
Starting codeword
171
template (or model). In visual pattern recognition, one compares the template function to
the input image by maximising the spatial cross-correlation or by minimising a distance:
that provides the matching rate.
The strategy of this approach is: for each possible position (in the image), each possible
rotation, or each other geometric transformation of the template, compare each pixels
neighbourhood to this template. After computing the matching rate for each possibility,
select the largest one, that exceeds a predefined threshold. It is a very expensive operation
while dealing with big templates and/or large sets of images (Brunelli & Poggio, 1997 ;
Roberts & Everson, 2001 ; Cole et al., 2004). Figures 2 illustrate a pattern recognition based
on the template matching approach. Figure 2.a is the input image I, Fig.1.b represents two
templates (K representing letter 'K' and P letter 'P'). Figures 2.c and 2.d represent,
respectively, the normalized cross-correlation of I with K and the normalized crosscorrelation of I with P. On these two images, the cross-correlation peaks surrounded by a
circle indicate the location of the most matching letter in the input image. On figure 2.e, we
have superposed the templates on the input image, accordingly to the coordinates of
corresponding correlation peaks. For this study, we didn't take the rotation and the scaling
into account: from the result, it clearly appears that this approach retrieves only the shape
that matches perfectly the model (size and rotation). This explains why only one 'K' (the
rotated one) and only one 'P' (the down-scaled one) are recognised.
b)
a)
c)
d)
e)
172
in the information being passed to the following neurone. Figure 3 shows a simple neural
network representing the Perceptron as defined by Frank Rosenblatt in 1957. On this
example, the output Outj (j=1 or 2) is defined by a weighted combination of the inputs. In
the reference (Abdi, 1994), the author presents a nice introduction to ANNs.
Besides these approaches, one can encounter other methodologies like those based on fuzzyset theoretic, genetic algorithms. In some applications, hybrid methodologies combine
different aspects of these approaches to design more complex PRSs. In (Liu et al., 2006), the
authors present an overview of pattern recognition approaches and the classification of their
associated applications.
In1
w12
w11
Out1
w21
In2
w22
w31
Out2
w32
In3
Input
layer
Output
layer
173
improve discrimination are retained and the others are discarded. During this processing,
higher level features can be derived by combining and/or transforming low level features,
e.g. by applying the so called independent component analysis (ICA) (Roberts & Everson,
2001): this operation thus leads to the reduction of the dimension of the feature space.
These features must be as discriminative as possible to reduce false alarms due to
misclassification during the second task. Efficient features must also present some essential
properties such as:
translation invariance: whichever be the location of the pattern, it must give exactly the
same features,
rotation invariance: extracted features must not vary with the rotation of the pattern,
scale invariance: scale changing must not affect the extracted features,
noise resistance: features must be as robust as possible against noise i.e. they must be
the same whichever be the strength of the noise that affects the pattern,
compact. The number of retained features is not too large. It must also be fast in
extraction time and in matching,
reliable: as long as one deals with the same pattern, the extracted features must remain
the same.
Image
sensor
Analysis/Description
Features
extraction
Models
Features
database
Features
selection
Off-line
Learning
Images
database
Similarity measure
(matching)
Interpretation
Classification/Recognition
174
& Hagedoorn, 2001 ; Zhang, 2002) or city block distance and Euclidian distance that are
particular Minkowski distances. The following paragraph illustrates formalism of some of
them.
Let VA(a1, a2,, aN) and VB(b1, b2,, bN) be the features vectors representing patterns A and
B in an N-dimensional features space ; examples of distances are defined by the following
expressions.
City block distance (d1)
N
d1 (VA , VB ) = a i b i
(2)
i =1
d 2 ( VA , VB ) =
(a i b i )2
(3)
i =1
d 2 (VA , VB ) = 1 cos() = 1
a i bi
VA VBT
VA VB
=1
(4)
i =1
i =1
i =1
(a i )2 (b i )2
3.50
4.50
7.00
U=(4.5, 6.0)T
V
V=(6.0, 8.0)T
W=(5.0, 2.0)T
d2
d3
2.50
4.03
0.15
6.08
0.15
V
0
Fig. 5. Examples of similarity measures between two vectors depending on the chosen metric
175
176
Chaumette (Chaumette, 1994), has addressed the problem and proposed some solutions in a
closed loop system based on vision-based task. In (Chaumette, 2004), he proposes various
visual features based on the image moments to characterise planar objects in YLVXDOVHUYRLQJ
VFKHPHV
3.2 Pattern recognition in biometrics
The biometric authentication takes increasing place in various applications ranging from
personal applications like access control to governmental applications like biometric
passport and fight against terrorism. In this applications domain, one measures and
analyses human physical (or physiological or biometric) and behavioural characteristics for
authentication (or recognition) purposes. Examples of biometric characteristics include
fingerprints, eye retinas and irises, facial patterns and hand geometry measurement, DNA
(Deoxyribonucleic acid). Examples of biometric behavioural characteristics include
signature, gait and typing patterns. This helps to identify individual people in forensics
applications.
Reference (Jain et al., 2004a) is an interesting starting point to pattern recognition
approaches and systems in biometrics. This paper gives a brief overview of the field of
biometrics and summarizes some of its advantages, disadvantages, strengths, limitations,
and related privacy concerns. In (Jain et al., 2004b), the authors also address the problem of
the accuracy of the authentication and that of the individual's right to the security, to the
privacy and to the anonymity.
The reader is encouraged to have a look on the article presented in (Jain & Pankanti, 2006).
The authors of this article address a problem of identity steeling through a true story and
then they present some current or forthcoming systems based on biometric PRSs that will
help prevent identity steeling.
3.3 Content-based image retrieval
Content-based image retrieval systems aim at automatically describing images by using
their own content: the colour, the texture and the shape or their combination. As explained
in (Sikora, 2001; Bober, 2001), image retrieval has became an active research and
development domain since the early 1970s. During the last decade the research on image
retrieval became of high importance. The most frequent and common means for image
retrieval is to index them with text keywords. If this technique seems to be simple, it
becomes rapidly laborious and fastidious while facing large volumes of images. On the
other hand, images are rich in content so, to overcome difficulties due to the huge data
volume, the content-based image retrieval emerged as a promising mean for retrieving
images and browsing large images databases
With the simultaneous rapid growth of computer systems and the growing huge availability
of digital data, such pattern recognition systems become increasingly necessary to help
browse databases and find the desired information within a reasonable time limit.
Accordingly to this observation, systems like CBIR (Content-Based Image Retrieval), QBIC
(Query By Image Content), QBE (Query By Example) need more attention and take more
and more place in the concerns of the researchers (Mokhtarian et al., 1996 ; Trimeche et al.,
2000 ; Veltkamp & Tanase, 2001 ; Veltkamp & Hagedoorn, 2001). With query by example,
the user supplies a query image and the PRS finds images of the database that are most
177
similar to it based on various low-level features like colour, texture or shape. With query by
sketch, the user draws roughly the image he is looking for and the PRS locates images of the
database that match the best the sketch. In the reference (Veltkamp & Tanase, 2001), are
reported various CBIR systems. After a brief description of CBIR system, the authors present
different kinds of existing systems along with the features involved.
In the context of image indexing, CBIR systems use content information as summarised in
figure 6. An image can then be described by using features derived from colour, texture,
shape or a combination of those features.
Input
image
Shape
Colour
Texture
178
ASM =
i =0
M(i, j)2
j=0
L 1 L 1
C=
Contrast
i =0
(i j) 2 M(i, j)
j=0
L 1 L 1
IDM =
i =0
j=0
L 1 L 1
H=
Homogeneity
i =0
M(i, j)
1 + (i j) 2
M(i, j)
1+ i j
j=0
L 1 L 1
Entropy
i =0 j =0
m
M
(5)
m and M being, respectively, the smallest and the largest eigenvalues of the inertia matrix
of the shape. Also called elongation factor or elongation coefficient, this parameter varies
from 0% for long but thick shapes to 100% for isotropic shape (see Fig. 7.e and Fig. 7.f).
The compactness (CO) measures how branchy or how tortuous is the shape. For a given 2D
shape, let A be the enclosed area and P the perimeter ; the compactness is defined by:
CO = 100
4A
P2
(6)
179
The compactness varies from 0% for very branchy or very tortuous shapes to 100% for
compact shapes like a circle (see Fig. 7.a and Fig. 7.c).
The mass deficit coefficient (MD) measures the area variation between the shape and the
minimum enclosing circle centred on the centre of gravity of the shape. For a shape with
area A, let SC be the area of the circumscribed circle, then the mass deficit area is defined as
follows:
MD = 100
SC A
SC
(7)
The mass excess coefficient (ME) measures the area variation between the shape and the
maximum enclosed circle centred on the centre of gravity of the shape. For a shape with
area A, let SI be the area of the inscribed circle, then the mass deficit area is defined as
follows:
ME = 100
A SI
A
(8)
The two previous parameters, give another estimation of the compactness: they vary from
0% for compact shapes (e.g. a circle) to 100% for spread out tortuous patterns (see Fig. 7.a
and Fig. 7.d)
The isotropic factor (IF) tells how isotropic is the pattern: it indicates how regular is the
shape around its centre of gravity. For a given 2D shape, let Rm be its minimal radius and RM
its maximal radius then the IF parameter is defined by:
IF = 100
Rm
RM
(9)
The isotropic factor varies from 0% for anisotropic shapes to 100% for isotropic shapes like a
circle (see Fig. 7.a and Fig. 7.d).
EL=100.0%
CO=100.0%
MD= 0.0%
ME= 0.0%
IF=100.0%
a)
EL=100.0%
CO= 59.6%
MD= 11.6%
ME= 4.3%
IF= 92.0%
b)
EL=100.0%
CO= 64.5%
MD= 3.8%
ME= 10.9%
IF= 92.6%
c)
EL=100.0%
CO= 9.8%
MD= 50.2%
ME= 77.3%
IF= 33.7%
d)
EL=
CO=
MD=
ME=
IF=
44.4%
75.4%
41.2%
47.6%
55.5%
e)
EL=100.0%
CO= 78.5%
MD= 36.3%
ME= 21.4%
IF= 70.7%
f)
180
Shape description
Region-based features
Global
Structural
Contour-based features
Structural
Area
Compactness
Compactness
Convex Hull
Eccentricity
Media Axis
Euler Number
Global
Core
B-Spline
Chain code
Invariants
Polygons
Geometric
Moments
Legendre
Moments
Shape Matrix
Zernike Moments
Eccentricity
Circularity
Elastic matching
Elongation
Fourier
descriptors
Scale space
descriptors
Wavelet
descriptors
181
contour data with Gaussian filters of increasing width. This representation carries a number
of important properties, such as:
it reflects properties of the perception of human visual system and offers good
generalization,
it is robust to perspective transformations, which result from the changes of the camera
parameters and are common in images and video,
it is compact.
Some of the above properties of this descriptor are illustrated in figure 11, each frame
containing very similar images according to CSS, based on the actual retrieval results from
the MPEG-7 shape database. In figure 9, we represent two shapes and their corresponding
CSS images. On the CSS images (bottom row) we have superposed the peaks points that are
used to generate features (Mokhtarian & Bober, 2003).
Fig. 9. Example of contours (top row) and the corresponding CSS image with the peaks
points (bottom row)
Region-based approach
In region-based approaches, all pixels surrounded by the shape boundary are taken into
account to generate the shape descriptor. Like in the case of contour-based approaches, we
encounter the same two different ways in region-based shape description: global and
structural one. In the structural approach, the shape is decomposed into sub-regions to
generate a tree to represent the shape. In the global way, one computes some characteristic
features to generate a vector to represent the shape. Common global features derived from a
region-based approach are: geometrical moment invariants, shape matrix, area,
compactness, eccentricity, Euler number, geometric moments, Legendre moments, Zernike
moments For region-based shape description, the MPEG-7 working group (Bober, 2001 ;
Martinez, 2004) has selected the angular radial transform (ART). It is a moment-based
approach for a 2D region-based shape description. In (Ricard et al., 2005) the authors
proposed a generalization of the ART approach to describe 2D and 3D shapes for contentbased image retrieval purpose.
182
The contour-based approaches are more appealing than region-based approaches because
they involve less computation complexity, than the region-based ones, with enough
discriminating efficiency. It is also demonstrated that characteristic information about a
shape lie essentially on its contour features. The main drawback of contour-based
descriptors is that they are more subject to noise and variations than region-based ones.
Figure 10 shows examples of shapes and illustrates situations for which the contour-based
or the region-based descriptors are most suitable.
A shape may consist of just one single region (see Fig.10.a-c) or a set of several regions as
well as regions with some holes inside them as illustrated in figures 10.d-f. Since the regionbased descriptors make use of all pixels constituting the shape, they can describe any kind
of shapes. They are more suitable than the contour-based descriptors to handle complex
shape consisting of holes in the object or several disjoint regions (see Fig.10.d-f) in a single
descriptor. Indeed, for contour-based descriptors, these shapes consist not of a single
contour but of multiple contours leading, thus, to multiple descriptors.
(a)
(b)
(d)
(g)
(e)
(h)
(c)
(f)
(i)
the region-based shape descriptor will consider 10.g and 10.h similar but different from
10.i,
the contour-based shape descriptor will consider 10.h and 10.i similar but different from
10.g.
As illustrated by MPEG-7 (Martinez, 2004), a challenge for a pattern descriptor is to enable
the recognition of a pattern even if it has undergone various deformations namely partial
occultation (Fig.11.a) and non-rigid deformation (Fig. 11.b).
Figure 11.a, according to (Martinez, 2004), illustrates the robustness to partial occultation:
indeed, in this figure, one can note that the tails or the legs of the horses are sometimes
occulted but they are recognised to be from the same class. As presented in (Mokhtarian,
1997 ; Petrakis, 2002) , this is possible because of the ability of the descriptor to handle local
properties. On figure 11.c are represented various shapes that are classified in the same class
based on the visual perceptual similarity
183
(a)
(b)
(c)
"what am I looking at? Tell me about objects visible in this image by transferring
annotations from similar images"
To do this, they used the SIFT (Scale Invariant Feature Transform) keypoints detector that
was shown to be transformation invariant (Lowe, 2004).
Among the various forthcoming systems, we can encounter MPEG-7. Formally named
"Multimedia Content Description Interface", MPEG-7 aims at managing data in the way that
content information can be retrieved easily. It is under development by the Moving Picture
184
MPEG-2 and MPEG-4 also make content available but MPEG-7 enables to find the desired
content. MPEG-7 visual description tools consist of basic structures and descriptors that
cover basic visual features: colour, texture, shape, motion, localization. Each category
consists of elementary and sophisticated descriptors (Sikora, 2001; Bober, 2001). One must
note that MPEG-7 addresses many different applications in various environments, thus it
needs to provide a standard flexible and extensible framework for describing audio-visual
data.
4. Application example based on the MSGPR method
In (Kpalma & Ronsin, 2006) we have presented an original pattern description approach
based on the multi-scale analysis of the contour of planar objects. This proposed approach
summarises the different presented considerations in this chapter. It is well known that
some objects, especially natural ones, exist with a more or less large range of scales; and that
the aspect of the object can change from one scale to another. Without a priori information
about the distance of observation inside a given scene, an interesting challenge can be to
find an object without any precision about its scale of observation. Faced with this situation,
it is very difficult to significantly describe a pattern using only one meaningful scale. To
overcome this problem, increasingly more pattern description techniques are based on
multi-scale or multiresolution representation methods (Lindeberg, 1998). Within this
context, methods based on the pattern itself (Torres-Mndez et al., 2000 ; Kadyrov & Petrou,
2001 ; Belongie et al., 2002 ; Grigorescu & Petkov, 2003) exist as well as methods based on
pattern contour behaviour (Matusiak & Daoudi, 1998 ; Roh & Kweon, 1998 ; Wang et al.,
1999 ; Latecki et al., 2000).
This study deals exclusively with methods based on the pattern contour. Called MSGPR (A
Multi-Scale curve smoothing for Generalised Pattern Recognition) this scale-space
(Mokhtarian et al., 1996 ; Matusiak & Daoudi, 1998 ; Wang et al., 1999 ; Mokhtarian & Bober,
2003) method is based on multi-scale smoothing of a planar pattern contour. This method is
totally translation and rotation insensitive and as showed in the initial studies it is also
robust against scale change for a large range of scaling and resistant to additive noise.
4.1 Description of the MSGPR method
The framework of the MSGPR can be broken down into four main stages as follows (see
Fig.12):
1. the input contour is separated into two parameterised functions,
2. both functions are low-pass filtered (smoothed),
3. scale adjustment is then applied to both filtered functions so that the corresponding
smoothed contour has the same scale as the input one,
()
185
4.
finally, the intersection points map (IPM) is generated by detecting the intersection
points of the input contour and the smoothed scale-adjusted one.
g(,u) X(,u)
x(u)
XGC(,u)
y(u)
Input contour
YGC(,u)
g(,u)
Contour
separation
Y(,u)
Filtering
Scale adjustment
u
Intersection points map function
g (, u ) =
The
filtered
functions
are
then
1
2
given
u2
2 2
by:
(10)
X(, u ) = g (, u ) x (u )
and
Y(, u ) = g (, u ) y(u ) so that each ( x ( u ), y( u )) point on the input contour leads to the
( X (, u ), Y (, u )) point on the output smoothed contour.
Since the bandwidth is conversely proportional to , it is clear that the bandwidth decreases
as increases. Thus the filter cuts increasingly lower so that the output functions move
towards their mean values when tends towards infinity.
186
Original
contour C0
Smoothed scale-adjusted
contour CGC(=30)
Smoothed scale-adjusted
contour CGC(=180)
Fig. 13. Example of a contour and two smoothed scale-adjusted ones (=30 and =180)
4.1.4 Definition of the IPM function
By increasing , the output contour moves towards a convex curve that has some intersection
points with the input contour. By marking these intersection points for each , we obtain the
intersection points map (IPM) function defined below which characterises the pattern.
After the scale adjustment system, the IPM function is generated as follows. For each
value, we define a function which is an image in the scale-space (u,) plane so that (see
Fig.14):
Figure 14 shows examples of contours (left column) and the corresponding IPM functions
(right column). On this figure, intersection points are indicated by (1) through (6) or (8), for
the contour in Fig.14.a or for that in Fig.14.c, respectively. On the right column, one can see
the marks corresponding to those intersection points in the IPM representation. As can be
seen on this figure, the IPM function is characteristic of the contour it is derived from.
(1)
(3)
(2)
(3)
(4)
(5)
(6)
(2)
(1)
(6)
(4)
(5)
a)
(3)
b)
(2)
(4)
(1)
(5)
(6)
(7)
(1) (2)
(8)
c)
d)
(7) (8)
187
we consider the IPM points at the set 0 and select two consecutive pa and pb points
which are, circularly, the furthest apart in the IPM function as illustrated in figure 15,
we determine the circular distance between both points to produce the first d1
component of the V0 features vector,
(11)
To benefit from multi-scale information of the IPM function, we can define a set of M values
of (0, 1, ...,M-1) and determine the Vi feature vectors (i=0, 1, 2,, M-1) corresponding
to the i scales. The global V features vector is then produced by a concatenation of the
individual Vi scale vectors as follows:
V = (V0, V1,, VM-1)
p2=pb
p3
p 2 p3 p4
1=30
d6
d4
d2
0
(12)
d3
p4
p5
p2=pb
d5
p5
p7
p6
p1=pa
d1
p1=pa
p6
Min ( VA , VB )
Max ( VA , VB )
(13)
where is the angle between both vectors and where . indicates the module of a vector.
This function ranges from 0% for very different vectors to 100% for perfectly matching
vectors.
188
a)
b)
Edge detection
(I)
Contours extraction
(II)
(III)
Off-line
learning
MSGPR
IPM-based
features
database
Similarity computation
'3' (79%)
(I) edge (or contours) detection that will enable to obtain contours delimiting each
character in the image (Fig. 18.a et Fig. 18.b). One must note that this stage is very
important in our process, because, the effectiveness of character recognition will
depend on it.
189
(II) contour extraction: in this stage, one considers only the external (or the outer)
boundary (Fig. 18.c), because only these contours are taken into account. As for the
stage (I), one must pay particular care to the extraction of the characters so that they
are continuous and closed, without self-intersection.
(III) character recognition: at this last stage, we apply our IPM-based description
approach to extract the features and to integrate them into the identification process
to measure the similarity score between each extracted character and the models of
the data base. In this application, the similarity measure is based on the SimScore
function defined by equation (13).
4.4.2 Experimental results
Figures 18.a and 18.b represent the output images of the edge detection when applied to
images corresponding to figure 16. The figure 18.c presents the set of the extracted
characters from figures 18.a and 18.b. On figure 18.d we present a sample set of characters of
the database: this base consists of the character set "bold.chr" of Borland.
a)
b)
c)
d)
Fig. 18. a) and b) detected edges - c) extracted contours from a) and b) - d) examples of the
content of the database.
It must be noted that in this study, the database is composed of only one font while the
query characters come from two different fonts. In order to improve the identification
results, a possible solution would be to integrate in the database, all the possible fonts used
to create car plates. Figures 19 show some results obtained from the input images presented
on figure 16. On these figures, we represent some results of character recognition: on each
figure, the contour on the upper left corner represents the query contour. Following
contours in left-to-right and top-to-down scanning, represent eight retrieved contours which
give the highest similarity scores.
As can be seen on these figures, the identification of different characters is effective enough:
for each query, the identified character (the most similar: the character next to the query in
figures 19.a-d) is exactly the required character. Thus, for the query '3', we identify the letter
'3' with a similarity score of 79%. Table 2 summarises the three highest similarity scores for
the contours presented on figure 19. For the contour '9' as a query, we retrieved the digit '9'
with a similarity score up to 96% followed by the digit '6' with a similarity score of 79%. One
can notice that the contour '6' of the used font is not other than the contour '9' which
underwent a rotation of 180: this explains that the digit '6' occupies the second position
190
during the retrieval process. In the same way, the topological similarity between the digit '5'
and the letter 'S' or between the digit '8' and the letter 'B' results in the appearance of 'S' and
'B', respectively, into the second position in the retrieval ranging. In spite of this topological
similarity, specific properties of each character lead to sufficiently important variations of
similarity scores to avoid mistakes.
a)
b)
d)
c)
Query
'3'
'3' (79%)
'C' (62%)
'E' (56%)
'5'
'5' (72%)
'S' (58%)
'6' (55%)
'8'
'8' (91%)
'B' (63%)
'1' (61%)
'9'
'9' (96%)
'6' (79%)
'K' (76%)
5. Conclusion
As mentioned before, pattern recognition does not appear as a new problem. A lot of studies
have been performed on this scientific field and a lot of works are currently developed.
Pattern recognition is a wide topic in machine learning. It aims to classify a pattern into one
of a number of classes. It appears in various fields like psychology, agriculture, computer
vision, robotics , biometrics With technological improvements and growing performances
of computer science, its application field has no real limitation. In this context, a challenge
consists of finding some suitable description features since commonly, the pattern to be
classified must be represented by a set of features characterising it. These features must have
discriminative properties: efficient features must be affined transformations insensitive.
They must be robust against noise and against elastic deformations due, e.g., to movement
in pictures.
Through the application example based on our MSGPR method, we have illustrated various
aspects of a PRS. With this example, we have illustrated the description task that enabled us
to extract multi-scale features from the generated IPM function. By using theses features in
the classification task, we identified the letters from a car number plate so that we
automatically retrieved the license number of a vehicle.
191
The research topic of pattern recognition is under continuous development and in perpetual
progress. With the large volumes of digital images, the challenge for pattern recognition in
computer vision is now the development of a CBIR-like system: system that is able to
retrieve useful information by using the only content of the input image. With the growing
huge availability of digital images, pattern recognition takes more and more place in our
daily life to help us find the desired information in a reasonable time limit, while browsing
large databases.
Pattern recognition is integrated into the forthcoming standard MPEG-7 via indexing
approaches. Such standardization does not bring restriction to a domain: it gives synergy of
best actors mixing challenge and cooperation. And moreover international standardization
occurs as a requirement from different applications so it meets all conditions for large
diffusion. Standards use the possibilities of last technological developments, and drive
strong investments and focus research on the concerned domain. As it has been observed,
for example, for coding when it was integrated inside different MPEG standards, the
integration of pattern recognition inside MPEG-7 will boost its last developments.
6. References
Abdi, H. (1994). A neural network primer. Journal of Biological Systems, Vol. 2, No. 3, pp.
247-281
Belongie, S., Malik, J., and Puzicha, J., Shape matching and object recognition using shape
contexts. IEEE PAMI-24, No 24, pp 509-522, 2002
Bober, M. (2001). MPEG-7 Visual Shape Descriptors, IEEE Transactions on Circuits and
Systems for Video Technology, Vol. 11, No. 6, pp 716-718.
Bruckstein, A. M., Rivlin, E., and Weiss, I. (1996). Recognizing objects using scale space local
invariants, Proceedings of the 1996 International Conference on Pattern
Recognition (ICPR '96), August 25-29, pp. 760-764, Vienna, Austria.
Bruckstein, A., Katzir, N., Lindenbaum, M., and Porat, M. (1992). Similarity invariant
signatures for partially occluded planar shapes, IJCV, Vol. 7, No. 3, pp. 271-285.
Brunelli, R. and Poggio, T. (1997). Template Matching: Matched Spatial Filters And Beyond,
Pattern Recognition, Vol. 30, No 5, pp. 751-768
Chaumette, F. (2004), Image Moments: A General and Useful Set of Features for Visual
Servoing, IEEE Transactions on Robotics, Vol. 20, No. 4, pp. 713-723
Chaumette, F. (1994). Visual servoing using image features defined upon geometrical
primitives, International 33rd IEEE Conference on Decision and Control, Vol. 4, pp.
3782-3787, Orlando, Florida
Cole, L.; Austin, D. and Cole, L. (2004). Visual Object Recognition using Template Matching,
Australasian Conference on Robotics and Automation 2004
Coster, M. and Chermant, J.-L. (1985). Prcis d'Analyse d'Images, Editions du CNRS, 15,
quai A. France, Paris, 1985
Frawley, W. J.; Piatetsky-Shapiro, G. & Matheus, C. J. (1992). Knowledge Discovery in
Databases: An Overview, AI Magazine 13(3), pp. 57-70
Grigorescu, C., and Petkov, N. (2003). Distance Sets for Shape Filters and Shape Recognition.
IEEE Trans. Image Processing 12(9).
Haralick, R.M. (1979), Statistical and structural approaches to texture, Proceedings of the
IEEE, No. 5, Vol. 67, pp. 786-804
192
Haralick, R.M., Shanmugam, K. and Dinstein, I. H. (1973). Textural features for image
classification, IEEE Transaction on Systems, Man and Cybernitics, Vol. SMC-3, n6,
pp. 610-621
Iqbal, Q. and Aggarwal, J. K. (2002). CIRES: A System for Content-based Retrieval in Digital
Image Libraries, Seventh International Conference on Control, Automation,
Robotics and Vision (ICARCV), Singapore, pp. 205-210, December 2-5, 2002
Jain, A. K. and Pankanti, S. (2006). A Touch of Money, IEEE Spectrum, vol. 43, no. 7, pp. 2227, July 2006.
Jain, A. K.; Ross, A. and Prabhakar, S. (2004a).An Introduction to Biometric Recognition,
IEEE Transactions on Circuits and Systems for Video Technology, Vol. 14, No. 1,
January 2004
Jain, A. K., Pankanti, S., Prabhakar, S., Hong, L., Ross, A., and Wayman, J. L. (2004b).
Biometrics: A Grand Challenge, Proceedings of the 17th International Conference
on Pattern Recognition, Vol. 11, August 2004, pp. 935942.
Jain, A. K.; Duin R. P.W. and Mao, J. (2000). Statistical Pattern Recognition: A Review, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, pp. 4-37
Kpalma, K., and Ronsin, J. (2006). Multiscale contour description for pattern recognition,
Elsevier Science Inc, Pattern Recognition Letters, Vol.27, No.13, pp 1545-1559, 1
October 2006
Kpalma, K., and Ronsin, J. (2003). A Multi-Scale curve smoothing for Generalised Pattern
Recognition (MSGPR), Seventh International Symposium on Signal Processing and
its Applications (ISSPA), pp 427-430, Paris, France.
Kpalma, K. (1994). Caractrisation de textures par l'anisotropie de la dimension fractale,
Proceedings of the 2nd African Conference on Research in Computer Science
(CARI), October 1994, Ouagadougou, Burkina Faso.
Kadyrov A., Petrou, M. (2001). Object descriptors invariant to affine distortions. Proceedings
of the British Machine Vision Conference, BMVC'2001, Manchester, UK.
Kuncheva, L. I. (2004). Classifier Ensembles for Changing Environments, Proc. 5th
International Workshop on Multiple Classifier Systems, Cagliari, Italy, SpringerVerlag, LNCS, Vol. 3077, 1-15
Latecki, L. J., Lakamper, R., and Eckhardt, U. (2000). Shape Descriptors for Non-rigid Shapes
with a Single Closed Contour, IEEE Conf. On Computer Vision and Pattern
Recognition (CVPR), pp. 424-429, 2000
Lindeberg, T. (1998). Principles for Automatic Scale Selection, Technical report ISRN KTH
NA/P--98/14--SE. Department of Numerical Analysis and Computing Science,
KTH (Royal Institute of Technology), S-100 44 Stockholm, Sweden.
Lindeberg, T. (1994). Scale-Space Theory in Computer Vision, Kluwer Academic Publishers,
Dordrecht, Netherlands.
Liu, J., Sun, J. and Wang, S. (2006). Pattern Recognition: An overview, International Journal
of Computer Science and Network Security (IJCSNS), Vol. 6, No.6, June 2006
Lowe, D. G. (2004). Distinctive image features from scale invariant keypoints, IJCV, 60
(2):91110, 2004.
Martinez, J. M., (editor), (2004), MPEG-7 Overview (version 10), ISO/IEC JTC1/SC29/WG11
N6828, Palma de Mallorca, October 2004
Martinez, J.M. (2002). Standards - MPEG-7 overview of MPEG-7 description tools, part 2.,
IEEE Multimedia 9 (3), July-Sept. 2002, pp. 83 93
193
Matusiak S., Daoudi M. (1998). Planar Closed Contour Representation by Invariant Under a
General Affine Transformation, IEEE International Conference on System, Man and
Cybernetics (IEEE-SMC'98), pp. 3251-3256, October 11-14, Hyatt Regency La Jolla,
San Diego, California, USA.
Mittal A. (2006). An Overview of Multimedia Content-Based Retrieval Strategies,
Informatica, International Journal of Computing and Informatics, Vol. 30, No. 3, pp.
347356
Mokhtarian, F., and Bober, M. (2003). Curvature Scale Space Representation: Theory,
Applications and MPEG-7 Standardization. Kluwer Academic.
Mokhtarian, F. (1997). Silhouette-Based Occluded Object Recognition through Curvature
Scale Space, Machine Vision and Applications, Vol. 10, No. 3, pp. 87-97.
Mokhtarian, F., Abasi, S., and Kittler, J. (1996). Efficient and Robust Retrieval by Shape
Content through Curvature Scale Space, in Proceedings International Workshop on
Image Databases and MultiMedia Search, pp 35-42, Amsterdam, The Netherlands.
Mokhtarian, F., and Mackworth, A. K. (1992). A Theory of Multiscale, Curvature-Based
Shape Representation for Planar Curves, in IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. PAMI-14, N 8.
Munich, M. E.; Pirjanian, P.; Di Bernardo, E.; Goncalves, L.; Karlsson, N. and Lowe, D.
(2006). Application of Visual Pattern Recognition to Robotics and Automation,
IEEE Robotics & Automation Magazine, pp.72-77, September 2006
Pal, S.K. & Pal, A., (Editors). (2002). Pattern recognition: from classical to modern approaches,
World Scientific, ISBN No. 981-02-4684-6, Singapore
Petrakis, E. G.M.; Diplaros, A. and Milios, A. (2002). Matching and Retrieval of Distorted
and Occluded Shapes Using Dynamic Programming, IEEE Transaction on Pattern
Analysis and Machine Intelligence, Vol. 24, No. 11, pp. 1501-1516
Ricard, J., Coeurjolly, D. and Baskurt A. (2005). Generalizations of angular radial transform
for 2D and 3D shape retrieval, Elsevier Science Inc, Pattern Recognition Letters
Volume 26, Issue 14 , 15 October 2005, Pages 2174-2186
Roberts, S. and Everson, R. (2001). Independent Component Analysis- principles and practice,
Cambridge University Press, ISBN 0521792983
Roh, K.-S., Kweon, I.-S. (1998). 2-D object recognition using invariant contour descriptor and
projective refinement, Pattern Recognition, Vol. 31, N 4, pp. 441-455.
Smith, J. and Chang, S. F. (1996).Tools and Techniques for Color Image Retrieval. in
IS&T/SPIE proceedings of Electronic Imaging: Science and Technology - Storage &
Retrieval for Image and Video Databases IV vol. 2670, pp. 1630-1639, San Jose, CA,
February 1996.
Snavely, N., Seitz, S. M. and Szeliski, R. (2006). Photo tourism: Exploring photo collections in
3D, ACM Transactions on Graphics (SIGGRAPH Proceedings), 25 (3), pp. 835-846.
Sonka, M.; Hlavac, V. and Boyle, R. (1993). Image Processing, Analysis and Machine Vision,
Chapman & Hall, London, UK, 1993, pp. 193242
Sossa, H., 2000. Object Recognition, Summer School on Image and Robotics, INRIA RhneAlpes, France.
Sun, K. B. and Super, B. J. (2005). Classification of Contour Shapes Using Class Segment Sets
Full text, Proceedings of the 2005 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR'05), Vol. 2
194
ISBN 978-3-902613-05-9
Hard cover, 536 pages
cheap cameras and fast processors. This increase has also been accompanied by a blurring of the boundaries
between the different applications of vision, making it truly interdisciplinary. In this book we have attempted to
put together state-of-the-art research and developments in segmentation and pattern recognition. The first
nine chapters on segmentation deal with advanced algorithms and models, and various applications of
segmentation in robot path planning, human face tracking, etc. The later chapters are devoted to pattern
recognition and covers diverse topics ranging from biological image analysis, remote sensing, text recognition,
advanced filter design for data analysis, etc.
How to reference
In order to correctly reference this scholarly work, feel free to copy and paste the following:
Kidiyo Kpalma and Joseph Ronsin (2007). An Overview of Advances of Pattern Recognition Systems in
Computer Vision, Vision Systems: Segmentation and Pattern Recognition, Goro Obinata and Ashish Dutta
(Ed.), ISBN: 978-3-902613-05-9, InTech, Available from:
http://www.intechopen.com/books/vision_systems_segmentation_and_pattern_recognition/an_overview_of_ad
vances_of_pattern_recognition_systems_in_computer_vision
InTech Europe
InTech China