0% found this document useful (0 votes)
109 views20 pages

Handwriting Style Classification: Mandana Ebadian Dehkordi, Nasser Sherkat, Tony Allen

This document summarizes a research paper that describes a method for classifying handwriting styles into categories of legible, illegible, or middle based on features extracted from word contours. It defines legibility based on a recognizer's performance and extracts 36 features from word images. It then applies multiple discriminant analysis and probabilistic neural networks to classify words, finding the probabilistic neural network method provides superior results, correctly classifying 67.33% of words into the three legibility categories.

Uploaded by

tweety492
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views20 pages

Handwriting Style Classification: Mandana Ebadian Dehkordi, Nasser Sherkat, Tony Allen

This document summarizes a research paper that describes a method for classifying handwriting styles into categories of legible, illegible, or middle based on features extracted from word contours. It defines legibility based on a recognizer's performance and extracts 36 features from word images. It then applies multiple discriminant analysis and probabilistic neural networks to classify words, finding the probabilistic neural network method provides superior results, correctly classifying 67.33% of words into the three legibility categories.

Uploaded by

tweety492
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

IJDAR (2003) 6: 5574 Digital Object Identier (DOI) 10.

1007/s10032-003-0101-4

Handwriting style classication


Mandana Ebadian Dehkordi, Nasser Sherkat, Tony Allen
IRIS, School of Computing and Mathematics, Nottingham Trent University, Burton Street, Nottingham NG1 4BU, UK Received: 6 September 2002 / Accepted: 19 September 2002 Published online: 6 June 2003 c Springer-Verlag 2003

Abstract. This paper describes an independent handwriting style classier that has been designed to select the best recognizer for a given style of writing. For this purpose a denition of handwriting legibility has been dened and a method implemented that can predict this legibility. The technique consists of two phases. In the feature-extraction phase, a set of 36 features is extracted from the image contour. In the classication phase, two nonparametric classication techniques are applied to the extracted features in order to compare their eectiveness in classifying words into legible, illegible, and middle classes. In the rst method, a multiple discriminant analysis (MDA) is used to transform the space of extracted features (36 dimensions) into an optimal discriminant space for a nearest mean based classier. In the second method, a probabilistic neural network (PNN) based on the Bayes strategy and nonparametric estimation of probability density function is used. The experimental results show that the PNN method gives superior classication results when compared with the MDA method. For the legible, illegible, and middle handwriting the method provides 86.5% (legible/illegible), 65.5% (legible/middle), and 90.5% (middle/illegible) correct classication for two classes. For the three-class legibility classication the rate of correct classication is 67.33% using a PNN classier. Keywords: Style denition Style classication Handwriting legibility Discriminant analysis Probabilistic neural network

1 Introduction Nowadays optical character recognition (OCR) provides good recognition results [20,26,37]. However, as variation of document style and font increases, recognition accuracy drops [1]. One way of tackling this problem has been
Correspondence to: N. Sherkat (e-mail: [email protected])

to determine the font of printed text prior to recognition before attempting to recognize the text. In this way, it is hoped that the characters would be sent to the appropriate recognizer rather than normalizing all the fonts into a generic superset and hence risk losing some vital information during the normalization process. The diculties of style characterization are even more challenging when handwriting is to be dealt with [18,30,33]. This is mostly due to the vast variability in human handwriting both between dierent writers (interwriter) and within the same writer (intrawriter) [4,22,25]. Previous research has shown that writing style can vary signicantly with geographical location, cultural background, age, sex, etc. [5,29]. Indeed people often completely redene their style of writing as they age. Cursive handwriting variability is due not only to a writers style but also to geometric factors determined by the writing conditions. In the case of oine handwriting, there is little or no information on the type of instruments used. The artifacts of the complex interactions between instruments and subsequent operations such as scanning and binarization present additional challenges to algorithms for oine handwriting recognition. Following in the OCR tradition, a preprocessing stage is normally used in current handwriting recognition systems. The aim is to reduce unwanted variation and present to the recognizer characters that are as close as possible to the generic templates. The main functions of such preprocessing steps are usually the correction of slant [7], the deskewing of handwritten words [3], normalization [27], etc. The use of these preprocessing steps has been shown to improve the image quality and correct the character string recognition. However, as part of this process some of the original information may be lost. More importantly, the variation in handwriting is such that far more sophisticated operations would be required to more closely approximate the generic template [15,17,38]. Unfortunately, although these approaches yield limited success for improving recognition performance, they eventually fail when the handwriting becomes highly illegible as far as the recognizer is concerned.

56

M.E. Dehkordi et al.: Handwriting style classication


1200

A more sophisticated approach to dealing with variability has been to devise multiple experts and use them independently to recognize the handwriting [4043], the hypothesis generally being that since the dierent recognizers would have specic expertise that would complement that of the others, this would improve the chances of dealing eectively with variability. The results generated by each expert would normally be used in a voting process to determine the nal outcome. Although with the availability of relatively low-cost computing it is possible to economically construct a system comprising many expert nodes, the problem of distilling the nal outcome using a voting system remains a complicated one. The approach presented in this paper is to glean some information about the style of the target handwriting before it is sent to the recognizer. In this way, the data can either be sent to the appropriate specic expert rather than to all experts or provide some weighting information that can be used at the voting stage, hence reducing the complexities associated with results combination or voting. In this work, the concept of style classication is introduced and the various aspects of its denition in quantitative terms are discussed. The idea of style classication is somewhat new, and to provide a starting point, style has been dened in terms of recognizer-specic legibility. In this way, the best recognizer could be selected for a given style of writing using a prediction of legibility based on a given recognizers previous performance. This research therefore focuses on the problem of classifying word images as legible, illegible, or middle prior to the recognition stage. An independent handwriting style classier has been designed that, in principle, can be used to select the best recognizer for a given style of writing. For this purpose a denition of recognizer-specic handwriting legibility has been introduced and a method implemented that can predict this legibility. Multiple discriminant analysis (MDA) and probabilistic neural network (PNN) techniques, based on the Bayes strategy and nonparametric estimation of probability density function, are proposed. Both methods are applied to the task of classifying handwritten words into legible, illegible, or middle prior to the recognition stage. A comparison between the two classication techniques is then given. 2 Denition of legibility Up till now handwriting legibility has been dened purely in human terms. However, since the ability of a machinebased recognizer diers signicantly from that of a human being [1], any denition of legibility should be based on the recognition system. Of course, like the denition of a human being, the denition of legibility is a debatable issue. However, at the time of writing no reference to a machine-based denition of legibility has been found in the literature, which is probably not surprising considering the novelty of this concept. Our denition of handwritten legibility has therefore been based on our existing recognizers performance [34].

1000

Number of words

800

600

400

200

10

13

16

19

22

25

28

31

34

37

40

43

46

Rank of correct words

Fig. 1. All correct words regardless of rank using HVBC recognizer

This recognizer is a holistic word level recognizer (HVBC) that uses three features, namely, holes, vertical bars, and cups. This denition of legibility can be extended to any available recognizer. Figure 1 shows that almost all correct words are located within the top ten positions. Thus legible words could be further dened as those that are likely to be placed in the top ten of the correct word list with a score of 75 or greater. Illegible words could be those that would produce a list containing the correct word anywhere in the word list with a score of less than 45. Middle words (those between legible and illegible) are then dened as those that would produce a list containing the correct word with a score of 45 to 75. These thresholds have been arrived at experimentally and merely provide a starting point. They can be changed depending on the context in which they are to be used. The following experiments serve to assess the validity of this approach by conducting a binary followed by triple style classication. 3 Feature extraction During the design process of this classication system 36 features from the contour of a reasonably large number of handwritten word images were extracted. The data set is provided by 18 dierent writers (150 words each) [34]. The reasoning behind the choice for data sets and features is provided below and in further detail in Chien and Aggarwal [6], Dehkordi et al. [10], Jedrzejewski [19], and Loncaric [24]. Some sample images are available from http://www.doc.ntu.ac.uk/ns/c sample.html. 3.1 Contour-based features As a starting point, based on human perception of style it was assumed that the word contour, as dened by tracing around the outside of the whole word, could contain information about the underlying characters used in constructing the word [6]. We extend this to the hypothesis that the synergy within the word resulting from the way in which the neighboring characters follow/inuence each other is encapsulated in the word shape. A number of features were therefore introduced that are based on the contour of the handwritten word images.

49

M.E. Dehkordi et al.: Handwriting style classication

57

2 3 4 5 6
Fig. 2. Eight primitive directions

dl k +1
1 0 7
S
O

d l 1

pl 1

al dl pl +1

A handwritten word can be described as a sequence of disjointed loop contours: W I = {Ci | Ci Cj = , i = j, j = 1, 2, . . . , N } (1)

Each loop contour Ci is a sequence of consecutive points on the x-y plane: Ci = {pj | j = 1, 2, . . . , Mi , p1 = pMi } (2)

where p1 and pMi are the end points of the ith loop contour. The contour-based features used in our system are mainly based on: (a) The chain coding from the eight primitive directions given by Freeman encoding [12]. Figure 2 refers to the eight primitive directions and represents the writing direction from a start point to an end point by following the upper contour of the word. Each loop contour Ci can be represented by a chain code sequence Di = {dj | j = 1, 2, . . . , Mi 1} and
N

pl +k 1
Fig. 3. Angle al at point pl

dl +k 1

Table 1. al as a function of (dl1 dl ) (dl1 dl ) mod 8 al 0 180 1 135 2 90 3 45 5 315 6 270 7 225

(3)

curvatures of each individual loop contour. (3) Midpoints between two consecutive points of type (1) or (2). Using the above concepts, the following subsections dene the selected features in detail.

D=
i=1

Di

(4) 3.2 Global features In 2001, Madhvanath [25] showed how word shape contains sucient information to classify words in certain lexicons. These characteristics of handwriting are dierent from one writer to another. A number of features based on the overall shape of a given word have been nominated, assuming N is the number of loop contours. (1) An estimate of number of sharp angles in the whole word: ratio of number of original sharp angles to the total number of angles (ROSP):
N

(b) Consecutive exterior angles and contour angles formed by pairs of vectors along the word images. Figure 3 shows the exterior angle al at point pl formed by a pair of vectors dl and dl1 that are located on the left-hand side of the vectors. The value of al can be obtained easily from Table 1. The sequences of exterior angles in a loop contour, Ci , is calculated as: Ai = {aj | j = 2, 3, . . . , Mi 1} (5)

(c) Dominant points. Dominant points refer to points of the following types: (1) End points of the segmented regions of each individual loop contour. (2) Points corresponding to the local extreme of

ROSP = where

i=1

card(A90 ) i (6)

card(P )

A = {aj Ai | aj , j = 2 . . . Mi1 } i

(7)

58
N

M.E. Dehkordi et al.: Handwriting style classication

P =
i=1

Ci
N N

(8)

where
N

N Mi (9)

dia

=
i=1

Nidia

(20)

card(P ) =
i=1

card(Ci ) =
i=1

and card stands for the number of members in a set and sharp angles are the angle less than or equal to 90c irc. (2) Average of the component length (disjoint loop contours) or averaged component length (ACOL): card(P ) ACOL = N (10)

Nidia = {dj Di | dj = 1 dj = 3 dj = 5 dj = 7} (21)


N

and card(N dia ) =


i=1

card(Nidia ) (22)

(3) Ratio of vertical direction (two and six directions given by Freeman code) to the total original chain code (RVO): card(N ver ) RVO = card(P ) where
N

dia as Nidia Nj = f ori = j

3.3 Region-based features (11) The region-based features were proposed in order to measure the plain, concave, and convex regions, and this variability of writing could be used for style or legibility of handwriting [23]. The region-based features used are the dominant points in the contours and direction primitives between dominant points. Prior to the process of nding dominant points, a Gaussian average lter is used to reduce the inuence of digitization noise. The ltered version of Ai is denoted as a Ai = {i | i = 2, 3, . . . , Mi 1} (14) (23)

N ver =
i=1

Niver

(12) (13)

Niver = {dj Di | dj = 2 dj = 6}
N

and card(N ver ) =


i=1

card(Niver )

ver as Niver Nj = for i = j

(4) Ratio of horizontal directions (any zero and four directions given by Freeman code) to the total original chain code (RHO): card(N ) RHO = card(P ) where
N hor

After performing a Gaussian average lter on Ai , each contour Ci can be partitioned into a sequence of convex, concave, and plain regions:
Ti

Ci = (15)
j=1

k Rij

(24)

where Ti is the number of disjointed regions of Ci .

N hor =
i=1

Nihor

(16) (17)

k Rij , k {1, 2, 3} are series of consecutive points on contours Ci such that 1 Rij = {pl Ci | pl are consecutive points, al = 180}

Nihor = {dj Di | dj = 0 dj = 4}
N

(plain region) and card(N hor ) =


i=1 hor as Nihor Nj = for i = j

(25)

card(Nihor ) (18)

2 Rij = {pl Ci | pl are consecutive points, al < 180}

(concave region)
3 Rij = {pl Ci | pl are consecutive points, al > 180}

(26)

(5) Ratio of diagonal directions (any one, three, ve, and seven directions given by Freeman code) to the total original chain code(RDO): RDO = card(N dia ) card(P ) (19)

(convex regions)

(27)

Figures 4, 5, 6, and 7 show an example of a typical word with its concave, convex, and plain regions consecutively.

M.E. Dehkordi et al.: Handwriting style classication

59

Fig. 4. A typical word

Fig. 5. Concave regions


Fig. 8. The detected dominant points on words

Fig. 6. Convex regions

(3) Average concave region length (ACAL): ACAL = card(P )


N Ti i=1 j=1 2 card(Rij )

(30)

Fig. 7. Plain regions

(4) Average convex region length (ACVL): ACVL = card(P )


N Ti i=1 j=1 3 card(Rij )

The contour angle vl at pl is dened within a support region and its value estimated by averaging angles alk , where k = 1, 2, 3, . . . , K and alk is formed by the pair of vectors dlk and dl+k1 . Denoting the sequence of contour angles in the region as V = v2 v3 . . . vMi 1 , one can easily obtain the maximum within a convex region and the minimum in a concave region. All such maxima and minima constitute the local extremes of the curvature (corner points) along a word. More details of the above technique can be found in Li and Hall [23]. Figure 8 shows the corner points that are detected on words after using Gaussian average ltering, with two iterations while K = 3 is considered. It should be noted that the experiments show that as the number of iterations is increased the ltering process will remove some of the dominant points as well as the noise. On the other hand, if the number of iterations is not enough, the system will detect some of the noise as dominant points. cr Denoting Ci = { pcr Ci | j = 1, 2, .., Si } as j the dominant or critical points of the ith contour and cr Di = dcr | j = 1, 2, . . . , Si 1 as the direction primij tives between dominant points, the region-based features are dened as follows: (1) Average region length (AREL): AREL = card(P )
N i=1 Ti
j=1 k{1,2,3}

(31)

(5) Ratio of sharp angle of critical points to the total number of critical points (RSCR):
N

RSCR =

i=1 N

card(Vicr, ) (32)
cr card(Ci )

90

i=1

where
cr Vicr, = {vj Vi | vj < , Pj Ci , j = 2, 3, . . . , Mi 1} (33)

(6) Ratio of ltered sharp angle to the total number of points (RFSP):
N

RFSP = where

i=1

card(A90 ) i (34)

card(P )

A = aj Ai | aj < , j = 2, 3, . . . , Mi1 i (28)

(35)

k card(Rij )

(7) Ratio of critical vertical code to the total critical chain code (RVF): RVF = card(N ver )
N i cr card(Ci )

(36)

(2) Average plain region length (APRL): APRL = card(P )


N Ti i=1 j=1 1 card(Rij )

where (29) N ver =


i=1 N

Niver

(37)

60
cr Niver = dcr Di | dcr = 2 dcr = 6 j j j

M.E. Dehkordi et al.: Handwriting style classication

(38)

and card(N ver ) =


i=1

card(Niver )

(39)
Fig. 9. Three regions of interest within a window for dierent word case samples

ver as Niver Nj = f ori = j

(40)

(8) Ratio of critical horizontal code to the total critical chain code (RHF): RHF = card(N hor )
N i cr card(Ci )

(41)

where
N

Fig. 10. Three regions of interest within a window for dierent styles of handwriting (one specic word)

N hor =
i=1

Nihor

(42)

2 3 1 0 1 2 3

cr Nihor = dcr Di | dcr = 0 dcr = 4 j j j

(43)

0
N

and card(N hor ) =


i=1

card(Nihor )

(44)

as

Nihor

hor Nj

= f ori = j

(45)

Fig. 11. Representation of the four directions (slopes)

(9) Ratio of critical diagonal to the total critical chain code (RDF): RDF = card(N dia )
N i cr card(Ci )

3.4 Windows-based features Figure 10 shows how handwriting from one person to another could be dierent in each window. As this gure shows, the number of pixels and the value of slope in each window should be dierent. Therefore, the following features were introduced to investigate this style characteristic. Four values of slope corresponding to the angle of a direction with the horizontal are extracted from the eight directions given by the Freeman code. The four values correspond to angles of 0 , 45 , 90 , and 135 , respectively, to the horizontal (Fig. 11). For a given window i and a given slope k, the pointszone(i | k) is computed as follows: pointszone(i | k) =
card(i|k) k card(i|k)

(46)

where
N

N dia =
i=1

Nidia

(47)

cr Nidia = dcr Di | dcr j j

= 1 dcr = 3 dcr = 5 dcr = 7 j j j


N

(48) maxi,k

card(i|k) k card(i|k)

(51)

and card(N dia ) =


i=1

card(Nidia )

(49)

dia as Nidia Nj = for i = j

(50)

where card(i | k) is the number of contour points with a given slope k. The total number of local features extracted for a given window position is a made up of three slope features for each of the three zones. These are dened as follows:

M.E. Dehkordi et al.: Handwriting style classication

61

(1) Ratio of vertical directions in lower window (RVLZ): RVLZ = pointszone(0 | 2) (52)
Fig. 12. Horizontal lines are drawn from the center of each word

(2) Ratio of horizontal directions in lower window (RHLZ): RHLZ = pointszone(0 | 0) (53)

information of word images, which could help for legibility classication of handwriting [24]. M1 = (20 02 )2 + 42 11 (62)

(3) Ratio of diagonal directions in lower window (RDLZ): RDLZ = pointszone(0 | 1) + pointszone(0 | 3) (54)

(4) Ratio of vertical directions in middle window (RVZM): RVZM = pointszone(1 | 2) (55)

where the coordinates of a contour pixel is given by the 2D binary image of the cursive word and the central moment is given by pq = where pi = (xi , yj ) P and , 1 1 xi ; y = x= N N 1 N
N

(xi x)p (yi y )q


i=1

(63)

(5) Ratio of horizontal directions in middle window (RHZM): RHZM = pointszone(1 | 0) + pointszone(1, 4) (56)

(6) Ratio of diagonal directions in middle window (RDZM): RDZM = pointszone(1 | 1) + pointszone(1 | 3) (57)

yi

(64)

and N is the total number of points in the contour word image. 3.6 Zero-crossing feature As shown in Fig. 12, the number of intersections of a horizontal line passing through the midline of a word are different. The following features were therefore introduced to make use of this characteristic. A horizontal line is drawn through the center of the word. 1 Center of the word = S
S S

(7) Ratio of vertical directions in upper window (RVZU): RVZU = pointszone(2 | 2) (58)

(8) Ratio of horizontal directions in upper window (RHZU): RHZU = pointszone(2 | 0) (59)

(9) Ratio of diagonal directions in upper window (RDZU): RDZU = pointszone(2 | 1) + pointszone(2 | 3) (60)

xi ,
i=1 i=1

yi

(65)

In addition to the above features, the following feature is also dened: (10) Ratio of number of points in middle area to total number of points (RPCE): RPCE = where cardM id(P ) is the number of points in the middle zone. cardM id(P ) card(P ) (61)

where S is the total number of points in the contour word images. The number of intersections of this line with the contoured word gives the number of zero crossings (NCRS) (Fig. 12). In addition to the above features, group-based features and horizontal-based histogram features are used in this research. (For more details of these features refer to References 8, 9, and 10.) Features that are used in this research are listed in Table 2 for subsequent references. 4 Classication techniques

3.5 Feature-based moments In addition to the slope features described above, an additional feature, NOM1, based on the rst moment is also extracted. The moment features capture the global

4.1 Linear discriminant transformation (MDA) A multiple discriminant analysis (MDA) is used to transform the feature space of 36 dimensions into an optimal discriminant space for a nearest mean classier. A brief

62 Table 2. 36 extracted features 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

M.E. Dehkordi et al.: Handwriting style classication

Average region length Average plain region length Average concave region length Average convex region length Average number of sharp angles in each region Average number of ltered sharp-angle whole words Ratio of critical vertical code to the total critical chain code Ratio of critical horizontal code to the total critical chain code Ratio of critical diagonal to the total critical chain code An estimate of the number of sharp angles in the whole word An estimate of the component length (disjoint contours) or averaged component (Ci ) length Ratio of critical vertical code to the total critical chain code Ratio of critical horizontal code to the total critical chain code Ratio of critical diagonal to the total critical chain code Ratio of vertical directions in lower window Ratio of horizontal directions in lower window Ratio of diagonal directions in lower window Ratio of vertical directions in middle window Ratio of horizontal directions in middle window Ratio of diagonal directions in middle window Ratio of vertical directions in upper window Ratio of horizontal directions in upper window Ratio of diagonal directions in upper window Ratio of number of points in middle area to total number of points Zero crossing First moment feature Ratio of number of points in middle area to total number of points Ratio of number of black pixels in upper zone to number of black pixels in all three zones of a word Spread or rst moment of the histograms Average number of groups in each word Ratios of distance between upper bounding box and upper zone to distance between lower and upper zone for rst three groups of the word Ratios of distance between upper bounding box and upper zone to distance between lower and upper zone for second three groups of the word Ratios of distance between upper bounding box and upper zone to distance between lower and upper zone for third three groups of the word Ratios of distance between lower bounding box and lower zone to distance between lower and upper zone for rst groups of the word Ratios of distance between lower bounding box and lower zone to distance between lower and upper zone for second groups of the word Ratios of distance between lower bounding box and lower zone to distance between lower and upper zone for third groups of the word

summary of the technique is given here for clarity, but for more detail see Reference 39. The aim of MDA is to maximize the ratio of interclass variance and intraclass variance: Wb Ww |t Wb | |t Ww |

class:
nj

f = = (66)

1 nj

f m,j
m=1

(mean of features in j th class)

(67)

In this equation, Wb is the interclass scatter matrix, Ww the intraclass scatter matrix, and the transformation we are searching for to form the optimal discriminant space. i,j i,j We can dene the following, with f i,j = f1 , . . . , fp being the p extracted features of word image i in the j th class and nj being the number of word images in the j th

f=

1 n

nj f
j=1

(mean of features in all classes)

(68)

M.E. Dehkordi et al.: Handwriting style classication

63

where n is the number of classes (j = 1, 2, . . . , n).


nj

to correspond to the generalized eigenvectors of the following equation [31,39]: Wb j = j Ww j (69) (79)

W =
i=1

f i,j f

f i,j f

j t

(covariance in j th class)
n

Ww =
j=1

Wj

(intraclass covariance)

(70)

nj

Wb =
j=1

nj f f

f f

(interclass covariance)

(71)

Both the intraclass scatter Ww and the interclass scatter Wb are analogous to their respective covariance matrices. In looking for we can dene y = t F (transform F by t ) (72)

where the vectors j then form the columns of the matrix . In addition, the individual dimensions of the discriminant space created by each eigenvector j are now ordered. The interclass variance in dimension j is proportional to the eigenvalue j . Assuming a constant intraclass variance, the higher a dimensions interclass variance, the better the discriminant capacity of that dimension. One additional step can be taken to scale all of the intraclass variances to uniform size in the discriminant space. The variance in dimension j can be computed as t Ww j , and each dimension can be scaled by replacing j j with j = j t Ww j j (80)

j y j | f j j th class, y j = t f j 1 yj = y nj j
y

(mean of transformed features in j th class) 1 n


n

(73)

y=

nj y j
j=1

(mean of transformed features in all classes) Ww =


j y j

(74)

giving each new dimension uniform variance. The decision as to whether the particular word image is allocated to one class or another is then based on measuring the euclidean distance between its transform scores (created by the MDA) and the centroids of all the classes in the discriminant space (nearest mean classier). The nearest mean classier is very simple and robust. Each pattern class is represented by a single prototype, which is the mean vector of all training samples in that class. Furthermore, this classier does not require any user-specic parameters. 4.2 Nonlinear classication PNN method

(y y j )(y y j )t (75)

(intraclass covariance of transformed features) Wb =


j

nj (j y )(j y )t y y (76)

(interclass covariance transformed features) From these it follows that Ww = t Ww Wb = t Wb

A statistical classication method based on a Bayesian [14] decision can also be used to classify the style of an unseen word. The basic idea behind the Bayesian decision rule is to calculate the probability density functions of the features of the word images in each of the classes i (i= L (legible), I (illegible), and M (middle)). The probability that a particular set of features from word image f = (f1 , . . . , f36 ) comes from class i is denoted as p(i | f ) where p(i | f ) = p(f | i )p(i )
C j=1

(77) (78)

(81)

taking the determinant of a scatter matrix RESTORE is as being equivalent to nding the product of the eigenvalues, which, in turn, corresponds to the product of the variance. As may be seen with reference to Eq. 66, by maximizing this ratio, we are looking for a transform that maximizes the interclass variance with respect to the intraclass variance. The solution of Eq. 66 can be shown

(82)

p(f | j )p(j )

and C is the number of classes. This equation requires knowledge of the class-conditional density. This is described in the next section.

64

M.E. Dehkordi et al.: Handwriting style classication


input layer

4.2.1 Parzen method. The accuracy of the Bayesian decision in Eq. 82 depends on the accuracy with which the underlying class-conditional density is estimated. A Parzen model [28] is a class of smooth and continuous probability density function (PDF) estimators, which become progressively more representative of the true classconditional density as the number of samples increases. The Parzen model uses weight function W (d) that has a maximum value at d = 0 and decreases as the absolute value of d increases. A general formulation of the Parzen model is described by g f = 1 nj 1 p
i nj

pattern layer

summation layer

f
M

1,1

f1 f2

n1 ,1

p( 1 | f )

f
M

1, 2

W
i=1

i i fp fp f1 f1 , , 1 p

n2 , 2

p( 2 | f )

(83) and p are the sample points (exwhere f = tracted features) and number of features in the training set, k is the variance of k th features (k = 1, 2, . . . p) of points that surround each sample in the training set, nj is the number of samples in class j , W is the weight i function, and fk is the k th feature that is extracted from th the i word image belonging to the j class. In general, each Parzen method should have multiple i values. However, to simplify the model a special case can be assumed where = 1 = 2 = . . . = p for all of the weights of function W . A more general density estimator, which assumes a Guassian kernel distribution, is used in this study and is well behaved and easily computed. Thus, Eq. 83 becomes 1
nj i i f1 , . . . , fp

fp

f
M

1, 3

n3 ,3

p( 3 | f )

g f =

( )

k =1

2 k

exp( D ( f f ))
i i =1

nj

Fig. 13. Organization of a probabilistic neural network for classication of patterns into categories

g(f ) =

ni

e
i=1

f f i 2

(84) 4.2.2 Optimizing the . For each particular a set of Parzen density estimators based on the training data set is estimated. The number of correctly classied words produced by each value is then used to judge the eciency of a particular value of . To estimate an unbiased correct classication rate for each , a leave-oneout method was used. In this method, all of the training data set belonging to each class except one is used to train the system, and the remaining datum is used for testing. This training and testing using the leave-oneout method was repeated until every datum element in the two or three dierent classes had been independently tested. The leave-one-out method thus gives class bounds of the true performance of the classier [13]. The numbers of misclassied words for each are then counted as an error function. A nal value of is then chosen that minimizes the error function (number of misclassications). The minimization technique involves two stages. First, a global search over a reasonable range is used to nd a rough minimum. The range can be determined iteratively such that the error rate is minimized. Then a golden section method [31] is used to rene the estimate. Details were extensively reported by Schomaker et al [32] and Sargur et al. [36] and therefore are not reported here.

As we do not know in advance which features are important and which are not, the presence of features whose variation is meaningless has a dilutive eect on the useful features. We want the variation of unimportant features to be small so that they exert minimal inuence on the distance measure computed between an unknown point (test word) and each member of the training case. The solution to this problem is to use a separate weight for each feature. Eq. 84 then changes to g(f ) =
p

1 2k

nj

eD(f ,f
i=1

(85)

k=1

where
p

D(f , f i ) =
k=1

i fk fk k

(86)

In this experiment, both approaches were tested in order to evaluate the eectiveness of each method. In characterizing the function represented by Eq. 84, the estimation of i is critical [28]. A good criterion for selecting appropriate values of i is the number of correctly classied cases that each value produces.

M.E. Dehkordi et al.: Handwriting style classication

65

4.3 Probabilistic neural network The nonparametric classier described in the previous section can be implemented as a probabilistic neural network (PNN) structure. Figure 13 shows a neural network organization for classication of input pattern f = (f1 , , fp ) (p indicates the number of features) into three classes. The input unit is simultaneously distributed to all neurons in the pattern layer. The network is trained by setting the Wp weight vector in one of the pattern units equal to each f = (f1 , , fp ) pattern in the training set. The dot product of the input pattern vector f with a weight vector Wp is calculated and performs a nonlinear operation on Yp = f .Wp [35]. The summation units simply sum the inputs from the pattern units that correspond to the class from which the training pattern was selected and then apply a Bayesian decision rule to calculate the probability density functions for each class. Compared to traditional multilayer perceptron (MLP) networks, our kernel-based method has a simple architecture consisting of two layers of weights in which the rst layer contains the parameters of the kernel functions and the second layer forms linear combinations of the activations of the kernel functions to generate the outputs. A MLP network often has many layers of weights and a complex pattern of connectivity. All the parameters in a MLP network are usually determined at the same time as part of a single global training strategy involving supervised training. Our kernel-based method, however, is typically trained in two stages, with the kernel functions being determined rst using unsupervised techniques on the input data alone and then the second-layer weights subsequently being found by fast linear supervised methods. 4.4 Comparison of appropriate classication methods Most of the standard statistical classication algorithms assume some knowledge of the distribution of the random variables used for classication purposes. Specically, a multivariate normal distribution is frequently assumed, and the training set is used only to estimate the mean vectors and covariance matrix of the populations. This means that large deviations from normalities usually cause a classier to fail. Multimodal distributions cause even the most nonparametric methods to fail. An advantage of neural networks is that they can typically handle even the most complex distributions. Multilayer feedforward networks (MLFNs) have been shown to be robust classiers. On the other hand, there are two main problems with MLFNs. First, there is little knowledge about how they operate and, second, little is known about what behavior is theoretically expected of them. Another major problem with MLFNs is that their training speed can be very slow. The PNN, however, usually trains orders of magnitude faster than MLFNs and classies as well as or better than they do. Its main drawback is that it is slow to classify. However, most important of all for many

applications is that the PNN method can provide mathematically sound condence levels for its decisions. This fact alone has made the PNN a favorite for our investigations. Another major advantage of using a PNN is the way it handles outliers, points that are very dierent from the majority. In fact, outliers will have no real impact on decisions regarding the more frequent cases, yet they will be properly handled if the data are valid. Existence of outliers is an important issue for other neural network models or traditional statistical techniques since they can totally devastate the outcome. As mentioned earlier, it should be emphasized that the outputs of our classier also have a precise interpretation as the posterior probabilities of class membership. The ability to interpret outputs in this way is of central importance in the eective application of classiers, as it may be used for rejecting a test pattern in case of doubt. Thus it would have some performance gains over other methods like k-nearest neighbor or support vector machine. Finally, the PNN technique is strongly based on Bayess method of classication. This means that, provided the true probability density function is known, there is a Bayes optimal decision rule that will minimize the expected cost of misclassication.

5 Experimental result and analysis

Previous work [19] has indicated the need for a careful choice of sample words to allow a good representation of a much larger vocabulary without becoming hopelessly unwieldy. Kassel [18] has discussed the design aspects of such data sets, and sample words used in this research were chosen based on that work in a free space (no guidelines), and no baseline correction techniques have been applied. The style classication technique was applied on our existing data set, which consists of scanned images obtained from 18 writers, each containing 150 words at 200100-dpi resolution. Initially the system is trained on the LegTRn (legible training words), ILegTRn (illegible training words), and MiddleTRn (middle training words) sets containing all 2456 words in the training set. The classication system was then tested with: (1) the same data set: LegTRn, IlegTRn, and MiddleTRn; and 2) a different data set, LegTEn (legible test words), ILegTEn (illegible test words), and MiddleTEn (middle test words). This latter set contained 518 words. Note that the n in the name of the datasets (LegTRn, ILegTRn, MiddleTRn, LegTEn, ILegTEn and MiddleTEn) shows the number of features and TR and TE indicate the training and test sets, respectively. Also note that the x, y, and z-axes in Figs. 35 indicate the number of the segmented sigmas range, sigmas range, and the estimated error in each region, respectively. Sigmas range and error function are shown in the tables under each gure.

66
120

M.E. Dehkordi et al.: Handwriting style classication


0.45 0.4 100 0.35 80 0.3 0.25 60 0.2 40 0.15 0.1 20 0.05 0 0

10

11

12

13

14

15

16

17

18

19

20

Sigmas range 0.01 0.02 0.03 0.04 0.07 0.11 0.18 0.3 0.48 0.78 1.27 2.07 3.36 5.46 8.86 14.4 23.4 37.9 61.6 100 Erro function 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.06 0.06 0.09 0.27 0.39

Number of segmented sigmas ranges

Fig. 14. Error estimation of common for a classication between legible and illegible handwriting using 36 extracted features ( = 5.47436)

5.1 PNN classier using a common for binary classication Tables 35 show the two-class (binary) classication results obtained when using a nonlinear classication (PNN) technique based on the selected values of common . The rst column in these tables shows the samples that were used as the training data set, while the second column shows the samples that were used as a test set. The third column shows the correct classication results obtained when using a nonlinear classication (PNN) technique with common using all of the 36 features. The fourth column shows the average of correct classication results when the system was tested with seen or unseen data, and the average classication result for all with common is given in the last row. Figures 14 16 indicate the estimated error based on sigmas range, and then the best value of is chosen. Tables 35 show that the average classication result is 89.50, 82.50, and 87.75% when classifying between legible/illegible, legible/middle, and illegible/middle word images, respectively, using 36 extracted features with common . The system can also achieve 99.50, 99.50, and 99.50% correct classication when the test set is the same as the training set and 79.50, 65.50, and 76.00% correct classication when the test set is dierent from the training set. 5.2 PNN classier using dierent i for binary classication Tables 68 show the classication results obtained when using a nonlinear classication (PNN) technique with different values of i (i = 1, 2, , 36). The rst column in

these tables shows the samples that were used as the training data set, while the second column shows the correct classication results obtained when using a nonlinear classication (PNN) technique with dierent i values using all 36 features. Tables 6 to 8 show that the overall classication results are 93.00, 82.50, and 95.00% correct classication when classifying legible/middle, illegible/middle, and legible/illegible handwriting word images, respectively. These can be broken down into 99.50, 99.50, and 99.50% correct classication when the test set is the same as the training set and 86.00, 65.50, and 90.50% correct classication when the test set is dierent from the training set. 5.3 Multilinear classication (MDA) for binary classication Tables 911 show the experimental results obtained using all 36 extracted features to discrimnate between legible/illegible, legible/middle, and illegible/middle word images when using the multiple discriminant analysis technique. The rst column shows the samples that were used as the training data set, while the second column shows the samples that were used as a test set. The third column shows the correct classication result. The fourth column shows the average of correct classication results when the system was tested with seen or unseen data. The last row shows the average classication results for all data. The training and test samples are the same as those used in the nonlinear classication experiment. The overall binary classication using 36 features in the MDA technique is 65.50, 63.75, and 61.00% for classication between legible/illegible, legible/middle, and

Estimated error

Sigmas range

M.E. Dehkordi et al.: Handwriting style classication


120 0.5 0.45 100 0.4 0.35 0.3 60 0.25 0.2 40 0.15 0.1 0.05 0 0

67

20

10

11

12

13

14

15

16

17

18

19

20

Sigmas range 0.01 0.02 0.03 0.04 0.07 0.11 0.18 0.3 0.48 0.78 1.27 2.07 3.36 5.46 8.86 14.4 23.4 37.9 61.6 100 Error function 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.11 0.15 0.14 0.17 0.34 0.46

Number of segmented sigmas region


Fig. 15. Error estimation of common for a classication between middle and illegible handwriting using 36 extracted features ( = 0.01368)

Fig. 16. Error estimation of common for a classication between middle and legible handwriting using 36 extracted features ( = 7.11064)

Estimated error

80

Sigmas range

68

M.E. Dehkordi et al.: Handwriting style classication

Table 3. Classication results using 36 extracted features to discriminate between legible and illegible handwriting using common ( = 5.47436) Training set LEGTR36, LEGTR36, LEGTR36, LEGTR36, ILLEGTR36 ILLEGTR36 ILLEGTR36 ILLEGTR36 Test set LEGTR36 ILLEGTR36 LEGTE36 ILLEGTE36 % Correct classication result (common ) 99.00 100.00 69.00 90.00 Overall % Correct average 99.50 79.50 89.50

Table 4. Classication results using 36 extracted features to discriminate between legible and middle handwriting using common( = 7.11064) Training set LEGTR36, LEGTR36, LEGTR36, LEGTR36, MiddleTR36 MiddleTR36 MiddleTR36 MiddleTR36 Test set LEGTR36 MiddleTR36 LEGTE36 MiddleTE36 % Correct classication result (common ) 100.00 99.00 81.00 50.00 Overall % Correct average 99.50 65.50 82.50

Table 5. Classication results using 36 extracted features to discriminant between middle and illegible handwriting using common ( = 0.01386) Training set MiddleTR36, MiddleTR36, MiddleTR36, MiddleTR36, ILLEGTR36 ILLEGTR36 ILLEGTR36 ILLEGTR36 Test set MiddleTR36 ILLEGTR36 MiddleTE36 ILLEGTE36 % Correct classication result (common ) 99.00 100.00 52.00 100.00 Overall % Correct average 99.50 76.00 87.75

Table 6. Classication result using 36 extracted features to discriminate between illegible and legible handwriting using dierent i Training set LEGTR36, LEGTR36, LEGTR36, LEGTR36, ILLEGTR36 ILLEGTR36 ILLEGTR36 ILLEGTR36 Test set LEGTR36 ILLEGTR36 LEGTE36 ILLEGTE36 % Correct classication (dierenti ) 99.00 100.00 90.00 83 Overall % Correct average 99.50 86.50 93.00

Table 7. Classication results using 36 extracted features to discriminate between middle and legible handwriting using dierent i Training set LEGTR36, LEGTR36, LEGTR36, LEGTR36, MiddleTR36 MiddleTR36 MiddleTR36 MiddleTR36 Test set LEGTR36 MiddleTR36 LEGTE36 MiddleTE36 % Correct classication (dierent i ) 100.00 99.00 81.00 50.00 Overall % Correct average 99.50 65.50 82.50

M.E. Dehkordi et al.: Handwriting style classication Table 8. Classication results using 36 extracted features to discriminate between middle and illegible handwriting using dierent i Training set MiddleTR36, MiddleTR36, MiddleTR36, MiddleTR36, ILLEGTR36 ILLEGTR36 ILLEGTR36 ILLEGTR36 Test set MiddleTR36 ILLEGTR36 MiddleTE36 ILLEGTE36 % Correct classication (dierent i ) 99.00 100.00 98.00 83.00 Overall % Correct average 99.50 90.50 95.00

69

Table 9. Classication results using 36 features to discriminate between legible and illegible Training set LEGTR36, LEGTR36, LEGTR36, LEGTR36, ILLEGTR36 ILLEGTR36 ILLEGTR36 ILLEGTR36 Test set LEGTR36 ILLEGTR36 LEGTE36 ILLEGTE36 % Correct classication MDA 78.00 63.00 67.00 54.00 Overall % Correct average 70.50 60.50 65.50

Table 10. Classication results using 36 features to discriminate between legible and middle Training set LEGTR36, LEGTR36, LEGTR36, LEGTR36, MiddleTR36 MiddleTR36 MiddleTR36 MiddleTR36 Test set LEGTR36 MiddleTR36 LEGTE36 MiddleTE36 % Correct classication MDA 70.00 58.00 57.00 70.00 Overall % Correct average 64.00 63.50 63.75

illegible/middle words. This can be broken down into 70.50, 64.00, and 64.50% correct classication when the test set is the same as the training set and 60.5, 63.50, and 57.50% correct classication when the training set is dierent from the test set.

5.4 Comparison between the linear and nonlinear method for binary classication Tables 12 and 13 summarize the experimental results obtained when using all 36 extracted features using the PNN technique with common , dierent i , and MDA technique. The experimental results given in Tables 12 and 13 show that the PNN technique achieved an improvement of 26.00%, 2.00%, and 33% using dierent i and an improvement of 19.00%, 2.00%, and 18.50% using common when compared with the MDA technique for classication between legible/illegible, legible/middle, and illegible/middle words, respectively, where the test set is different from the training set. In the case where the training set is the same as the test set, the PNN technique achieved an improvement of 29.00, 35.50, and 35.00% using dierent i and an improvement of 29.00, 35.50,

and 35.00% using common as compared with the MDA technique for classication between legible/illegible, legible/middle, and illegible/middle words, respectively. The results given in Table 12 show that when the training set is the same as the test set, there is no dierence in classication rate between using dierent i values and a common value. However, Table 13 shows that while using a dierent i rather than a common has no eect on the classication between legible/middle, it does give an improvement of 7.00 and 14.50% for classication between legible/illegible and illegible/middle when the test set is dierent from the training set.

5.5 Triple classication using common Table 14 gives the results for the three-class data sets. The rst column shows the samples used as the training data set, while the second column shows the samples used as the test set. The third column shows the correct classication results obtained when using the nonlinear classication technique with common using all 36 features. The fourth, fth, sixth, and seventh columns show the misclassication results in each category and the average classication results for seen and unseen data. The

70

M.E. Dehkordi et al.: Handwriting style classication

Table 11. Classication results using 36 features to discriminate between middle and illegible Training set Test set % Correct classication MDA 66.00 63.00 56.00 59.00 Overall % Correct average 64.50 57.50 61.00

MiddleTR36,ILLEGTR36 MiddleTR36,ILLEGTR36 MiddleTR36,ILLEGTR36 MiddleTR36,ILLEGTR36

MiddleTR36 ILLEGTR36 MiddleTR36 ILLEGTR36

Table 12. Comparison of classication results with (1) PNN using dierent i , (ii) PNN using common , and (iii) MDA technique when the training set is the same as the test set Training set is same as test set Legible/Illegible Di. i Com MDA Illegible/Middle Di. i Com MDA 99.50% 99.50% 64.50% Middle/Legible Di. i Com MDA 99.50% 99.50% 64.00% Overall Di. i Com MDA 99.5% 99.50% 66.33%

36 extracted features

99.50% 99.50% 70.50%

Table 13. Comparison of classication results with (i) PNN using dierent i , (ii) PNN using common , and (iii) MDA techniques when the training set is dierent from the test set Training set is dierent from test set Legible/Illegible Di. i Com MDA Illegible/Middle Di. i Com MDA 90.5% 76.00% 57.50% Middle/Legible Di. i Com MDA 65.50% 65.50% 63.50% Overall Di. i Com MDA 80.83% 73.67% 60.50%

36 extracted features

86.50% 79.50% 60.50%

Table 14. Classication results using 36 features to discriminate between legible, illegible, and middle handwriting word images using common ( = 0.001) %Misclassication words Training sets Test sets % Correct nonlinear (PNN) 100.00 100.00 99.00 72.00 83.00 47.00 As legible As illegible As middle % Correct average 99.67

LEGTR36, ILLEGTR36, LEGTR36, ILLEGTR36, LEGTR36, ILLEGTR20, LEGTR36, ILLEGTR36, LEGTR36, ILLEGTR36, LEGTR36, ILLEGTR36,

LEGTR36 MiddleTR36 ILLEGTR36 MiddleTR36 MiddleTR36 MiddleTR36 LEGTE36 MiddleTR36 ILLEGTE36 MiddleTR36 MiddleTE36 MiddleTR36

0 1.00 17.00 51.00

0 0 10.00 2.00

0 18.00 0 Overall

67.33

83.50

M.E. Dehkordi et al.: Handwriting style classication


25 0.45 0.4 20 0.35 0.3 15 0.25 0.2 10 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 Sigmas Range Estimated Error 0

71

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 3 4 5 6 8 101215 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Number of segmented sigmas region


Fig. 17. Error estimation of common for a classication between legible, illegible, and middle handwriting using 36 extracted features ( = 0.001) Table 15. Classication results using 36 extracted features to discriminate between legible, illegible, and middle handwriting word images using dierent i %Misclassication words Training sets Test sets % Correct nonlinear (PNN) 100.00 99.00 99.00 72.00 83.00 47.00 As legible As illegible As middle % Correct average 99.33

LEGTR36, ILLEGTR36, MiddleTR36 LEGTR36, ILLEGTR36, MiddleTR36 LEGTR36, ILLEGTR20, MiddleTR36 LEGTR36, ILLEGTR36, MiddleTR36 LEGTR36, ILLEGTR36, MiddleTR36 LEGTR36, ILLEGTR36, MiddleTR36

LEGTR36 ILLEGTR36 MiddleTR36 LEGTE36 ILLEGTE36 MiddleTE36

2.00 0.60 17.00 51.00

0 0.40 10.00 2.00

0 0 18.00 0 Overall

67.33

83.33

last row shows the overall classication results for all with common . For the three-class style classication the best common value is 0.001. The details are shown in Fig. 17. The experimental results given in Table 14 show that a classier based on the PNN using a common value of 0.001 can achieve an overall correct style classication of 67.33% when the test set is dierent from the training set. The system can also be seen to achieve a 99.67% correct classication when the test set is the same as the training set. This gives an overall correct classication of 83.50% for the three classes.

5.6 Triple classication using dierent i The best values of dierent i obtained for each legible, illegible, and middle classication with an error rate of 0.21840 calculates as 0.000889, 0.000931, and 0.001260 in legible, illegible, and middle classes, respectively. Experimental results using these dierent i values are given in the following table. Table 15 shows that the PNN classier using dierent i values achieves 67.33% correct classication when the test set is dierent from the training set and 99.33% correct classication when the test set is the same as the

Estimated Error

Sigmas Range

72

M.E. Dehkordi et al.: Handwriting style classication

training set. This gives an overall 83.33% correct classication.

6 Conclusion and future work This paper has introduced a novel handwriting legibility classication system that can be used to predict the recognition performance of a recognizer for a given handwriting style in order to choose the best recognizer. Thirty-six features are extracted and two methods for style classication of the word images are described (MDA and PNN), and a comparison between these two methods are presented. Experimental results show that some of the features have a more signicant inuence on classication results than the others. However, experiments also show that all the features used in this research play some role and are deemed necessary for successful classication. Indeed a signicant reduction of feature vectors leads to a much less eective classication [11]. As the size and quality of writing is important in these experiments, some of the features are not extracted correctly, resulting in misclassication. It is therefore suggested that further examination of the selected features be considered. One possible candidate is fractals. Fractal features may provide useful information for discriminating between legible/illegible/middle handwriting word images. These features have been useful for classifying the regularity in handwriting as well as size of writing [2]. Experimental results using MDA and PNN techniques (using dierent i values and a common value) show that in the case of legible/illegible and illegible/middle, the PNN technique using dierent i values gives the superior result as compared with using the PNN with common and the MDA technique. However, in the case of middle/legible classication, the PNN techniques using common and dierent i values give the same classication result. Therefore, the PNN technique using dierent i values is the best classier. As the PNN in classication between two classes gives superior results in comparison with the MDA, for the time being we use PNN for triple classication; no experiments were carried out for the triple classication with the MDA technique. Experimental results show that those words that were correctly classied using the MDA technique were equally correctly classied using PNN. However, those words that were misclassied or closely classied by PNN were correctly classied using MDA. The Parzen model, used for density estimation in the PNN system, has the same number of kernels and data points. This leads to models that can be slow to evaluate new input vectors, especially when the number of training data points is very large. One way to tackle this problem is to use a clustering technique such as fuzzy clustering to reduce the number of data points prior to PNN. The center of each cluster can be used as a center for each kernel, thereby greatly increasing the classication speed.

Faced with signicant style variation of handwriting it is more likely that style-specic classiers will yield higher classication accuracy than the generalized classiers. Therefore, the next stage of our work would be to use the preclassier to route a given data sample to a recognizer that is deemed more suitable to the style of the sample. Work in this area so far has concentrated on a small subset of style classication. The result of our initial experiments in applying the described techniques to determine a writing style has been encouraging. Given the strengths of this approach in matching a handwriting sample to a given recognizer, it would be possible to characterize each recognizer in a multiexpert system based on individual performance and then route a data sample to the appropriate recognizer. Further investigation to determine how eectively we can identify a writer will be needed. It is a fact that intrawriter style variation is also a problem [19] that leads to signicant user frustration, aecting success of todays online applications such as PDAs. It would be interesting to see whether there is any scope in treating intrawriter style variation in a similar way. These classication methods can also be applied to identifying symbol types, such as digit and punctuation and lowercase and uppercase letters, for further work [16]. For example, separation of digits and uppercase and lowercase characters or words is an important task in document layout.

References
1. Coates AL, Baird HS, Fateman RJ (2001) Pessimal print: a reverse Turing test. ICDAR 01, pp 11541158 2. Bouletreau V, Vincent N, Sabourin R, Emptoz H (1997) Synthetic parameters for handwriting classication. Proceedings of ICDAR97, pp 102106 3. Bozinovic RM, Srihari SN (1989) O-line cursive script word recognition. IEEE Transactions, PAMI 11(1):6883 4. Camastra F, Vinciarelli A (2001) Cursive character recognition by learning vector quantization. Pattern Recogn Lett 22(67):625629 5. Cha S-H, Srihari SN (2001) A priori algorithm for subcategory classication analysis of handwriting. Proceedings of ICDAR01, pp 10221025 6. Chien CH, Aggarwal JK (1998) Model construction and shape recognition from occluding contours. IEEE Transactions, PAMI 11(4):372389 7. Ding Y, Kimura F, Miyake Y, Shridhar M (1999) Evaluation and improvement of slant estimation for handwriting words. Proceedings of ICDAR99, pp 753756 8. Dehkordi ME, Sherkat N, Whitrow RJ (1999) A principal component approach to classication of handwritten words. Proceedings of ICDAR99, pp 781784 9. Dehkordi ME, Sherkat N, Whitrow RJ (1999) Classication of o-line handwritten words into upper and lower cases. IEE Colloquium document image processing and multimedia, London, March 1999, pp 8/18/4 1999. 10. Dehkordi ME, Sherkat N, Allen T (2000) Case classication of o-line handwritten words prior to recognition. Fourth IAPR international workshop on document analysis systems DAS2000, pp 325334

M.E. Dehkordi et al.: Handwriting style classication 11. Dehkordi ME, Sherkat N, Allen T (2001) Prediction of handwriting legibility. Proceedings of ICDAR01, pp 9971000 12. Freeman H (1961) On the encoding of arbitrary geometric conguration. IRE Transactions on electronic computers, EC-10(2):260268 13. Fukunaga K, Hayes R (1989) Estimation of classier performance, IEEE Transactions, PAMI 11:10871097 14. Gonzalez RC, Wood RE (1993) Digital image processing. Addison-Wesley, Reading, MA 15. Hamanaka M, Yamada K (2000) On-line character recognition adaptively controlled by handwriting quality. Proceedings of 7th international workshop on frontiers in handwriting recognition (IWFHR2000), pp 3342 16. Ho TK, Nagy G (2001) Exploration of contextual constraints for character pre-classication. Proceedings of ICDAR01, pp 450454 17. Hu J, Yu D, Yan H (1999) Construction of partitioning paths for touching handwritten characters. Pattern Recogn Lett 20(3):293303 18. Impedovo S, Ottaviano L, Occhinero S (1991) Optical character recognition: a survey. Int J Pattern Recogn Articial Intell 5(2):124 19. Jedrzejewski MS (1997) Automatic characterisation of handwriting style. MPhil thesis, Department of Computing, Nottingham Trent University 20. Jung M, Shin Y, Srihari SN (1999) Multifont classication using typographical attributes. Proceedings of ICDAR99, pp 353356 21. Kassel RH (1995) A comparison of approaches to on-line handwritten character recognition. Doctoral dissertation, Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 22. Leedham G (1994) Historical perspective of handwriting recognition system, IEE colloquium on handwriting and pen-based input. (Digest No. 1994/065), London, pp 1/13 23. Li X, Hall NS (1993) Corner detection and shape classication of on-line hand printed Kanji strokes. Pattern Recogn 26(9):13151334 24. Loncaric S (1998) A survey of shape analysis techniques. Pattern Recogn 31(8):983-1001 25. Madhvanath S, Govindaraju V (2001) The role of holistic paradigms in handwritten word recogniton. IEEE Transactions, PAMI 23(2):149165 26. Mori S, Suen CY, Yamamoto K (1991) Historical review of OCR research and development. Proceedings of IEEE, 80(7):10291058 27. Nicchiotti G, Scagliola C (1999) Generalised projections: a tool for cursive handwriting normalisation. ICDAR99, pp 729732 28. Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:10651076 29. Powalka RK, Sherkat N, Whitrow RJ (1995) Recognizer characterisation for combining handwriting recognition. Proceedings of ICDAR95, 1:6873 30. Powalka RK, Sherkat N, Whitrow RJ (1995) Multiple recognition combination topologies. Proceedings of 7th biennial conference of the International Graphonomics Society, London, Ontario, Canada, 610 August 1995, pp 128129 31. Ripley BD (1997) Pattern recognition and neural networks. Cambridge Publishers, Cambridge, UK

73 32. Schioler H, Hartmann U (1992) Mapping neural network derived from the Parzen window estimator. Neural Netw 5(6):903909 33. Schomaker L, Aabbink G, Selen S (1994) Writer and writing-style classication in recognition of on-line handwriting. Proceedings of the European workshop on handwriting analysis and recognition, IEE (ISSN 0963-3308), London 34. Sherkat N, Allen TJ (1999) Whole word recognition in facsimile images. Proceedings of ICDAR99, pp 547550 35. Specht DF (1990) Probabilistic neural networks and the polynomial adaline as complementary techniques for classication. IEEE Transactions on networks, pp 111 121 36. Specht DF, Shapiro PD (1991) Generalization accuracy of probabilistic neural networks compared with backpropagation networks. Lockheed Missile & Space Co., Inc. Independent research project RDD 360, I-887-I-892 37. Srihari SN, Sung HC, Arora H, Lee S (2001) Individuality of handwriting: a validation study. Proceedings of ICDAR01, 11951204 38. Wang J, Yan H (1999) Mending broken handwriting with a macrostructure analysis method to improve recognition. Pattern Recogn Lett 20(8):855864 39. Webb A (1999) Statistical pattern recognition. Arnold, London 40. Velek O, Jaeger S, Nakagawa M (2002) A new warping technique for normalizing likelihood of multiple classiers and its eectiveness in combined on-line/oline Japanese character recognition. Proceedings of IWFHR08, pp 177181 41. Gnter S, Bunke H (2002) Creation of classier ensemu bles for handwritten word recognition using feature selection algorithms. Proceedings of IWFHR08, pp 183187 42. Van Erp M, Vuurpijl L, Schomaker L (2002) An overview and comparison of voting methods for pattern recognition. Proceedings of IWFHR08, pp 195200 43. Vinciarelli A, Bengio S (2002) Writer adaptation techniques in o-line cursive word recognition. Proceedings of IWFHR08, pp 287291 Mandana Ebadian Dehkordi received her B.Sc. in applied mathematics from Isfahan University of Technology in Iran in 1988. She has recently completed a Ph.D thesis in handwriting recognition at the Nottingham Trent University in England. Her current research interests include oine recognition and classication of unconstrained cursive script handwriting, shape description, feature extraction, data reduction, articial neural networks, and pattern recognition. She also works as parttime researcher in Medicsight Plc UK, where she is developing several classication techniques to detect abnormality in medical images. As a result of her research, she has published a number of scientic papers.

74 Nasser Sherkat received a B.Sc honors degree in mechanical engineering from the University of Nottingham in 1985. In 1989, he received his Ph.D. in high-speed geometric processing for continuous path generation from the Nottingham Trent University . He is now a reader in computing in the School of Computing and Mathematics at The Nottingham Trent University. His interests include cursive handwriting and poor-quality optical character recognition and image-recognition algorithms. Dr. Sherkat is the leader of the Intelligent Recognition and Interactive Systems (IRIS) group of the School of Computing. He is also a director of Axiomatic Technology Limited.

M.E. Dehkordi et al.: Handwriting style classication Tony Allen received a rst-class honors degree in physics/chemistry from the Open University in 1990. Subsequent to this, he studied part time at the University of Nottingham, where he received an MSc in electronic engineering in 1992 and a PhD in optoelectronic neural networks in 1997. Dr. Allen has worked as an engineer with British Telecom PLC and as a lecturer at the Peoples College, Nottingham. Currently he is a senior lecturer in the School of Computing and Mathematics at the Nottingham Trent University, where his research interests include collaborative and distributed neural network systems.

You might also like