0% found this document useful (0 votes)
64 views10 pages

Mohammad S Khorsheed William F Clocksin: BMVC99

Uploaded by

Farouk Ampatuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views10 pages

Mohammad S Khorsheed William F Clocksin: BMVC99

Uploaded by

Farouk Ampatuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

BMVC99

Structural Features Of Cursive


Arabic Script
Mohammad S Khorsheed William F Clocksin
[email protected] [email protected]

Computer Laboratory, University of Cambridge, New Museum Site


Cambridge CB2 3QG, United Kingdom

Abstract
We present a technique for extracting structural features from cursive
Arabic script. After preprocessing, the skeleton of the binary word image is
decomposed into a number of segments in a certain order. Each segment is
transformed into a feature vector. The target features are the curvature of
the segment, its length relative to other segment lengths of the same word,
the position of the segment relative to the centroid of the skeleton, and
detailed description of curved segments. The result of this method is used
to train the Hidden Markov Model to perform the recognition.

1 Introduction
The Arabic alphabet, as show in Table 1, is commonly used for writing many
widespread languages (e.g. Arabic, Urdu, Persian), yet there is much less research
in progress for recognising Arabic text than there is for Roman and Chinese text.
One factor accounting for this is the characteristics of the Arabic alphabet which
oblige researchers to examine some diculties only recently being addressed by
researchers on other languages [1]. Among these characteristics is the cursiveness
of the script even in the machine-printed form.
To measure the performance of any Arabic text recognition system, we need
to assess how successful the system is in overcoming the obstacles of cursiveness
and context-sensitivity. The conventional approach is to segment the words into
either characters [3, 4, 7] or symbols [2, 9, 6]. The rst type of segmentation is
the cause of recognition errors, and hence a low recognition rate. The second
type is where a sub-word, or a primitive, is segmented into symbols where each
symbol may represent a character, a ligature, or possibly a fraction of a character.
The advantage of the second type over the rst is that it is easier to nd a set of
potential connection points, than to nd the actual connection points directly.
In this paper, we present another approach where the word is recognised as a
single unit. This depends highly on a prede ned lexicon which acts as a look-up
dictionary. The procedure is to extract all segments from the skeleton of the word

422

BMVC 1999 doi:10.5244/C.13.42


British Machine Vision Conference 2
BMVC99
Char Iso Ini Mid End Char Iso Ini Mid End
a b

@ @ A A H K. J.  I
. .

t 
H

K
;t

J 

I

H

K

J

I

g ` c h:
b a h k j i

^
h p s d
r q X X Y Y

;d r

X X Y Y P P Q Q

z R R Ss S € ƒ ‚ 

s ˆ ‹ s:
Š ‰  “ ’ ‘

d: ˜ › št: ™   £ ¢ ¡

z: ¤ § ¦ ` ¥ ¨ « ª ©
:
g
¨

« f

ª

© ò



® ó

q ô

¯ k

®  õ ¼ » º ½

l È Ë m
Ê É Ð Ó Ò Ñ

n
à

K h

J 
á è ë ê é

w ð ð y
ñ ñ ø

K

J

 ù

Table 1: Arabic alphabet.

to be recognised, then transform each segment into a feature vector. Using Vec-
tor Quantisation (VQ) [8], each feature vector is mapped to the closest symbol
in the codebook. This results a sequence of observations that is presented to a
Hidden Markov Model (HMM) [11 ]. A number of reasons motivate the proposed
technique. First, we wish to dispense with segmenting words into characters or
other primitives. Next, extracting segments from the skeleton graph is more reli-
able than nding the actual connection points in the word. Finally, the extracted
features are shape descriptors of the skeleton graph, so they provide a compromise
between powerful discrimination and ecient extraction.

2 Preprocessing
The image of the word to be recognised is introduced to the system as a matrix of
black pixels (foreground) and white pixels (background). Two preprocessing steps
are then performed in sequence: thinning and centroid calculation.
The thinning method is based on Stentiford's algorithm [10]. The aim is to
remove boundary pixels of the character that neither are essential for preserving
the connectivity of the pattern nor do they represent any signi cant geometrical
feature of the pattern. The process converges when the connected skeleton does
not change or vanish even if the iteration process continues.
The purpose of centroid calculation is to nd a reference point relative to which
all segment locations are de ned.

423
British Machine Vision Conference 3
BMVC99
a!b
d b ! c a!b
b ! e b ! e
c!d
a
c
c ! e c!d
e!f
e b
f
e!f
Figure 1: segment and loop extraction. The segments: b ! c, b ! e and c ! e
are merged to form the complex loop b ! e.

3 Feature Extraction
This stage transforms the word to be recognised into a sequence of feature vectors.
This is done in three consecutive steps: segment extraction, loop extraction and
segment transformation.
3.1 segmen t Extraction
The skeleton graph of a word image consists of a number of segments where each
segment starts and ends with a feature point. A feature point is a black pixel
in the image which has a number of transitions from black to white pixels in the
surrounding 3  3 window equals to: one (end point ), three (branch point ), or four
(cross point ).
For the machine-printed fonts used in this paper, cross points appear only in
the skeleton of a word if this word has the letter (w, ) in the middle ( ) or at
ð X ñªK

the end ( ). However, these actually consist of three branches connected to the
ñ« X

feature point: a loop and two other branches. This de nes the cross point as a
branch point and consequently leads us to say that: a segment may be incident
between two end points, an end point and a branch point, a branch point and an
end point, or two branch points.
An important requirement for segment extraction is to ensure that segments
are assigned a canonical order, so that the observation sequence for the HMM is
well de ned. The rule is: all segments are listed in descending order relative to
the horizontal value of the starting feature point of that segment.
3.2 Loop Extraction
During segment extraction, the skeleton is checked for loops, as seen in Fig.1. A
number of Arabic letters have a loop-like shape. These loops can be divided into
three categories: a simple loop, e.g. ò , consists of a single segment that
ô Ó

starts from a feature point and returns to the same point again. A complex loop,
e.g.¢ Ò ‘, either consists of two segments, both having the same starting and
nishing feature points, or three segments. A double loop, e.g. , consists of two
ê ë

connected simple and/or complex loops.

424
British Machine Vision Conference 4
BMVC99
3.3 segment Transformation
Having extracted segments and loops from the skeleton graph of the word im-
age, each segment is now transformed into an 8-dimensional feature vector. Each
feature has the following description:
1. Normalised length feature (f1): This feature gives the length of a segment
relative to other segment lengths in the same word. It is calculated as follows:
f1 = ActualLength ; ELmin (1)
ELmax ; ELmin
where ELmin and ELmax are the minimum and the maxim um segment
lengths for that word, respectively. This feature tolerate font size and rota-
tion.
2. Curvature feature (f2): This feature measures the curvature of a segment.
Simply divide the Euclidean distance between the two feature points of that
segment by its actual length. This feature equals zero when the segment is a
loop, and 1 when the segment is a straight line. Although this feature does
not measure the curvature precisely ( and  have the same value), it is
useful when combined with other features.
3. Endpoint feature (f3): This feature de nes the two endpoints of the segment.
It has one of the following values:
Value Description
0 end point ! end point
1 end point ! branch point
2 branch point ! end point
3 branch point ! branch point
This feature can distinguish between similar segments belonging to di erent
characters, e.g. the curve in and the last part of they are almost identical

à €

except that f3 equals zero and two, respectively.


4. Relative location feature (f4): This binary feature indicates whether the
starting feature point of a segment falls above/below the centroid of the
skeleton and shows this by one/zero. This feature helps deciding whether a
dot is above or below the character which is considered a crucial decision,
e.g. and .

J J

5. Curved features (f5 ; f8): These features calculate the percentage of pixels
above the top feature point, below the bottom feature point, left of the left-
most feature point and right of the right-most feature point of that segment.
The importance of these features are noticed when considering character
such as and .
ø

425
British Machine Vision Conference 5
BMVC99

Letter

Letter Letter

Figure 2: HMM topology .

4 Hidden Markov Models


4.1 De nition
The proposed method is based on a Hidden Mark ov Model [11]. An HMM ma y
be represented by the parameter (; A; B), where :
 = fi = P(qi at t = 1)g, initial state probability.
A = faij = P(qj at t + 1jqi at t)g, state transition probability.
B = fbi(k) = P(vk at tjqi at t)g, observation symbol probability in state i.
T = length of observation sequence.
N = number of states in the model.
M = number of observation symbols.
Q = fqig 1  i  N, states.
V = fvig 1  i  M, discrete set of possible symbol observations.

4.2 HMM Implementation


In this paper, we build only one model for all the words in the lexicon and use
di erent paths, state sequences, to distinguish one pattern from the others. A
pattern is classi ed to the word which has the maximum path probability over
all possible paths. This approach is called path discriminant HMM [5 ]. A state
may signify only one segment, and this segment represents a complete character
( ), a fraction of a complete character, or touching characters ( ). Accordingly,

X A¿

we did not use any optimisation criterion, such as the maximum likelihood (ML)
[11]. The reason is that optimisation criteria produce a better model but do not
preserve the correspondence of the states to individual characters which yields a
lower recognition rate.
The HMM is formed from ergodic, fully connected, elemen tary units, as shown
in Fig.2 . Each elementary unit represents a letter and is structured as a left-to-
right HMM. As previously men tioned, a letter is decomposed into a number of

426
British Machine Vision Conference 6
BMVC99
segments. This number determines the number of states in an elementary unit.
An important step to put the HMM in practice is to estimate the model's
parameters, but rst we directly derive the dictionary statistics from the lexicon
No. of words starting with letter
Initial = Total (2)
number of words in the lexicon
Trans ! = No. of transitions from letter to letter
Total number of transitions from letter (3)
The initial state probability is computed as

i = Initial If i is the 1st state in the elementary unit
0 Otherwise
The state transition probability is calculated as
8
>
> Trans ! i is the last state in letter
< & j is the 1st state in letter
aij = > 0 i & j are middle states in di erent letters
>
> 0 i & j are in the same letter, ij
: P (qj at t+1j qi at t) i & j are in the same letter, i<j
In our implementation, we use VQ to generate the symbol probabilities. VQ
partitions the training samples into several classes in the Euclidean space using
the K-means clustering algorithm. The sym bol probability can be calculated as
bi (k) = No. of times in state i and observing symbol vk
Total number of times in state i (4)
Each word is represented by at least one path through the HMM. There are
several possible ways to nd the optimal state sequence associated with the given
observation sequence. W e use a modi ed form of the Viterbi algorithm [11] for
nding the optimal path and some near-optimal paths. This procedure is stated
below:
1. Initialisation 1iN
1 (i) = ibi (o1 ) (5)
1 (i) = 0 (6)
2. Recursion 1  i  N; 2  t  T
t (i) = 1max [ (j)aji ]bi (ot)
j N t;1
(7)
t(i) = arg 1 max [ (j)aji ]
j N t;1
(8)

3. Termination
P  = 1max [ (j)]
j N T
(9)
qT = arg 1max [ (j)]
j N T
(10)

427
British Machine Vision Conference 7
BMVC99
Font type Percentage when recognised Percentage when recognised
as the 1st choice among the best 5 choices
Simpli ed Arabic 74% 97%
Traditional Arabic 68% 94%
Arabic Transparent 72% 95%
Table 2: System recognition rate.

4. Path Backtracking T ;1t1


qt = t+1(qt+1 ) (11)
For the sake of word recognition application, a considerable improvement can
be made if more than just the rst optimal path is reco vered. The approach is to
extend the  and to another dimension which represents the choice W . Assume
the model in state j at instance t then all the possible t;1(i; w) are considered
and the W best paths are recorded in t (j; w) with their probabilities in t (j; w)
where w = 1; 2; :::; W.

5 Experimental Results
Images of the text were captured using a scanner with a resolution of 300 dpi. Each
image passed the four-step processing sequence to be transformed into a sequence
of observations. First, the thinning algorithm was applied to obtain the skeleton of
the word, and the centroid of the skeleton was calculated. Secondly, segments were
extracted from the skeleton graph in descending order relative to the horizontal
value of the starting feature point. Then, each segment was transformed into an
8-dimensional feature vector. Thirdly, if two or more segments formed a loop then
those segments were merged together in a single feature vector to be assigned a
curvature value (f2 ) zero. Finally, VQ algorithm was used to form a codebook.
This was done by partitioning the training samples into several classes. Each class
was then represented by its class centroid. Each codebook symbol represented
one class. The codebook included a total number of 76 symbols. Fig.3 shows an
original image and the results of the rst three steps of the processing sequence.
Table 2 shows the recognition rate of the proposed system.
To assess the performance of the proposed method an HMM w as trained using a
294-word lexicon. The samples were printed using three di erent fonts: Simpli ed
Arabic, Traditional Arabic and Arabic Transparent. Samples used for training
were not used during recognition. Table 3 shows the system output of three
di erent samples representing the same word . The observation sequence for

ô C£
C @

each sample di ers from the others. Where the system output sho ws the same
word more than once, this means the same w ord was recognised by a di erent
path through the HMM. The HMM was not always able to list the correct word
among the best ve paths. An example of such a case may be seen in Table 4, in
which a dot is missing owing to a problem with thinning. Sometimes, the HMM

428
British Machine Vision Conference 8
BMVC99

(a) Original (b) Thinned

(c) segment Extraction (d) Loop Extraction

Figure 3: The processing sequence of an image word. The above image is transfered
into the following observation sequence: 15, 24, 2, 4, 1, 4, 2, 10, 2, 1, 30, 47, 15,
31.
Font type Obs. sequence System Output P (jO)
Simpli ed Arabic 6:30  10;13

ô C£
C @

1:67  10;15

5, 41, 19, 22, 1, 36, 2 
è C£
C @

7:60  10;19

ô C£
C @

2:10  10;19

41, 22, 22, 17, 1, 66 ô C£
C @

1:38  10;19

ô C£
C @

Arabic Transparent 6:20  10;12



ô C£
C @

4:12  10;15

5, 42, 13, 22, 1, 36, 2 
è C£
C @

1:87  10;18

ô C£
C @

8:44  10;19

19, 22, 22, 21, 1, 66 ô C£
C @

6:68  10;19

ô C£
C @

Traditional Arabic 8:00  10;22




õK A£ @

1:74  10;22

6, 36, 19, 22, 1, 19, 2 
è C£
C @

7:30  10;23

ô C£
C @

6:76  10;23

19, 2, 48, 22, 6, 1, 68
õK A£ @


ô AK Z   @

1:94  10;23
Table 3: System output of three di erent samples of the word . Observation

ô C£
C @

sequences are ordered from left-to-right.

429
British Machine Vision Conference 9
BMVC99
Word System Output P(jO)

AÓAÜß

AÓ AÜß 1:99  10;10

AÓ AÜß 1:25  10;10

AÓ AÜß 3:12  10;11

AÓ AÜß 1:96  10;11

Ð AÜß 4:34  10;13
Table 4: An example of a word which was not recognised correctly.

Word System Output P(jO)


 
éK

YªK
 
ñK @ X Z H 3:58  10;18
 
éK

YªK 1:67  10;18
 
AJ. K @ X Z H 3:98  10;19
 
éK

YªK 1:16  10;19
 
éK @ X Z H 9:61  10;21
Table 5: System output of an example word.

throws up a sequence which was not included in the lexicon, as shown in Table 5.
The rst path is not in the lexicon and it is not even an Arabic word. So,
 
ñK X Z H

the second path which is in the lexicon was considered as the rst option.
 
éK Y ªK

The last result concerns generalisation. Although it is dicult to predict in


advance which untrained words the HMM will recognise, we have found a number
of words recognised by the HMM, but w ere not among the training set nor the
lexicon. An example word is shown in Table 6, and other words include , 
˜ @QªJƒ @


, , , , and
h @Q ¯ @


Qå”.
áK.

.
Z AîE.

ÉÓ A«

Word System Output P(jO)



H ñ® K
.

H ñ® K
.
9:34  10;11
 
H ñ® K 1:56  10;11
 
H ñ® K 1:07  10;12

R ñ® K 1:26  10;13

X ñ® K 7:17  10;14
Table 6: An example of a word which was recognised without a previous training.

430
British Machine Vision Conference 10
BMVC99
6 Conclusion
We have proposed a technique for recognising Arabic words from digitised scans of
script. The technique does not rely on segmentation into characters, but instead
converts the skeletonised script into an observation sequence suitable for an HMM
recogniser. A word model was trained from a 294-word lexicon acquired from a
variety of script sources, and recognition rates of up to 97% were achieved. Future
work will focus on increasing the number of fonts that can be recognised by the
system.

References
[1] B. Al-Badr and S. Mahmoud. Surv ey and bibliography of arabic optical text
recognition. Signal Processing, 41:49{77, 1995.
[2] H. Al-Muallim and S Yamaguchi. A method of recognition of arabic cursive
handwriting. IEEE Trans. on Pattern Analysis and Machine Intelligence,
9(5):715{722, 1987.
[3] H. Al-Youse and S. Upda. Recognition of arabic characters. IEEE Trans.
on Pattern Analysis and Machine Intelligence, 14(8):853{857, 1992.
[4] A. Amin and J. Mari. Mac hine recognition and correction of printed arabic
text. IEEE Trans. on Systems, Man, and Cybernetics, 19(5):1300{1306, 1989.
[5] M. Chen, A. Kundu, and J. Zhou. O -line handwritten word recognition
using a hmm type stochastic network. IEEE Trans. on Pattern Analysis and
Machine Intelligence, 16(5):481{496, 1994.
[6] M. Fehri and M. Ben Ahmed. A new approac h to arabic character recognition
in multi-font document. In The 4th International Conference and Exhibition
on Multi-Lingual Computing. Cambridge University Press, 1994.
[7] H. Goraine and M. Usher. Printed arabic text recognition. In The 4th Inter-
national Conference and Exhibition on Multi-Lingual Computing. Cambridge
University Press, 1994.
[8] R. M. Gray. Vector quantization. IEEE ASSP Magazine, 1:4{29, 1989.
[9] K. Hassibi. Machine-printed arabic ocr using neural networks. In The 4th
International Conference and Exhibition on Multi-Lingual Computing. Cam-
bridge University Press, 1994.
[10] J. R. Parker. Algorithms For Image Processing and Computer Vision. John
Wiley and Sons, Inc., 1997.
[11] L. Rabiner and B. Juang. An introduction to hidden markov models. IEEE
ASSP Magazine, pages 4{16, January 1986.

431

You might also like