0% found this document useful (0 votes)
55 views4 pages

Thinning Algorithms Arabic OCR: Two Parallel Thinning Al

1. The document presents two parallel thinning algorithms for Arabic optical character recognition. 2. The first algorithm is based on local operations to detect edge, end, and break points using four sub-iterations that satisfy connectivity and shape preservation criteria. 3. The second algorithm uses eight templates to match pixels in a current image to a working image, removing cental pixels that match to iteratively thin the image.

Uploaded by

api-3754855
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views4 pages

Thinning Algorithms Arabic OCR: Two Parallel Thinning Al

1. The document presents two parallel thinning algorithms for Arabic optical character recognition. 2. The first algorithm is based on local operations to detect edge, end, and break points using four sub-iterations that satisfy connectivity and shape preservation criteria. 3. The second algorithm uses eight templates to match pixels in a current image to a working image, removing cental pixels that match to iteratively thin the image.

Uploaded by

api-3754855
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Thinning Algorithms for Arabic OCR

M.Tellache', M.A.Sid-Ahmed' and B.Abaza'


First author was a student at the Univ. of Windsor, now in
Algeria I.
* Second author is with the University of Windsor.
Third author is with B t Y research lab.
Abstract A thinning algorithm is
Two parallel thinning al- considered useful if:
gorithms are presented. One is 1. It preserves the shape
based on local operations to of the original image.
detect edge points, end-points 2. It preserves the con-
and break-points. The other is nectivity.
a matching algorithm in which a 3. It does not leave ex-
set of eight templates and two traneous pixels (branches).
images (the current image and a
::.c?rk:nu ,maqa) arc. used in the 2. Procedure.
processiilg . Most skeletonization al-
gorithms consists of iterative-
1. Introduction. ly executing many passes over
L a b i f 'cysad text consists the pattern where iz eneL .-ass
of connected characters forming a few dark points are deleted.
portrons Df words and isolated In any pass, .=i dark point
characters. Thischaracteristic to be deleted from the pattern
makes recognition of typed text must satisfy the following in-
a more challenging one than tuitive criteria:
English text. The isolation of a. It is an edge point.
the characters from the connec- b. It is not an end point.
ted Arabic text is required a c. It is not a break point.
preprocessing step. Algorithms d. Its deletion must not cause
that do not require character excessive erosion.
isolation such as the vector- The processing of a thinn-
stroke approach (11 processes ing algorithm can either be
the thinned image of a scanned serial (sequential) or parall-
page. Such algorithms are more el. If g(i,j) and g'(i,j)l de-
suited for the cursive nature note respectively the input and
of the Arabic text. Thinning output of an elementary step in
plays a major role in such an point (i,j) then serial pro-
application, andsince recogni- cessing is defined by
tion is dependent in part on g' (i ,j)=F [: g' (i-1,j-1) ,gat( f -
the effectiveness of the thin- W + l ) '(i-1, j+l) , g f(I,]-
ning algorithm, attention is 1) ,s(i,;;J,g(i,jtl) tg(i+ltj-l)
given in this paper to the de- g(l+l/I),g(i+L1+1)3 *
velopment of effective thinning In parallel processing all
algorithms for the purpose of the points can be processed
developing an Arabic OCR. simultaneously, then
Thinning (or skeletoniza- g' (i,j) =F[g (i-1, j-1) ,g (i-
tion) of a binary pattern con- 1,j+1) ,g(i-1, j + l >,g(i,j-
sists of successive deletion of 1) ,g(i,j),g(i,j+l),g(i+l,j-l),
dark points (i.e. changing them g ( i + W ) ,g(i+l,j+1! 1
to white points) along the edg- In the following two stlz-
es of the pattern until it is tions two parallel algorithms
thinned to a line. are developed.

IEEE Pac Rim '93 -248 - 0 1993 IEEE


0-7803-0971-5/93/$3.00
3. PARALU3L-G .
3 P2xP4xP6==0
4. P2xP4xP8=0
AuJo9sx1Rf153,
3.2 wn8tion of the
3 neighbor-
hood of the input image:

because they are the same for


Psi Pl P4 all the sub-iterations, then
candit$ons 3 and 4 for each
sub-iteration is explained.
1. By co-ndition 1 the end-
The algorithm is as fol- points of the skeleton line are
lows: preserved, i.e. this verifies
In each pass, the input that PI is not am end-point.
pattern is scanned from the 2. Condition 2 verifies
upper left hand corner to the that PI to be deleted is not a
i.--v right- hand corner. In break-point, i.e. its deletiorr
each pass a dark point Pi is must not break the connected-
flagged if at least one of each
sub-iteration is satisfied. 3. &
. .
ness of the original pattern.
d 4 or
Nets +&et each of the sub- the first sub-iteretion.
icerations is divided into four

-.
By f % b e o o n d i t h s the
b4,ririditions poirrt P1 which has been removed
F i r s t sub i-tion- might be an east border point
1. 2SB(Pl)S6 or south border point or north-
2. A(P1)=1 west corner point [2],[3].
3. P2XP4XP6=0 The solution to the set of
4. PIXP6XP8=0 equations are:
where P4=0 or P6=0 or P2-0 and
A(P1) is the number of 01
pattern in the ordered P2, P3,
P8=0. . . s 3 and 4 for
.
P4,. 8P9 and P2 that are the
eight neighbors of P1.
4. -tion
the second. mub-iteratian.
By these conditions the
B(P1) is the number of point P1 which has been removed
nonhero neighbors of Pi. That might be a south border point
is or a west border point or a
B(Pl)=P2+P3+P4+P+P6+P+P8+P9 north-east corner point.
-
Second sub iteration. The solution to the set of
equations are:
1. 21B(P1)16 P6=0 or P8=0 or P2=0 and
2. A(P1)=1 P4=0.
3. P2XP6XP8=0 5. Con- 3 and 4 for
4. P4XP6XP8=0 the third sub iteration.
h

-
T h i r d s u b i t e r a t ion. By these conditions the
point P1 which has been removed
1. 2SB(P1)56 might he a north border point
.
2. A(Pl)=l
3 P2XP4XP8=0
4. P2XP6XP8=0
or a west border point or a
south-east corner point.
The solution to the set of
-
Fourth sub i t a r p t i on . equations are:
P2=0 or P8=0 or P4=0 and
1. 2SB(P1)56 P6=0.
2. A(P1)=1

-249 -
6. Conditions 3 and 4 for templates. In the templates
t a . zeros must match 0 valued pix-
By these conditions the els and ones must match 1 val-
point P1 which has been removed ued pixels and asterisks must
might be a north border point match either 1 or 0 valued pix-
or an east border point or a els in the current image.
south-west corner point. Initially the current im-
The solution to the set of age and the working image are
equations are: identical copies of the origi-
P2=0 or P4=0 or P6=0 and nal input image. Template A1 is
P8=0. compared with all pixels having
The above algorithm can a value 1 and their neighbors
also be implemented using tem- in the current image. If the
plates. match is obtained, then the
The process is repeated central pixel is removed in the
until no more pixels are delet- working image. After processing
ed from the pattern, thus the with template A l l the current
final skeleton is obtained. image is discarded and the wor-
Fig.1 shows the result of king image becomes the new cur-
applying the above algorithm. rent image, and the new working
image is obtained by copying
the new current image. The pro-
cess is repeated with templates
A2, A3 , ...,B4 forming a com-
plete cycle until no more pix-
els are removed (i.e. the skel-
eton is obtained.)
This method preserves the
shape of the original image and
maintains connectivity. The
algorithm leaves a few extran-
eous pixels only on the dots of
the characters as shown in
F i g u r e 1. Result of applying Fig.3. To avoid this problem
the parallel thinning al- the character and the dots are
gorithm.
o o * * o o * 1 * * 1 *
0 1 1 1 1 0 1 1 0 0 1 1
* 1 * * 1 * * o o o o *
4. MATCHING ALGO- A1 A2 A3 A4
RITHM. 0 0 0 1 * 0 * 1 1 o * *
4 . 1 Alcroritbm. * 1 * 1 1 0 * 1 * 0 1 1
Matching algorithm util- 1 1 * * * o 0 0 0 0 * 1
izes eight templates [4] B1 B2 B3 B4
(Fig.2) and two images. The two
images are the current image F i g u r e 2 . Templates f o r the
and a working image of the same matching algorithm. Note t I * t t
size which is updated when tem- can either match to 1 or 0.
plates are matched.
Pixels are removed by com-
paring each pixel having a val- initially separated.
ue 1 and its neighbors in the
current image with a set of

- 250 -
Acknowledgement.
The support provided by
the Defense Industrial Research
Program (DIRP), Department of
National Defense, Ottawa, is
greatly appreciated.

WO parallel thinning al-


gorich0 w a e pr-ted. Both
algorithur preserve very well
the shap of the original image
and yield results that can be
effectively used in Arabic OCR
algorithms.

1. H.Almuallim and S.Yamaguchi,


"A method of recognition of
Arabic Cursive Handwriting" ,
IEEE Trans on PAUI Sept. 1987 ,
Vol. PAnr-9 "her 5, pp.715-
723.
2. T.Y.Zhang and C.Y.Suen, "A
fast parallel algorithm for
thinning digital patterns",
CO". mn 27, 1988, pp. 236-
239.
3. Y.S.Chen and W.H.HSU, "A
modified fast parallel algo-
rithm for thinning digital pat-
terns", Pattern Recognition
Letters 7, 1988, pp.99-106.
4. C.Arcellf, L-Cordella and
S.Levialdi, "Parallel thinning
of binary pictures* , Electronic
Letters, Vol. 11, pp.148-149,
July 1975.

-251 -

You might also like