0% found this document useful (0 votes)

71 views6 pages

A Recognition-Based Arabic Optical Character Recognition System

This document proposes a recognition-based Arabic optical character recognition system. It consists of image acquisition, preprocessing, segmentation, character fragmentation, feature extraction, and classification. The system segments characters by combining character fragments using feedback to improve recognition accuracy. The implemented system achieves 90% recognition accuracy with a 20 character per second recognition rate.

Uploaded by

api-3754855

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views6 pages

A Recognition-Based Arabic Optical Character Recognition System

Uploaded by

api-3754855

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A Recognition-Based Arabic Optical Character Recognition System

A. Cheung M. Bennamoun N. W.Bergmann

Space Centre for Satellite Space Centre for Satellite Space Centre for Satellite
Navigation Navigation Navigation
Queensland University of Queensland University of Queensland University of
Technology Technology Technology
GPO Box 2434, Brisbane, GPO Box 2434, Brisbane, GPO Box 2434, Brisbane,
Qld 4001, Australia Qld 4001, Australia Qld 4001, Australia

ABSTRACT
recognize either of these three scripts. Arabic is another popular
Optical character recognition systems improve human-machine script. It is estimated that there are over one billion Arabic script
interaction and are widely used in many government and users. However, because of the technical difficulties induced by
commercial departments. After forty years of intensive research, the cursive nature of the Arabic script, its OCR techniques have
OCR systems for most scripts are well developed. However, not not been well developed yet. If OCR systems are available for
for Arabic script. Since Arabic is a popular script, Arabic OCR Arabic characters, they will be very useful and have a great
systems should have great commercial value. Thus a commercial value. Therefore, a recognition-based Arabic
recognition-basedArabic OCR system is proposed in this paper. Optical Character Recognition system is proposed in this paper.
It consists of the image acquisition, preprocessing, Some background knowledge is given in Section 2 first. Then
segmentation,character fragmentation, combination of character the detail structure of the proposed method is described in
fragments, feature extraction, and classification. A signal is fed Section 3. The system performance and discussions are then
back to improve and determine the segmentatiodrecognition presented in Section 4. Finally, this paper is concluded in
result. The system has been implemented and it has 90% Section 5.
recognition accuracy with a 20 chardsec recognition rate.
2. BACKGROUND
1. INTRODUCTION
2.1 The Characteristics of Arabic Script
Optical character recognition (OCR) is the process of converting
a raster image representation of a document, e.g. a machine This section illustrates some problems which are faced when
printed or handwritten text scanned by a document scanner, into developing an Arabic OCR system.
a computerprocessable format, such as ASCII code.
As each Arabic character has two to four different forms, this
The origin of character recognition system was found in 1870 as extends the classes to be recognized from 28 to 100. Fig 1 shows
an aid for the visually handicapped. In the 194O’s, digital the character set of Arabic script which clearly illustrates that
computers were invented and since then many engineers and the appearance of Arabic character varies according to its
scientists have started their research on OCRs. In the 195Os, the position in a word or sub-word [3,4].
first commercial OCR system was available [1,2]. This subject
has attracted an immense research interest not only because of Both typed and hand-written Arabic are cursive and are read
the very challenging nature of this problem, but also because it from right to left- Fig 2 demonstratesthe formation of an Arabic
improves human-machine interaction in many applications. word and illustrates the variation of Arabic characters’ shape in
Example appliances include office automation, cheque a word. Due to the cursive nature of the script, we can either
verification, and a large variety of banking, business and data recognize a word at a time or segment a word into characters
entry applications.Thus, after forty years of intensive research, a and then recognize the characters. The first case seems to be
lot of techniques and methods were developed for many scripts. impossible and not feasible due to the numerous numbers of
Moreover, many OCR systems are commercially available words in a language. However, if the second case is used,
nowadays. research has been practically proved that the segmentation of a
cursive word is a very difficult problem. However the
The typed Latin, Chinese and Japanese scripts are widely used segmentation is a crucial step in Arabic OCR systems [SI.
around the world. Their characters are separated from one
another which makes their OCR techniques easier to develop. We have also noticed that some Arabic words may be
These are the reasons why OCR systems for these t h e scripts horizontally overlapped with others in a document. An example
are well developed and most commercial available OCR systems is given in Fig 3. This feature causes the traditional

0-7803-4778-1198 $10.00 0 1998 IEEE 41 89

segmentation method using projection profile not applicable in Most characters share similar shape with others, e.g. BA,
this situation and it brings out the word segmentation problem. TA and m; JTM, RA and a etc. The position or
number of dots in the character makes the only difference.
Some characters can only appear at the beginning or at the
end of a word or sub-word. An Arabic word could have one
or more sub-words. This is due to the fact that some
characters are not connectable from the left side with the
succeeding character.
There are only three zigzags that represent vowels. Other
vowels are represented by diacritics in form of over-scores
or under-scores. The use of diacritics is limited to the cases
where the word is foreign or where the pronunciation is
stressed.
There are no upper or lower cases in Arabic.

.-...... ........
over lap
Fig 3. An example of overlapped Arabic words.

2.2 Dissection vs. Recognition-Based Segmentation

The segmentation of an object can be performed by dissection or

recognition-based methods. Dissection is meant the
decomposition of the image into a sequence of sub-images using
general features [6]. It is an intelligence process in that an
analysis of the image is carried out For OCR systems using this
technique, they usually plot projection profiles of the image and
then use a set of rules to segment the image. The dissection
technique is widely used by Latin, Chinese and Japanese OCR
systems. It is because characters of these scripts are isolated,
Fig 1. Arabic character in all forms. (EF end form, MF middle hence the character segmentation can be easily achieved by
form, BF beginning form, and IF isolated form.) dissection techniques. Although Amin [7] developed a
dissection technique for Arabic characters, it seems to be font
dependent

On the other hand, no feature-based dissection algorithm is

employed in the recognition-based segmentation technique. It
usually uses a mobile window of variable width to provide a
sequence of tentative segmentations which are then confirmed
(or not) by the character recognition as a result of a coherent
segmentationlclassification result [6]. This technique is also
called "segmentation-free" in other literatures. The major
advantage of this technique is that it bypasses the segmentation
problem. Therefore it should be suitable to systems which
involved serious segmentation problem.

Fig 2. An Arabic word. 3. THE ARABIC OCR SYSTEM

Some other characteristics of Arabic script are summarized The implemented Arabic OCR system involves five image
below. r processing techniques which are the image acquisition, the
Most characters (17 out of 28) have a dot, two dots, or preprocessing, the segmentation, the feature extraction and the
zigzags associated with the character and they are located classification. As the recognition-based technique is employed
either above, below, or inside the character. in the system, the feature extraction and classification are

4190
I
grouped into one block. Fig 4 gives an overview of the proposed Firstly, we used the hybrid edge detector, whose structure is
system. shown in Fig 6, to detect the edges of a word. A hybrid edge
detector is used because it can localize good edges and provide
A document is quantized by a flatbed scanner in space and good immunity to noise simultaneoysly. We then extracted the
amplitude (i.e. image sampling and gray-level quantization) to contour of the word and fed it into the part segmentation stage.
acquire a digitized representation. The digital image is then
binarized by the Otus method described in [SI. A simple We detected CDPs of a word in the part segmentation stage. The
smoothing method is used to minimize the noise in the image algorithm used in the extraction of the CDPs is illustrated in
due to the shading effect or unevenness of the gray scale [9]. Fig7. At first, the contour smoothing operation is carried out
The image is then ready for segmentation. The projection profile using a Gaussian kernel with 01 so that the problem of
method is employed to extract lines from the document. As discontinuity in the calculation of the derivative of curvature can
mentioned earlier in Section 2.1, Arabic words may horizontally be avoided. Once a smoothed contour is produced, the curvature
overlap with others, therefore a word segmentation method is is computed using Equation (2).
developed to solve this problem. The algorithm is described . . . . . .
in [lo].

......................................................
I

............................................................... ' where

A
x=-,
05 yA = -dS; 1
;
x=,,
d2.? ; d2S;
y = - dt2 ' and
~d-hd,qmcmwz"-~h, modd
dt dt dt
Fig 4. The recognition-based Arabic OCR system. and denote the smoothed version of the x and y coordinates
of the contour respectively. The uppermost branch of the block
3.1 Character Fragmentation diagram shown in Fig 7 extracts all the dominant points on the
contour by convolving the curvature with the derivative of the
The input to the recognition-based OCR system is a sequence of Gaussian function with 02 and followed by zero crossing
tentative character fragments. It can be done by either the pixel- detection. A dominant point is defined as the point for which the
based or feature-based fragmentation. In order to save the derivative of the curvature equals zero. The lowermost branch is
processing time of the system, the feature-based fragmentation responsible to select the convex points for which the smoothed
is chosen. It involves two steps. curvature is greater than a certain threshold 7%. Both branches
are ANDed to produce the CDPs and each CDP is a tentative
The fist step provides coarse fragmentation points. We fragmentation point.
simplified the dissection technique of Amin [7] by ignoring all
the supplementarysegmentation rules. In more detail, we plotted
the vertical projection profile of a word and calculated the sum
of the average value (A 9,where
%g-Lmo -
I
Threshold
i=l
and where N, is the number of columns and Xi is the number of Fig 6. The hybrid edge detector.
black pixels of the ith column. Hence each part which shows a
sum value less than AVis a tentative fragmentation point [7].

--
Fig 5. Bennamoun's segmentation technique.
in ut
conrour - Gaussian

In the second step, we fine tuned fragmentation points by

applying the object segmentation method of Bennamoun's
vision system [ll]. His segmentation method has been u u
practically proved to be a reliably technique for segmenting
Fig 7. Extraction of the CDPs.
objects with convex dominant points (CDPs). Fig 5 illustrates
this segmentation technique which consists of two stages: the
hybrid edge detection and the part segmentation.

4191
Finally, we combined fragmentation points found in the first and Table 1. The different possibilities of the 2x2 masking and its
second step to evaluate the resultant sequence of fragmentation Freeman code.
points. Fig 8 shows some character fragmentation results.

3.2 Feature Extraction

The end result of the image acquisition, preprocessing,

segmentation and character fragmentation is a matrix of
numbers that represents a character fragment in some way. In
the general case, however, the matching of these numbers to a
template may be too time consuming and not flexible enough.
Therefore, feature extraction is needed. Structural features of
each character fragment are extracted in this system.

..... .......................... .................. ...,..._.............. .......

Fig 8. Character fragmentation results.

By using the hybrid edge detector, the contour of a character

fragment is extracted. Then,we started from the top right-hand
black pixel of the character fragment contour and traced through
its whole contour. The tracing process used depends on a 2x2
window. When this window is imposed over a contour, it
produces a vector such as those in Table 1. This feature
extraction process is similar to the one described in [12].
However, as the input image is different, some modifications to
the method have been made. The result of this process is a
sequence of Freeman codes. Fig 9 shows the contour of the
character ALIF. By applying this feature extraction method, the
following sequence of Freeman code is produced:
3,3,3,3,3,3,3,3,3,3,3,5,7,7,6,7,7,7,7,7,6,7,7,7,7,7,7,7,7,
7,7,7,1,3,2,3,3,3,3,3,3.

We then apply the following four formulae to smooth up the

code chain.
cicjcj+cicjcj (3)
cicicj-+ CiCiCj (4)

cicjci+cicici (5)
cicjc,-+ c,c,c, (6)
where The code chain is finally concentrated by dividing the run-length
of a code with a threshold TI providing that the run-length of
Ci,C ,C, ,C, E {1,2,3,4,5,6,7,8} that code exceeds a threshold T2. The purpose of T2 is to make
and C, is the resultant direction of Ci, C , and C,. By applying the final code chain have a certain degree of robustness to noise.
the above formulae, the above listed sequence becomes: If TI is set to 8 and Tz is set to 3 then the above code chain
3,3,3,3,3,3,3,3,3,3,5,5,5,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, becomes the following sequence of codes:
7,7,1,1,2,2,2,3,3,3,3,3. 3,3,5,7,7,2.

4192
3.3 Classification feedback loop. After that, we combined the result of these two
feedback trials to form the recognized word.
The classification process is carried out at the final stage to
recognize the character. It assigns an input character to one of 4. RESULTS AND DISCUSSIONS
many pre-specified classes which are based on the extracted
features and their analysis. We have fully implemented the recognition-based Arabic OCR
system that is described above. The system is written using
C/Cc+programming language and is run on Pentium 166MHz
personal computer. It was applied to Arabic documents which
means all four forms, as shown in Fig 1, are mixed together in
testing samples. Many tests were taken on printed texts and a
recognition accuracy of 90% was achieved. The worst result is
shown in Fig 10. It recognizes Arabic Characters in around 20
charkc. In other words, it is a real time system.

The major error of this system happens in the classification

stage. Even though we have performed right-to-left and left-to-
right feedback recognition, whenever there is a character in a
word that could not be recognized, the rest of the characters in
the word are not recognized properly. It seems that it could not
be solved unless a more complex feedback control strategy is
used.

If we compare the recognition accuracy of this system with the

other two systems that have been described in [5, 131, it is
obvious that the recognition accuracy has increased. This is
because of the use of the recognition-based
segmentatiodclassificationmethod. Therefore we believe that
recognition-based model is more suitable to the Arabic script or
other cursive scripts, like handwritten Latin.
Fig 9. The contour of ALIF.
5. CONCLUSION
In this OCR system, each character fragment is numbered from
right to left During the recognition process, the first fragment is We have presented a recognition-based approach for the
fed into the feature extraction process in order to determine the recognition of printed Arabic text in this paper. This system
concentrated Freeman code chain. This code chain is then consists of the image acquisition, preprocessing, segmentation,
inputted to a structural classifier to find the best match. The feature extraction and classification. It is similar to usual OCR
structural classifier is a state-diagram of Freeman codes of systems except it has a feedback loop that can control the
database samples. In order to minimize the confusion of combination of character fragments to form a character for
character fragments with characters and to save the search time, classification. Because of this feedback loop, the system
there are four databases. According to the position of a fragment bypasses the character segmentation process which leads to the
or fragments in a word, we will go to the correspondingdatabase 90% recognition accuracy. Its recognition rate is about
to search for the best match. For example, if there is a 20 chardsec.
combination of first and second fragment, we will go to the
database file for beginning characters. If the fragment could not As mentioned earlier, the feedback loop that we used has a
be recognized, a signal is fed back to the character fragment potential problem. If a character in a word could not be
combination process to combine the first and second fragment recognized, the rest of the characters are not recognized
(refer to Fig 4). Then the above processes are repeated until a properly. This affects the accuracy of this system. However, we
character is recognized. If a character is recognized after believe that if a more intelligent feedback loop is developed on
combining the first f z fragments, then this feedback loop will controlling the combination of character fragments to form
start again to recognize the next character from the (n+l)th characters, a higher recognition accuracy should definitely be
fragment onwards. achieved.

The above feedback system has a potential problem which is if a 6. REFERENCES

character in a word could not be recognized due to some
reasons, then the rest of the characters would not be recognized [l] V. K. Govindan and A. P. Shivaprasad, “Character
properly. In order to minimize this problem we repeated the Recognition - Review,” Punerrc Recogrcition,vol. 23, no. 7,
above feedback recognitionprocess again but started from left to pp. 671-683, 1990.
right if the word is not wholly recognized in the right-to-left

41 93
[2] S. Mori, C . Y. Suen, and K. Yamamoto, “Historical Review 1991.
of OCR Research and Development,” in Proceedings of [8] N. Otsu, “A Threshold Selection Method from Gray-Level
IEEE, vol. 80, pp. 1029-1058, July 1992. Histograms,” IEEE Tram. on SMC, vol. 9, no. 1, pp. 62-66,
[3] I. S. Abuhaiba, S. A. Mahmoud and R J. Green, “Cluster January 1979.
Number Estimation and Skeleton Refining Algorithm for [9] A. Amin and W. H. Wilson, “Hand-Printed Character
Arabic Characters,” The Arabian Jounzal for Science mid Recognition System Using Artificial Neural Networks,” pp.
Elzgiizeeririg, vol. 16, no. 4B, pp. 519-530, October 1991. 943-945, July 1993.
[4] K. M. Jambi, “Arabic Character Recognition: Many [lo] A. Cheung, M. Bennamoun and N. W. Bergmann, “A New
Approaches and One Decade,” Die Arabic Jounzul for World Segmentation Algorithm for Arabic Script,”
Scierzce mid Etzgirzeering, vol. 16, no. 4B, pp. 501-509, (Auckland, New Zealand), DICl’A’97, pp. 431-435,
October 1991. December 1997.
[5] A. Cheung, M. Bennamoun and N. W. Bergmann, [ l l ] M. Bennamoun and B. Boashash, “A Structural Description
“Implementation of A Statistical Based Arabic Character based Vision System for Automatic Object Recognition,”
Recognition System,” (Brisbane, Australia), TENCON97, IEEE Trails OILSMC, vol. 06, no. 27, pp. 893-906, 1997.
pp. 531-534, December 1997. [12] A. Amin and J. F. Mari, “Machine Recognition and
[6] R. G. Casey and E. Lecolinet, “A Survey of Methods and Correction of Printed Arabic Text,” IEEE Tram. on SMC,
Strategies in Character Segmentation,” IEEE Trurzs. on vol. 19, no. 5, pp. 1300-1306, October 1989.
PAMI, vol. 18, no. 7, pp. 690-706, July 1996. [13]~.Cheung, M. Bennamoun and N. W. Bergmann, ”The
[7] A. Amin, “Recognition of Arabic Handprinted Arabic Optical Character Recognition Systems: Statistical
Mathematical Formulae,” The Arabiaiz Jounzal for Science and Neural Network Approaches,” IAIF’97, pp. 293-
arid Engineering, vol. 16, no. 4B, pp. 532-542, October 298,November 1997.

0)
Fig 10. (a) The original document. @) The recognized result

41 94

Solution
40% (5)
Solution
13 pages
Philippine Skills Framework For Contact Center and Business Process Management
100% (2)
Philippine Skills Framework For Contact Center and Business Process Management
320 pages
Silt Control in Irrigation Channels
100% (1)
Silt Control in Irrigation Channels
36 pages
Karnataka FPOs
No ratings yet
Karnataka FPOs
66 pages
Ruby Programming For Beginners: The Simple Guide to Learning Ruby Programming Language Fast!
From Everand
Ruby Programming For Beginners: The Simple Guide to Learning Ruby Programming Language Fast!
Tim Warren
2/5 (2)
Research Paper On OCR
No ratings yet
Research Paper On OCR
4 pages
Project Report OCR
92% (25)
Project Report OCR
50 pages
N.E.F. Phobia
No ratings yet
N.E.F. Phobia
2 pages
Cell Broadcast (GBSS19.1 01)
No ratings yet
Cell Broadcast (GBSS19.1 01)
87 pages
Gartner - SWOT SAS Institute
100% (1)
Gartner - SWOT SAS Institute
26 pages
Benguet EP 2017
No ratings yet
Benguet EP 2017
158 pages
Minutes of Meeting Attendance: Present
No ratings yet
Minutes of Meeting Attendance: Present
3 pages
Arabic Ocr A Survey
100% (1)
Arabic Ocr A Survey
13 pages
I Lecture 6
No ratings yet
I Lecture 6
39 pages
Y.garud Multiaxial Fatigue
No ratings yet
Y.garud Multiaxial Fatigue
27 pages
Ales Hrdlicka - Some Results of Recent Anthropological Exploration in Peru, 1911
No ratings yet
Ales Hrdlicka - Some Results of Recent Anthropological Exploration in Peru, 1911
40 pages
Tài liệu về OCR 5
No ratings yet
Tài liệu về OCR 5
50 pages
TSP Cmes 2455513
No ratings yet
TSP Cmes 2455513
38 pages
10 - Chapter 2
No ratings yet
10 - Chapter 2
37 pages
Pre SWOT Offline OCR
No ratings yet
Pre SWOT Offline OCR
31 pages
2208 11484v2
No ratings yet
2208 11484v2
31 pages
Impact Application of ICT On Office Mana
No ratings yet
Impact Application of ICT On Office Mana
34 pages
Ass2 Sem-2 20-21
No ratings yet
Ass2 Sem-2 20-21
1 page
Questionpaper Paper1P June2017 PDF
No ratings yet
Questionpaper Paper1P June2017 PDF
36 pages
Applsci 13 04584 With Cover
No ratings yet
Applsci 13 04584 With Cover
28 pages
A Survey On Optical Character Recognition For Bangla and Devanagari Scripts
No ratings yet
A Survey On Optical Character Recognition For Bangla and Devanagari Scripts
36 pages
Aurora Geo Report
No ratings yet
Aurora Geo Report
86 pages
Optical Character Recognition of Amharic Documents
No ratings yet
Optical Character Recognition of Amharic Documents
15 pages
Signal & Image Processing: An International Journal
No ratings yet
Signal & Image Processing: An International Journal
16 pages
A Novel Fuzzy Approach For Handwritten Arabic Character
No ratings yet
A Novel Fuzzy Approach For Handwritten Arabic Character
16 pages
Project Report
No ratings yet
Project Report
38 pages
F1775 English e Tarjome
No ratings yet
F1775 English e Tarjome
14 pages
Kurdish Optical Character Recognition: Research Article
No ratings yet
Kurdish Optical Character Recognition: Research Article
10 pages
Computer Vision CH4
No ratings yet
Computer Vision CH4
9 pages
50 Grammar Sihtet
No ratings yet
50 Grammar Sihtet
14 pages
Isolated Arabic Handwritten Character Recognition A
No ratings yet
Isolated Arabic Handwritten Character Recognition A
11 pages
370 Oct Ijamte - 1126
No ratings yet
370 Oct Ijamte - 1126
7 pages
A Novel Arabic Optical Character Recognition Approach Based On Levenshtein Distance
No ratings yet
A Novel Arabic Optical Character Recognition Approach Based On Levenshtein Distance
11 pages
Dept. of Math. Dept. of Software College of Computer Sciences and Mathematics University of Mosul
No ratings yet
Dept. of Math. Dept. of Software College of Computer Sciences and Mathematics University of Mosul
19 pages
Chapter 13 Test
No ratings yet
Chapter 13 Test
7 pages
Optical Character Recognition (OCR) System
No ratings yet
Optical Character Recognition (OCR) System
5 pages
Chapter One: 1.1 Problem Definition
No ratings yet
Chapter One: 1.1 Problem Definition
41 pages
Maliki 2012
No ratings yet
Maliki 2012
8 pages
A Prototype System For Handwritten Sub-Word Recognition: Toward Arabic-Manuscript Transliteration
No ratings yet
A Prototype System For Handwritten Sub-Word Recognition: Toward Arabic-Manuscript Transliteration
8 pages
Kra 4 Community Linkages and Professional Engagement & Personal Growth and
No ratings yet
Kra 4 Community Linkages and Professional Engagement & Personal Growth and
7 pages
Recognition of Handwritten Characters A Review: R H Davis and J Lyall
No ratings yet
Recognition of Handwritten Characters A Review: R H Davis and J Lyall
11 pages
Ocr & Cbir
No ratings yet
Ocr & Cbir
13 pages
Case Note Excellent1 Annotated
No ratings yet
Case Note Excellent1 Annotated
8 pages
DR Fixit Polymer Mortar PX 75 1
No ratings yet
DR Fixit Polymer Mortar PX 75 1
3 pages
Script Recognition-A Review: Debashis Ghosh, Tulika Dube, and Adamane P. Shivaprasad
No ratings yet
Script Recognition-A Review: Debashis Ghosh, Tulika Dube, and Adamane P. Shivaprasad
20 pages
Khmer Optical Character Recognition (OCR) : September 2015
No ratings yet
Khmer Optical Character Recognition (OCR) : September 2015
7 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
14 pages
علي عبد حسين - ماستر عام - OCR
No ratings yet
علي عبد حسين - ماستر عام - OCR
12 pages
CH 4 Force System Resultant
No ratings yet
CH 4 Force System Resultant
50 pages
Optical Character Recognition Techniques in Urdu-A Survey: Vippon Preet Kour Naveen Kumar Gondhi
No ratings yet
Optical Character Recognition Techniques in Urdu-A Survey: Vippon Preet Kour Naveen Kumar Gondhi
5 pages
Arabic Character Recognition System Development: Sciencedirect
No ratings yet
Arabic Character Recognition System Development: Sciencedirect
8 pages
Gara in 2002
No ratings yet
Gara in 2002
11 pages
An OCR System T o Read T W o Indian Language Scripts: Bangla and Devnagari (Hindi)
No ratings yet
An OCR System T o Read T W o Indian Language Scripts: Bangla and Devnagari (Hindi)
5 pages
Development of Text Extraction Technique 3acb33e9
No ratings yet
Development of Text Extraction Technique 3acb33e9
8 pages
Off-Line Arabic Handwriting Character Recognition Using Word Segmentation
No ratings yet
Off-Line Arabic Handwriting Character Recognition Using Word Segmentation
6 pages
Position Description BIM Manager
No ratings yet
Position Description BIM Manager
5 pages
Irjet V6i6736
No ratings yet
Irjet V6i6736
5 pages
Tetra 350-450MHz Mini Repeater Brochure, V1.04
No ratings yet
Tetra 350-450MHz Mini Repeater Brochure, V1.04
2 pages
HDFC 5000 Book4 07to31mar25
No ratings yet
HDFC 5000 Book4 07to31mar25
3 pages
Recognition of On-Line Arabic Handwritten Characters Using Structural Features
No ratings yet
Recognition of On-Line Arabic Handwritten Characters Using Structural Features
15 pages
Optical Character Recognition Techniques
No ratings yet
Optical Character Recognition Techniques
6 pages
Optical Character Recognition: Selected Topics in Computer Science
No ratings yet
Optical Character Recognition: Selected Topics in Computer Science
7 pages
DB Ex3
No ratings yet
DB Ex3
4 pages
Review On Optical Character Recognition of Devanagari Script Using Neural Network
No ratings yet
Review On Optical Character Recognition of Devanagari Script Using Neural Network
6 pages
Scale and Rotation Invariant Recognition of Cursive Pashto Script Using SIFT Features
No ratings yet
Scale and Rotation Invariant Recognition of Cursive Pashto Script Using SIFT Features
5 pages
Optical Character Recognition Technique Algorithms
No ratings yet
Optical Character Recognition Technique Algorithms
8 pages
Optical Character Recognition For Printed Tamil Text Using Unicode
No ratings yet
Optical Character Recognition For Printed Tamil Text Using Unicode
9 pages
NPP0085 Jec DD Me DWG 00133
No ratings yet
NPP0085 Jec DD Me DWG 00133
1 page
Text Color Images
No ratings yet
Text Color Images
6 pages
Irjet V6i6736
No ratings yet
Irjet V6i6736
5 pages
112 Experiment 3
No ratings yet
112 Experiment 3
3 pages
Executive Summary of URD OCR: Waqar Ahmed
No ratings yet
Executive Summary of URD OCR: Waqar Ahmed
4 pages
Optical Character Recognition Using MATLAB: Sandeep Tiwari, Shivangi Mishra, Priyank Bhatia, Praveen Km. Yadav
No ratings yet
Optical Character Recognition Using MATLAB: Sandeep Tiwari, Shivangi Mishra, Priyank Bhatia, Praveen Km. Yadav
4 pages
Poster PHD Chadli
No ratings yet
Poster PHD Chadli
1 page
Guajarati Character Recognition: The State of The Art Comprehensive Survey
No ratings yet
Guajarati Character Recognition: The State of The Art Comprehensive Survey
4 pages
Offline Arabic Text Recognition An Overview
No ratings yet
Offline Arabic Text Recognition An Overview
9 pages
8th ICCIT - 2005 - 564
No ratings yet
8th ICCIT - 2005 - 564
5 pages
An Efficient Scheme For Tilt Correction in Arabic OCR System
No ratings yet
An Efficient Scheme For Tilt Correction in Arabic OCR System
6 pages
An English Language OCR
No ratings yet
An English Language OCR
4 pages
BUSS 1020 - Quantitative Business Analysis Individual ASSIGNMENT Semester 2, 2015
No ratings yet
BUSS 1020 - Quantitative Business Analysis Individual ASSIGNMENT Semester 2, 2015
3 pages
PM Clinic L11 2023
No ratings yet
PM Clinic L11 2023
2 pages
Segmentation of Connected Arabic Characters Using Hidden Markov Models
No ratings yet
Segmentation of Connected Arabic Characters Using Hidden Markov Models
5 pages
Lecture 12
No ratings yet
Lecture 12
4 pages
List of Some Implementation Based Problems On Spoj
No ratings yet
List of Some Implementation Based Problems On Spoj
2 pages
Lecture 9
No ratings yet
Lecture 9
4 pages
Image To Text Conversion, Character Recognition
No ratings yet
Image To Text Conversion, Character Recognition
3 pages
Lecture 4
No ratings yet
Lecture 4
6 pages
(IJIT-V3I2P4) : Dr. V.Ajantha Devi, J. Ashifa
No ratings yet
(IJIT-V3I2P4) : Dr. V.Ajantha Devi, J. Ashifa
3 pages
Efficient Anti-Aliasing Algorithm For Computer Generated Images
No ratings yet
Efficient Anti-Aliasing Algorithm For Computer Generated Images
4 pages
Multifont Arabic Character Recognition Using Houghtransform and Hidden Markov Models
No ratings yet
Multifont Arabic Character Recognition Using Houghtransform and Hidden Markov Models
4 pages
Thinning Algorithms Arabic OCR: Two Parallel Thinning Al
No ratings yet
Thinning Algorithms Arabic OCR: Two Parallel Thinning Al
4 pages
Fast Algorithm For Line Rasterization by Using Slope 1
No ratings yet
Fast Algorithm For Line Rasterization by Using Slope 1
5 pages
Lecture 10&11
No ratings yet
Lecture 10&11
3 pages
Lecture 7
No ratings yet
Lecture 7
7 pages
An Efficient Line Algorithm
No ratings yet
An Efficient Line Algorithm
3 pages
Implementation of A Statistical Based Arabic Character Recognition System
No ratings yet
Implementation of A Statistical Based Arabic Character Recognition System
4 pages
Lecture 5
No ratings yet
Lecture 5
5 pages
Parting Glass
No ratings yet
Parting Glass
1 page
Slope Discussion
No ratings yet
Slope Discussion
1 page
Optical Character Recognition: Unlocking the Power of Computer Vision for Optical Character Recognition
From Everand
Optical Character Recognition: Unlocking the Power of Computer Vision for Optical Character Recognition
Fouad Sabry
No ratings yet
Recognition of Urdu Scripts
No ratings yet
Recognition of Urdu Scripts
5 pages

A Recognition-Based Arabic Optical Character Recognition System

Uploaded by

A Recognition-Based Arabic Optical Character Recognition System

Uploaded by

A Recognition-Based Arabic Optical Character Recognition System

A. Cheung M. Bennamoun N. W.Bergmann

0-7803-4778-1198 $10.00 0 1998 IEEE 41 89

2.2 Dissection vs. Recognition-Based Segmentation

The segmentation of an object can be performed by dissection or

On the other hand, no feature-based dissection algorithm is

Fig 2. An Arabic word. 3. THE ARABIC OCR SYSTEM

............................................................... ' where

In the second step, we fine tuned fragmentation points by

3.2 Feature Extraction

The end result of the image acquisition, preprocessing,

..... .......................... .................. ...,..._.............. .......

Fig 8. Character fragmentation results.

By using the hybrid edge detector, the contour of a character

We then apply the following four formulae to smooth up the

The major error of this system happens in the classification

If we compare the recognition accuracy of this system with the

The above feedback system has a potential problem which is if a 6. REFERENCES

You might also like