0% found this document useful (0 votes)

82 views5 pages

Optical Character Recognition Techniques in Urdu-A Survey: Vippon Preet Kour Naveen Kumar Gondhi

The document discusses optical character recognition techniques for Urdu, a cursive language written in Nastaliq script from right to left. It outlines challenges in recognizing Urdu text including bi-directionality, non-monotonic characters, context sensitivity, complex dot placement, variable spacing, and looping letters. The document also summarizes common Urdu writing styles and the main steps in an optical character recognition process: preprocessing, segmentation, feature extraction, recognition/classification.

Uploaded by

Ammara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views5 pages

Optical Character Recognition Techniques in Urdu-A Survey: Vippon Preet Kour Naveen Kumar Gondhi

Uploaded by

Ammara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Journal of Computer Applications (0975 – 8887)

International Conference on Advancements in Engineering and Technology (ICAET 2017)

Optical Character Recognition Techniques in Urdu- A

Survey

Vippon Preet Kour Naveen Kumar Gondhi

Department of Computer Science & Engineering Department of Computer Science & Engineering
SMVDU, Katra SMVDU, Katra

ABSTRACT desire, demand and popularity of the Urdu literature, there is a

The survey of the optical character reader for Urdu like need to develop and design an Optical Character Reader for
cursive languages is based on the various techniques and Urdu Language. The objective of character recognition is to
studies performed on designing and implementation of the imitate the human reading ability, with the human accuracy
optical character reader. As, Urdu language has Nastaliq font but at far higher accuracy.
so different approaches were applied on this font so as to get
the desired result. Survey is being performed on all the
2. CHALLENGES IN URDU SCRIPT
techniques whether segmentation based, on-line or off-line There are various challenges in Urdu like cursive scripts and
etc, then all the data gathered is represented in a tabular they are
manner so as to make it an ease to understand or to have an 2.1 Bi-directionality
idea of the concept by visualizing the table at once. Non
Urdu language is written bidirectional, as the characters are
existence of the Urdu OCR has limited the concept a digital
written are written from right to left while the numerals are
Urdu library and this nonexistence leads a pathway for
written left to right. So, this makes an OCR design much
immense research in this field.
chaotic and complex.
General Terms 2.2 Non-Monotonicity
Pattern Recognition, Nastaliq font, Urdu language, Pushto
The writing pattern of Urdu is quite different from the other
language.
cursive languages. In the Urdu like scripts, one frequently
Keywords goes back to the already written character as certain letters
Image Segmentation, Optical Character Reader, Feature consist of stoke that goes back and beyond the previous
Extraction, Classification. character e.g.; JEEM, HAAY.

1. INTRODUCTION 2.3 Context sensitivity

The character recognition or the optical character recognition Each character in Urdu language changes its shape in
is the process of the mechanical or electronic conversion of accordance with the neighboring character. Thus a character
the type images of handwritten text or printed text into can have different shapes.
machine encoded text whether the text is taken from a
handwritten document, photo of the document. It is a method
2.4 Complex dot placement
There is always some dot displacement in Urdu language
of digitizing the printed texts so that they can be edited,
depending upon the character presence with the neighboring
searched, stored, more compactly and can be displayed online
characters. Due to this the dots displace from their standard
using machine processes. The applications of the OCR’s are
positions.
mostly for the blind impaired users, data entry for business
documents, number plate recognition, defeating CAPTCHA in 2.5 Spacing
anti-bot systems. The OCR works with implementation of Like the English language, there is no definite procedure to
various techniques, tools and on the basis of them the understand the intra word spacing in case of Urdu language.
accuracy level is obtained and thus a result is generated. This makes the readability a bit difficult task.
There are a variety of languages spoken all over the world.
Many people are multi-linguistic, but certain languages are 2.6 Looping
having immense cultural influence from the prehistoric times. In Urdu language certain characters have negative and
The main languages of the era are Hindi, Sanskrit, Urdu, positive loops. The negative loops form shape resembling to
Punjabi, Sindhi, Pushto etc. Urdu is the national language of the circle or oval or vice versa.
Pakistan, and an official language of six states of India. It is
also one of the official languages recognized in the ‫میں زمانہ اگلے ہیں کہتے‬
Constitution of India. Urdu language has some specialized
vocabulary and apart from it Urdu is intelligible with standard In this example there are both positive and negative loops. For
Hindi. In the Asian continent, the Urdu language came under example: meem and gaaf.
the influence of British rule, when they replaced the Persian
language by the Urdu language. Urdu language writing style 3. URDU WRITING STYLES
basically comes under the ’Cursive Languages Writing Style’. The various Urdu writing styles are present based on the past
Urdu is written in Nastaliq format. As there is a lot of old, styles followed by different writers and people around the
popular literature written in Urdu language present in globe and they are
handwritten form, but all this is not digitized. So, due to the

22
International Journal of Computer Applications (0975 – 8887)
International Conference on Advancements in Engineering and Technology (ICAET 2017)

3.1 Urdu Script 4.2 Preprocessing

It is an extension of the Persian alphabet from right to left, This step involves the tasks such as the separation of dots
which itself is an extension of the Arabic alphabet. This touching the base of the ligature.
feature is known as the Persian Calligraphy, but the Urdu
language follows the Nastaliq style of this calligraphy. It is 4.3 Segmentation
the most popular and commonly used style of the Urdu In this step, the text of a paragraph is segmented into lines and
language used so far. then the lines into words and the words into sub
words/ligatures. This step has two kinds of approaches viz
3.2 Kaithi Script “segmentation free or holistic methods” and “segmentation
This script was used in the British administration courts of based or analytical methods”.
Bengal, Bihar, North-West provinces and Oudh. It is a highly
Persianized and technical form of the Urdu language and was 4.5 Feature Extraction
used prominently in the 19th century. Most of the legal This step involves the extraction of unique and salient patterns
operations or paperwork of the British India was executed in from the input image to enhance the discrimination power and
this script. reduce data for the classification. The extracted features can
be classified as Statistical features, Global features and.
3.3 Devanagari Script Structural features.
The introduction of orthographic features by publishers into
the Devanagari script solved the purpose of representing the 4.6 Recognition/classification
Perso-Arabic etymology of Urdu words. This is the most On the basis of the extracted features the classification
popular script adopted for publishing journals and other /recognition is the main decision making stage. The pattern is
technical tasks. identified and recognized from the input features.

3.4 Roman script 5. STEPS OF AN OCR PROCESS

Due to the ease and availability of Roman movable type of The diagramatic flow diagram shows how and what main
printing press, Urdu was occasionally written in Roman script. steps are taken into consideration while designing an OCR,
It is prominently used over internet as well by the youngsters and they are as shown in the fig:

SCANNED INPUT

PREPROCESSING

SEGMENTATION

FEATURE EXTRACTION

RECOGNIZED TEXT

Fig. 2: Steps of OCR for cursive languages:

A brief description of the diagram is given below as:

5.1 Input or scanned pages

This is the data which needs to be recognized by the optical
character reader. This can be in any form i.e., either in text
form or in the form of an image.
Fig.1: The Urdu Nastaliq alphabets with their names 5.2 Pre –processing
Devanagari and Latin alphabets. This step is performed on raw data to prepare it for another
processing procedure. It is the preliminary step that
4. OPTICAL CHARACTER RECOGN- transforms the data into a format that can be more effectively
ITION TYPES processed with ease. It is an important step and usually
Based on the type of the input mode, the OCR can be consists of binarization, filtering, smoothing, slant correction,
classified as On-line OCR and Off-line OCR. The online skew detection, thinning, baseline detection etc. This requires
character recognition system consists of five components. fineness while carrying out the tasks as it can severely and
adversely affect the upcoming steps.
4.1 Image acquisition
This step involves binarization, filtering, smoothing, slant 5.3 Segmentation
correction, skew detection, and thinning and baseline It is defined as the process by which a given data is divided
detection to improve the performance. This step affects the into sub data or we can say that in order to detect what
reliability and efficiency. actually is contained in the image or input, the division of the

23
International Journal of Computer Applications (0975 – 8887)
International Conference on Advancements in Engineering and Technology (ICAET 2017)

given image or data is done into subparts and then taking output dependent on the state is visible. Each state has the
these subparts after processing are merged together to probability distribution over the possible output tokens and
determine what actually was in the input, or the subparts can the sequence of tokens generated by the model give
be feed to the next step for processing. information about the sequence of states. The ‘hidden’ here
indicates a state through which the model passes but not the
5.4 Feature extraction parameters of the model. The HMM are generally applied in
This step starts from the initial step of measured data and temporal pattern recognition such as speech, handwriting,
extracts the derived or desired features or values which are gesture recognition, musical score following etc. The
intended to be non-redundant and informative. The extracted diagrammatic representation of the model is shown as:
or selected features are assumed to contain the desired
information or data. It extracted features can be classified as a12 a23
Structural features, Statistical features, Global
transformations. X1 X2 X3
a22
5.5 Recognized text
This is the final output result or desired data that has been
obtained from the application of the previous steps.

6. APPROACHES b11 b12 b13 b14

For the design of an optical character reader for the Urdu
language, a wide variety of techniques/approaches were used
by different researchers.

6.1 Segmentation based approaches

Segmentation is the process in which the given data is divided
i.e., the subparts of a particular data are formed. So the data
that is to be segmented can be in any form e.g., a paragraph, a
line, a word etc. The subpart or the divisible part is made such
that it can be processed easily in the upcoming steps. The Y1 Y2 Y3 Y4
segmentation is followed in such a sequential manner that the
paragraph is segmented into the individual lines and the line is
then segmented into different words and at last the words are Fig.5: Hidden Markov Model with X-States, Y-Possible
segmented into the characters or the alphabets. observations, a-State transition probabilities, b-
Output probabilities.
PARAGRAPH LINE
6.3 Template matching:
In this approach we find the small parts of an image which
match the template image. It is a feature based approach i.e.,
CAHARACTER WORD strong features are taken into consideration while matching. If
the strong features are not present, then the template based
Fig.3: Step wise representation of segmentation approach is followed and it proves to be effective. It requires
sampling of a large number of points and then from that
The commonly two types of approaches in segmentation sample we start matching the points. If the template may not
include Segmentation free or holistic approach and provide a direct match, then the other methods like motion
Segmentation based or analytical approach. The analytical tracking and occlusion handling are performed. This approach
approach is further of two methods i.e., indirect and direct. In has various applications as face detection, visual object
direct method, the ligatures are not further segmented while in recognition, car plate recognition etc. It is a simple method
indirect method, a word is separated directly into the letters used for classification and pattern recognition. A database of
using a number of heuristics that identify all of the templates is used for matching which is called the training
segmentation points of a character. A ligature is resolved by data. It can identify scanned or computer written characters,
splitting it into smaller elements that might be letters or less numbers and the secondary characters are known as diacritics.
than letters such as sub letters or small stokes which further The template image is moved to all possible positions of the
need segmentation. Hence, segmentation proves to be an source image and an exact match with nearest representation
extensive and important approach in OCR. is extracted and taken into consideration. In most of the cases
all matching is done on pixel by pixel basis.

6.4 Unicode Mapping

It is a computing industry standard for the consistent
Fig.4: Segmented Urdu word encoding, representation and handling of text in most writing
systems. The latest version of Unicode contains a repository
6.2 Hidden Markov Models (HMM) of more than 120,000 characters covering about 129 modern
It is a statistical model in which the system being modeled is and historic scripts along with multiple symbol sets. Unicode
assumed to be a process with hidden states. The dynamic can be implemented by different character encodings. All of
Bayesian Network is used for representing HMM in the the possible sequences of segments are generated and stored
simpler form. It is based a little on the forward-backward in a file. After the segments are generated, we produce
procedure of a optimal non-linear filtering problem. In this Unicode of each segment. One character can have multiple
model the state is not directly visible to the observer but the segments (over segmentation), while others can combine to

24
International Journal of Computer Applications (0975 – 8887)
International Conference on Advancements in Engineering and Technology (ICAET 2017)

construct one segment (under segmentation). One character major

can have different number/types of diacritics to distinguish order,
among characters. So, a state machine has been developed to template
cope up with all the combinations. One sequence of states matching
exhausts inputs and returns the Unicode value of one
character. When a ligature is tested, long sequence for all FFNN
Machine
segments is generated and then found in the state table. If any Back,
Printed
fulfilling sequence is found then the value is acquired, 200 solidity,
Ligature Hussain
otherwise one state from the sequence is dropped and again Ligature number of 100%
Or Word et al[1]
searched in the table. So, it is the longest sequence matching s holes, axis
Recognitio
algorithm. eccentricity
n
, moments.
….. Handwritt
…….. en Isolated Numeral,
17U0653 Sagheer
Character, 60329 gradient, 99%
2(+0) U0670 et al[5]
Ligature, SVM
47U0647 Word And
…. ….. Numeral
3(+0) 3(+0) U0632 Recognitio Basu et Numeral,
3000 96.20%
3(+0) 3(+0) n al[6] QTLR
127U0632
Sliding
Sardar[7 window
Fig.6: State Transition Table 1050 97%
] and HMM,
Table 1: Comparison of various OCR approaches Online KNN
Data Classificati Accura Isolated
Approach Authors Numeral,
Sets on cy Character,
Razzak 900 Structural
Ligature 96.30%
Connected et al[8] Images ,Rule
Or Word
component Based
And
200 labelling Numeral
Hussain Numeral,
Ligature and 100% Recognitio
et al[1] Fuzzy
s centroid to n Razzak 900 Logic,
centroid 97.80%
et al[9] ligatures Fuzzy rule,
distance
Hybrid and
Horizontal HMM
Small
and vertical
Pal et variety
profile, 96.90%
al[2] Characte 7. CONCLUSION
component
rs The reliable Urdu script OCR is still a far cry due to immense
labeling
Segmentati challenges. In particular the Nastaliq style of writing and its
on Ligature geometrical difference from the Naksh style of writing makes
used as a this more challenging. The researchers had tried both online
150
sentence
structural and offline forms of the handwritten text, but haven’t yet been
method, more successful in either of them. Isolated and ligature based
s
trigram recognition for the Urdu script is more enthusiastic parameter
Akram compose
trained on in research so far. Till this date there is no multilingual OCR
and d of
co- 99.40% available, but there is a need to develop algorithms that can
Hussain[ 6075
occurrence incorporate unlimited database as there is high similarity
3] ligatures
information among Arabic script languages.
and
of ligatures
2156
words
and words 8. REFERENCES
in the [1] S.A. Husain, A multi-tier holistic approach for urdu
corpus. Nastaliq recognition, in: Proceedings of the 6th
International Multitopic IEEE Conference (INMIC'02),
Topologica 2002.
l, contour [2] U. Pal, A. Sarkar, Recognition of printed Urdu script,
3050
Pal et and water in: Proceedings of the Seventh International Conference
characte 98%
al[2] reservoir, on Document Analysis and Recognition (ICDAR 2003),
Isolated rs
template 2003.
Character matching.
Recognitio [3] M. Akram, S. Hussain, Word segmentation for urdu
n Pixel OCR system, in: Proceedings of the 8th Workshop on
values Asian Language Resources. Asian Federation for
Zaman 106
using row 95.00% Natural Language Processing, Beijing, China, 2010.
et al[4]
Ligatures
major and [4] S. Zaman, W. Slany, F. Sahito, Recognition of
column segmented Arabic/Urdu characters using pixel values as
their features, in: Proceedings of the 1st International

25
International Journal of Computer Applications (0975 – 8887)
International Conference on Advancements in Engineering and Technology (ICAET 2017)

Conference on Computer and Information Technology the 13th International Multitopic IEEE Conference
(ICCIT'2012), 2012 (INMIC'09), 2009
[5] M.W. Sagheer, C.L. He, N. Nobile, C.Y. Suen, A new [14] M. Riley, Beyond quasi-stationarity: designing time-
large Urdu database for off-Line handwriting frequency representation for speech signals in :
recognition 5716 (2009). Proceedings of the International Conference on
[6] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, Acoustics Speech and Signal Processing(ICASSP87),
D.K. Basu, A novel framework for automatic sorting of vol. 12, 1987 ,pp, 657-660.
postal documents with multi-script address blocks, [15] Nabeel Shahzad,Brandon Paulson and Tracy Hammond
Pattern Recognition 43 (10) (2010) . Urdu Qaeda: Recognition System for Isolated
[7] S. Sardar, A. Wahab, Optical character recognition UrduCharacters IUI 2009 Workshop on Sketch
system for Urdu: online and ofﬂine OCR irrespective of Recognition February 8, 2009, Sanibel Island, Florida
fonts, in: Proceedings of the International Conference Chair: Tracy Hammond
on Information and Emerging Technologies (ICIET), [16] Tabassam Nawaz, Syed Ammar Hassan Shah Naqvi,
Karachi, Pakistan, 2010. Habib ur Rehman & Anoshia Faiz Optical Character
[8] M.I. Razzak, A. Belaïd, S.A. Hussain, Effect of ghost Recognition System for Urdu (Naskh Font) Using
character theory on arabic script based languages Pattern Matching Technique International Journal of
character recognition, in: Proceedings of the WASE Image Processing, (IJIP)Volume (3) : Issue (3)
Global Conference on Image Processing and Analysis [17] Sohail Abdul Sattar Shams-ul Haque Mahmood Khan
(GCIA’09), Taiwan, China, 2009. Pathan “A Finite State Model for Urdu Nastalique
[9] M.I. Razzak, F. Anwar, S.A. Husain, A. Belaïd, M. Optical Character Recognition “,IJCSNS International
Sher, HMM and fuzzy logic: a hybrid approach for Journal of Computer Science and Network Security,
online urdu script-based languages' character VOL.9 No.9, September 2009
recognition, Knowledge Based Systems 23 (8) (2010) [18] Faisal Shafait, Adnan-ul-Hasan, Daniel Keysers, and
[10] S.T. Javed, Investigation into a segmentation based Thomas M. Breuel, “Layout Analysis of Urdu
OCR for the Nastaleeq writing system (Master's thesis). Document Images,” [Multitopic Conference, 2006.
National University of Computer & Emerging Sciences, INMIC '06. IEEE,p. 293 – 298.]
Lahore, Pakistan, 2007. [19] S.A.Hussain, Anwar F., Asma. “Online Urdu Character
[11] Z.A. Shah, Ligature based optical character recognition Recognition System.” MVA2007 IAPR Conference on
of Urdu-Nastaleeq font, in: Proceedings of the 6th Machine Vision Applications.
International Multitopic IEEE Conference (INMIC'02), [20] Liana M & Venu G. (2006). Offline Arabic
2002. Handwriting Recognition: A Survey. IEEE,Transactions
[12] S.T. Javed, S. Hussain, Improving Nastalique-speciﬁc On Pattern Analysis and Machine Intelligence, vol. 28,
pre-recognition process for Urdu OCR, in: Proceedings No. 5, pp. 712-724.I.
of the 13th International Multitopic IEEE Conference [21] R. Safabakhsh and P. Adibi. (2005). Nastaaligh
(INMIC'09), 2009. Handwritten Word Recognition Using a
[13] S.F. Rashid, S.S. Bukhari, F. Shafait, T.M. Breuel, A ContinuousDensity variable-Duration HMM. The
discriminative learning approach for orientation Arabian J. Science and Eng., vol.30, pp. 95-118.
detection of urdu document images, in: Proceedings of

20 TIPS ON HOW NOT TO WRITE AN APPELLATE BRIEF, 32-FEB Pa. Law. 40
No ratings yet
20 TIPS ON HOW NOT TO WRITE AN APPELLATE BRIEF, 32-FEB Pa. Law. 40
4 pages
Project Report OCR
92% (25)
Project Report OCR
50 pages
Cosmotheandric Vision of Raimond Panikkar
No ratings yet
Cosmotheandric Vision of Raimond Panikkar
42 pages
Cse 408
No ratings yet
Cse 408
7 pages
Executive Summary of URD OCR: Waqar Ahmed
No ratings yet
Executive Summary of URD OCR: Waqar Ahmed
4 pages
(IJIT-V3I2P4) : Dr. V.Ajantha Devi, J. Ashifa
No ratings yet
(IJIT-V3I2P4) : Dr. V.Ajantha Devi, J. Ashifa
3 pages
Pattern Recognition
No ratings yet
Pattern Recognition
20 pages
Large Scale Font Independent Urdu Text Recognition System
No ratings yet
Large Scale Font Independent Urdu Text Recognition System
9 pages
Layer 3 1
No ratings yet
Layer 3 1
5 pages
UCOM Offline Dataset-An Urdu Handwritten Dataset Generation: Et Al
No ratings yet
UCOM Offline Dataset-An Urdu Handwritten Dataset Generation: Et Al
7 pages
History Ocr
No ratings yet
History Ocr
2 pages
History Ocr
No ratings yet
History Ocr
2 pages
A Comparative Study of Optical Character Recognition For Tamil Script
No ratings yet
A Comparative Study of Optical Character Recognition For Tamil Script
13 pages
Ligature Analysis-Based Urdu OCR Framework
No ratings yet
Ligature Analysis-Based Urdu OCR Framework
6 pages
Optical Character Recognition of Amharic Documents
No ratings yet
Optical Character Recognition of Amharic Documents
15 pages
Choice of Recognizable Units For Urdu OCR: Gurpreet Singh Lehal
No ratings yet
Choice of Recognizable Units For Urdu OCR: Gurpreet Singh Lehal
7 pages
50 Grammar Sihtet
No ratings yet
50 Grammar Sihtet
14 pages
A Recognition-Based Arabic Optical Character Recognition System
No ratings yet
A Recognition-Based Arabic Optical Character Recognition System
6 pages
Optical Character Recognition (OCR) System
No ratings yet
Optical Character Recognition (OCR) System
5 pages
10 - Chapter 2
No ratings yet
10 - Chapter 2
37 pages
Research Paper On OCR
No ratings yet
Research Paper On OCR
4 pages
Research Paper Urdu Scriptex-1
No ratings yet
Research Paper Urdu Scriptex-1
13 pages
Complexities and Implementation Challenges in of Ine Urdu Nastaliq OCR
No ratings yet
Complexities and Implementation Challenges in of Ine Urdu Nastaliq OCR
8 pages
Implementation of A Statistical Based Arabic Character Recognition System
No ratings yet
Implementation of A Statistical Based Arabic Character Recognition System
4 pages
Applsci 13 04584 With Cover
No ratings yet
Applsci 13 04584 With Cover
28 pages
Urdu Optical Character Recognition OCR Thesis Zaheer Ahmad Peshawar Its Soruce Code Is Available On MATLAB Site 21-01-09
100% (1)
Urdu Optical Character Recognition OCR Thesis Zaheer Ahmad Peshawar Its Soruce Code Is Available On MATLAB Site 21-01-09
61 pages
Irjet V6i6736
No ratings yet
Irjet V6i6736
5 pages
Optical Character Recognition Using MATLAB: Sandeep Tiwari, Shivangi Mishra, Priyank Bhatia, Praveen Km. Yadav
No ratings yet
Optical Character Recognition Using MATLAB: Sandeep Tiwari, Shivangi Mishra, Priyank Bhatia, Praveen Km. Yadav
4 pages
Is204 - 6
No ratings yet
Is204 - 6
27 pages
علي عبد حسين - ماستر عام - OCR
No ratings yet
علي عبد حسين - ماستر عام - OCR
12 pages
Machine Learning Empowered Urdu Characters Recognition Mechanism
No ratings yet
Machine Learning Empowered Urdu Characters Recognition Mechanism
6 pages
JETIR1804232
No ratings yet
JETIR1804232
3 pages
Irjet V6i6736
No ratings yet
Irjet V6i6736
5 pages
Pre SWOT Offline OCR
No ratings yet
Pre SWOT Offline OCR
31 pages
Recognition of Devanagari Printed Text Using Neural Network and Genetic Algorithm
No ratings yet
Recognition of Devanagari Printed Text Using Neural Network and Genetic Algorithm
4 pages
Offline Arabic Text Recognition An Overview
No ratings yet
Offline Arabic Text Recognition An Overview
9 pages
3 M&a
No ratings yet
3 M&a
24 pages
A Survey On Optical Character Recognition For Bangla and Devanagari Scripts
No ratings yet
A Survey On Optical Character Recognition For Bangla and Devanagari Scripts
36 pages
Multimedia and WS-CS 550-Content Analysis v1
No ratings yet
Multimedia and WS-CS 550-Content Analysis v1
27 pages
A Novel Arabic Optical Character Recognition Approach Based On Levenshtein Distance
No ratings yet
A Novel Arabic Optical Character Recognition Approach Based On Levenshtein Distance
11 pages
Optical Character Recognition Techniques
No ratings yet
Optical Character Recognition Techniques
6 pages
OCR For Printed Telugu Documents
No ratings yet
OCR For Printed Telugu Documents
32 pages
Ocr & Cbir
No ratings yet
Ocr & Cbir
13 pages
Optical Character Recognition For Printed Tamil Text Using Unicode
No ratings yet
Optical Character Recognition For Printed Tamil Text Using Unicode
9 pages
Ijarcce 5
No ratings yet
Ijarcce 5
5 pages
Computer Vision CH4
No ratings yet
Computer Vision CH4
9 pages
Script Recognition-A Review: Debashis Ghosh, Tulika Dube, and Adamane P. Shivaprasad
No ratings yet
Script Recognition-A Review: Debashis Ghosh, Tulika Dube, and Adamane P. Shivaprasad
20 pages
Kurdish Optical Character Recognition: Research Article
No ratings yet
Kurdish Optical Character Recognition: Research Article
10 pages
Literature Review
No ratings yet
Literature Review
4 pages
Scale and Rotation Invariant Recognition of Cursive Pashto Script Using SIFT Features
No ratings yet
Scale and Rotation Invariant Recognition of Cursive Pashto Script Using SIFT Features
5 pages
2208 11484v2
No ratings yet
2208 11484v2
31 pages
Text Recognition Handwritten Words
No ratings yet
Text Recognition Handwritten Words
18 pages
Optical Character Recognition (Ocr) : Karan Panjwani T.E - B, 68 Guided By: Prof. Shalini Wankhade
No ratings yet
Optical Character Recognition (Ocr) : Karan Panjwani T.E - B, 68 Guided By: Prof. Shalini Wankhade
24 pages
Fi Pdflatex mk4 - Bezdeklarace
No ratings yet
Fi Pdflatex mk4 - Bezdeklarace
41 pages
10 1109@icirca48905 2020 9183326
No ratings yet
10 1109@icirca48905 2020 9183326
6 pages
Paper 5
No ratings yet
Paper 5
5 pages
English Character Recognition System Using MATLAB
No ratings yet
English Character Recognition System Using MATLAB
48 pages
PostScript Language Essentials: Definitive Reference for Developers and Engineers
From Everand
PostScript Language Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Bosque Programming Language Essentials: The Complete Guide for Developers and Engineers
From Everand
Bosque Programming Language Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Optical Character Recognition Technologies and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Optical Character Recognition Technologies and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Building Software Interpreters: Definitive Reference for Developers and Engineers
From Everand
Building Software Interpreters: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Scheme Language Reference: Definitive Reference for Developers and Engineers
From Everand
Scheme Language Reference: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Plot Diagram
No ratings yet
Plot Diagram
5 pages
MTB College, Sadiq Abad: Assignment (Semester 8)
No ratings yet
MTB College, Sadiq Abad: Assignment (Semester 8)
1 page
Finite Diffrnce
No ratings yet
Finite Diffrnce
20 pages
Assignment No - 1
No ratings yet
Assignment No - 1
1 page
Constituency Grammars
No ratings yet
Constituency Grammars
31 pages
Introduction To Nintendo DS Programming
No ratings yet
Introduction To Nintendo DS Programming
47 pages
GURO21 Course 1 Required Readings PDF
No ratings yet
GURO21 Course 1 Required Readings PDF
57 pages
(Cross - Cultures) Cynthia Vanden Driesen - Writing The Nation - Patrick White and The Indigene-Rodopi (2009)
100% (1)
(Cross - Cultures) Cynthia Vanden Driesen - Writing The Nation - Patrick White and The Indigene-Rodopi (2009)
244 pages
Oba 5
No ratings yet
Oba 5
3 pages
Summerian-Civilization2 0
No ratings yet
Summerian-Civilization2 0
13 pages
Pascal's Wager, James-Clifford Debate and The Problem of Evil
No ratings yet
Pascal's Wager, James-Clifford Debate and The Problem of Evil
5 pages
Prince, Fairy, Why, Because, Clothes, Let, Put On, Before, Have To, Try On, Fit 2. Why ? Because
No ratings yet
Prince, Fairy, Why, Because, Clothes, Let, Put On, Before, Have To, Try On, Fit 2. Why ? Because
7 pages
International Art Directors
No ratings yet
International Art Directors
5 pages
Analysis of William Somerset Maugham
No ratings yet
Analysis of William Somerset Maugham
3 pages
Java
No ratings yet
Java
381 pages
D2066 Quasar Manual v1 050728
No ratings yet
D2066 Quasar Manual v1 050728
90 pages
Case Study
No ratings yet
Case Study
3 pages
Subject Link 2 (2nd) - Word Test - WORD
No ratings yet
Subject Link 2 (2nd) - Word Test - WORD
16 pages
IR 25 v7.0 2
No ratings yet
IR 25 v7.0 2
40 pages
Reasons Why Jews and Samaritans Don't Get Along With Each Other 2
No ratings yet
Reasons Why Jews and Samaritans Don't Get Along With Each Other 2
24 pages
Yami Language
No ratings yet
Yami Language
8 pages
Geraline C. Fermin - English3B - Module 3
No ratings yet
Geraline C. Fermin - English3B - Module 3
4 pages
What Is Apache Cordova?: HTML CSS Javascript
No ratings yet
What Is Apache Cordova?: HTML CSS Javascript
54 pages
Firat Terms Exam KG-I Course Papaer 2024-2025
No ratings yet
Firat Terms Exam KG-I Course Papaer 2024-2025
2 pages
Java CAPS 5.1.3 Cobol CopyBook Converter User's Guide
No ratings yet
Java CAPS 5.1.3 Cobol CopyBook Converter User's Guide
67 pages
PASSIVE: Present: Grammar Worksheet
100% (1)
PASSIVE: Present: Grammar Worksheet
2 pages
LR W10 Aotw
No ratings yet
LR W10 Aotw
4 pages
Annual - 2
No ratings yet
Annual - 2
3 pages
Cambridge IGCSE ™: Hindi As A Second Language 0549/02
No ratings yet
Cambridge IGCSE ™: Hindi As A Second Language 0549/02
5 pages
The Epic of Gilgamesh
67% (3)
The Epic of Gilgamesh
31 pages
Resistance Bell Hooks
No ratings yet
Resistance Bell Hooks
4 pages
Mushroom Classification Using Machine Learning
No ratings yet
Mushroom Classification Using Machine Learning
23 pages

Optical Character Recognition Techniques in Urdu-A Survey: Vippon Preet Kour Naveen Kumar Gondhi

Uploaded by

Optical Character Recognition Techniques in Urdu-A Survey: Vippon Preet Kour Naveen Kumar Gondhi

Uploaded by

International Journal of Computer Applications (0975 – 8887)

International Conference on Advancements in Engineering and Technology (ICAET 2017)

Optical Character Recognition Techniques in Urdu- A

Vippon Preet Kour Naveen Kumar Gondhi

ABSTRACT desire, demand and popularity of the Urdu literature, there is a

1. INTRODUCTION 2.3 Context sensitivity

3.1 Urdu Script 4.2 Preprocessing

3.4 Roman script 5. STEPS OF AN OCR PROCESS

Fig. 2: Steps of OCR for cursive languages:

5.1 Input or scanned pages

6. APPROACHES b11 b12 b13 b14

6.1 Segmentation based approaches

6.4 Unicode Mapping

construct one segment (under segmentation). One character major

You might also like