A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
Follow
Is your email address OK? You are signed up for our newsletters but your email address is either unconfirmed, or
has not been reconfirmed in a long time. Please click here to have a confirmation email sent so we can confirm
your email address and start sending you newsletters again. Alternatively, you can update your subscriptions.
OCR are some times used in signature recognition which is used in bank
OCR's are known to be used in radar systems for reading speeders license plates and lot other things
A Detailed Look on the OCR Implementation and its use in this Paper
The goal of Optical Character Recognition (OCR) is to classify optical patterns (often contained
Involves several steps including segmentation, feature extraction, and classification. Each of
Implementation of OCR.
One example of OCR is shown below. A portion of a scanned image of text, borrowed from the
Web, is shown along with the corresponding (human recognized) characters from that text.
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 1/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
Of descriptive bibliographies of authors and presses. His ubiquity in the broad field of bibliographical and textual study, his
seemingly complete possession of it, distinguished him from his illustrious predecessors and made him the personification of
bibliographical scholarship in his time.
A few examples of OCR applications are listed here. The most common for use OCR is the first
Item, people often wish to convert text documents to some sort of digital representation.
1. People wish to scan in a document and have the text of that document available in a word processor.
2. Recognizing license plate numbers
3. Post Office needs to recognize zip codes
Friendly ship?
Training and testing. These steps can be broken down further into sub-steps.
1. Training
c. Model Estimation – from the finite set of feature vectors, need to estimate a model
2. Testing
a. Pre-processing
b. Feature extraction – (both same as above)
c. Classification – Compare feature vectors to the various models and find the
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 2/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
OCR – Pre-processing
These are the pre-processing steps often performed in OCR
Binarization – Usually presented with a grayscale image, binarization is then simply a matter of choosing a threshold value.
Morphological Operators – Remove isolated specks and holes in characters, can use the majority operator.
Segmentation – Check connectivity of shapes, label, and isolate.
Segmentation is by far the most important aspect of the pre-processing stage. It allows the
Recognizer to extract features from each individual character. In the more complicated case of
Handwritten text, the segmentation problem becomes much more difficult as letters tend to be
Think of each character as a PDF. The 2-D moments of the character are:
4. Skewness
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 3/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
5. Kurtosis
6. Higher order moments
7. Hough and Chain code transform
8. Fourier transform and series
There are different methods for feature extraction or finding an image descriptor, these methods lie into two categories
All the above methods uses the contour of the object to collect the object’s features.
ALGORITHIM REQURIMENTS
The algorithm we needed for this OCR had to satisfy requirements
And so we decided to implement for this OCR in particular is the chain code
Freeman’s chain code is one of the best and easiest methods for texture recognition
Freeman designed the chain-code in 1964 (also, the chain code is known to be a good method for image encoding but here we are
using it as a method for feature extraction)
Although the chain code is a compact way to represent the contour of an abject yet is has some serious draw back when used as a
shape descriptor
Chain-code Drawbacks
1. It works only on contoured shapes (which in our case means characters should be inputted in bold font )
2. it is so sensitive to noise as the errors are cumulative
3. The starting point of the chain code, and the orientation and scale of a contour affects the chain code.
Therefore, the chain correlation scheme, which can be used to match two chains, suffers from these drawbacks
N.B: How ever the scaling draw back was partially overcame
The only preprocessing step we will discuss in this paper is the edge detection or contouring.
I.e. if we can detect the filling and remove it then the remains are the edges.
So all we need is to find a unique property of the filling and apply this
Filling property:
Pseudo-code:
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 4/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
Begin
WHILE (! end of image)
Search original image for black pixel
If (the eight surrounding are black)
Then
This pixel is filling and we should remove it
Go to next-pixel
WHILEEND
END
There are many methods to implement the chain-code and all of them lead to the idea of partitioning the target objects
And so the one we used is based on the idea of partitioning the object into tracks and sectors and then apply the chain-code in each
sector for getting the pixels relations and saving it on a file.
How do we achieve this level of partitioning on the contoured object and extracting the feature vector it?
Achieving a sectors tracks partitioned object and extracting the feature vector from it involves applying the following steps
A DETAILED LOOK ON HOW TO achieve this level of partitioning of the contoured object.
The center of mass of any texture (in our case character) is driven by the following equation:
Xc=∑x/ ∑∑f(x, y)
Yc= ∑y/ ∑∑f(x,y)
In English, this means that the X coordinate of the centroid is the sum of all the positions of x coordinate of all the pixels in the object
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 5/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
Therefore, the Y coordinate of the centroid is the sum of all the positions of y coordinate of all the pixels in the object divided by the
number of pixel of the object
To get the longest radius we have to calculate the distance between the centriod (center of mass) and every other pixel of the
contoured object and find the maximum length and this would be the longest radius
Thanks to Pythagoras who invented Pythagoras theorem, which gives us the distance between any, two points and it states:
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 6/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
The track step is the distance between any two adjacent tracks, which will be used to identify the pixels position i.e. in which track
The track step equals max-radius dived by the predefined number of tracks i.e. (Track_Step=M-radius/No’ of tracks)
(Which is based on the default number of tracks used i.e. the same number of tracks must be used for both training and testing)
The sector step will be used to know under which sector does a pixel lie.
6- DIVIDE THOSE VIRTUAL TRACKS INTO EQUAL SECTORS USING THE SECTOR-STEP
(Which is based on the default number of sectors used i.e. the same number of sectors must be used for both training and testing);
In other words using the sector-step and the track-step we can identify under which sector and which track a pixel lies.
First, we have to get Ө, which is the angle between the pixel and the x-axis
Өi=tan-1(y-Yc/x-Xc)
Finding pixel relation is by far the most important and easiest step in the feature extraction step since it some how describes the
shape.
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 7/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
Pseudo-code
Begin
For every pixel surrounding the target pixel
Moving clock-wise from north direction
If a pixel is, present surrounding the target pixel
Then
Store its position
Go to next target pixel
End if
Next
END
By now, you should be able to identify for any pixel in which track and sector it lies on and its relation with its neighboring pixels.
The feature vector is simply the collection of the features of every pixel sorted by the track i.e. check the position of each pixel and
add its properties to the feature vector. E.g. if we have two pixels lie in sector 1 and track 1 with the same relation of 4 then in cell4 in
the relations table the one in t1 and s1 will be incremented twice since there are TWO pixels having this property
Actually, the only draw back that was solved was the scaling problem and it was partially solved (i.e. up scaling only).
How to over come the up scaling draw back of the chain code?
It has been solved by dividing the feature vector by the number of pixels in the object so this way the number of pixels (size of
character) has no effect on the feature vector.
Classification
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 8/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
Classification is the process of identifing the unknown object
1. Neural networks
2. Support vecotor machines
3. K-nearest neighbor
4. Eculidian distance
In this article we discuss eculidian distance which is a variation from the knn
Simply the eculidian distance is calculating the distnace between the relations
How ever, for down scaling the recognition rate is very poor
When we tested our OCR on hand written there where two VI observations that affects the recognition rate
1st people tend to use different fonts than the on it’s trained
2nd objects with curves and like character “C” &”O”&”Q” ESPECIALLY were not recognized at all perhaps because most people used
very bad handwriting
The handwritten test data we used where from two different sources
1. We asked people to draw the characters using the paintbrush and so we got so little number of volunteers actually one and
my self.
2. the 2nd source was hand written letters on a piece of paper and we had approximately 7 samples of 3 different hand written
letters
As for source, number two the recognition rate was poorer than what was expected, it was approximately 57% for most of the
samples
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 9/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
The following is teaching sample the engine was originally trained for:
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 10/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)
Share
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 11/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
Hussein El Saadi
No Biography provided
Follow
Egypt
this Member
Throws Exception
Member 13070685 2-Apr-18 4:35
help
Member 13374696 23-Dec-17 19:17
Help plz..
Member 11579679 12-May-15 18:00
Question
Member 11579679 11-May-15 15:27
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 12/13
15/9/2019 A C# Project in Optical Character Recognition (OCR) Using Chain Code - CodeProject
monotype cursiva
salimkadri 5-Mar-14 0:09
please reply
Member 10454183 11-Feb-14 14:21
pleasee help
Member 10454183 14-Jan-14 14:38
please help
Member 10454183 14-Jan-14 14:26
My vote of 5
al13n 12-Jan-14 17:17
Please Help!
Member 10349626 21-Oct-13 2:06
My Vote is 5
thaer.sqor 5-Mar-13 12:37
Refresh 1 2 3 Next »
General News Suggestion Question Bug Answer Joke Praise Rant Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.
https://www.codeproject.com/Articles/160868/A-C-Project-in-Optical-Character-Recognition-OCR-U 13/13