Creating Optical Character Recognition (OCR) Applications Using Neural Networks - CodeProject
Creating Optical Character Recognition (OCR) Applications Using Neural Networks - CodeProject
How the use of neural network can simplify coding of OCR applications.
Introduction
A lot of people today are trying to write their own OCR (Optical Character Recognition) System or to improve the quality of an existing one.
This article shows how the use of artificial neural network simplifies development of an optical character recognition application, while achieving highest quality of recognition and good
performance.
Background
Developing proprietary OCR system is a complicated task and requires a lot of effort. Such systems usually are really complicated and can hide a lot of logic behind the code. The use of
artificial neural network in OCR applications can dramatically simplify the code and improve quality of recognition while achieving good performance. Another benefit of using neural network
in OCR is extensibility of the system ability to recognize more character sets than initially defined. Most of traditional OCR systems are not extensible enough. Why? Because such task as
working with tens of thousands Chinese characters, for example, is not as easy as working with 68 English typed character set and it can easily bring the traditional system to its knees!
Well, the Artificial Neural Network (ANN) is a wonderful tool that can help to resolve such kind of problems. The ANN is an information-processing paradigm inspired by the way the human
brain processes information. Artificial neural networks are collections of mathematical models that represent some of the observed properties of biological nervous systems and draw on the
analogies of adaptive biological learning. The key element of ANN is topology. The ANN consists of a large number of highly interconnected processing elements (nodes) that are tied
together with weighted connections (links). Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true for ANN as well.
Learning typically occurs by example through training, or exposure to a set of input/output data (pattern) where the training algorithm adjusts the link weights. The link weights store the
knowledge necessary to solve specific problems.
Originated in late 1950's, neural networks didnt gain much popularity until 1980s a computer boom era. Today ANNs are mostly used for solution of complex real world problems. They are
often good at solving problems that are too complex for conventional technologies (e.g., problems that do not have an algorithmic solution or for which an algorithmic solution is too
complex to be found) and are often well suited to problems that people are good at solving, but for which traditional methods are not. They are good pattern recognition engines and robust
classifiers, with the ability to generalize in making decisions based on imprecise input data. They offer ideal solutions to a variety of classification problems such as speech, character and
signal recognition, as well as functional prediction and system modeling, where the physical processes are not understood or are highly complex. The advantage of ANNs lies in their
resilience against distortions in the input data and their capability to learn.
https://www.codeproject.com/Articles/3907/Creating-Optical-Character-Recognition-OCR-appli 1/7
6/30/2017 Creating Optical Character Recognition (OCR) applications using Neural Networks - CodeProject
Lets assume you that you already have gone through all image pre-processing routines (resampling, deskew, zoning, blocking etc.) and you already have images of the characters from your
document. (In the example I simply generate those images).
The nodes in the Backpropagation neural network are interconnected via weighted links with each node usually connecting to the next layer up, till the output layer which provides output for
the network. The input pattern values are presented and assigned to the input nodes of the input layer. The input values are initialized to values between -1 and 1. The nodes in the next layer
receive the input values through links and compute output values of their own, which are then passed to the next layer. These values propagate forward through the layers till the output
layer is reached, or put another way, till each output layer node has produced an output value for the network. The desired output for the input pattern is used to compute an error value for
each node in the output layer, and then propagated backwards (and here's where the network name comes in) through the network as the delta rule is used to adjust the link values to
produce better, the desired output. Once the error produced by the patterns in the training set is below a given tolerance, the training is complete and the network is presented new input
patterns and produce an output based on the experience it gained from the learning process.
I override the Train method of the base class to implement my own training method. Why do I need to do it? I do it because of one simple reason: the training progress of the network is
measured by quality of produced result and speed of training. You have to establish the criteria when the quality of network output is acceptable for you and when you can stop the training
process. The implementation I provide here is proven (based on my experience) to be fast and accurate. I decided that I can stop the training process when network is able to recognize all of
the patterns, without a single error. So, here is the implementation of my training method.
Also, I have implemented a BestNodeIndex property that returns the index of the node having maximum value and having the minimal error. An OutputPatternIndex method
returns the index of the pattern output element having value of 1. If those indices are matched the network has produced correct result. Here is how the BestNodeIndex
implementation looks like:
https://www.codeproject.com/Articles/3907/Creating-Optical-Character-Recognition-OCR-appli 2/7
6/30/2017 Creating Optical Character Recognition (OCR) applications using Neural Networks - CodeProject
get {
int result = -1;
double aMaxNodeValue = 0;
double aMinError = double.PositiveInfinity;
for (int i = 0; i< this.OutputNodesCount;i++)
{
NeuroNode node = OutputNode(i);
//Look for a node with maximum value or lesser error
if ((node.Value > aMaxNodeValue)||
((node.Value >= aMaxNodeValue)&&(node.Error <aMinError)))
{
aMaxNodeValue = node.Value;
aMinError = node.Error;
result = i;
}
}
return result;
}
}
As simple as it gets I create the instance of the neural network. The network has one constructor parameter integer array describing number of nodes in each layer of the network. First layer
in the network is an input layer. The number of elements in this layer corresponds to number of elements in input pattern and is equal to number of elements in digitized image matrix (we
will talk about it later). The network may have multiple middle layers with different number of nodes in each layer. In this example I use only one layer and apply not official rule of thumb to
determine number of nodes in this layer:
Note: You can experiment by adding more middle layers and using different number of nodes in there - just to see how it will affect the training speed and recognition quality of the
network.
The last layer in the network is an output layer. This is the layer where we look for the results. I define the number of nodes in this layer equal to a number of characters that we going to
recognize.
The Inputs array contains your input data. In our case it is a digitized representation of the character's image. Under digitizing the image I mean process of creating a brightness (or
absolute value of the color vector-whatever you choose) map of the image. To create this map I split the image into squares and calculate average value of each square. Then I store those
values into the array.
I have implemented CharToDoubleArray method of the network to digitize the image. There I use an absolute value of the color for each element of the matrix. (No doubt that you can
use other techniques there) After the image is digitized, I have to scale-down the results in order to fit them into a range from -1 ..1 to comply with input values range of the network. To do
https://www.codeproject.com/Articles/3907/Creating-Optical-Character-Recognition-OCR-appli 3/7
6/30/2017 Creating Optical Character Recognition (OCR) applications using Neural Networks - CodeProject
this I wrote a Scale method, where I look for the maximum element value of the matrix and then divide all elements of the matrix by it. So, implementation of CharToDoubleArray
looks like this:
The Outputs array of the pattern represents an expected result the result that network will use during the training. There are as many elements in this array as many characters we going
to recognize. So, for instance, to teach the network to recognize English letters from A to Z we will need 25 elements in the Outputs array. Make it 50 if you decide to include lower case
letters. Each element corresponds to a single letter. The Inputs of each pattern are set to a digitized image data and a corresponding element in the Outputs array to 1, so network will
know which output (letter) corresponds to input data. The method CreateTrainingPatterns does this job for me.
Now we have completed creation of patterns and we can use those to train the neural network.
Normally, an execution flow will leave this method when training is complete, but in some cases it could stay there forever (!).The Train method is currently implemented relying only on
one fact: the network training will be completed sooner or later. Well, I admit - this is wrong assumption and network training may never complete. The most popular reasons for neural
network training failure are:
2. The training patterns are not clear enough, not precise or are too As a solution you can clean the patterns or you can use different type of network /training algorithm. Also,
complicated for the network to differentiate them. you cannot train the network to guess next winning lottery numbers... :-)
3. Your training expectations are too high and/or not realistic. Lower your expectations. The network could be never 100% "sure"
https://www.codeproject.com/Articles/3907/Creating-Optical-Character-Recognition-OCR-appli 4/7
6/30/2017 Creating Optical Character Recognition (OCR) applications using Neural Networks - CodeProject
Most of those reasons are very easy to resolve and it is a good subject for a future article. Meanwhile, we can enjoy the results.
In order to use the network you have to load your data into input layer. Then use the Run method to let the network process your data. Finally, get your results out from output nodes of the
network and analyze those (The BestNodeIndex property I created in OCRNetwork class does this job for me).
License
This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)
Share
EMAIL TWITTER
Unicode Optical Character Recognition Window Tabs (WndTabs) Add-In for DevStudio
SAPrefs - Netscape-like Preferences Dialog WTL for MFC Programmers, Part IX - GDI Classes, Common Dialogs, and
Utility Classes
Search Comments Go
Add a Comment or Question
https://www.codeproject.com/Articles/3907/Creating-Optical-Character-Recognition-OCR-appli 5/7
6/30/2017 Creating Optical Character Recognition (OCR) applications using Neural Networks - CodeProject
First Prev Next
Using this library for recognizing languages other than English characters
Member 10375740 30-Sep-15 3:43
plate recognition
merve123 5-Mar-14 8:36
qestion
aymandesoky 29-Mar-13 22:43
My vote of 5
manoj kumar choubey 11-Apr-12 8:31
My vote of 5
Norm .net 1-Apr-11 3:34
My vote of 5
Member 3837040 4-Jan-11 20:43
My vote of 4
Tanuj Srivastava 25-Aug-10 4:05
DTW
metyouba 14-Mar-10 6:06
Help needed....
deepak7777 15-Dec-09 23:44
Re: Help needed....
Kayode Nubi 25-May-10 1:46
Liscence?
Barbarrosa 11-May-08 9:38
thanks :)
*Jori* 22-Apr-08 17:46
https://www.codeproject.com/Articles/3907/Creating-Optical-Character-Recognition-OCR-appli 6/7
6/30/2017 Creating Optical Character Recognition (OCR) applications using Neural Networks - CodeProject
what are pros and cons in current ocr..
murali488 25-Aug-07 2:39
License number
Grimmsimon 23-Jan-07 7:41
Re: License number
arorahere 31-May-07 6:02
OCR Project.
Prasanna Vignesh 2-Dec-06 1:00
Some Documentation
adeel2 1-Jun-06 6:08
Error rate
tuonginfo 10-May-06 9:18
other ways
jianloong 8-Feb-06 22:24
Refresh 1 2 3 4 Next
General News Suggestion Question Bug Answer Joke Praise Rant Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.
Permalink | Advertise | Privacy | Terms of Use | Mobile Layout: fixed | fluid Article Copyright 2003 by Alex Cherkasov
Velg sprk
Web02 | 2.8.170628.1 | Last Updated 2 Sep 2004 Everything else Copyright CodeProject, 1999-2017
https://www.codeproject.com/Articles/3907/Creating-Optical-Character-Recognition-OCR-appli 7/7