Feature Extraction Phase
Feature Extraction Phase
Feature extraction is the operation of extracting the pertinent features from objects or
alphabets to build feature vectors.
These feature vectors are then utilized by classifiers to identify the input unit with objective
output unit. It becomes effortless for the classifier to classify between dissimilar classes by
glancing at these features as it becomes fairly easy to determine.
Several techniques are proposed for extracting features from the segmented characters in
literature. U. Pal et al have proposed directional chain code features and zoning and for
handwritten numeral recognition considered a feature vector of length 100 and have presented
a high level of recognition accuracy. But, the feature extraction process is time consuming and
complex.
Following to some proposes, there are two major classes of features: statistical features and
structural features.
In a character matrix statistical features are obtained from statistical distribution of every point
such as zoning, moments, crossings, fourier transforms and projection histograms. Statistical
features are also notable as global features as they are usually averaged and extracted in sub-
images such as meshes. Initially, statistical features are supplied to recognize machine printed
characters.
On the other hand, structural or topological features are concern to the geometry of the
character set to be contemplated. Some of these features are convexities and concavities in the
characters, number of holes in the characters, number of end points etc.
Classification Phase
OCR systems broadly utilize the methodologies of pattern recognition, which assigns each
example to a predefined class.
Classification is the procedure of distributing inputs with respect to detected information to
their comparing class in order to create groups with homogeneous qualities, while segregating
different inputs into different classes. Classification is conveyed out on the premise of put away
features in the feature space, for example, structural features, global features and so forth.
It can be said that classification isolates the feature space into several classes taking into
account the decision rule.
Choosing classifier depends on several agents, such as, number of free parameters, available
training set and so forth. Various procedures for OCR are explored by the scientists.
Techniques of OCR classification can be categorized as Statistical Techniques, Neural Networks,
Template Matching, Support Vector Machine (SVM) algorithms, and Combination of classifier.
Template matching
This is the least complex method for character recognition, in view of matching the stored
models against the word or character to be perceived. By gathering of shapes, pixels, curvature
and so forth, the operation of matching decides the level of similitude -between two vectors. A
gray-level or binary input character is contrasted with a standard arrangement of stored
models. The recognition rate of this strategy is extremely delicate to noise and input
disfigurement.
Statistical Techniques
Hypothesis of Statistical decision is treating with statistical decision capacities and an
arrangement of optimality criteria, which for a given model of a specific class can amplify the
likelihood of the observed pattern.
The main statistical methods that are performed in the area of OCR are Nearest Neighbor (NN),
Likelihood or Bayes classifier, Clustering Analysis, Hidden Markov Modelling (HMM), Fuzzy Set
Reasoning, and Quadratic classifier.
Neural Networks
Character classification issue is identified with heuristic rationale as people can perceive
characters and records by their learning and experience. Thus neural networks which are pretty
much heuristic in nature are greatly appropriate for this type of issue.
A neural network is an ascertaining architecture that includes enormously parallel
interconnection of flexible node processors. Output from one node is reinforcing to the next
one in the network and an official choice relies on the complicated collaboration of all nodes. As
a result of its similar character, it can apply calculations at a rate higher contrasted with the
traditional strategies.
Feed-forward neural networks and feedback neural networks can be thought as categorization
of neural network architectures. And Table compares and discusses some recent proposed OCR
applications based on Neural Network.
Kernel Methods
While the most imperative kernel strategies are support Vector Machines, techniques such as
Kernel Fisher Discriminant Analysis (KFDA) and Kernel Principal Component Analysis (KPCA) also
employ kernel method.
Support vector machines (SVM) are one of the most widely used and most effective supervised
learning techniques that can be used for binary or multi-class classification.
In classification techniques, by convention the data set first is partitioned into training and
testing sets. The objective of SVM is to deliver a model, which predicts the output of the test
set. Width of the edge between the classes is the enhancement rule, the unfilled zone around
the decision boundary characterized by the interval to the closest training example.
Combination Classifier
Different classification strategies have their own particular advantages and shortcomings. Thus
ordinarily various classifiers are consolidated together to solve a given classification problem.