Table of Content
Table of Content
Table of Content
Table of Content………………………………………………………………….1
Abstract …………………………………………………………………………..2
Introduction ……………………………………………………………………....2
Existing system ……………………………………………………………………3
Proposed system …………………………………………………………………..4
Problem statement ………………………………………………………………...4
Algorithms and techniques ………………………………………………………..5
Implementation ……………………………………………………………………6
Model evaluation and validation…………………………………………………..6
Conclusion …………………………………………………………………………7
References ………………………………………………………………………….7
1
ABSTRACT
The human visual system is one of the wonders of the world. Consider the following
sequence of handwritten digits:
Most people effortlessly recognize those digits as 5,6,8. That ease is deceptive. We carry in
our heads a supercomputer, tuned by evolution over hundreds of millions of years, and
superbly adapted to understand the visual world. Recognizing handwritten digits isn't easy.
Rather, we humans are stupendously, astoundingly good at making sense of what our eyes
show us. But nearly all that work is done unconsciously. And so we don't usually appreciate
how tough a problem our visual systems solve.
The difficulty of visual pattern recognition becomes apparent if you attempt to write a
computer program to recognize digits like those above. What seems easy when we do it
ourselves suddenly becomes extremely difficult. Simple intuitions about how we recognize
shapes - "a 9 has a loop at the top, and a vertical stroke in the bottom right" - turn out to be
not so simple to express algorithmically. When you try to make such rules precise, you
quickly get lost in a morass of exceptions and caveats and special cases. It seems hopeless.
INTRODUCTION
2
amount and convert them into machine-encoded form. Its application is found in optical
character recognition, transcription of handwritten documents into digital documents and more
advanced intelligent character recognition systems.
Basically, the algorithm takes an image (image of a handwritten digit) as an input and outputs
the likelihood that the image belongs to different classes (the machine-encoded digits, 1–9). In
this blog post, I will elaborate on my approach to solving this problem with a combination of
machine learning techniques.
Existing System:
The existing system uses images and breaks to pixels in identifying the numbers and
their sequences of pixels. This makes identification very difficult and the results always
varies in real time.
3
No perfect visualization.
Parallel processing is not possible in clustering Analysis
Mining techniques does not work well with the big data in analysis when we occurred
with multiple column analysis.
Proposed System:
We're focusing on handwriting recognition because it's an excellent prototype
problem for learning about neural networks in general. As a prototype it hits a sweet spot: it's
challenging - it's no small feat to recognize handwritten digits - but it's not so difficult as to
require an extremely complicated solution, or tremendous computational power.
Advantages:
We propose a system where we use statistical analysis with sampling data in the
analysis.
Considering the data visualization which is not done in the bigdata analysis.
Python has good graphical libraries.
The output is more effective using graphical libraries in Python.
Problem Statement
The handwritten digits are not always of the same size, width, orientation and justified to
margins as they differ from writing of person to person, so the general problem would be
while classifying the digits due to the similarity between digits such as 1 and 7, 5 and 6, 3 and
8, 2 and 5, 2 and 7, etc. This problem is faced more when many people write a single digit
with a variety of different handwritings. Lastly, the uniqueness and variety in the handwriting
of different individuals also influence the formation and appearance of the digits. Now we
introduce the concepts and algorithms of deep learning and machine learning.
It has been shown that Support Vector Machines (SVMs) can be applied to image and hand-
written character recognition [4]. SVMs are effective in high dimensional spaces, hence it
makes sense to use SVMs for this study given the high dimensionality of our input space, i.e.
4
784 features. However, SVMs don’t perform well in large datasets as the training time
becomes cubic in the size of the dataset. This could be an issue as our dataset containing
42,000 samples which is quite large. To deal with this issue, we will adopt a technique
proposed by a study conducted at the University of California, Berkeley, which is to train a
support vector machine on the collection of nearest neighbours in a solution they called
“SVM-KNN” [2]. Training an SVM on the entire data set is slow and the extension of SVM to
multiple classes is not as natural as Nearest Neighbor (NN). However, in the neighbourhood of
a small number of examples and a small number of classes, SVMs often perform better than
other classification methods.
We use NN as an initial pruning stage and perform SVM on the smaller but more relevant set
of examples that require careful discrimination. This approach reflects the way humans
perform coarse categorization: when presented with an image, human observers can answer
coarse queries such as presence or absence of an animal in as little as 150ms, and of course,
can tell what animal it is given enough time [6]. This process of a quick categorization,
followed by successive finer but slower discrimination was the inspiration behind the “SVM-
KNN” technique.
Implementation
Our simple implementation of SVM-KNN goes as follows: for a query, we compute the
Euclidean distances of the query to all training examples and pick the K nearest neighbours. If
the K neighbours have all the same labels, the query is labelled and exit. Else, we compute the
pairwise distances between the K neighbours, convert the distance matrix to a kernel matrix
and apply multiclass SVM. We finally use the resulting classifier to label the query.
5
Model Evaluation and Validation
In our initial implementation, we extract 60 principal components and use parameters values
of k=2 for KNN and C=1.0 for SVM. During development, a validation set was used to
evaluate the model. I split the dataset into training and test sets. The final hyperparameters
6
were chosen because they performed the best amongst the tried combinations. A final value
of k=3 and C=0.5 yielded the best results. A low k value makes sense for our model because
we are trying to find the few samples where NN has a hard time establishing a decision
boundary and apply SVM to perform a more coarse-grained classification.
CONCLUSION
The classification accuracy of 0.9714 is better than that of the benchmark (0.93514).
Therefore we can conclude that our model is adequate for solving the problem of classifying
handwritten characters in the MNIST dataset as it is able to accurately categorize well with an
accuracy quite close to humans. However, our model is useful in a limited domain. Some
changes would have to be made to solve a bigger problem of recognizing multiple digits in an
image or recognizing arbitrary multi-digit text in unconstrained natural images.
Source Code
https://github.com/briceicle/capstone/blob/master/model.py
https://github.com/briceicle/capstone/tree/master/data
References
[1] http://yann.lecun.com/exdb/publis/pdf/matan-90.pdf
[2] http://www.vision.caltech.edu/Image_Datasets/Caltech101/nhz_cvpr06.pdf
[3] http://www.johnwinn.org/Publications/papers/WinnCriminisi_cvpr2006_video.pdf
[4] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.441.6897&rep=rep1&type=pdf
[5] https://en.wikipedia.org/wiki/Support_vector_machine#Applications
[6] https://www.ncbi.nlm.nih.gov/pubmed/8632824
[7] https://github.com/chefarov/ocr_mnist/blob/master/papers/knn_MNIST.pdf