Synopsis Sample
Synopsis Sample
On
SUBMITTED BY:
i
Table of Contents
Introduction to OCR
Objectives
Future scope
ii
OPTICAL CHARACTER RECOGNITION (OCR)
Optical character recognition (OCR) is the mechanical or electronic conversion of
scanned images of handwritten, typewritten or printed text into machine-encoded text. It
is widely used as a form of data entry from some sort of original paper data source,
whether documents, sales receipts, mail, or any number of printed records. It is a common
method of digitizing printed texts so that they can be electronically searched, stored more
compactly, displayed on-line, and used in machine processes such as machine
translation, text-to-speech and text mining. OCR is a field of research in pattern
recognition, artificial intelligence and computer vision.
Early versions needed to be programmed with images of each character, and worked on
one font at a time. "Intelligent" systems with a high degree of recognition accuracy for
most fonts are now common. Some systems are capable of reproducing formatted output
that closely approximates the original scanned page including images, columns and other
non-textual components.
Early optical character recognition could be traced to activity around two issues:
expanding telegraphy
Creating reading devices for the blind.
OCR Software
Desktop and server OCR software
WebOCR & OnlineOCR
Application-oriented OCR
iii
Objectives
Problem identification:
Objectives:
iv
Features and Advantages
The best part of the project is that it does not scan an image rather it takes the
input as the sequence of mouse positions sampled as the text is being written
The basic classification mechanism is generic by constructing a suitable knowledge
base we expect to recognize other scripts as well.
The project can be easily implemented using the basic DOS (.dat) file instead of a
heavier database which reduces the size of the project considerably.
v
Future Scope
The artificial intelligence being the hot topic of research now-a-days is very vast and
flexible. Some of the future aspects of the project are:-
The project till now is just for recognising the independent characters. A database
of words can be implemented as dictionary which will automatically find the correct
words.
Implementing use of dictionary words may improve the performance of OCR
system.
One can also be implemented for classifying hand-written text.
OCR can also be implemented online.
vi
Platform for development
The platforms used in the development for the project are
JAVA: Java is an object oriented language, which means that it is centered on creating,
manipulating and connecting objects, thus allowing modularization of the program. It
is easy to learn, incorporate and debug. Also, it is platform independent, and can be
moved from one system to another without much difficulty. Also, Java allows
simplified development of Graphic User Interface. GUI is very essential, since
applications that require commands are not easily accepted by the layman, as GUI
provides a user-friendly approach. Connectivity to database is also very
straightforward and the programs are robust and reliable.
As long as a computer has a Java VM (Virtual Machine), a Java program can run on
these machines,
Windows 2000
Linux
Solaris
Mac OS
The Java platform differs from most other platforms in that it's a software-only
platform that runs on top of other hardware-based platforms. The Java platform has
two components,
Java Virtual Machine (Java VM)
Java Application Programming Interface (Java API)
Kohonen Neural Networks: The Kohonen neural network differs both in how it is trained
and how it recalls a pattern. The Kohohen neural network does not use any sort of
activation function. Further, the Kohonen neural network does not use any sort of a
bias weight. Output from the Kohonen neural network does not consist of the output
of several neurons.
vii