1822 B.E Cse Batchno 4
1822 B.E Cse Batchno 4
Abstract;
• We presents a camera-based label reader to help blind persons to read
names of labels on the products. Camera acts as main vision in detecting the
label image of the product or board then image is processed internally and
separates label from image by using open CV library and finally identifies
the product and identified product name is pronounced through voice.
• Now it identifies received label image is converted to text by using tesseract
library. Once the identified label name is converted to text and converted
text is displayed on display unit connected to controller. Now converted text
should be converted to voice to hear label name as voice through ear phones
connected to audio jack port using flite library.
Objective
• We presents a camera-based label reader to help blind persons to read names
of labels on the products. Camera acts as main vision in detecting the label
image of the product or board then image is processed internally and
separates label from image by using open CV library and finally identifies
the product and identified product name is pronounced through voice.
• Now it identifies received label image is converted to text by using tesseract
library. Once the identified label name is converted to text and converted
text is displayed on display unit connected to controller. Now converted text
should be converted to voice to hear label name as voice through ear phones
connected to audio jack port using flite.
Chapter 1
Introduction:
This project, ‘Handwritten Character Recognition’ is a software algorithm project
to recognize any hand written character efficiently on computer with input is
either an old optical image or currently provided through touch input, mouse or
pen. Character recognition, usually abbreviated to optical character recognition or
shortened OCR, is the mechanical or electronic translation of images of
handwritten, typewritten or printed text (usually captured by a scanner) into
machine-editable text. It is a field of research in pattern recognition, artificial
intelligence and machine vision. Though academic research in the field continues,
the focus on character recognition has shifted to implementation of proven
techniques. Optical character recognition is a scheme which enables a computer
to learn, understand, improvise and interpret the written or printed character in
their own language, but present correspondingly as specified by the user. Optical
Character Recognition uses the image processing technique to identify any
character computer/typewriter printed or hand written. A lot of work has been
done in this field. But a continuous improvisation of OCR techniques is being done
based on the fact that algorithm must have higher accuracy of recognition, higher
persistency in number of times of correct prediction and increased execution
time. The idea is to device efficient algorithms which get input in digital image
format. After that it processes the image for better comparison. Then after the
processed image is compared with already available set of font images. The last
step gives a prediction of the character in percentage accuracy.
Chapter 2
Literature survey:
1. Handwriting Recognition using Artificial Intelligence Neural Network and
Image Processing
Author: Sara Aqab1, Muhammad Usman Tariq2
Abstract:
Due to increased usage of digital technologies in all sectors and in almost all day to
day activities to store and pass information, Handwriting character recognition has
become a popular subject of research. Handwriting remains relevant, but people
still want to have Handwriting copies converted into electronic copies that can be
communicated and stored electronically. Handwriting character recognition refers
to the computer's ability to detect and interpret intelligible Handwriting input from
Handwriting sources such as touch screens, photographs, paper documents, and
other sources. Handwriting characters remain complex since different individuals
have different handwriting styles. This paper aims to report the development of a
Handwriting character recognition system that will be used to read students and
lectures Handwriting notes. The development is based on an artificial neural
network, which is a field of study in artificial intelligence. Different techniques and
methods are used to develop a Handwriting character recognition system.
However, few of them focus on neural networks. The use of neural networks for
recognizing Handwriting characters is more efficient and robust compared with
other computing techniques. The paper also outlines the methodology, design, and
architecture of the Handwriting character recognition system and testing and
results of the system development. The aim is to demonstrate the effectiveness of
neural networks for Handwriting character recognition.
2. HAND-WRITTEN CHARCTER RECOGNITION
Author: CHANDAN KUMAR
Abstract: In todays’ world advancement in sophisticated scientific techniques is
pushing further the limits of human outreach in various fields of technology. One
such field is the field of character recognition commonly known as OCR (Optical
Character Recognition). In this fast paced world there is an immense urge for the
digitalization of printed documents and documentation of information directly in
digital form. And there is still some gap in this area even today. OCR techniques
and their continuous improvisation from time to time is trying to fill this gap. This
project is about devising an algorithm for recognition of hand written characters
also known as HCR (Handwritten Character Recognition) leaving aside types of
OCR that deals with recognition of computer or typewriter printed characters. A
novel technique is proposed for recognition English language characters using
Artificial Neural Network including the schemes of feature extraction of the
characters and implemented. The persistency in recognition of characters by the
AN network was found to be more than 90% of times.
3. DIAGONAL BASED FEATURE EXTRACTION FOR HANDWRITTEN
ALPHABETS RECOGNITION SYSTEM USING NEURAL NETWORK
Author: J.Pradeep1 , E.Srinivasan2 and S.Himavathi3
Abstract: An off-line handwritten alphabetical character recognition system using
multilayer feed forward neural network is described in the paper. A new method,
called, diagonal based feature extraction is introduced for extracting the features
of the handwritten alphabets. Fifty data sets, each containing 26 alphabets written
by various people, are used for training the neural network and 570 different
handwritten alphabetical characters are used for testing. The proposed recognition
system performs quite well yielding higher levels of recognition accuracy
compared to the systems employing the conventional horizontal and vertical
methods of feature extraction. This system will be suitable for converting
handwritten documents into structural text form and recognizing handwritten
names.
Chapter 3
Existing system
• Traditional methods like Braille exist using which the blind people have to
trace and read text, which is very slow and not very practical.
• Existing OCR systems are not automatic and require full-fledged computers
to run and hence are not effective.
• K Reader Mobile runs on a cell phone and allows the user to read mail,
receipts, fliers, and many other documents
Proposed system
• A low cost, automatic system for reading text books will be implemented
that not only converts printed books to digital text, but also reads them as a
audio output.
• Our proposed algorithm can effectively handle complex background and
multiple patterns, and extract text information from both hand-held objects
and nearby signage
Block diagram
Software required:
• Python
• Open cv
OCR:
Illuminator
Document Detector
Document image
OCR Hard-Ware Or Soft-Ware
Document
Analysis Character
Recognition Contextual
Processing
Recognition Results
To application user
In this section, we identify the audience who are interested with the product and
are involved in the implementation of the product either directly or indirectly. As
from our research, the OCR system is mainly useful in R&D at various scientific
organizations, in governmental institutes and in large business organizations, we
identify the following as various interested audience in implementing OCR
system:-
⮚ The scientists, the research scholars and the research fellows in
telecommunication institutions are interested in using OCR system for
processing the word document that contains base paper for their research.
⮚ The Librarian to manage the information contents of the older books in
building virtual digital library requires use of OCR system.
⮚ Various sites that vendor e-books have a huge requirement of this OCR
system in-order to scan all the books in to electronic format and thus make
money. The Amazon book world is largely using this concept to build their
digital libraries.
Now we present the reading suggestions for the users or clients through which
the user can better understand the various phases of the product. These
suggestions may be effective and useful for the beginners of the product rather
than the regular users such as research scholars, librarians and administrators of
various web-sites. With these suggestions, the user need not waste his time in
scrolling the documents up and down, browsing through the web, visiting libraries
in search of different books and … The following are the various reading
suggestions that the user can follow in-order to completely understand about our
product and to save time:-
⮚ It would help you if you start with Wikipedia.com. It lets you know the basic
concept of every keyword you require. First learn from it what is OCR? And
how does it work based on a Grid infrastructure?
⮚ Now you can proceed your further reading with the introduction of our
product we provided in our documentation. From these two steps you
completely get an in-depth idea of the use of our product and several
processes involved in it.
⮚ The more you need is the implementation of the product. For this you can
visit FreeOCR.com where you can view how the sample OCR works and you
can try it.
PROBLEM STATEMENT:
• Traditional methods like Braille exist using which the blind people have to
trace and read text, which is very slow and not very practical.
• Existing OCR systems are not automatic and require full-fledged computers
to run and hence are not effective.
• Reader Mobile runs on a cell phone and allows the user to read mail,
receipts, fliers, and many other documents
A feasibility study is a high-level capsule version of the entire System analysis and
Design Process. The study begins by classifying the problem definition. Feasibility
is to determine if it’s worth doing. Once an acceptance problem definition has
been generated, the analyst develops a logical model of the system. A search for
alternatives is analyzed carefully. There are 3 parts in feasibility study.
2.1 TECHNICAL FEASIBILITY
Evaluating the technical feasibility is the trickiest part of a feasibility study. This is
because, at this point in time, not too many detailed design of the system, making
it difficult to access issues like performance, costs on (on account of the kind of
technology to be deployed) etc. A number of issues have to be considered while
doing a technical analysis. Understand the different technologies involved in the
proposed system before commencing the project we have to be very clear about
what are the technologies that are to be required for the development of the new
system. Find out whether the organization currently possesses the required
technologies. Is the required technology available with the organization?.
⮚ Have the user been involved in the planning and development of the project?
Since the proposed system was to help reduce the hardships encountered. In the
existing manual system, the new system was considered to be operational
feasible.
2.4 TRAINING
1. Un-Supervised Training
2. Supervised Training
Supervised training provides the neural network with training sets and the
anticipated output. Unsupervised training supplies the neural network with
training sets, but there is no anticipated output provided.
The input patterns presented to the Kohonen neural network are the dot image
of the character that was hand written. We may then have 26 output neurons,
which correspond to the 26 letters of the English alphabet. The Kohonen neural
network should classify the input pattern into one of the 26 input patterns.
There are several popular training algorithms that make use of supervised
training. One of the most common is the back-propagation algorithm. It is also
possible to use an algorithm such as simulated annealing or a genetic algorithm to
implement supervised training
The Kohonen neural network differs considerably from the feed-forward back
propagation neural network. The Kohonen neural network differs both in how it is
trained and how it recalls a pattern. The Kohonen neural network does not use
any sort of activation function. Further, the Kohonen neural network does not use
any sort of a bias weight.
Output from the Kohonen neural network does not consist of the output of
several neurons. When a pattern is presented to a Kohonen network one of the
output neurons is selected as a "winner". This "winning" neuron is the output
from the Kohonen network. Often these "winning" neurons represent groups in
the data that is presented to the Kohonen network. For example, in an OCR
program that uses 26 output neurons, the 26 output neurons map the input
patterns into the 26 letters of the Latin alphabet.
The most significant difference between the Kohonen neural network and the
feed forward back propagation neural network is that the Kohonen network
trained in an unsupervised mode. This means that the Kohonen network is
presented with data, but the correct output that corresponds to that data is not
specified. Using the Kohonen network this data can be classified into groups. We
will begin our review of the Kohonen network by examining the training process.
A "feed forward" neural network is similar to the types of neural networks that
we are ready examined. Just like many other neural network types the feed
forward neural network begins with an input layer. This input layer must be
connected to a hidden layer. This hidden layer can then be connected to another
hidden layer or directly to the output layer. There can be any number of hidden
layers so long as at least one hidden layer is provided. In common use most neural
networks will have only one hidden layer. It is very rare for a neural network to
have more than two hidden layers. We will now examine, in detail, and the
structure of a "feed forward neural network".
A "feed forward" neural network differs from the neural networks previously
examined. Figure 2.1 shows a typical feed forward neural network with a single
hidden layer.
Figure 2 Feed Forward Neural Network
The input layer to the neural network is the conduct through which the external
environment presents a pattern to the neural network. Once a pattern is
presented to the input layer of the neural network the output layer will produce
another pattern. In essence this is all the neural network does. The input layer
should represent the condition for which we are training the neural network for.
Every input neuron should represent some independent variable that has an
influence over the output of the neural network.
It is important to remember that the inputs to the neural network are floating
point numbers. These values are expressed as the primitive Java data type
"double". This is not to say that you can only process numeric data with the
neural network. If you wish to process a form of data that is non-numeric you
must develop a process that normalizes this data to a numeric representation.
The output layer of the neural network is what actually presents a pattern to the
external environment. Whatever patter is presented by the output layer can be
directly traced back to the input layer. The number of a output neurons should
directly related to the type of work that the neural network is to perform.
To consider the number of neurons to use in your output layer you must consider
the intended use of the neural network. If the neural network is to be used to
classify items into groups, then it is often preferable to have one output neurons
for each groups that the item is to be assigned into. If the neural network is to
perform noise reduction on a signal then it is likely that the number of input
neurons will match the number of output neurons. In this sort of neural network
you would one day he would want the patterns to leave the neural network in the
same format as they entered.
For a specific example of how to choose the numbers of input and output neurons
consider a program that is used for optical character recognition, or OCR. To
determine the number of neurons used for the OCR example we will first consider
the input layer. The number of input neurons that we will use is the number of
pixels that might represent any given character. Characters processed by this
program are normalized to universal size that is represented by a 5x7 grid. A 5x7
grid contains a total of 35 pixels. The optical character recognition program
therefore has 35 input neurons.
The number of output neurons used by the OCR program will vary depending on
how many characters the program has been trained for. The default training file
that is provided with the optical character recognition program is trained to
recognize 26 characters. As a result using this file the neural network would have
26 output neurons. Presenting a pattern to the input neurons will fire the
appropriate output neuron that corresponds to the letter that the input pattern
corresponds to.
2.6. VISION BASED ASSISTIVE SYSTEM FOR LABEL DETECTION WITH VOICE
OUTPUT
A camera based assistive text reading framework to help blind persons read text
labels and product packaging from hand-held object in their daily resides is
proposed. To isolate the object from cluttered backgrounds or other surroundings
objects in the camera view, we propose an efficient and effective motion based
method to define a region of interest (ROI) in the video by asking the user to
shake the object. In the extracted ROI, text localization and recognition are
conducted to acquire text information. To automatically localize the text regions
from the object ROI, we propose a novel text localization algorithm by learning
gradient features of stroke orientations and distributions of edge pixels in an
Adaboost model. Text characters in the localized text regions are then binarized
and recognized by off-the-shelf optical character recognition software. The
recognized text codes are output to blind users in speech
I. INTRODUCTION
Of the 314 million visually impaired people worldwide, 45 million are blind.
Recent developments in computer vision, digital cameras and portable computers
make it feasible to assist these individuals by developing camera based products
that combine computer vision technology with other existing commercial
products such optical character recognition (OCR) systems. Reading is obviously
essential in today’s society. Printed text is everywhere in the form of reports,
receipts, bank statements, restaurant menus, classroom handouts, product
packages, instructions on medicine bottles, etc.
The ability of people who are blind or have significant visual impairments to read
printed labels and product packages will enhance independent living and foster
economic and social self-sufficiency. Today, there are already a few systems that
have some promise for portable use, but they cannot handle product labelling
Selectively extract the image of the object held by the blind user from the
cluttered background or other neutral objects in the camera view; and 2) text
localization to obtain image regions containing text, and text recognition to
transform image-based text information into readable codes. We use a mini
laptop as the processing device in our current prototype system. The audio
output component is to inform the blind user of recognized text codes.
The video is captured by using web-cam and the frames from the video is
segregated and undergone to the pre-processing. First, get the objects
continuously from the camera and adapted to process. Once the object of interest
is extracted from the camera image and it converted into gray image. Use haar
cascade classifier for recognizing the character from the object. The work with a
cascade classifier includes two major stages: training and detection. For training
need a set of samples. There are two types of samples: positive and negative.
To extract the hand-held object of interest from other objects in the camera view,
ask users to shake the hand-held objects containing the text they wish to identify
and then employ a motion-based method to localize objects from cluttered
background.
In order to handle complex backgrounds, two novel feature maps to extracts text
features based on stroke orientations and edge distributions, respectively. Here,
stroke is defined as a uniform region with bounded width and significant extent.
These feature maps are combined to build an Adaboost based text classifier.
V. TEXT REGION LOCALIZATION
Text localization is then performed on the camera based image. The Cascade-
Adaboost classifier confirms the existence of text information in an image patch
but it cannot the whole images, so heuristic layout analysis is performed to
extract candidate image patches prepared for text classification. Text information
in the image usually appears in the form of horizontal text strings containing no
less than three character members.
The recognized text codes are recorded in script files. Then, employ the Microsoft
Speech Software Development Kit to load these files and display the audio output
of text information. Blind users can adjust speech rate, volume and tone
according to their preferences. Are designed to easily interface with dedicated
computer systems by using the same USB technology that is found on most
computers. Static random-access memory (SRAM) is a type of a semiconductor
memory that uses bi-stable latching circuitry to store each bit.
ARM11 features are, Supports 4-64k cache sizes, Powerful ARMV6 instruction set
architecture, SIMD (Single Instruction Multiple Data) media processing extensions
deliver up to 2x performance for video processing, and High-performance 64-bit
memory system speeds data access for media processing and networking
applications. LAN9512/LAN9512i contains an integrated USB 2.0 hub, two
integrated downstream USB 2.0 PHYs, an integrated upstream USB 2.0 PHY, a
10/100 Ethernet PHY, a 10/100 Ethernet Controller, a TAP Controller and EEPROM
Controller. Flash memory is an electronic non-volatile compute storage medium
that can be an electrically erased and reprogrammed. Earphones either have
wires for connection to a signal source such as an audio amplifier, radio, CD
player, portable media player or have a wireless receiver, which is used to pick up
signal without using a cable
The proposed system ensures to read printed text on hand-held objects for
assisting blind persons. In order to solve the common aiming problem for blind
users, a motion-based method to detect the object of interest is projected, while
the blind user simply shakes the object for a couple of seconds. This method can
effectively distinguish the object of interest from background or other objects in
the camera view. An Adaboost learning model is employed to localize text in
camera-based images .Off the shelf OCR is used to perform word recognition on
the localized text regions and transform into audio output for blind users.
OCR
Optical character recognition is the translation of optically scanned bitmaps of
printed or written text into digitally editable data files. OCRs developed for many
world languages are already under efficient use.
There are five senses that provide information to humans for making everyday
decisions and out of these the senses of hearing and vision are the sharpest of all.
The auditory sense helps us recognize sounds and classify them. It is this sense
which tells us that the person on the phone is a friend because his voice is
recognizable. We can differentiate between an endless variety of sounds, voices,
utterances and put them in exactly the slots they belong to, animal sounds, musical
notes, wind swishing, the footsteps of a family member, all are within recognition
range of a person with a normal sense of hearing.
The other one and the one more profound is the human vision which allows us to
identify a known person in a crowd of unknowns merely by casting a cursory
glance at him. It allows us to pick an object that belongs to us from a number of
those looking exactly as ours, and by being able to recognize a miss-spelt word in a
sentence and unconsciously correct it. The fact is that the human mind is capable
of identifying an image on features spontaneously determined and not predefined
or predetermined.
With the development of technology these human processes are imitated to create
intelligent machines, so much for the immense growth of robotics and intelligent
decision making systems and yet the work done so far is not comparable to any
natural involuntary human action or process. The hindrance however is that it is
not practically possible to imitate all the functions of the human mind and make
computer vision as efficient and accurate as the human eye but even though such a
possibility may be remote, efforts are consistently being made to bring them as
close to it as possible.
In its own turn document understanding is a vast and difficult area for the focus of
research today lies in being able to make content based searches which hope to
allow machines to look beyond the key words, headings or merely topics to find a
piece of information. A far more streamlined field of Document Recognition and
understanding is Optical Character Recognition which attempts to identify a single
character from an optically read text image as a part of a word that can be then
used to process further information on. The area gains rising significance as more
and more information each day needs to stored processed and retrieved rather than
being keyed in from an already present printed or handwritten source.
Artificial Intelligence
Computer Vision
Document Understanding & Recognition
Character recognition is further classified into two types according to the manner
in which input is provided to the recognition engine. Considering figure 1.2 which
shows the classification hierarchy of character recognition the two types of
character recognition are:
MICR is a unique technology that relies on recognizing text which has been
printed in special fonts with magnetic ink usually containing iron oxide. As the
machine prepares to read the code the printed characters become magnetized on the
paper with the North Pole on the right of each MICR character creating
recognizable waveforms and patterns that are captured and used for further
processing. The reading device is comparable to a tape recorder head that
recognizes the wave patterns of sound recorded on the magnetic tape. The system
has been in efficient use for a long time in banks around the world to process
checks as results give high accuracy rates with relatively low chances of error.
There are special fonts for MICR, the most common fonts being E-13B and CMC-
7.
Many a times we want to have an editable copy of the text which we have in the
form of a hard copy like a fax or pages from a book or a magazine. The system
employs the use of an optical input device usually a digital camera or a scanner
which pass the captured images to a recognition system that after passing it
through a number of processes convert it to a soft copy like an MS Word
document.
When we scan a sheet of paper we reformat it from hard copy to a soft copy, which
we save as an image. The image can be handled as a whole but its text cannot be
manipulated separately. In order to be able to do so, we need to ask the computer
to recognize the text as such and to let us manipulate it as if it was a text in a word
document. The OCR application does that; it recognizes the characters and makes
the text editable and searchable, which is what we need. The technology has also
enabled such materials to be stored using much less storage space than the hard
copy materials. OCR technology has made a huge impact on the way information
is stored, shared and communicated.
OCRs meant for printed text recognition are generally more accurate and reliable
because the characters belong to standard font files and it is relatively easier to
match images with the ones present in the existing library. As far as hand writing
recognition is concerned the vast variety of human writing styles and customs
make the recognition task more challenging. Today we have OCRs for printed text
in Latin script as an everyday tool in offices while an OCR for hand writing is still
in the research and development stage to have more result accuracy.
Optical Character Recognition (OCR) is one of the most common and useful
applications of machine vision, which is a sub-class of artificial intelligence, and
has long been a topic of research, recently gaining even more popularity with the
development of prototype digital libraries which imply the electronic rendering of
paper or film based documents through an imaging process.
The history of OCR dates back to the early 1950s with the invention of
Gismo a machine that could translate printed messages into machine codes for
computer processing. The product had been a combined effort of David Shepard, a
cryptanalyst at (Armed Forces Security Agency) AFSA and Harvey Cook. The
successful achievement was then followed by the construction of the world’s first
OCR system, also by David Shepard under his Intelligent Machines Research
Corporation. Shepard’s customers included The Readers’ Digest, Standard Oil
Company of California for making credit card imprints, Ohio Bell Telephone
Company for a bill stub reader and The U.S. Air force for reading and transmitting
teletype messages.
The OCR process begins with the scanning and subsequent digital
reproduction of the text in the image. It involves the following discrete sub-
processes, as shown in figure 1.3.
4.7.7 Scanning
A flat-bed scanner is usually used at 300dpi which converts the printed
material on the page being scanned into a bitmap image.
The bitmap image of the text is analyzed for the presence of skew or
slant and consequently these are removed. Quite a lot of printed literature
has combinations of text and tables, graphs and other forms of illustrations. It
is therefore important that the text area is identified separately from the other
images and could be localized and extracted.
4.7.9 Pre-processing
In this phase several processes are applied to the text image like noise
and blur removal, banalization, thinning, skeletonization, edge detection and
some morphological processes, so as to get an OCR ready image of the text
region which is free from noise and blur.
4.7.10 Segmentation
If the whole image consists of text only, the image is first segmented
into separate lines of text. These lines are then segmented into words and
finally words into individual letters. Once the individual letters are identified,
localized and segmented out in a text image it becomes a matter of choice of
recognition algorithm to get the text in the image into a text processor.
4.7.11 Recognition
Fig. 1: Image of word (taken from IAM) and its transcription into digital text.
Get code and data
1. You need Python 3, TensorFlow 1.3, numpy and OpenCV installed
2. Get the implementation from GitHub: either take the code version this
article is based on, or take the newest code version if you can accept
some inconsistencies between article and code
Model Overview
We use a NN for our task. It consists of convolutional NN (CNN) layers, recurrent
NN (RNN) layers and a final Connectionist Temporal Classification (CTC) layer.
Fig. 2: Overview of the NN operations (green) and the data flow through the NN
(pink).
We can also view the NN in a more formal way as a function (see Eq. 1) which
maps an image (or matrix) M of size W×H to a character sequence (c1, c2, …)
with a length between 0 and L. As you can see, the text is recognized on character-
level, therefore words or texts not contained in the training data can be recognized
too (as long as the individual characters get correctly classified).
Operations
CNN: the input image is fed into the CNN layers. These layers are trained to
extract relevant features from the image. Each layer consists of three operation.
First, the convolution operation, which applies a filter kernel of size 5×5 in the first
two layers and 3×3 in the last three layers to the input. Then, the non-linear RELU
function is applied. Finally, a pooling layer summarizes image regions and outputs
a downsized version of the input. While the image height is downsized by 2 in
each layer, feature maps (channels) are added, so that the output feature map (or
sequence) has a size of 32×256.
RNN: the feature sequence contains 256 features per time-step, the RNN
propagates relevant information through this sequence. The popular Long Short-
Term Memory (LSTM) implementation of RNNs is used, as it is able to propagate
information through longer distances and provides more robust training-
characteristics than vanilla RNN. The RNN output sequence is mapped to a matrix
of size 32×80. The IAM dataset consists of 79 different characters, further one
additional character is needed for the CTC operation (CTC blank label), therefore
there are 80 entries for each of the 32 time-steps.
CTC: while training the NN, the CTC is given the RNN output matrix and the
ground truth text and it computes the loss value. While inferring, the CTC is only
given the matrix and it decodes it into the final text. Both the ground truth text and
the recognized text can be at most 32 characters long.
Data
Input: it is a gray-value image of size 128×32. Usually, the images from the
dataset do not have exactly this size, therefore we resize it (without distortion) until
it either has a width of 128 or a height of 32. Then, we copy the image into a
(white) target image of size 128×32. This process is shown in Fig. 3. Finally, we
normalize the gray-values of the image which simplifies the task for the NN. Data
augmentation can easily be integrated by copying the image to random positions
instead of aligning it to the left or by randomly resizing the image.
Fig. 3: Left: an image from the dataset with an arbitrary size. It is scaled to fit the
target image of size 128×32, the empty part of the target image is filled with white
color.
CNN output: Fig. 4 shows the output of the CNN layers which is a sequence of
length 32. Each entry contains 256 features. Of course, these features are further
processed by the RNN layers, however, some features already show a high
correlation with certain high-level properties of the input image: there are features
which have a high correlation with characters (e.g. ―e‖), or with duplicate
characters (e.g. ―tt‖), or with character-properties such as loops (as contained in
handwritten ―l‖s or ―e‖s).
Fig. 4: Top: 256 feature per time-step are computed by the CNN layers. Middle:
input image. Bottom: plot of the 32nd feature, which has a high correlation with
the occurrence of the character ―e‖ in the image.
RNN output: Fig. 5 shows a visualization of the RNN output matrix for an image
containing the text ―little‖. The matrix shown in the top-most graph contains the
scores for the characters including the CTC blank label as its last (80th) entry. The
other matrix-entries, from top to bottom, correspond to the following characters: ―
!‖#&’()*+,-
./0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu
vwxyz‖. It can be seen that most of the time, the characters are predicted exactly at
the position they appear in the image (e.g. compare the position of the ―i‖ in the
image and in the graph). Only the last character ―e‖ is not aligned. But this is OK,
as the CTC operation is segmentation-free and does not care about absolute
positions. From the bottom-most graph showing the scores for the characters ―l‖,
―i‖, ―t‖, ―e‖ and the CTC blank label, the text can easily be decoded: we just take
the most probable character from each time-step, this forms the so called best path,
then we throw away repeated characters and finally all blanks: ―l---ii--t-t--l-…-e‖
→ ―l---i--t-t--l-…-e‖ → ―little‖.
Fig. 5: Top: output matrix of the RNN layers. Middle: input image. Bottom:
Probabilities for the characters ―l‖, ―i‖, ―t‖, ―e‖ and the CTC blank label.
Implementation using TF
The implementation consists of 4 modules:
We only look at Model.py, as the other source files are concerned with basic file
IO (DataLoader.py) and image processing (SamplePreprocessor.py).
CNN
For each CNN layer, create a kernel of size k×k to be used in the convolution
operation.
Then, feed the result of the convolution into the RELU operation and then again to
the pooling layer with size px×py and step-size sx×sy.
RNN
Create and stack two RNN layers with 256 units each.
Then, create a bidirectional RNN from it, such that the input sequence is traversed
from front to back and the other way round. As a result, we get two output
sequences fw and bw of size 32×256, which we later concatenate along the feature-
axis to form a sequence of size 32×512. Finally, it is mapped to the output
sequence (or matrix) of size 32×80 which is fed into the CTC layer.
CTC
For loss calculation, we feed both the ground truth text and the matrix to the
operation. The ground truth text is encoded as a sparse tensor. The length of the
input sequences must be passed to both CTC operations.
We now have all the input data to create the loss operation and the decoding
operation.
Training
The mean of the loss values of the batch elements is used to train the NN: it is fed
into an optimizer such as RMSProp.
Fig. 6: A complete text-line can be fed into the NN if its input size is increased
(image taken from IAM).
If you want to improve the recognition accuracy, you can follow one of these hints:
Chapter 4
History
Python was conceived in the late 1980s, and its implementation began in
December 1989 [28] by Guido van Rossum at Centrum Wiskunde & Informatica
(CWI) in the Netherlands as a successor to the ABC language (itself inspired by
SETL) capable of exception handling and interfacing with the operating system
Amoeba. Van Rossum is Python's principal author, and his continuing central role
in deciding the direction of Python is reflected in the title given to him by the
Python community, benevolent dictator for life (BDFL).
About the origin of Python, Van Rossum wrote in 1996:Over six years ago, in
December 1989, I was looking for a "hobby" programming project that would keep
me occupied during the week around Christmas. My office ... would be closed, but
I had a home computer, and not much else on my hands. I decided to write an
interpreter for the new scripting language I had been thinking about lately: a
descendant of ABC that would appeal to Unix/C hackers. I chose Python as a
working title for the project, being in a slightly irreverent mood (and a big fan of
Monty Python's Flying Circus).
Python 2.0 was released on 16 October 2000 and had many major new features,
including a cycle-detecting garbage collector and support for Unicode. With this
release the development process was changed and became more transparent and
community-backed. Python 3.0 (which early in its development was commonly
referred to as Python 3000 or py3k), a major, backwards-incompatible release, was
released on 3 December 2008 after a long period of testing. Many of its major
features have been backported to the backwards-compatible Python 2.6.xand 2.7.x
version series. The End Of Life date (EOL, sunset date) for Python 2.7 was
initially set at 2015, then postponed to 2020 out of concern that a large body of
existing code cannot easily be forward-ported to Python 3.
This design of a small core language with a large standard library and an easily
extensible interpreter was intended by Van Rossum from the start because of his
frustrations with ABC, which espoused the opposite mindset. While offering
choice in coding methodology, the Python philosophy rejects exuberant syntax,
such as in Perl, in favor of a sparser, less-cluttered grammar. As Alex Martelli put
it: "To describe something as clever is not considered a compliment in the Python
culture." Python's philosophy rejects the Perl "there is more than one way to do it"
approach to language design in favor of "there should be one—and preferably only
one—obvious way to do it". Python's developers strive to avoid premature
optimization, and moreover, reject patches to non-critical parts of CPython that
would offer a marginal increase in speed at the cost of clarity. When speed is
important, a Python programmer can move time-critical functions to extension
modules written in languages such as C, or try using PyPy, a just-in-time compiler.
Cython is also available, which translates a Python script into C and makes direct
C-level API calls into the Python interpreter. An important goal of Python's
developers is making it fun to use. This is reflected in the origin of the name,
which comes from Monty Python, and in an occasionally playful approach to
tutorials and reference materials, such as using examples that refer to spam and
eggs instead of the standard foo and bar. A common neologism in the Python
community is pythonic, which can have a wide range of meanings related to
program style.
To say that code is pythonic is to say that it uses Python idioms well, that it is
natural or shows fluency in the language, that it conforms with Python's minimalist
philosophy and emphasis on readability. In contrast, code that is difficult to
understand or reads like a rough transcription from another programming language
is called unpythonic. Users and admirers of Python, especially those considered
knowledgeable or experienced, are often referred to as Pythonists, Pythonistas, and
Pythoneers. Syntax and semantics 6/13/2017 Python (programming language) -
Wikipedia https://en.wikipedia.org/wiki/Python_(programming_language) 4/19
Python is intended to be a highly readable language. It is designed to have an
uncluttered visual layout, often using English keywords where other languages use
punctuation. Python doesn't have semicolons and curly brackets "{}" which is
different compared to most of the programming language. Further, Python has
fewer syntactic exceptions and special cases than C or Pascal. Indentation Python
uses whitespace indentation to delimit blocks – rather than curly braces or
keywords. An increase in indentation comes after certain statements; a decrease in
indentation signifies the end of the current block. This feature is also sometimes
termed the off-side rule. Statements and control flow Python's statements include
(among others): The assignment statement (token '=', the equals sign). This
operates differently than in traditional imperative programming languages, and this
fundamental mechanism (including the nature of Python's version of variables)
illuminates many other features of the language.
However at a given time a name will be bound to some object, which will have a
type; thus there is dynamic typing. The if statement, which conditionally executes
a block of code, along with else and elif (a contraction of else-if). The for
statement, which iterates over an iterable object, capturing each element to a local
variable for use by the attached block. The while statement, which executes a
block of code as long as its condition is true. The try statement, which allows
exceptions raised in its attached code block to be caught and handled by except
clauses; it also ensures that clean-up code in a finally block will always be run
regardless of how the block exits. The class statement, which executes a block of
code and attaches its local namespace to a class, for use in object-oriented
programming. The def statement, which defines a function or method. The with
statement (from Python 2.5), which encloses a code block within a context
manager (for example, acquiring a lock before the block of code is run and
releasing the lock afterwards, or opening a file and then closing it), allowing
Resource Acquisition Is Initialization (RAII)-like behavior.
Before version 3.0, Python had two kinds of classes: old-style and new-style. The
syntax of both styles is the same, the difference being whether the class object is
inherited from, directly or indirectly (all new-style classes inherit from object and
are instances of type). In versions of Python 2 from Python 2.2 onwards, both
kinds of classes can be used. Old-style classes were eliminated in Python 3.0. The
long term plan is to support gradual typing and as of Python 3.5, the syntax of the
language allows specifying static types but they are not checked in the default
implementation, CPython. An experimental optional static type checker named
mypy supports compile-time type checking.
Other shells add abilities beyond those in the basic interpreter, including IDLE and
IPython. While generally following the visual style of the Python shell, they
implement features like auto-completion, session state retention, and syntax
highlighting. In addition to standard desktop integrated development environments
(Python IDEs), there are also web browser-based IDEs, SageMath (intended for
developing science and math-related Python programs), and a browser-based IDE
and hosting environment, PythonAnywhere. Additionally, the Canopy IDE is also
an option for writing Python programs. Implementations Reference
implementation The main Python implementation, named CPython, is written in C
meeting the C89 standard. It compiles Python programs into intermediate
bytecode, which is executed by the virtual machine. CPython is distributed with a
large standard library written in a mixture of C and Python. It is available in
versions for many platforms, including Windows and most modern Unix-like
systems. CPython was intended from almost its very conception to be cross-
platform. Other implementations PyPy is a fast, compliant interpreter of Python
2.7 and 3.5. Its just-in-time compiler brings a significant speed improvement over
CPython. A version taking advantage of multi-core processors using software
transactional memory is being created. Stackless Python is a significant fork of
CPython that implements microthreads; it does not use the C memory stack, thus
allowing massively concurrent programs.
The Nokia N900 also supports Python with GTK widget libraries, with the feature
that programs can be both written and run on the target device. [96] Cross-
compilers to other languages There are several compilers to high-level object
languages, with either unrestricted Python, a restricted subset of Python, or a
language similar to Python as the source language: Jython compiles into Java byte
code, which can then be executed by every Java virtual machine implementation.
This also enables the use of Java class library functions from the Python program.
IronPython follows a similar approach in order to run Python programs on the
.NET Common Language Runtime. The RPython language can be compiled to C,
Java bytecode, or Common Intermediate Language, and is used to build the PyPy
interpreter of Python. Pyjamas compiles Python to JavaScript. Cython compiles
Python to C and C++. Pythran compiles Python to C++. Somewhat dated Pyrex
(latest release in 2010) and Shed Skin (latest release in 2013) compile to C and
C++ respectively. Google's Grumpy compiles Python to Go. Performance A
performance comparison of various Python implementations on a non-numerical
(combinatorial) workload was presented at EuroSciPy '13. Development Python's
development is conducted largely through the Python Enhancement Proposal
(PEP) process. The PEP process is the primary mechanism for proposing major
new features, for collecting community input on an issue, and for documenting the
design decisions that have gone into Python.
Outstanding PEPs are reviewed and commented upon by the Python community
and by Van Rossum, the Python project's benevolent dictator for life. [98]
Enhancement of the language goes along with development of the CPython
reference implementation. The mailing list python-dev is the primary forum for
discussion about the language's development; specific issues are discussed in the
Roundup bug tracker maintained at python.org. Development took place on a
selfhosted source code repository running Mercurial, until Python moved to
GitHub in January 2017. CPython's public releases come in three types,
distinguished by which part of the version number is incremented: Backwards-
incompatible versions, where code is expected to break and must be manually
ported. The first part of the version number is incremented. These releases happen
infrequently—for example, version 3.0 was released 8 years after 2.0. Major or
"feature" releases, which are largely compatible but introduce new features. The
second part of the version number is incremented. These releases are scheduled to
occur roughly every 18 months, and each major version is supported by bugfixes
for several years after its release. Bugfix releases, which introduce no new features
but fix bugs. The third and final part of the version number is incremented. These
releases are made whenever a sufficient number of bugs have been fixed upstream
since the last release, or roughly every 3 months. Security vulnerabilities are also
patched in bugfixreleases.6/13/2017 Python (programming language)-Wikipedia
https://en.wikipedia.org/wiki/Python_(programming_language) 11/19 Many alpha,
beta, and release-candidates are also released as previews and for testing before
final releases. Although there is a rough schedule for each release, this is often
pushed back if the code is not ready.
The development team monitors the state of the code by running the large unit test
suite during development, and using the BuildBot continuous integration system.
The community of Python developers has also contributed over 86,000 software
modules (as of 20 August 2016) to the Python Package Index (PyPI), the official
repository of third-party libraries for Python. The major academic conference on
Python is named PyCon. There are special mentoring programmes like the
Pyladies. Naming Python's name is derived from the television series Monty
Python's Flying Circus, and it is common to use Monty Python references in
example code. For example, the metasyntactic variables often used in Python
literature are spam and eggs, instead of the traditional foo and bar. Also, the
official Python documentation and many code examples often contain various
obscure Monty Python references. The prefix Py- is used to show that something
is related to Python.
Examples of the use of this prefix in names of Python applications or libraries
include Pygame, a binding of SDL to Python (commonly used to create games);
PyS60, an implementation for the Symbian S60 operating system; PyQt and
PyGTK, which bind Qt and GTK to Python respectively; and PyPy, a Python
implementation originally written in Python. Uses Since 2003, Python has
consistently ranked in the top ten most popular programming languages as
measured by the TIOBE Programming Community Index. As of March 2017, it is
the fifth most popular language. It was ranked as Programming Language of the
Year for the year 2007 and 2010. It is the third most popular language whose
grammatical syntax is not predominantly based on C, e.g. C++, Objective-C (note,
C# and Java only have partial syntactic similarity to C, such as the use of curly
braces, and are closer in similarity to each other than C). An empirical study found
scripting languages (such as Python) more productive than conventional languages
(such as C and Java) for a programming problem involving string manipulation
and search in a dictionary. Memory consumption was often "better than Java and
not much worse than C or C++". Large organizations that make use of Python
include Wikipedia, Google, Yahoo!, CERN, NASA, and some smaller entities
like ILM, and ITA.
The social news networking site Reddit is written entirely in Python. Python can
serve as a scripting language for web applications, e.g., via mod_wsgi for the
Apache web server. With Web Server Gateway Interface, a standard API has
evolved to facilitate these applications. Web frameworks like Django, Pylons,
Pyramid, TurboGears, web2py, Tornado, Flask, Bottle and Zope support
developers in the design and maintenance of complex applications. Pyjamas and
IronPython can be used to develop the client-side of Ajax-based applications.
SQLAlchemy can be used as data mapper to a relational database. Twisted is a
framework to program communications between computers, and is used (for
example) by Dropbox. Libraries like NumPy, SciPy and Matplotlib allow the
effective use of Python in scientific computing, [120][121] with specialized
libraries such as Biopython and Astropy providing domain-specific functionality.
SageMath is a mathematical software with a "notebook" programmable in Python:
its library covers many aspects of 6/13/2017 Python (programming language) -
Wikipedia https://en.wikipedia.org/wiki/Python_(programming_language) 12/19
mathematics, including algebra, combinatorics, numerical mathematics, number
theory, and calculus. The Python language re-implemented in Java platform is used
for numeric and statistical calculations with 2D/3D visualization by the DMelt
project. Python has been successfully embedded in many software products as a
scripting language, including in finite element method software such as Abaqus,
3D parametric modeler like FreeCAD, 3D animation packages such as 3ds Max,
Blender, Cinema 4D, Lightwave, Houdini, Maya, modo, MotionBuilder,
Softimage, the visual effects compositor Nuke, 2D imaging programs like GIMP,
Inkscape, Scribus and Paint Shop Pro, and musical notation programs like
scorewriter and capella. GNU Debugger uses Python as a pretty printer to show
complex structures such as C++ containers. Esri promotes Python as the best
choice for writing scripts in ArcGIS.It has also been used in several video games,
and has been adopted as first of the three available programming languages in
Google App Engine, the other two being Java and Go.
Python is also used in algorithmic trading and quantitative finance. Python can also
be implemented in APIs of online brokerages that run on other languages by using
wrappers. Python has been used in artificial intelligence tasks. As a scripting
language with module architecture, simple syntax and rich text processing tools,
Python is often used for natural language processing tasks. Many operating
systems include Python as a standard component; the language ships with most
Linux distributions, AmigaOS 4, FreeBSD, NetBSD, OpenBSD and macOS, and
can be used from the terminal. Many Linux distributions use installers written in
Python: Ubuntu uses the Ubiquity installer, while Red Hat Linux and Fedora use
the Anaconda installer. Gentoo Linux uses Python in its package management
system, Portage. Python has also seen extensive use in the information security
industry, including in exploit development. Most of the Sugar software for the One
Laptop per Child XO, now developed at Sugar Labs, is written in Python. The
Raspberry Pi single-board computer project has adopted Python as its main user-
programming language. LibreOffice includes Python and intends to replace Java
with Python. Python Scripting Provider is a core feature since Version 4.0 from 7
February 2013.
OpenCV:
OpenCV (Open Source Computer Vision) is a library of programming functions
mainly aimed at real-time computer vision. Originally developed by Intel, it was
later supported by Willow Garage and is now maintained by Itseez. The library is
cross-platform and free for use under the open-source BSD license
History Officially launched in 1999, the OpenCV project was initially an Intel
Research initiative to advance CPU intensive applications, part of a series of
projects including real-time ray tracing and 3D display walls. The main
contributors to the project included a number of optimization experts in Intel
Russia, as well as Intel’s Performance Library Team. In the early days of OpenCV,
the goals of the project were described as: Advance vision research by providing
not only open but also optimized code for basic vision infrastructure. No more
reinventing the wheel. Disseminate vision knowledge by providing a common
infrastructure that developers could build on, so that code would be more readily
readable and transferable. Advance vision-based commercial applications by
making portable, performance-optimized code available for free—with a license
that did not require code to be open or free itself.
The first alpha version of OpenCV was released to the public at the IEEE
Conference on Computer Vision and Pattern Recognition in 2000, and five betas
were released between 2001 and 2005. The first 1.0 version was released in 2006.
In mid-2008, OpenCV obtained corporate support from Willow Garage, and is now
again under active development. A version 1.1 "pre-release" was released in
October 2008. The second major release of the OpenCV was in October 2009.
OpenCV 2 includes major changes to the C++ interface, aiming at easier, more
type-safe patterns, new functions, and better implementations for existing ones in
terms of performance (especially on multi-core systems). Official releases now
occur every six months and development is now done by an independent Russian
team supported by commercial corporations. In August 2012, support for OpenCV
was taken over by a non-profit foundation OpenCV.org, which maintains a
developer and user site.
Applications OpenCV's application areas include: 2D and 3D feature toolkits
● Emotion estimation
● Facial recognition system
● Gesture recognition Human–computer interaction (HCI)
● Mobile robotics Motion understanding
● Object identification Segmentation and recognition
● Stereopsis stereo vision: depth perception from 2 cameras Structure from
motion
● (SFM) Motion tracking Augmented reality
To support some of the above areas, OpenCV includes a statistical machine
learning library that contains:
● Boosting Decision tree learning
● Gradient boosting trees
● Expectation-maximization algorithm
● k-nearest neighbor algorithm
● Naive Bayes classifier
● Artificial neural networks Random forest Support vector machine (SVM)
Programming language OpenCV is written in C++ and its primary interface is in
C++, but it still retains a less comprehensive though extensive older C interface.
There are bindings in Python, Java and MATLAB/OCTAVE. The API for these
interfaces can be found in the online documentation. Wrappers in other languages
such as C#, Perl, Haskell and Ruby have been developed to encourage adoption
by a wider audience. All of the new developments and algorithms in OpenCV are
now developed in the C++ interface.
Hardware acceleration If the library finds Intel's Integrated Performance Primitives
on the system, it will use these proprietary optimized routines to accelerate itself. A
CUDA-based GPU interface has been in progress since September 2010. An
OpenCL-based GPU interface has been in progress since October 2012,
documentation for version 2.4.9.0 can be found at docs.opencv.org.
OS support OpenCV runs on a variety of platforms. Desktop: Windows, Linux,
macOS, FreeBSD, NetBSD, OpenBSD; Mobile: Android, iOS, Maemo,
BlackBerry 10. The user can get official releases from SourceForge or take the
latest sources from GitHub. OpenCV uses CMake.c
SOURCECODE:
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import datasets
In [2]:
digits= datasets.load_digits()
x=digits.data
y=digits.target
In [3]:
x[1]
Out[3]:
array([ 0., 0., 0., 12., 13., 5., 0., 0., 0., 0., 0., 11., 16.,
9., 0., 0., 0., 0., 3., 15., 16., 6., 0., 0., 0., 7.,
15., 16., 16., 2., 0., 0., 0., 0., 1., 16., 16., 3., 0.,
0., 0., 0., 1., 16., 16., 6., 0., 0., 0., 0., 1., 16.,
16., 6., 0., 0., 0., 0., 0., 11., 16., 10., 0., 0.])
In [4]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,random_state=42)
In [5]:
from sklearn import svm
clf=svm.SVC(kernel="poly",C=1,gamma=0.1)
clf.fit(x_train,y_train)
Out[5]:
SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=0.1, kernel='poly',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
In [6]:
pred=clf.predict(x_test)
In [8]:
from sklearn.metrics import accuracy_score
In [9]:
accuracy_score(pred,y_test)
Out[9]:
0.9888888888888889
In [10]:
clf.predict(digits.data[[100]])
Out[10]:
array([4])
In [11]:
clf.predict(digits.data[[50]])
Out[11]:
array([2])
In [12]:
clf.predict(digits.data[[500]])
Out[12]:
array([8])
In [13]:
plt.imshow(digits.images[100])
plt.show()
In [14]:
plt.imshow(digits.images[50])
plt.show()
In [15]:
plt.imshow(digits.images[500])
plt.show()
In [ ]:
ExperimentalResults:
Test application Analysis:
The test application accompanying the source code can perform the recognition of
handwritten digits. To do so, open the application (preferably outside Visual
Studio, for better performance). Click on the menu File and select Open. This will
load some entries from the Optdigits dataset into the application. To perform the
analysis, click the Run Analysis button. Please be aware that it may take some
time. After the analysis is complete, the other tabs in the sample application will be
populated with the analysis' information. The level of importance Experiments
were performed on different samples having mixed scripting languages on
numerals using single hidden layer.
Data Trainin Testing Validatio Training Test Set Validation
set g Set Set Size n Set Size Set Accura Set
Size Accuracy cy Accuracy
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet("My sheet")
resim = "hand1.jpg"
img = cv2.imread(resim)
print("Picture is Detected")
api = img
# Ocr
url_api = "https://api.ocr.space/parse/image"
compressedimage = cv2.imencode(".jpg", img)[1]
file_bytes = io.BytesIO(compressedimage)
result = requests.post(url_api,
files={resim: file_bytes},
data={"apikey": "helloworld",
"language": "eng"})
result = result.content.decode()
print(result)
result = json.loads(result)
parsed_results = result.get("ParsedResults")[0]
text_detected = parsed_results.get("ParsedText")
print(text_detected)
print("Text is writing to file...")
f = open("text_detected.txt", "a+")
f.write(text_detected)
f.close()
text = text_detected
chunks = text.split('\n')
#res1 = ""
#for i in text:
# if i.isalpha():
# res1 = "".join([res1, i])
# Result
#print("Result: ", res1)
#chunks = res1.split('#')
#print(chunks)
#row = 0
#column = 0
row1 = 0
column1 = 0
for item in text :
print("Operation is successful")
#voice
cv2.imshow("roi", api)
cv2.imshow("Img", img)
cv2.waitKey(0)
os.remove(resim)
Input image:
Handwritten zip file:
382 97
Data Set
390 95
376 94
387 96
92 93 94 95 96 97 98
Accuracy(%)
SVM
We can use NN as an initial pruning stage and perform SVM on the smaller but
more relevant set of examples that require careful discrimination. This approach
reflects the way humans perform coarse categorization: when presented with an
image, human observers can answer coarse queries such as presence or absence of
an animal in as little as 150ms, and of course, can tell what animal it is given
enough time . This process of a quick categorization, followed by successive finer
but slower discrimination was the inspiration behind the ―SVM-KNN‖ technique.
References:
[1] X. Chen and A. L. Yuille, ―Detecting and reading text in natural scenes,‖ in
Proc. Comput. Vision Pattern Rec-ognit., 2004, vol. 2, pp. II-366–II-373.
[3] K. Kim, K. Jung, and J. Kim, ―Texture-based ap-proach for text detection in
images using support vec-tor machines and continuously adaptive mean shift al-
gorithm,‖ IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12,pp. 1631–1639,
Dec. 2003.
[4] N. Giudice and G. Legge, ―Blind navigation and the role of technology,‖in
The Engineering Handbook of Smart Technology for Aging, Disability,and
Indepen-dence, A. A. Helal, M. Mokhtari, and B. Abdulrazak, Eds. Hoboken, NJ,
USA: Wiley, 2008.
[5] World Health Organization. (2009). 10 facts about blindness and visual
impairment.
[6] Advance Data Reports from the National Health In-terview Survey (2008).
[10] B. Epshtein, E. Ofek, and Y. Wexler, ―Detecting text in natural scenes with
stroke width transform,‖ in Proc. Comput. Vision Pattern Recognit.,2010, pp.
2963–2970.
[13]An overview of the Tesseract OCR (optical charac-ter recognition) engine, and
its possible enhancement for use in Wales in a pre-competitive research stage
Prepared by the Language Technologies Unit (Canol-fan Bedwyr), Bangor
University April 2008.
[14] A. Shahab, F. Shafait, and A. Dengel, ―ICDAR 2011 robust reading
competition:ICDAR Robust Reading Competition Challenge 2: Reading text in
scene imag-es,‖ in Proc. Int. Conf. Document Anal. Recognit., 2011, pp. 1491–
1496.
[15] KReader Mobile User Guide, knfb Reading Technol-ogy Inc. (2008).