0% found this document useful (0 votes)
27 views5 pages

HWTR

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views5 pages

HWTR

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Published by : International Journal of Engineering Research & Technology (IJERT)

http://www.ijert.org ISSN: 2278-0181


Vol. 9 Issue 04, April-2020

Handwritten Text Detection using Open


CV and CNN
Dr. S Jessica Saritha G Hemanth Kumar
Assistant Professor, Dept of CSE, JNTUACE, Student, Dept of CSE, JNTUACE,
Pulivendula, AP, India Pulivendula, AP, India

K R G Deepak Teja S Jeelani Sharief


Student, Dept of CSE, JNTUACE, Student, Dept of CSE, JNTUACE,
Pulivendula, AP, India Pulivendula, AP, India.

(Comma Separated Values). Each row represents an image


Abstract:- The main aim of the project is Handwritten Text and contains a lable in the first column and followed by
Recognition (HTR). HTR is the task of transcribing images of 784 pixel values for 28 X 28 images.
handwritten text into the digital text. In HTR, the text is The data related to our project will have thousands
written, captured by a scanner and then the resulting
images are processed as input to return its text format. We
of instances. The data that we use for our project is
need openCV and CNN for achieving this task. Our goal is to obtained from kaggle.
design a model that transcribes the images to text with great Two types of data will be taken
accuracy. ✓ One for English alphabets
A_Z Handwritten Data [1] and
Keywords – CNN, Handwritter Text Recognition (HTR), ✓ The other for digits - MNIST dataset [2]
openCV, Transcription
DATA VISUALIZATION
I. INTRODUCTION For dataset_1 the shape will be (372450, 785) and for
dataset_2 the shape will be (42000, 785).
Now a day’s people have been using ebooks
which do not occupy any space and ample copies can be
A_Z Handwritten_Data:
carried comfortably. So there is a need for making more
The dataset will comprise of multivariate data of
ebooks available. We have more number of handwritten
English alphabets. Here the dataset will have a label and
texts available all over the world which need protection be
pixel values which lie from 0 to 255. There will be total of
safegaurded. By transcribing them we can increase the
372450 instances, and the total number of attributes is 784
availability of ebooks. And also instead of striving hard to
and a label.
protect old texts which are hand written, they can be
digitilized and stored as soft copies with ease.
0_9 Handwritten_Data:
We will apply the machine learning techniques in
The dataset will comprise of multivariate data of
order to find the digital form of handwritten text from their
digits from 0 to 9. Here the dataset will have a label and
scanned images. We will take the help of the readymade
pixel values which lie from 0 to 255. There will be total of
datasets that contain pixel values of scanned images as the
42000 instances, and the total number of attributes is 784
inputs and we will be able to find the text in it. We can also
and a label.
extend this project for different languages and writing
styles.
Problem Statement: To accurately predict the text from a
scanned handwritten text image using Machine Learning
algorithms.
For this we need to assume that all the images that
contain same letters have same features and we can
conclude that an image having those features contains that
alphabet. However this hypothesis is ideal and may not
come true always in practical.

II. DATASETS
The data plays a very important role in machine
learning. The past data is used to predict the future
outcome. The relevant data can be downloaded from the
internet.
The data that is related to our project that is HTR
Fig 1. Bar diagram showing lable sizes of different labels
consists of pixel values. The format of the data files is csv

IJERTV9IS040641 www.ijert.org 749


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 9 Issue 04, April-2020

Table 1. First five instances of the dataset

III. ARCHITECTURE
The problem convert a handwritten text which is in the
form of pixels into its digital for is a data driven approach.
The data which is already collected can be used for
extracting the features of each letter. The availability of
more powerful machine learning algorithms introduces an
efficient and better approach to solve this problem.
The project is divided into two modules.

A Segmentation module in which an image is taken as Fig 3. Flow of training module


input, letters are detected, bounded, cropped, resized and
then segmented and a Training module where prediction V. EXPERIMENTAL RESULTS
occurs.
The output of the segmenation module is the input of the A. Segmetation Module
training module. Segmentation model is very important for this
project as its output will be the input of the other module.

1) Read the image


We have many ML libraries like Pillow, openCV etc. for
performing operations on images. Here openCV is used to
read and manipulate images. An image is read and then
stored in multiple copies for performing different
operations.
After reading the image is plotted in its shape to make sure
it is read perfectly. That image contains letters that need to
be images each cropped into 28 X 28 images by the end of
segmentation model.

Fig 2. Architecture

IV. METHODOLOGY
The research methodology in this project include,
• Visualizing and understanding the data
• Choosing a suitable model Fig 4. An image of dimensions 351 X 232 pixels
• Agreeing on a common evaluation metric
• Training and Testing the models 2) Detecting the letters
• Implementing the final model Object detection is a Computer vision technique that
• Analyzing the result detectects certain components from in an image or a video.
It makes use of Machine Learning and Deep Learning
Algorithms to yield good results. Detecting the letters is
same as detecting objects. We need to apply some standard
filters to the input image for achieving this task.
Step 1: Convert a BGR image to Greyscale image.
An image with 3 channels is a BGR image but a Grayscale
image consists of a single channel. A channel is a thid
dimension of an image.

IJERTV9IS040641 www.ijert.org 750


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 9 Issue 04, April-2020

3) Bounding and Cropping


This part of code is added before cropping in order to add
spaces between letters. In the list that contains bounding
details, we add a space string “ “if the distance between the
corresponding x co-ordinates is > 50 pixel values
representing they must be separated by a space.

And then Bounding and Cropping for non-spaces and


storing them in a list ‘img_lst’.
Fig 5. A Grayscale Image
The elements in the list of detected objects may be the
Step 2: Applying Gaussian blur to the Greyscale image. boundingrect values of a detected object from the image or
This step is done to remove any noise and disturbances in a space string. The detected object is considered a letter
the image. If the image is blurred the colour intensites can only if its height is greater than 20 pixels. (Our assumption)
be recognised easily from the image. The blurring is Then for those which are considered letters we use
technically called Gaussian Blur in the Computer Vision. boundingrect values to crop the letter from another copy of
Step 3: Otsu Thresholding → A standard stepfor object the original image stored in another variable and append
detection each of the cropped images which are in numpy array form
It is calcluation of the measure of spread for the pixel into a python list.
levels each side of the threshold.
‘rectangle()’ draws a rectangular border around detected
letters which has the image, bounding values, colour and
width of the border.

Fig 6. Image after applying Otsu thresholding

Step 4: Finding and Drawing contours


The ‘findcontours()’ and ‘drawcontours()’ are the methods
used for finding and drawing contours which are generally
Fig 8. Image with rectangular bounds around
borders of a detected object in an image.
The drawcontours need image, detected contours, colour
and dimensions of the border as the parameters.

Fig 7. Image with contours Fig 9. A cropped image of letter E is one of the results of cropping

Step 5: Storing co-ordinates for rectangular bounding 4) Resizing


Boundingrect() gets the list of x, y co-ordinates of top left For each cropped image in the list, we resize them to 28 X
point of the image, width and height allowing us to 28 pixels. We do so because the output images from this
drawimages in the order of detected objects. We need to module which are going to be the input of the other module
sort in the order of x co-ordinate of the top left corner to must be images of size 28 X 28 pixels as the training data
order them. All these lists are stored in a list and sorted of that module is of that format.
with a base of list[i][0]th elements.

IJERTV9IS040641 www.ijert.org 751


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 9 Issue 04, April-2020

Convolution layer with kernal size 5 X 5 and 32 nodes. The


activation function used is ‘relu’. And then max pooled
with 2 X 2 and over fitting is reduced using dropout of 0.3.
Then two more layers are added with 128 and n nodes
respectively, where n is the number of possible outputs
(here n = 36).
Now we compile the model by ‘categorial_crossentropy’
loss function, ‘adam’ optimizer with a metric of ‘accuracy’.

Fig 10. A resized image

Each resized images must be converted to Grayscale to


match requirements of training model. And then sent as
input to the training model.

The subplot of the output of segmentation module looks


like

Fig 11. Summary of the training model

The fitting of a model is shown below. It is passed with 1


B. Training Module iteration. 200 batch size and the dataset contains 331560
A model is trained by using past data and Machine learning instances.
Algorithms. It learns from the past data by feature
extraction and patterns. In this project Convolutional model.fit ( X_train, Y_train, validation_data = (X_test, Y_test),
Neural Networks are used. We split the data into training epochs=1, batch_size=200, verbose=2)
and testing in the ratio of 80:20.
Instructions for updating:
Use tf.cast instead.
The dataframe with attributes, the dataframe with only Train on 331560 samples, validate in 82890 samples
label column, train_size or test_size and shuffeled are Epoch 1/1
important parametere of trin_test_split. Shuffeled will be - 350s – loss: 0.3029 – acc: 0.9184 – val_loss: 0.1381 – val_acc: 0.962
‘True’ by default which shuffeles the data before spliting.
This method need not always return same output. VI. RESULTS
The above model will give us the train accuracy of 0.9184
Scikit-learn library is used to change the form of data. We and the test accuracy of 0.9626.
need to convert the attribute values to the float datatype and
their labels into the categorial form to train them.

The attribute values are converted to floating point


numbers ranging from 0 to 1 which earlier were 0 to 255. A
pixel value of 0 will be 0.0000, a pixel value of 1 will be Model Loss Accuracy
1.0000, and a pixel value of 128 will be 0.5000.
Neural Network 0.1381 0.9626

Categirial form results in a list of size equals number of all


possible labels in which an instance with lable value i will
have i = 1 and other elements 0. Even these values must be
in float. ARMY
Fig 12. Illustration if HTR
The categorial form will look like this.

VII. CHALLENGES INVOLVED


The challenges we have faced while modelling handwritten
Now we can proceed to train our data by using standard text recogniser are
Machine Learning Algorithms. (i) Letters like ‘i‘and ‘j‘which have break in them cannot
A sequence of hidden layers are created with some nodes in be detected as a single letter.
each of them. The first hidden layer is 2Dimensional

IJERTV9IS040641 www.ijert.org 752


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 9 Issue 04, April-2020

REFERENCES
[1] https://www.kaggle.com/sachinpatel21/az-handwritten-alphabets-
in-csv-format
[2] https://www.kaggle.com/c/digit-recognizer/data
[3] CHARACTER RECOGNITION IN NATURAL IMAGES By
Te´ofilo E. de Campos, Bodla Rakesh Babu, Manik Varma
https://www.researchgate.net/publication/221416071_Character_
Figure An image with rectangular border around detected contours Recognition_in_Natural_Images/link/5dd6e92892851c1feda56fc1
/download
[4] Text detection and recognition in raw image dataset and seven
segment digital energy meter display By Karthick
Kanagarathinam, Kavaskar Sekar
https://reader.elsevier.com/reader/sd/pii/S235248471930174X?to
ken=FFC0111CC7487898FEFE8637DDA6CE1692B76C48DBB
Figure A subplot for cropped images of above image 26C375D1CD755667BBC2109D8C5287A2205169F20461A43B
DD304
(ii) If two letters touch each other (like in cursive writing), [5] Scene Text Detection and Recognition: The Deep Learning Era
By Shangbang Long, Xin He, Cong Yao
they are recognized as a single letter. https://arxiv.org/pdf/1811.04256.pdf
[6] Automatic Text Detection and Classification in Natural Images
VIII. CONCLUSION By C.P. Chaithanya, N. Manohar, Ajay Bazil Issac
https://www.ijrte.org/wp-
Convolutional Neural Network learns from the real time content/uploads/papers/v7i5s3/E11330275S19.pdf
data and simplifies model by reducing the number of [7] An End-to-End Trainable Neural Network for Image-based
Sequence Recognition and Its Application to Scene Text
parameters and hence gives considerable accuracy. Recognition By Baoguang Shi, Xiang Bai and Cong Yao
Future Enhancements https://arxiv.org/abs/1507.05717
We can increase the accuracy: [8] EMNIST: an extension of MNIST to handwritten letters By
→ By taking huge datasets Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andr´e van
Schaik https://arxiv.org/pdf/1702.05373.pdf
→ By adopting much suitable algorithms [9] Handwritten Text Recognition for Historical Documents By
→ We can compile the model at more number of epochs. Veronica Romero, Nicholas Serrano, Alejandro H. Toselli, Joan
→ Hyper-parameter Tuning (There are a lot of parameters Andreu Sanchez and Enrique Vidal
that we can play with). https://www.aclweb.org/anthology/W11-4114.pdf
[10] Arabic Cursive Text Recognition from Natural Scene Images By
→ Use of deeper architectures Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzaq and
Rybiyah Yusof https://www.mdpi.com/2076-3417/9/2/236/pdf
This application can be taken to next level by
→ Extending its scope to different writing styles
→ Extending its scope to different writing styles

ACKNOWLEDGEMENT
We would like to acknowledge the help of ‘JNTUA
College of Engineering, Pulivendula’ for the kind support
provided and our faculty and friends for the helpful
discussions. We also would like to thank ‘kaggle’ who
provided datasets.

IJERTV9IS040641 www.ijert.org 753


(This work is licensed under a Creative Commons Attribution 4.0 International License.)

You might also like