Final Doc1
Final Doc1
Neural Networks
A Industry Oriented Mini Project report
submitted in partial fulfillment of requirements
for the award of degree of
Bachelor of Technology
In
Information Technology
By
i
i
Gayatri Vidya Parishad College of Engineering (Autonomous)
Visakhapatnam
CERTIFICATE
This report on “DIGITALIZATION OF HANDWRITTEN TEXT
USING NEURAL NETWORKS” is a bonafide record of the mini
project work submitted
By
KANDULA ANUSHA (Reg No:17131A1251)
ALLAMSETTYSAILAJA (RegNo:17131A1206)
CHITLURI ANJANI NOOKAMBICA (Reg No:17131A1223)
AMULYA NANDANA (Reg No:17131A1207)
in their VII semester in partial fulfillment of the requirements for the Award of Degree of
Bachelor of Technology
In
Information Technology
During the academic year 2020-2021
ii
DECLARATION
degree of B.Tech is of our own and it is not submitted to any other university or has
DATE : A.SAILAJA(17131A1206)
CH.ANJANI NOOKAMBIKA(17131A123)
N.AMULYA(17131A1207)
iii
ACKNOWLEDGEMENT
Finally we would like to thank all those people who helped us in many ways in
completing this project.
K. Anusha(17131A1251)
A. Sailaja(17131A1206)
N. Amulya (17131A1207)
iv
ABSTRACT
v
CONTENTS:
1. INTRODUCTION
1.1 Objective
1.2 Theory
1.4 Purpose
1.5 Scope
2. SRS DOCUMENT
3. ALGORITHM ANALYSIS
vi
4. SOFTWARE DESCRIPTION
4.3 Pycharm
4.5 Numpy
5. PROJECT DESCRIPTION
5.3.1. FlaskFramework
5.3.2. Model
5.3.2.2. Pre-Processing
5.3.2.4. Post-Processing
6. SYSTEM DESIGN
6.2.1. Things
6.2.2. Relationships
6.3.3. Diagrams
vii
6.3.1. Use Case Diagram
7. DEVELOPMENT
7.2.1. Main.py
7.2.2. App.py
7.2.3. Upload.html
8. SYSTEMMAINTAINENCE
9. CONCLUSION
10. BIBLIOGRAPH
viii
1. INTRODUCTION
1.1. OBJECTIVE:
1.2. THEORY:
1.3.2. RECURRENTNEURALNETWORK(RNN):
● DocumentReading
1.5. SCOPE:
● The system should process the input given by the user only
if it is an image file.
Drawbacks:
● OCR text works well with printed text only and not with
handwritten text. Handwriting needs to be learnt by the
computer.
● OCR systems are expensive.
● Images produced by a scanner consume a lot of memory space.
● Images lose some quality during the scanning and digitizing
process.
● Quality of the final image depends on the quality of the original
image.
● All the documents need to be checked over carefully and then
manually corrected.
● Direct use of OCR remains a difficult problem to resolve,as it
leads to low reading accuracy.
3.2. PROPOSED SYSTEM:
CNN: the input image is fed into the CNN layers. These layers
are trained to extract relevant features from the image. Each
layer consists of three operations. First, the convolution
operation, which applies a filter kernel of size 5×5 in the first
two layers and 3×3 in the last three layers to the input. Then,
the non-linear RELU function is applied.
Finally, a pooling layer summarizes image regions and
outputs a downsized version of the input. While the image
height is downsized by 2 in each layer, feature maps
(channels) are added, so that the output feature map (or
sequence) has a size of 32×256.
RNN: the feature sequence contains 256 features per time-step,
the RNN propagates relevant information through this
sequence. The popular Long Short-Term Memory (LSTM)
implementation of RNNs is used, as it is able to propagate
information through longer distances and provides more robust
training-characteristics than vanilla RNN. The RNN output
sequence is mapped to a matrix of size 32×80. The IAM dataset
consists of 79 different characters, further one additional
character is needed for the CTC operation (CTC blank label),
therefore there are 80 entries for each of the 32 time-steps.
CTC: while training the NN, the CTC is given the RNN output
matrix and the ground truth text and it computes the loss
value. While inferring, the CTC is only given the matrix and it
decodes it into the final text. Both the ground truth text and the
recognized text can be at most 32 characters long.
2. Is it financially feasible?
3. Will the project’s time to market beat competition?
4.4. TENSORFLOW:
4.5. NUMPY:
5.2.PROJECT OVERVIEW:
DESCRIPTION:
5.3.1.FLASKFRAMEW
ORK:
BREAKDOWN MODEL
5.3.2.1.Image Acquisition:
6.2.2. RELATIONSHIPS:
It illustrates the meaningful connections between things. It
shows the association between the entities and defines the
functionality of an application.
6.2.3. DIAGRAMS:
The diagrams are the graphical implementation of the models that
incorporate symbols and text. Each symbol has a different meaning
in the context of the UML diagram. There are thirteen different
types of UML diagrams that are available in UML 2.0, such that
each diagram has its own set of a symbol. And each diagram
manifests a different dimension, perspective, and view of the
system.
6.3. UML DIAGRAMS:
6.3.1. USECASEDIAGRAM:
7.2.1. MAIN.PY:
import
sys
import
argparse
import
cv2
import edit distance
fromDataLoader import
DataLoader, Batch from Model
import Model, DecoderType
fromSamplePreprocessor import
preprocess
classFilePaths:
“filenames and paths to data”
fnCharList =
‘H:/IAM/HTR/model/charList.txt’
fnAccuracy =
‘H:/IAM/HTR/model/accuracy.txt’
fnTrain = ‘H:/IAM/HTR/data/’
fnInfer =
‘H:/IAM/HTR/data/test.png’
fnCorpus =
‘H:/IAM/HTR/data/corpus.txt’
def train(model,
loader):
“train NN
epoch = 0 # number of training epochs since start
bestCharErrorRate = float(‘inf’) # best occurred character
error rate noImprovementSince = 0 # number of epochs no
improvement of
character error rate occurred
# train
print(‘Trai
n NN’)
loader.trai
nSet()
whileloader.hasNext():
interInfo =
loader.getIterator Info()
batch = loader.getNext()
loss = model.trainBatch(batch)
print(‘Batch:’, iterInfo[0],’/’, iterInfo[1], ‘Loss:’, loss)
# validate
charErrorRate = validate(model, loader)
noImprovementSi
nce = 0
model.save()
open(FilePaths.fnAccuracy, ‘w’).write(‘Validation
character error rate of saved model: %f%%’ %
else (charErrorRate*100.0))
:
print(‘Character error rate not
improved’) noImprovementSince
+= 1
def validate(model,
loader):
“validate NN”
print(‘Validate
NN’)
loader.validati
onSet()
numCharErr =
0
numCharTotal = 0
numWordOK = 0
numWordTotal
=0
whileloader.has
Next():
interInfo = loader.getIterator Info()
print(‘Batch:’, interInfo[0],’/’,
interInfo[1]) batch =
loader.getNext()
(recognized, _) = model.inferBatch(batch)
print(‘Ground truth ->
Recognized’) for i in
range(len(recognized)):
numWord += 1 if batch.gtTexts[i] ==
recognized[i] else 0 numWordTotal += 1
dist = edit distance.eval(recognized[i],
batch.gtTexts[i]) numCharErr += dist
numCharTotal += len(batch.gtTexts[i])
print(‘[OK]’ if dist==0 else ‘[ERR:%d]’ % dist,’”’ +
batch.gtTexts[i] + ‘”’, ‘->’, ‘”’ + recognized[i] + ‘”’)
def main(path):
“main function”
# optional command line
args parser =
argparse.ArgumentParser()
parser.add_argument(‘—train’, help=’train the NN’, action=’store_true’)
parser.add_argument(‘—validate’, help=’validate
the NN’, action=’store_true’)
parser.add_argument(‘—beam search’, help=’use beam search
instead of best path decoding’, action=’store_true’)
parser.add_argument(‘—wordbeamsearch’, help=’use
word beam search
action=’store_true
’) args =
parser.parse_args(
#args, unknown =
parser.parse_known_args() de…
7.2.2. APP.PY:
import os
from flask import Flask, render_template,
UPLOAD_FOLDER = ‘/static/uploads/’
‘jpeg’, ‘gif’])
app = Flask(_name_)
defallowed_file(filen
ame): return‘.’ in
filename and \
@app.route(
‘/’)
defhome_
page():
returnrender_template(‘inde
x.html’) #return“Hi”
@app.route(‘/upload’, methods=[‘GET’,
‘POST’]) defupload_page():
if request.method == ‘POST’:
file.save(os.path.join(os.getcwd() +
UPLOAD_FOLDER, file.filename))
extracted_text,probability =
main(os.path.join(os.getcwd() +
UPLOAD_FOLDER, file.filename)).split(‘,’)
returnrender_template(‘upload.html’,
msg=’Successfully processed’,
extracted_text=extracted_text,
probability=probability,
img_src=UPLOAD_FOLDER +
‘GET’:
returnrender_template(‘upload.html’)
if _name_ == ‘_main_’:
app.run()
7.2.3. UPLOAD
HTML:
<!DOCTYPE html>
<html>
<head>
<title>Upload Image</title>
</head>
<body>
{% if msg %}
{% endif %}
</form>
<h1>Result:</
h1>
{% if img_src %}
{% endif %}
{% if extracted_text %}
<p> The extracted text from the image above is: <b>
{{ extracted_text
}} </b></p>
{% else %}
{% endif %}
{% if probability %}
{% else %}
{% endif %}
</body>
</html>
7.3. INPUT OUTPUT SCREENS:
8. SYSTEM MAINTENANCE
WEB REFERENCES:
1. https://towardsdatascience.com/2326a3487cd5
2. https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742
3. https://arxiv.org/pdf/1507.05717.pdf