Final Finaldoc
Final Finaldoc
Neural Networks
A Industry Oriented Mini Project report submitted
in partial fulfillment of requirements
for the award of degree of
Bachelor of Technology
In
Information Technology
By
i
Gayatri Vidya Parishad College of Engineering (Autonomous)
Visakhapatnam
CERTIFICATE
This report on “DIGITALIZATION OF HANDWRITTEN TEXT
USING NEURAL NETWORKS” is a bonafide record of the mini
project work submitted
By
KANDULA ANUSHA (Reg No:17131A1251)
ALLAMSETTYSAILAJA (RegNo:17131A1206)
CHITLURI ANJANI NOOKAMBICA (Reg No:17131A1223)
AMULYA NANDANA (Reg No:17131A1207)
in their VII semester in partial fulfillment of the requirements for the Award of Degree of
Bachelor of Technology
In
Information Technology
During the academic year 2020-2021
ii
DECLARATION
degree of B.Tech is of our own and it is not submitted to any other university or has
DATE : A.SAILAJA(17131A1206)
CH.ANJANI NOOKAMBIKA(17131A123)
N.AMULYA(17131A1207)
iii
ACKNOWLEDGEMENT
Finally we would like to thank all those people who helped us in many ways in
completing this project.
K. Anusha(17131A1251)
A. Sailaja(17131A1206)
N. Amulya (17131A1207)
iv
ABSTRACT
v
CONTENTS:
1. INTRODUCTION
1.1 Objective
1.2 Theory
1.4 Purpose
1.5 Scope
2. SRS DOCUMENT
3. ALGORITHM ANALYSIS
vi
4. SOFTWARE DESCRIPTION
4.3 Pycharm
4.5 Numpy
5. PROJECT DESCRIPTION
5.3.2. Model
5.3.2.2. Pre-Processing
5.3.2.4. Post-Processing
6. SYSTEM DESIGN
6.2.1. Things
6.2.2. Relationships
6.3.3. Diagrams
vii
6.3.1. Use Case Diagram
7. DEVELOPMENT
7.2.1. Main.py
7.2.2. App.py
7.2.3. Upload.html
8. SYSTEMMAINTAINENCE
9. CONCLUSION
10. BIBLIOGRAPH
viii
1. INTRODUCTION
1.1. OBJECTIVE:
1.2. THEORY:
1
Fig.1.3.1. Convolutional Neural Network Diagram
2
1.2.3. WHAT IS NEURAL NETWORK:
3
● Adaptive Learning: An ability to learn how to do tasks
based on the data given for training or initial experience.
● Self-Organization: An ANN can create its own
organization or representation of the information it
receives during learning time.
● Real Time Operation: ANN computations may be
carried out in parallel, and special hardware devices
are being designed and manufactured which take
advantage of this capability.
● Fault Tolerance via Redundant Information coding:
partial destruction of network leads to the corresponding
degradation of performance. However, some network
capabilities may be retained even with major network
damage.
4
one or more convolutional layers and are used mainly for image
processing, classification, segmentation. Each convolutional layer
contains a series of filters known as convolutional kernels. The filter
is a matrix of integers that are used on a subset of the input pixel
values, the same size as the kernel.
Each pixel is multiplied by the corresponding value in the kernel,
then the result is summed up for a single value for simplicity
representing a grid cell, like a pixel, in the output channel/feature
map.
5
1.3.3.CONNECTIONISTTEMPORAL CLASSIFICATION (CTC)
If you want a computer to recognize text, neural networks (NN)
are a good choice as they outperform all other approaches at the
moment. The NN for such use-cases usually consists of
convolutional layers (CNN) to extract a sequence of features and
recurrent layers (RNN) to propagate information through this
sequence. It outputs character-scores for each sequence-element,
which simply is represented by a matrix. Now, there are two
things we want to do with this matrix:
6
1.4.PURPOSE:
● Document Reading
1.5.SCOPE:
7
2. SRS DOCUMENT
2.1. FUNCTIONAL REQUIREMENTS:
● The system should process the input given by the user only if it is
an image file.
● System will show the error message to the user when the input
given is not in the required format.
Drawbacks:
● OCR text works well with printed text only and not with
handwritten text. Handwriting needs to be learnt by the
computer.
● OCR systems are expensive.
● Images produced by a scanner consume a lot of memory space.
● Images lose some quality during the scanning and digitizing
process.
● Quality of the final image depends on the quality of the original
image.
● All the documents need to be checked over carefully and then
manually corrected.
● Direct use of OCR remains a difficult problem to resolve, as it
leads to low reading accuracy.
9
3.2. PROPOSED SYSTEM:
CNN: the input image is fed into the CNN layers. These layers are
trained to extract relevant features from the image. Each layer
consists of three operations. First, the convolution operation, which
applies a filter kernel of size 5×5 in the first two layers and 3×3 in
the last three layers to the input. Then, the non-linear RELU
function is applied.
Finally, a pooling layer summarizes image regions and outputs a
downsized version of the input. While the image height is
downsized by 2 in each layer, feature maps (channels) are added, so
that the output feature map (or sequence) has a size of 32×256.
RNN: the feature sequence contains 256 features per time-step, the
RNN propagates relevant information through this sequence. The
popular Long Short-Term Memory (LSTM) implementation of
RNNs is used, as it is able to propagate information through longer
distances and provides more robust training-characteristics than
vanilla RNN. The RNN output sequence is mapped to a matrix of
size 32×80. The IAM dataset consists of 79 different characters,
further one additional character is needed for the CTC operation
(CTC blank label), therefore there are 80 entries for each of the 32
time-steps.
CTC: while training the NN, the CTC is given the RNN output
10
matrix and the ground truth text and it computes the loss value. While
inferring, the CTC is only given the matrix and it decodes it into the
final text. Both the ground truth text and the recognized text can be at
most 32 characters long.
2. Is it financially feasible?
3. Will the project’s time to market beat competition?
This study is carried out to check the economic impact that the
system will have on the organization. The amount of funds that the
company can pour into the research and development of the system
is limited. The expenditures must be justified. Thus the developed
system as well within the budget and this was achieved because
most of the technologies used are freely available. Only the
customized products must be purchased.
11
4. SOFTWARE DESCRIPTION
4.5. NUMPY:
14
Fig.5.2. Neural Network Diagram
15
application can be some web pages, a blog, awiki or go as big as a
web-based calendar application or a commercial website. Flask is
part of the categories of the micro-framework. Micro-framework are
normally frameworks with little to no dependencies to external
libraries. This framework is light, there is little dependency to update
and watch for security bugs.
5.3.2.MODEL:
BREAKDOWN MODEL
5.3.2.2. Pre-processing:
• Then, we copy the image into a (white) target image of size 128×32.
5.3.2.4. Post-Processing:
17
● proposed system is designed to recognize English alphabets and
digit
18
6. SYSTEM DESIGN
6.2.2. RELATIONSHIPS:
It illustrates the meaningful connections between things. It shows the
association between the entities and defines the functionality of an
application.
6.2.3. DIAGRAMS:
The diagrams are the graphical implementation of the models that
incorporate symbols and text. Each symbol has a different meaning in
the context of the UML diagram. There are thirteen different types of
UML diagrams that are available in UML 2.0, such that each diagram
has its own set of a symbol. And each diagram manifests a different
dimension, perspective, and view of the system.
20
6.3. UML DIAGRAMS:
21
6.3.2. SEQUENCEDIAGRAM:
22
6.3.3.ACTIVITY DIAGRAM
23
7. DEVELOPMENT
7.2.1. MAIN.PY:
# validate
charErrorRate = validate(model, loader)
noImprovementSince = 0
model.save()
else:
25
open(FilePaths.fnAccuracy, ‘w’).write(‘Validation character
error rate of saved model: %f%%’ % (charErrorRate*100.0))
else:
print(‘Character error rate not improved’)
noImprovementSince += 1
26
print(‘[OK]’ if dist==0 else ‘[ERR:%d]’ % dist,’”’ + batch.gtTexts[i] +
‘”’, ‘->’, ‘”’ + recognized[i] + ‘”’)
def main(path):
“main function”
# optional command line
args parser =
argparse.ArgumentParser()
27
parser.add_argument(‘—train’, help=’train the NN’, action=’store_true’)
parser.add_argument(‘—validate’, help=’validate
the NN’, action=’store_true’)
parser.add_argument(‘—beam search’, help=’use beam search
instead of best path decoding’, action=’store_true’)
parser.add_argument(‘—wordbeamsearch’, help=’use
word beam search instead of best path decoding’,
action=’store_true’) parser.add_argument(‘—dump’,
help=’dump output of NN to CSV file(s)’, action=’store_true’)
args = parser.parse_args()
#args, unknown =
parser.parse_known_args() de…
7.2.2. APP.PY:
import os
UPLOAD_FOLDER = ‘/static/uploads/’
app = Flask(_name_)
filename.rsplit(‘.’, 1)[1].lower() in
ALLOWED_EXTENSIONS
@app.route(‘/’)
defhome_page():
28
returnrender_template(‘index.html’) #return“Hi”
defupload_page():
if request.method == ‘POST’:
# check if the post request has the file part if file’ not in
request.files:
file = request.files[‘file’]
if file.filename == ‘’:
file.save(os.path.join(os.getcwd() +
UPLOAD_FOLDER, file.filename))
extracted_text,probability = main(os.path.join(os.getcwd() +
UPLOAD_FOLDER, file.filename)).split(‘,’)
29
returnrender_template(‘upload.html’, msg=’Successfully
processed’, extracted_text=extracted_text,
probability=probability, img_src=UPLOAD_FOLDER +
returnrender_template(‘upload.html’)
if _name_ == ‘_main_’:
app.run()
<!DOCTYPE html>
<html>
<head>
<title>Upload Image</title>
</head>
<body>
{% if msg %}
{% endif %}
30
<p><input type=file name=file>
</form>
<h1>Result:</
h1>
{% if img_src %}
{% endif %}
{% if extracted_text %}
<p> The extracted text from the image above is: <b> {{
extracted_text
}} </b></p>
{% else %}
{% endif %}
{% if probability %}
{% else %}
31
{% endif %}
</body>
</html>
32
7.3. INPUT OUTPUT SCREENS:
33
34
35
36
37
38
39
40
41
8. SYSTEM MAINTENANCE
42
9. CONCLUSION
43
10. BIBLIOGRAPHY
WEB REFERENCES:
1. https://towardsdatascience.com/2326a3487cd5
2. https://repositum.tuwien.ac.at/obvutwhs/download/pdf/2874742
3. https://arxiv.org/pdf/1507.05717.pdf
44