Rajeswari
Rajeswari
Rajeswari
Bachelor of Technology
in
Computer Science & Engineering
By
May, 2024
HAND WRITTEN LETTER RECOGNITION WITH CNN
USING DEEP LEARNING
Bachelor of Technology
in
Computer Science & Engineering
By
May, 2024
CERTIFICATE
It is certified that the work contained in the project report titled “HAND WRITTEN LETTER RECOG-
NITION WITH CNN USING DEEP LEARNING” by “ATCHI BALASRINIVASARAO (20UECS0-
077), YEGURU PENCHALA VINEELA (20UECS1041)” has been carried out under my supervision
and that this work has not been submitted elsewhere for a degree.
i
DECLARATION
We declare that this written submission represents our ideas in our own words and where others’
ideas or words have been included, we have adequately cited and referenced the original sources. We
also declare that we have adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will be cause for disciplinary action by the Institute and can also
evoke penal action from the sources which have thus not been properly cited or from whom proper
permission has not been taken when needed.
ATCHI BALASRINIVASARAO
Date: / /
ii
APPROVAL SHEET
This project report entitled HAND WRITTEN LETTER RECOGNITION WITH CNN USING DEEP
LEARNING by ATCHI BALASRINIVASARAO (20UECS0077), YEGURU PENCHALA VINEELA
(20UECS1041) is approved for the degree of B.Tech in Computer Science & Engineering.
Examiners Supervisor
Dr. D. Rajesh,M.E.,Ph.D.,
Professor.,
Date: / /
Place:
iii
ACKNOWLEDGEMENT
We express our deepest gratitude to our respected Founder Chancellor and President Col. Prof.
Dr. R. RANGARAJAN B.E. (EEE), B.E. (MECH), M.S (AUTO),D.Sc., Foundress President Dr.
R. SAGUNTHALA RANGARAJAN M.B.B.S. Chairperson Managing Trustee and Vice President.
We are very much grateful to our beloved Vice Chancellor Prof. S. SALIVAHANAN, for provid-
ing us with an environment to complete our project successfully.
We record indebtedness to our Professor & Dean, Department of Computer Science & Engi-
neering, School of Computing, Dr. V. SRINIVASA RAO, M.Tech., Ph.D., for immense care and
encouragement towards us throughout the course of this project.
We are thankful to our Head, Department of Computer Science & Engineering,Dr.M.S. MU-
RALI DHAR, M.E., Ph.D., for providing immense support in all our endeavors.
We also take this opportunity to express a deep sense of gratitude to our Internal Supervisor Dr.
D. RAJESH,M.E.,Ph.D., for his cordial support, valuable information and guidance, he helped us in
completing this project through various stages.
A special thanks to our Project Coordinators Mr. V. ASHOK KUMAR, M.Tech., Ms. C.
SHYAMALA KUMARI, M.E., for their valuable guidance and support throughout the course of the
project.
We thank our department faculty, supporting staff and friends for their help and guidance to com-
plete this project.
iv
ABSTRACT
v
LIST OF FIGURES
vi
LIST OF ACRONYMS AND
ABBREVIATIONS
vii
TABLE OF CONTENTS
Page.No
ABSTRACT v
LIST OF FIGURES vi
1 INTRODUCTION 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Project Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 LITERATURE REVIEW 4
3 PROJECT DESCRIPTION 7
3.1 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Feasibility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Economic Feasibility . . . . . . . . . . . . . . . . . . . . . 9
3.3.2 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . 9
3.3.3 Social Feasibility . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 System Specification . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4.1 Hardware Specification . . . . . . . . . . . . . . . . . . . . 10
3.4.2 Software Specification . . . . . . . . . . . . . . . . . . . . 10
3.4.3 Standards and Policies . . . . . . . . . . . . . . . . . . . . 10
4 METHODOLOGY 12
4.1 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.1 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . 14
4.2.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . 16
4.2.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.4 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . 18
4.2.5 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Algorithm & Pseudo Code . . . . . . . . . . . . . . . . . . . . . . 20
4.3.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.2 Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Module Description . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4.1 Import libraries and Dataset . . . . . . . . . . . . . . . . . 21
4.4.2 The Data Preprocessing . . . . . . . . . . . . . . . . . . . 21
4.4.3 Create the Model . . . . . . . . . . . . . . . . . . . . . . . 22
4.4.4 Train The Model . . . . . . . . . . . . . . . . . . . . . . . 22
4.5 Steps to execute/run/implement the project . . . . . . . . . . . . . . 22
4.5.1 Import required Libraries . . . . . . . . . . . . . . . . . . . 22
4.5.2 Training Model . . . . . . . . . . . . . . . . . . . . . . . . 23
4.5.3 Testing Accuracy . . . . . . . . . . . . . . . . . . . . . . . 23
8 PLAGIARISM REPORT 37
9 SOURCE CODE & POSTER PRESENTATION 38
9.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
9.2 Poster Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . 40
References 40
Chapter 1
INTRODUCTION
1.1 Introduction
The project involving the extraction of handwritten characters from images using a
CNN algorithm represents a significant advancement in the field of computer vision
and character recognition. This project’s primary objective is to develop a robust
and accurate system that can automatically identify and extract handwritten charac-
ters from a wide range of sources, such as scanned documents, handwritten notes,
and images. Handwriting, being highly variable and unique, poses a complex chal-
lenge in OCR, making the utilization of CNNs and deep learning techniques pivotal.
The system’s workflow typically begins with the acquisition of images containing
handwritten letter, followed by preprocessing steps to enhance the quality of these
images. The CNN model, known for its exceptional feature extraction capabilities,
plays a central role in the project. Through a process of training and fine-tuning on a
diverse dataset of handwritten characters, the CNN learns to recognize and differen-
tiate various handwritten characters and symbols. Once trained, the CNN model can
be deployed to automatically process and extract characters from new, unseen im-
ages, achieving a level of accuracy that rivals or surpasses traditional OCR methods.
This technology holds immense potential in numerous applications, from digitiz-
ing historical handwritten documents and improving data entry processes to aiding
individuals with disabilities by converting handwritten recognition. Furthermore, as
handwritten letter is prevalent in a multitude of languages and forms, the project’s
impact transcends borders and domains. Its successful implementation can signifi-
cantly enhance our ability to efficiently convert handwritten content into digital data,
making it accessible, searchable, and editable, thereby revolutionizing the way we
interact with and manage handwritten information. In conclusion, the project’s aim
to employ CNN algorithms for handwritten character extraction is poised to con-
tribute significantly to the advancement of OCR technology and its broad range of
practical applications. We can detect single letter and convert to telugu.
1
1.2 Aim of the Project
The project aims to implement a deep learning model specifically designed to rec-
ognize uppercase alphabets (A to Z) from handwritten images. High accuracy in
character recognition will be a primary focus, achieved through appropriate model
training on a suitable dataset. Overall, the objective is to create a practical and re-
liable solution that finds utility in digitizing handwritten content and assisting in
optical character recognition tasks with ease of use for all users.
2
ter is not only time-consuming when done manually but also prone to human errors.
The emergence of deep learning techniques, particularly CNN, has revolutionized
OCR by offering highly accurate pattern recognition capabilities. The potential of
these techniques to decipher handwritten characters holds immense promise, as they
can enhance the efficiency of letter extraction from images, documents, and forms.
3
Chapter 2
LITERATURE REVIEW
Li Yuan, et al. [1], [2021], proposed Transformers, which are popular for lan-
guage modeling, have been explored for solving vision tasks recently, e.g., the Vision
Transformer (ViT) for image classification. The ViT model splits each image into a
sequence of tokens with fixed length and then applies multiple Transformer layers
to model their global relation for classification. However, ViT achieves inferior per-
formance to CNNs when trained from scratch on a midsize dataset like ImageNet.
We find it is because: 1) the simple tokenization of input images fails to model the
important local structure such as edges and lines among neighboring pixels, leading
to low training sample efficiency; 2) the redundant attention backbone design of ViT
leads to limited feature richness for fixed computation budgets and limited training
samples.
Jiaji Yang,et al.[3], [2021], proposed At present, industrial robotics focused more
on motion control and vision; whereas Humanoid Service Robotics (HSRs) are in-
creasingly being investigated among researchers’ and practitioners’ field of speech
interactions. The problematic and quality of human-robot interaction (HRI) has be-
come one of the hot potatoes concerned in academia. This paper proposes a novel
interactive framework suitable for HSRs. The proposed framework is grounded on
the novel integration of Trevarthen Companionship Theory and neural image gener-
4
ation algorithm in computer vision. By integrating the image-tonatural interactivities
generation, and communicate with the environment to better interact with the stake-
holder, thereby changing from interaction to a bionic-companionship.
R. B. Arif, et al.[5], [2021], proposed SVM based hand written letter recognition.
Authors claim that SVM outperforms the Multilayer perceptron classifier. Experi-
ment is carried out on MNIST standard dataset. Advantage of Multilayer perceptron
is that it is able to segment non-linearly separable classes. However, Multilayer per-
ceptron can easily fall into a region of local minimum, where the training will stop
assuming it has achieved an optimal point in the error surface.
R. B. Arif, et al.[6], [2021], proposed SVM based offline digit recognition. Au-
thors claim that SVM outperforms the Multilayer perceptron classifier. Experiment
is carried out on MNIST standard dataset. Advantage of Multilayer perceptron is that
it is able to segment non-linearly separable classes. However, Multilayer perceptron
can easily fall into a region of local minimum, where the training will stop assuming
it has achieved an optimal point in the error surface.
5
proach. The membership function is modified by two structural parameters that are
estimated by optimizing the entropy subject to the attainment of membership func-
tion to unity. The overall recognition rate is found to be 95Hindi numerals and%
percent for English numerals.
N.Alrobah, et al.[9], [2021], proposed Hybrid Deep Model for Recognizing Ara-
bic Handwritten Characters.Handwriting recognition for computer systems has been
in research for a long time, with different researchers having an extensive variety of
methods at their disposal. The problem is that most of these experiments are done
in English, as it is the most spoken language in the world. But other languages such
as Arabic, Mandarin, Spanish, French, and Russian also need research done on them
since there are millions of people who speak them. In this work, recognizing and de-
veloping Arabic handwritten characters is proposed by cleaning the state-of- the-art
Arabic dataset called Hijaa, developing Conventional Neural Network (CNN) with
a hybrid model using Support Vector Machine (SVM) and eXtreme Gradient Boost-
ing (XGBoost) classifiers. The CNN is used for feature extraction of the Arabic
character images, which are then passed on to the Machine Learning classifiers. A
recognition rate of up to 96.329classes is achieved, far surpassing the already state-
of-the-art results of the Hijaa dataset.
6
Chapter 3
PROJECT DESCRIPTION
Disadvantages
• Depending on the complexity of the input handwriting and variations in writing
styles, the accuracy of character recognition may not always be reliable. The
system might produce incorrect or inaccurate results, impacting the overall use-
fulness of the application.
• The current code might be challenging to scale up for handling large-scale, batch
processing of multiple images simultaneously.
• It lacks the ability to recognize characters, numbers, or other symbols, limiting
its overall usefulness for handling diverse types of handwritten content.
7
• Depending on the complexity of the input handwriting and variations in writing
styles, the accuracy of character recognition may not always be reliable. The
system might produce incorrect or inaccurate results, impacting the overall use-
fulness of the application.
• The current code might be challenging to scale up for handling large-scale, batch
processing of multiple images simultaneously.
• It lacks the ability to recognize characters, numbers, or other symbols, limiting
its overall usefulness for handling diverse types of handwritten content.
The proposed system for the “Handwritten Letter Recognition” project aims to
overcome the limitations of the existing system and introduce significant improve-
ments. The key enhancements include extending character recognition to upper case
characters, enabling the system to process diverse handwritten content effectively.
To achieve higher accuracy, the proposed system will leverage a more extensive
and diverse dataset for training the deep learning model CNN algorithm, accom-
modating various handwriting styles .The proposed system will support making it
suitable for handling handwriting letter in different scripts. An improved user inter-
face will provide an intuitive and seamless experience, while cloud-based processing
will offer scalability and efficient resource utilization. Adaptive preprocessing and
a quality check mechanism will optimize recognition results and prompt users to
re-upload low-quality images. Overall, these proposed enhancements will transform
the “Handwritten Letter Recognition” into a versatile and powerful application ca-
pable of accurately recognizing and converting various recognize handwritten letter,
catering to a wider range of practical use cases and delivering an enhanced user ex-
perience.
Advantages
• By utilizing a larger and more diverse dataset for training the deep learning
model, the proposed system enhances recognition accuracy, ensuring reliable
and precise Letter extraction from handwritten images.
• The proposed system achieves faster processing times and reduced resource con-
sumption, enhancing overall performance.
8
• The proposed system’s implementation of comprehensive error handling and
validation mechanisms ensures that it gracefully handles unexpected scenarios
and provides users with informative feedback, enhancing the overall user expe-
rience.
The feasibility of the project is analyzed in this phase and business proposal is
put forth with a very general plan for the project and some cost estimates. During
system analysis the feasibility study of the proposed system is to be carried out. This
is to ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.
This study is carried out to check the economic impact that the system will have
on the organization. The amount of fund that the company can pour into the research
and development of the system is limited. The expenditures must be justified. Thus
the developed system as well within the budget and this was achieved because most
of the technologies used are freely available. Only the customized products had to
be purchased.
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand
on the available technical resources. This will lead to high demands on the available
technical resources. This will lead to high demands being placed on the client. The
developed system must have a modest requirement, as only minimal or null changes
are required for implementing this system.
The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user
must not feel threatened by the system, instead must accept it as a necessity. The
9
level of acceptance by the users solely depends on the methods that are employed
to educate the user about the system and to make him familiar with it. His level of
confidence must be raised so that he is also able to make some constructive criticism,
which is welcomed, as he is the final user of the system.
This project needs the help of hardware and software requirements to be fit in
the computer or the laptop Pc. The user and the toolkits and hardware and software
requirements are required also.
Anaconda Prompt
Anaconda prompt is a type of command line interface which explicitly deals with the
ML( MachineLearning) modules.And navigator is available in all the Windows,Linux
10
and MacOS.The anaconda prompt has many number of IDE’s which make the cod-
ing easier. The UI can also be implemented in python.
Standard Used: ISO/IEC 27001
Jupyter Notebook
It’s like an open source web application that allows us to share and create the doc-
uments which contains the live code, equations, visualizations and narrative text. It
can be used for data cleaning and transformation, numerical simulation, statistical
modeling, data visualization, machine learning.
Standard Used: ISO/IEC 27001
11
Chapter 4
METHODOLOGY
The Figure 4.1 describes the architecture diagram of Letter Recognition using
CNN. It shows us the steps involved in recognition the given Input Letter by using
CNN.
12
letter images are segmented into a subimage of individual letters, which are assigned
a number to each letter. Each individual letter is resized into pixels. In this step an
edge detection technique is being used for segmentation of dataset images.
Convolutional Layer: This layer is the first layer that is used to extract the various
features from the input images. In this layer, the mathematical operation of convolu-
tion is performed between the input image and a filter of a particular size MxM. By
sliding the filter over the input image, the dot product is taken between the filter and
the parts of the input image with respect to the size of the filter (MxM).
13
4.2 Design Phase
The Figure 4.2 describes the datadlow diagram of handwritten letter recognition
Dataset: The dataset serves as the foundation of the project, containing handwrit-
ten letters in various forms, styles, and contexts. These letters serve as the input data
for the recognition system.
Data Preprocessing: In this stage, the dataset undergoes several preprocessing
steps to ensure its compatibility and suitability for the deep learning model. Pre-
processing steps may include resizing images to a standard size, converting images
to grayscale, normalization to adjust pixel values, and augmentation to increase the
diversity of the dataset. These steps aim to enhance the model’s ability to learn and
generalize from the data.
CNN Algorithm: The preprocessed data is fed into a CNN, a type of deep learning
algorithm specifically designed for analyzing visual data such as images. The CNN
consists of multiple layers, including convolutional layers, pooling layers, and fully
connected layers. These layers work together to extract relevant features from the
input images and learn patterns that distinguish different handwritten letters.
14
Feature Extraction: Within the CNN, feature extraction occurs as the convolu-
tional layers analyze the input images, detecting edges, shapes, and other visual
patterns that are characteristic of handwritten letters. These features are then passed
to subsequent layers for further processing.
Model Building: Using the extracted features, a neural network model is con-
structed. This may involve designing and configuring the architecture of the neural
network, including the number and configuration of layers, activation functions, and
other parameters. Additionally, techniques such as transfer learning may be em-
ployed, where a pre-trained neural network is adapted and fine-tuned for the specific
task of handwritten letter recognition.
Test Images: Separate images that were not used during training are input into
the trained model for testing. These test images serve as a benchmark to evaluate
the performance and accuracy of the model in recognizing handwritten letters. It’s
crucial to use unseen data for testing to assess how well the model generalizes to
new, unseen examples.
Predict Output: The trained model predicts outputs or labels for the test images
based on the patterns and features learned during training. The output may include
the recognized letter corresponding to each input image, along with a confidence
score indicating the model’s certainty in its prediction. This output can be further
analyzed and used for various applications, such as optical character recognition
(OCR), document analysis, and text processing.
15
4.2.2 Use Case Diagram
The Figure 4.3 describes about the use case diagram. A use case diagram at its
simplest is a representation of a user interaction with the system that shows the re-
lationship between the user and the different use cases in which the user is involved
The actor can be human or other external system that provides input letter . The
input given by actor is sent to letter recognition system which contains various use
cases, The user can upload the image of the letter he wants to detect. The given input
images are pre-processed . The pre-processing additionally characterizes a smaller
portrayal of the example. Binarization changes over a gray scale image into a binary
image.Once the pre-processing of the input images is completed, subimages of indi-
vidual letterss are formed from the sequence of images. Pre-processed letters images
are segmented into a subimage of individual letters. The given input pass undergoes
all the uses case then finally letter is recognised.
16
4.2.3 Class Diagram
The Figure 4.4 describes the Class diagram of model class structure and contents
using design elements such as classes(handwritten frame,image frame,main screen),
packages and objects. Class diagram describes three perspectives when designing
a system Conceptual, Specification, Implemenation. Classes are composed of three
things: name, attributes and operations. the handwritten frame class contains op-
erations like trainAction(), recogniseAction() and mainframe class contains the op-
erations to load and recognise data . Class diagrams also display relations such as
containment, inheritance, associations etc. The association relationship is most com-
mon relationship in a class diagram. The association shows the relationship between
instances of classes. The purpose of class diagram is to model the static view of an
application. Class diagrams are the only diagrams which can be directly mapped
with object oriented languages and thus widely used at the time of construction. The
data from image class is loaded using loadframe operation present in mainframe
class and using the method actions present in the M class the digit is recognized.
17
4.2.4 Sequence Diagram
The Figure 4.5 represents the sequence diagram is a graphical view of a scenario
that shows object interaction in a time based sequence what happens first, what hap-
pens next. Sequence diagram establish the role of objects and helps provide essential
information to determine class responsibilities and interfaces. This type of diagram
is best used during early analysis phase in design because they are simple and easy
to comprehend. Sequence diagram are normally associated with use cases presents.
18
4.2.5 Activity Diagram
The Figure 4.6 describes how activities are coordinated to provide a service which
can be at different levels of abstraction. Typically, an event needs to be achieved by
some operations, particularly where the operation is intended to achieve a number
of different things that require coordination, or how the events in a single use case
relate to one another, in particular, use cases where activities may overlap and require
coordination. It is also suitable for modeling how a collection of use cases coordinate
to represent business workflows.
19
4.3 Algorithm & Pseudo Code
4.3.1 Algorithm
1 S t e p 1 : Choose a D a t a s e t
2 Choose a d a t a s e t o f y o u r i n t e r e s t o r you c a n a l s o c r e a t e y o u r own image d a t a s e t f o r s o l v i n g y o u r
3 own image c l a s s i f i c a t i o n p r o b l e m . An e a s y p l a c e t o c h o o s e a d a t a s e t i s on k a g g l e . com .
4 The d a t a s e t I m g o i n g w i t h c a n be f o u n d h e r e .
5 T h i s d a t a s e t c o n t a i n s 1 2 , 5 0 0 a u g m e n t e d i m a g e s o f b l o o d c e l l s ( JPEG ) w i t h a c c o m p a n y i n g c e l l
6 t y p e l a b e l s (CSV) . T h e r e a r e a p p r o x i m a t e l y 3 , 0 0 0 i m a g e s f o r e a c h o f 4 d i f f e r e n t c e l l t y p e s
7 g r o u p e d i n t o 4 d i f f e r e n t f o l d e r s ( a c c o r d i n g t o c e l l t y p e ) . The c e l l t y p e s a r e E o s i n o p h i l ,
8 Lymphocyte , Monocyte , and N e u t r o p h i l .
9 Here a r e a l l t h e l i b r a r i e s t h a t we would r e q u i r e and t h e c o d e f o r i m p o r t i n g them .
10
15 R e s i z i n g i m a g e s i n t o 200 X 200
16
17 Step 3 : C r e a t e T r a i n i n g Data
18 T r a i n i n g i s an a r r a y t h a t w i l l c o n t a i n image p i x e l v a l u e s and t h e i n d e x a t which t h e image i n t h e
19 CATEGORIES l i s t .
20
23 S t e p 5 : A s s i g n i n g L a b e l s and F e a t u r e s
24 This shape of both the l i s t s w i l l be u s e d i n C l a s s i f i c a t i o n u s i n g t h e NEURAL NETWORKS.
25
26 S t e p 6 : N o r m a l i s i n g X and c o n v e r t i n g l a b e l s t o c a t e g o r i c a l d a t a
27
28 S t e p 7 : S p l i t X and Y f o r u s e i n CNN
29
32 S t e p 9 : A c c u r a c y and S c o r e o f model
33
34 c a p t i o n ={XCOMPRESSCU f u n c t i o n } ]
35 XCOMPRESSCU( * pCurCU )
36 M <− FastCUMope ( PO , QP )
37 i f M ! = SPLIT t h e n
38 C2n <− CHECKINTRA( pCurCU )
39 else
40 C2n <−
41 end i f
42 i f M ! = HOMO and Dcur < Dmax t h e n
43 Cn <− 0
44 f o r i = 0 t o 3 do
45 pSubCUi <− p o i n t e r t o SubCUi
20
46 CN <− CN + XCompressCU ( pSubCUi )
47 end f o r
48 else
49 CN <−
50 end i f
51 CHECKBESTMODE( C2N , CN)
52 end f u n c t i o n
At the project beginning, we import all the needed modules for training our model.
We can easily import the dataset and start working on that because Keras library
already contains many datasets and MNIST is one of them. We call MNIST. load
data() function to get training data with its labels and also the testing data with its
labels.
Model cannot take the image data directly so we need to perform some basic op-
erations and process the data to make it ready for our neural network. The dimension
of the training data is (60000*28*28). One more dimension is needed for the CNN
21
model so we reshape the matrix to shape (60000*28*28*1). The role of the prepro-
cessing step is it performs various tasks on the input image. It basically up grades
the image by making it reasonable for segmentation. The fundamental motivation
behind pre-processing is to take off a fascinating example from the background.
Its time for the creation of the CNN model for this Python-based data science
project. A convolutional layer and pooling layers are the two wheels of a CNN
model. The reason behind the success of CNN for image classification problems is
its feasibility with grid structured data. We will use the Adadelta optimizer for the
model compilation.
To start the training of the model we can simply call the model.fit() function
of Keras. It takes the training data, validation data, epochs, and batch size as the
parameter. The training of model takes some time. After succesful model training,
we can save the weights and model definition in the ‘mnist. h5’ file. Optimizers
are algorithms or methods used to change the attributes of the neural network such
as weights and learning rate in order to reduce the losses. There are many types of
optimizers. Each and every optimiser has its own method to deal with the weights
and bias inputs. In this project the back propagation algorithm is modified with the
adam optimizer instead of the gradient descent optimiser. At first gradient descent
was set to be final but then the adam or stochastic gradient descent optimiser is better
than anyone as the inputs from the various pixels are varying in points.
• At the project beginning, all the modules needed for training model are imported.
• One can easily import the dataset and start working on that because the Keras
library already contains many datasets and MNIST is one of them.
• The mnist load data() function is called to get training data with its labels and
also the testing data with its labels.
22
4.5.2 Training Model
• After completing data preprocessing, the CNN model is created which consists
of various convolutional and pooling layers alongside a 3x3 sized kernel.
• The model will then be trained on the basis of training and validation data with
the help of several python libraries such as TensorFlow, Pillow, OpenCV, Tkin-
ter, Numpy that were preloaded to perform these specific tasks.
• After the model is trained using the training dataset, the testing dataset is used
to evaluate how well model works.
• A particular part of the overall OCR dataset is used as the testing dataset on the
basis of which the accuracy is computed for the proposed model.
• By using CNN algorithm one can get accuracy upto 99% .
23
Figure 4.8: Training Accuracy
The Figure 4.8 describes how Training accuracy is used in the training dataset, the
testing dataset is used to evaluate how well model works. And how it is trained by
using sequential model.
24
Chapter 5
The Figure 5.1 shows the Home Page of letter recognition system in which one
can give input digit to recognise. The user can upload the image of the letter he
wants to detect. After providing the letter, one should click predict button to get the
result.
25
5.1.2 Output Design
The Figure 5.2 describes about the given input image is pre-processed. After pre-
processing the image is sent to CNN filter where it converts the three dimensionnal
image into one dimensional matrix after in pooloing layer using max pooling the
size of the data is reduced nearly to half of its orginal size. The convoluted image is
compared with the already present test images and based on the accuracy the image
is recognised.
5.2 Testing
Testing is defined as an activity to check whether the actual results match the
expected results and to ensure that the software system is defect free. It involves the
execution of a software component or system component to evaluate one or more
properties of interest. Software testing also helps to identify errors, gaps, or missing
requirements in contrary to the actual requirements.
26
5.3 Types of Testing
Input
The Figure 5.3 represents the Unit testing involves testing individual components
or modules of the system to ensure they are working correctly. In this project, unit
testing could involve testing individual components such as the alphabets, data col-
lection, and the deep learning model to ensure they are working as expected.
27
Test Result
28
Test Result
System testing ensures that the entire integrated software system meets require-
ments. It tests a configuration to ensure known and predictable results. An example
of system testing is the configuration oriented system integration test. System testing
is based on process descriptions and flows, emphasizing pre-driven process links and
integration points.
29
5.3.4 Test Result
The Figure 5.6 describes how system testing is used in the training dataset, the
testing dataset is used to evaluate how well model works. And how it is trained by
using sequential model.And also explains about the different packages in this model.
30
Chapter 6
Key rationale toward Optical Character Recognition (OCR) from image includes
features extraction technique supported by a classification algorithm for recognition
of characters based on the features. Previously, several algorithms for feature classi
f ications and extraction have been utilized for the purpose of character recognition.
But, with the advent of CNN in deep learning, no separate algorithms are required
for this purpose. However, in the area of computer vision, deep learning is one of
the outstanding performers for both feature extraction and classification. However,
CNNarchitecture consists of many nonlinear hidden layers with a enormous number
of connections and parameters. Therefore, to train the network with very less amount
of samples is a very difficult task. In CNN, only few set of parameters are needed for
training of the system. So, CNN is the key solution capable to map correctly datasets
for both input and output by varying the trainable parameters and number of hidden
layers with high accuracy.
The new system we’re proposing for Optical Character Recognition (OCR) is a
big step forward. Instead of using lots of different algorithms to figure out what char
acters are in an image, we’re using something called Convolutional Neural Networks
(CNNs). These networks are like super-smart filters that can learn to recognize pat
terns in images all on their own. It’s kind of like how our brains see things– we don’t
need someone to explain every detail, we just know what we’re looking at. With
CNNs, the computer can do the same thing!.CNNs can extract local features and
learn complex representations from input images. Despite their complexity, CNNs
require fewer trainable parameters compared to traditional methods, making them
more efficient and scalable.Overall, the proposed system not only improves accu
racy in OCR but also enhances efficiency by simplifying the feature extraction and
classification.
31
6.2 Comparison of Existing and Proposed System
1 i m p o r t numpy a s np
2 i m p o r t p a n d a s a s pd
3 import seaborn as sns
4 import m a t p l o t l i b . pyplot as p l t
5 from i m b l e a r n . u n d e r s a m p l i n g i m p o r t N e a r M i s s
6 from k e r a s . s r c . u t i l s . n p u t i l s i m p o r t n o r m a l i z e
7 from s k l e a r n . m o d e l s e l e c t i o n i m p o r t t r a i n t e s t s p l i t
8 from s k l e a r n . m e t r i c s i m p o r t c l a s s i f i c a t i o n r e p o r t , confusion matrix
9 import keras
10 from k e r a s . l a y e r s i m p o r t Dense , Conv2D , MaxPool2D , F l a t t e n , Dropout , B a t c h N o r m a l i z a t i o n
11 from k e r a s . c a l l b a c k s i m p o r t ReduceLROnPlateau
12 from t e n s o r f l o w . k e r a s . m o d e l s i m p o r t S e q u e n t i a l
13 from t e n s o r f l o w . k e r a s . l a y e r s i m p o r t Dense , Conv2D , MaxPooling2D , F l a t t e n , Dropout , B a t c h N o r m a l i z a t i o n
32
14 import tensorflow as t f
15 import tensorflow addons as t f a
16 from a d a b e l i e f t f i m p o r t A d a B e l i e f O p t i m i z e r
17 import warnings
18 warnings . f i l t e r w a r n i n g s ( ’ ignore ’ )
19 d f = pd . r e a d c s v ( ’ A Z H a n d w r i t t e n D a t a . c s v ’ )
20 y = df [ ’0 ’ ]
21 del df [ ’0 ’ ]
22 x = y . r e p l a c e ( [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 1 1 , 1 2 , 1 3 , 1 4 , 1 5 , 1 6 , 1 7 , 1 8 , 1 9 , 2 0 , 2 1 , 2 2 , 2 3 , 2 4 , 2 5 ] , [ ’A ’ , ’B ’ , ’C ’ ,
’D ’ , ’E ’ , ’ F ’ , ’G ’ , ’H ’ , ’ I ’ , ’ J ’ , ’K ’ , ’L ’ , ’M’ ,
23 ’N ’ , ’O ’ , ’ P ’ , ’Q ’ , ’R ’ , ’ S ’ , ’T ’ , ’U ’ , ’V ’ , ’W’ , ’X ’ , ’Y ’ , ’Z ’ ] )
24 nM = N e a r M i s s ( )
25 X d a t a , y d a t a = nM . f i t r e s a m p l e ( df , y )
26
27 from k e r a s . u t i l s i m p o r t n o r m a l i z e , t o c a t e g o r i c a l
28
29 # Assuming y d a t a c o n t a i n s c l a s s l a b e l s
30 y = to categorical ( y data )
31
33
Output
The Figure 6.1 describes to evaluate how the model works. And how it is trained
by using sequential model.And also explains about the different packages in this
model.
34
Chapter 7
7.1 Conclusion
The performance of CNN for letter recognition performed significantly. The pro
posed method obtained 98% accuracy and is able to identify real world images as
well as the loss percentage in both training and evaluation is less than 0.1, which is
negligible. The only challenging part is the noise present in the real world image,
which needs to look after. The learning rate of the model is much dependent on the
number of dense neurons and the cross-validation measure. Recognition of letter
us ing CNN,withRectified Linear Units activation is implemented. The proposed
CNN framework is well equipped with suitable parameters for high accuracy of OCR
letter classification. Time factor is also considered for training the system. After
ward, for further verification of accuracy, the system is also checked by changing the
number of CNN layers. It is worth mentioning here that CNN architecture design
consists of two convolutional layers. The experimented results demonstrate that the
proposed CNN framework for OCR dataset exhibits high performance in terms of
time and accuracy as compared to previously proposed systems. Consequently,letters
are recognized with high accuracy (99.21%).
For future enhancements, the “Handwritten Letter Recognition” project can ex-
plore several avenues to further improve its capabilities. One potential direction in-
volves extending the model’s recognition capabilities to include cursive handwriting
and other unique writing styles, broadening its applicability. Additionally, incorpo-
rating real-time learning mechanisms can enhance adaptability to varying user inputs
and evolving handwriting styles over time. Integration with emerging deep learning
35
architectures or exploring transfer learning techniques could further boost the models
performance. Enhancements in multi-language support and the ability to recognize
special characters would increase the projects utility across diverse linguistic con-
texts. Collaborative efforts to create a continually expanding and diverse dataset
could contribute to ongoing model training, ensuring that the system remains current
and effective. Moreover, exploring edge computing and optimizing the application
for mobile platforms could enhance accessibility and convenience. Continuous en-
gagement with user feedback and a collaborative, open-source approach can foster a
community-driven development process, allowing the “Handwritten Letter Recogni-
tion”to evolve and stay at the forefront of advancements in CNN.
36
Chapter 8
PLAGIARISM REPORT
37
Chapter 9
1 i m p o r t numpy a s np
2 import tensorflow as t f
3 from t e n s o r f l o w i m p o r t k e r a s
4 from t e n s o r f l o w . k e r a s i m p o r t l a y e r s
5
6 # Load t h e MNIST d a t a s e t
7 mnist = keras . d a t a s e t s . mnist
8 ( x tr ai n , y t r a i n ) , ( x t e s t , y t e s t ) = mnist . load data ( )
9
14 # Add a c h a n n e l d i m e n s i o n t o t h e i m a g e s
15 x t r a i n = np . e x p a n d d i m s ( x t r a i n , −1)
16 x t e s t = np . e x p a n d d i m s ( x t e s t , −1)
17
18 # C o n v e r t t h e l a b e l s t o one − h o t e n c o d i n g
19 y t r a i n = keras . u t i l s . t o c a t e g o r i c a l ( y t r a i n , 10)
20 y t e s t = keras . u t i l s . t o c a t e g o r i c a l ( y t e s t , 10)
21
22 # D e f i n e t h e CNN model a r c h i t e c t u r e
23 model = k e r a s . S e q u e n t i a l (
24 [
25 l a y e r s . Conv2D ( 3 2 , ( 3 , 3 ) , a c t i v a t i o n =” r e l u ” , i n p u t s h a p e = ( 2 8 , 2 8 , 1 ) ) ,
26 l a y e r s . MaxPooling2D ( p o o l s i z e = ( 2 , 2 ) ) ,
27 l a y e r s . Conv2D ( 6 4 , ( 3 , 3 ) , a c t i v a t i o n =” r e l u ” ) ,
28 l a y e r s . MaxPooling2D ( p o o l s i z e = ( 2 , 2 ) ) ,
29 layers . Flatten () ,
30 l a y e r s . Dense ( 1 2 8 , a c t i v a t i o n =” r e l u ” ) ,
31 l a y e r s . Dense ( 1 0 , a c t i v a t i o n =” s o f t m a x ” ) ,
32 ]
33 )
34
35 # Compile t h e model
38
36 model . c o m p i l e ( l o s s =” c a t e g o r i c a l c r o s s e n t r o p y ” , o p t i m i z e r =” adam ” , m e t r i c s = [ ” a c c u r a c y ” ] )
37
38 # T r a i n t h e model
39 model . f i t ( x t r a i n , y t r a i n , b a t c h s i z e =128 , e p o c h s =10 , v a l i d a t i o n s p l i t = 0 . 1 )
40
41 # E v a l u a t e t h e model on t h e t e s t s e t
42 t e s t l o s s , t e s t a c c = model . e v a l u a t e ( x t e s t , y t e s t )
43 p r i n t ( ” Test accuracy : ” , t e s t a c c )
44
45 model . s a v e ( ” my model . h5 ” )
39
9.2 Poster Presentation
40
References
[1] J. Feng, and S. Yan, “Tokens-to-token ViT: Training vision transformers from
scratch on ImageNet,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp.
538–547,Oct 2021.
[2] C. C. Park, Y. Kim, and G. Kim, “Retrieval of sentence sequences for an image
stream via coherence recurrent convolutional networks,” IEEE transactions on
pattern analysis and machine intelligence, vol. 40, no. 4, pp. 945-957, June 2019.
[3] Dala, N.; Triggs, B. “Histograms of oriented gradients for human detection”, In
Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR ’05), San Diego, CA, USA, 20–26 June 2020.
[5] Mayank Jain, Harshith Guptha, “Digits Recognition using CNN,” in 2021 4th
International Conference on Electrical Engineering and Information Communi
cation Technology (iCEEiCT), 2021, pp. 118- 123: IEEE.
41
Communication Technology (iCEEiCT), 2021, pp. 112-117: IEEE
42