0% found this document useful (0 votes)
11 views9 pages

Project Paper PDF

The document presents a project paper on digit recognition using Convolutional Neural Networks (CNNs), detailing the methodology, architecture, and performance metrics of the model trained on the MNIST dataset. It discusses the effectiveness of CNNs in processing visual data, their hierarchical learning capabilities, and the advantages of using deep learning for image classification tasks. The paper also highlights the use of various technologies such as TensorFlow, Keras, and NumPy in the implementation of the digit recognition model.

Uploaded by

sakethpeetha30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

Project Paper PDF

The document presents a project paper on digit recognition using Convolutional Neural Networks (CNNs), detailing the methodology, architecture, and performance metrics of the model trained on the MNIST dataset. It discusses the effectiveness of CNNs in processing visual data, their hierarchical learning capabilities, and the advantages of using deep learning for image classification tasks. The paper also highlights the use of various technologies such as TensorFlow, Keras, and NumPy in the implementation of the digit recognition model.

Uploaded by

sakethpeetha30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Similarity Report ID: oid:3618:67456687

PAPER NAME

Project Paper.pdf

WORD COUNT CHARACTER COUNT

3500 Words 19204 Characters

PAGE COUNT FILE SIZE

6 Pages 913.5KB

SUBMISSION DATE REPORT DATE

Sep 24, 2024 10:56 AM GMT+5:30 Sep 24, 2024 10:56 AM GMT+5:30

16% Overall Similarity


The combined total of all matches, including overlapping sources, for each database.
12% Internet database 12% Publications database
Crossref database Crossref Posted Content database
0% Submitted Works database

Excluded from Similarity Report


Bibliographic material

Summary
Digit Recognition Using Convolutional Neural
Network
Ms. B. Veena Saketh Peetha
15 7
Deptartment of ECE Deptartment of ECE
Institute of Aeronautical Engineering Institute of Aeronautical Engineering
Hyderabad, India Hyderabad, India
[email protected] [email protected]

Kesari Likitha Karanji abhinav


7
Department of ECE Deptartment of ECE
14
Institute of Aeronautical Engineering Institute of Aeronautical Engineering
Hyderabad, India Hyderabad, India
[email protected] [email protected]

Abstract—Digit detection using convolutional neural networks images automatically [1]. This will enable the CNN to learn
works as an exciting area of computer vision and machine effectively different shapes, curves, and patterns characterizing
learning. Developed to process and analyze visual data, they different digits, even in noisy and distorted instances.
comprise specialized kinds of neural networks that have a
number of adaptive layers. CNNs thus manage to automatically Many blocks of recent popularity of Deep Learning Net-
learn and adapt to detecting edges, textures, and other shapes in works have been triggered by the fact that they act effectively
different images. For the case of digit detection, training could to classify information in an image and detect an image’s
involve characterization and further classification of handwritten information with regard to certain object classes. On the other
or printed digits from images. It will start with a data collection side, DLNs have been executed through CNNs in several com-
and pre-processing task for a large dataset of digit images; for 2
instance, taking the very famous MNIST dataset. The model goes puter vision tasks like: object tracking, pose estimation, action,
through a training phase where it learns to identify patterns and recognition and Object Counting [6]. Examples range from
other different features that are unique in each digital number
3
yield predictions of fruits in vineyards to disease diagnosis,
via many layers of convolutional and pooling operations. Using such as Parkinson’s, using image data alone as in Heinrich
Python, the text file was read and the digit recognised using CNN. et al. The increase in these applications of DLN results from
The CNN model was trained on the MNIST dataset containing 2
50000 images of handwritten digits for an accuracy rate of 99.4% enhanced computing powers and the need for faster training
3
in the prediction of an unseen digit. Using Python, the predicted and inference by extensive RAM and GPU usage, respectively
digit was3 displayed. Since it makes use of the CNN for model [9].
2
training, the proposed algorithm will be fast in processing. The Beyond computer vision, CNNs have also been applied in
existing works in this field make use of an image classification speech recognition and natural language processing, where
tree to recognize digits.
Index Terms—Convolution Neural Network, digit detection,
they outperformed the preceding algorithms mainly based
Image Processing, MNIST dataset. on Hidden Markov models and Gaussian mixture models.
Requirements for the execution of CNN architecture differ
I. I NTRODUCTION a lot depending on the domain of application. Generally, it
9
aims at reducing inference times, while improving the times
Digit recognition is a fundamental problem in the field for prediction accuracy [8].
of computer vision and pattern recognition, which means to Currently, there is no standard guideline on how to ef-
recognize numerical digits from images automatically [3]. This fectively design CNN architectures for the solution of do-
technology finds very important applications in real life, like main specific problems, many of which return apparently
automated check processing and reading of postal addresses, random input-output outcomes, like manufacturing fault de-
digitization of handwritten notes. tection through image classification. To some extent, this
On the backbone of today’s digit identification is the Con- challenge arises due to inadequate insight into the nature and
volutional Neural Network, a deep learning model designed performance implications of CNN architectures [4]. Moreover,
primarily for the processing and analysis of visual data. in many cases, DLNs are considered a black-box model
CNNs are normally composed of multiple layers: convolu- that would make the result hard to predict, thus making
tional, pooling, and fully connected, amongst which students the benchmark results and evaluations very necessary. We
2
collaborate to learn hierarchical representations of the input focus on giving a state-of-the-art re-view of the evolution of
architectures of DLN for image classification in this context, simple patterns, including edges and textures, while deeper
which delivers best-practice guidelines [9]. layers enable the extraction of more complex structures, in-
2
Since it is the most prevalent type of network used for cluding shapes and objects [15]. The hierarchical learning that
2
this task, we particularly focus on CNNs. Our goals are to occurs in this process plays a key role in image recognition,
provide several evaluation results and metrics that outline where it helps to understand different levels of abstractions.
2
the dynamics of characteristics of CNN architectures, their By the very nature of convolution and pooling operations,
prediction performance, and computational requirements [13]. a priori translational invariance relative to the input image
We also compare the different views among them over time to is guaranteed in CNNs [4]. This means they are able to
underline the technological trends in the design of the CNNs. detect an object regardless of where it may be in the image,
2
We use this knowledge to formulate five guidelines to enhance which makes them useful during object detection and image
the methodical selection of CNN designs and architectures classification. Images have a spatial structure, where nearby
[10]. pixels will tend to be more related than distant ones. CNNs
Overall, problems of digit recognition constitute a window exploit this property through local connections and shared
to the solution of more complex problems in handwritten weights, allowing them to effectively capture and process
pattern recognition and remain one of the vivid research areas spatial hierarchies within images [7].
that continues to drive improvement at both academia and Image processing requires the extraction of meaningful
industry. features relating to edges, textures, and shapes. Convolutional
layers of CNNs extract these features automatically during
II. L ITERATURE
11 training; there is no need to engineer features manually [14].
A. Convolutional Neural Network Because CNNs are committed to weight sharing and local
CNN stands for Convolutional Neural Network, a Deep connectivity, they have fewer parameters compared to fully
Learning model designed particularly to work with data hav- connected networks, and their computational power is reduced.
ing grid topology, such as images [5]. Inherent strengths of This makes them more efficient for large images, reducing the
CNNs in recognizing patterns and structures within input data possibility of over fitting.
4
make them very effective when applied for tasks like image
6
classification, object detection, and segmentation. B. Deep Neural Network
4
These layers convolve the input with filters to detect a set
deep neural network is a type of Artificial Neural Network
of local patterns, such as edges, textures,and shapes. In doing
with multiple layers between input and output layers. Multiple
so, these layers reduce the spatial dimensions of the data
layers in DNNs empower them to model complex patterns in
and down sample it, reducing computational complexity and
data incorporating the learning of hierarchical representations
improving robustness to spatial variations. These layers are
[15].
identical to traditional neural network layers, wherein every
13 Every layer in a DNN is composed of neurons, which
neuron is connected to all neurons in the previous layer [2]. 16
are the basic processing units. The neurons in one layer
They are normally used at the end of a network to make final
are connected to the neurons in the next through weighted
predictions. Other activation functions like ReLU introduce
connections; these weights start the training and will aim to
non-linearity into the model to have it pick more complex
produce predictionswith minimal error.
patterns.

Fig. 2. Neural Network

Fig. 1. Overview of CNN Architecture. Activation functions induce non-linearity into the network
and thus allow it to learn more complex patterns. Common
18
Images are high-dimensional data. Even a simple 256 x 256 activation functions are the ReLU—rectified linear unit,the sig-
RGB image has 196,608 values or pixels. Traditional neural moid and tanh [2]. The traditional training of DNNs involves
networks struggle with handling input sizes this large. CNNs backpropagation—a means of propagating an error backwards
mitigate this problem by using convolutions which are local through the network as a means of updating the weights. It
to receptive fields, greatly reducing the number of parameters is normally combined with optimization algorithms such as
and computational load [14]. CNNs can automatically learn stochastic gradient descent [14]. DNNs are very effective in
to detect hierarchical features in images. Early layers capture applications like speech recognition, natural language process-
ing, and image recognition due to their capability of learning
intricate patterns and representations in data [5].

17
Fig. 4. Sample images from MNIST test dataset

Fig. 3. Overview of DNN Architecture.

Training of DNNs, however, is compute-intensive, calls for


large amounts of labeled data, and is model naive. Techniques
such as transfer learning have shown great potential in re-
ducing these challenges, whereby a model trained on some Fig. 5. Block diagram
other task is fine-tuned for a new task [10]. Notably, also
successfully applied in various fields including health care, for
example, to support diagnosis from data, autonomous systems, From the block diagram above, it is clear that the first step
where they enable tasks such as self-driving cars, are other involved is to load the training and testing dataset. After that,
endpoints of application. To avoid the over fitting of DNNs we build the model, which is the hand written digit classifier,
and improve their generalization ability on new data, most of where training dataset is used to teach the hand written digit
the cases will have them with regularization techniques. This classifier to recognize sign language. later, we will check
includes dropout and batch normalization [14]. results and will validate model using testing dataset. This is
1 the overview of the methodology we did, which will be further
C. MNIST Dataset
explained in detailed as we proceed.
The MNIST database (Modified National Institute of Stan-
dards and Technology database) is a large database of hand- A. Activation Functions
written digits that is commonly used for training various image 8
processing systems. The database is also widely used for 1) ReLU: ReLU is a kind of piece wise linear function
training and testing in the field of machine learning. It was that returns ‘0’ if the input is negative and returns ‘1’ if it
created by ”re-mixing” the samples from MNIST’s original is positive. It became standard activation function to various
datasets. The creators felt that since MNIST’s training dataset NN’s because it’s easier for training and gives better results.
was taken from American Census Bureau employees, while the To increase the image non-linearity, the rectifier function is
testing dataset was taken from American high school students, applied. Because images are generally non-linear, This is
it was not well-suited for machine learning experiments. something
Furthermore, the black and white images from MNIST were we want to accomplish. The rectifier breaks up any linearity
normalized to fit into a 28x28 pixel bounding box and anti- imposed on an image during the convolution operation even
aliased, which introduced grayscale levels. further to compensate for it. In an image like this, the rectifier
The MNIST database contains 60,000 training images and function removes all dark portions, and those with positive
10,000 testing images. Half of the training set and half of value remain grey and white colors. Once we re-adjust image,
the test were taken from MNIST’s training dataset, while the one can see colors to change abruptly. Progressive transfor-
other half of the training set and the other half of the test were mation has ended. That means the linearity is gone.
taken from MNIST’s testing dataset. The original creators of 2) Soft max Activation: It is a multi-dimensional variant of
the database keep a list of some of the methods tested on it. logistic function. It’s often employed as the last layer.
To work out relative likelihood, the Softmax initiation work
III. M ETHODOLOGY A ND T ECHNOLOGIES U SED is used. That is, the qualities of Z21, Z22, and Z23 define
As we know there is no readily available digit recognition the ultimate probability value. Like the sigmoid activation
model for recognizing digits, we made our digit recognition function, the SoftMax function returns the likelihood of each
model by utilizing CNN. The methodology procedure is shown class. The condition for the SoftMax actuation work is as per
in diagram below. the following.
TensorFlow has chose Keras as official high-level API, out
ezi of above frameworks. Keras is a type of TensorFlow module,
softmax(zi ) = P zj
je which may be utilized to fastly run deep learning. At the same
time, one can utilize Tensor flow Core API for creating custom
computations including computation graphs, tensors, sessions,
etc, allowing one all control on the application, permitting you
to implement yours views in least time.
D. Numpy
NumPy is one of the Python library which supports massive,
matrices, multi- dimensional arrays, including different high
level algebraic functions, for manipulation of them. It is a
python library for computing. It’s an open-source module.
NumPy is collaborative open-source project with a large
number of collaborators. It’s a tool for working with arrays
and matrices in general.
Python is slower as compared to Fortran and other languages
Fig. 6. Softmax classification
to perform looping. NumPy, which turns repetitive code into
The Z represents output layer neurons values. Then the compiled form, is used to solve this problem. If you want to
values will be normalized , later get converted to probabilities, work on data analysis or machine learning projects, you’ll need
which is done by dividing them via sum of exponential values. to know numpy. Because additional data analysis packages
10 (like as pandas) are built on top of numpy, and because the
In other words, sigmoid is a Softmax variation. Let’s analyze
an example of how the softmax works. The neural network scikit-learn package.
below is what we have. E. Pandas
Pandas is the data cleaning and a type of analysis ML tool.
B. Tensor flow
It provides features for data exploration, cleansing, transfor-
Rather than DL, TensorFlow was designed with large nu- mation, and visualization. It gives swift, expressive, flexible
merical computations in mind. TensorFlow showed to be data structures.
advantageous to DL development, Hence Google made it
5
open- source[6]. It is based on graphs that depict data flows
5
and have nodes and edges. TensorFlow code may be spread
over a cluster of computers using GPUs much more easily
since execution mechanism is in mode of graphs.

C. Keras
Keras is the high level DL API, used during building
NN’s. Keras built utilizing Python, which is used to build NN
implementation easier. It permits computation of various NN’s
in backend[6]. It is simple to comprehend. Because of this,
Keras is slower compared to other DL frameworks, however
Fig. 8. Machine learning method steps
it is also user-friendly. It permits one for swapping back- forth
among various backends.
F. Matplotlib
Matplotlib, a Python package for data visualization, op-
erates at a low level and offers a straightforward approach
reminiscent of MATLAB’s graphical representations. Built
on NumPy arrays, this library boasts a wide array of plots,
including line charts, bar charts, and histograms. While it
provides considerable flexibility, leveraging its capabilities
often requires writing additional code. Matplotlib is a 2003
open-source Python charting package. It’s a large library with
several charting functions that may be used in Python. It
includes a variety of graphs like Line graph, Bar graph, Scatter
graph, Histogram, so on, which permit us for visualizing
Fig. 7. Keras Architecture different data.
G. Pyplot picture or upload a completely new data sector using open
Pyplot is a module of Matplotlib. Matplotlib made with cv we can directly interact with the system, by using system
intention of easy to use like MATLAB, however has its local camera and get it done with the evaluation by checking
advantages of being open-source and free. It creates the predicted and actual results. So, what we did is we uploaded a
plotting region in figure, creates plots with specific lines in dataset and wrote the piece of code to verify the output. That
plotting area, and then decorates plots with labels [9]. Line is, to see if the expected and actual outputs match. Fortunately,
graph, Histogram graph, Scatter graph are just a few of the our model got enough training and from the above graphs
plotsavailable in Pyplot. earlier we can see how accurate our system is. Below is the
sample output.
H. Seaborn
V. C ONCLUSION
Seaborn is data visualisation programme in python on the
basis of Matplotlib, which includes a high level user-interface Compared with the other works, we were able to obtain a
that allows you to create visually stunning and practical high accuracy over 99.4%. From earlier graphs (Figure 9 and
statistics graphs. This article will show you how to use the 10), we can see we obtained an training accuracy of 99.4%
Seaborn library to generate simple plots. in 10 epochs model training case and an valid accuracy of
99.4%. This concludes that the model is training and rectifying
IV. R ESULTS AND D ISCUSSIONS its errors as the epochs getting increased and at some point
Following that, the accuracy and loss curves will be shown. only little errors are incurred. Which means, the more training
We are optimistic about our model’s accuracy levels and we provide to the model, and the wider the dataset, the lesser
4
accuracy/loss curves. The training accuracy and validation the errors and greater the accuracy. And since we are using
10
accuracy and their respective loss curves are shown below. convolution layer before the neural network, it is easier to
classify the data. The trained CNN model can be improved to
transmit sentences using a continuous stream of input of sign
language.
VI. F UTURE S COPE
We can develop a system which can process continuous
input flow of digits and convert them to their corresponding
number. The proposed digit recognition instead of displaying
digits, the system might be improved to recognize gestures
and facial expressions in addition to letters, and instead of
displaying letter labels, sentences could be displayed as a
more accurate translation of language. This improves read-
ability as well. It can be expanded to include translation of
digits to speech. That is, The trained CNN model can be
further developed to translate the written text output to voice
12
Fig. 9. Training and Validation accuracy translation. This model might be turned into a mobile app.
The programme recognizes footage of speaker using neural
networks and computer vision, and then utilizes sophisticated
algorithms to convert it to speech. Affordably priced and
always available services for any community
R EFERENCES
[1] Vishwanatha V, Ramachandra A C, Satya Dev Nalluri, sai Manoj Thota,
& Aishwarya Thota (2023). ”Hand Written Digit Recognition Using
CNN”
[2] A. Krizhevsky , I. Sutskever, & G. E. Hinton (2012). ImageNet Classi-
fication with Deep Convolutional Neural Networks. NIPS.
[3] D. C. Ciresan, U. Meier, L. M. Gambardella, & J. Schmidhuber (2010).
Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition.
arXiv preprint arXiv:1003.0358.
[4] P. Y. Simard, D. Steinkraus, & J. C. Platt (2003). Best Practices for
Convolutional Neural Networks Applied to Visual Document Analysis.
ICDAR.
[5] L. Wan, M. Zeiler, S. Zhang, Y. LeCun, & R. Fergus (2013). Regular-
ization of Neural Networks using DropConnect. ICML.
Fig. 10. Training and validation loss [6] 6) M. D. Zeiler, & R. Fergus (2014). Visualizing and Understanding
Convolutional Networks. ECCV.
[7] 7) I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, & V. Shet (2013).
Finally, we validated the model by checking a few outputs. Multi-digit Number Recognition from Street View Imagery using Deep
Here, we uploaded a testing dataset. We can either upload a Convolutional Neural Networks. arXiv preprint arXiv:1312.6082.
[8] H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Ben-gio,
2007. “An Empirical Evaluation of Deep Architectures on Problems
with Many Factors of Variation,” in Proceedings of the 24th International
Conference on Machine Learning, ACM, pp. 473–480.
[9] J. Cheng, P. Wang, G. Li, Q. Hu, and H. Lu, 2018. “Recent Advances
in Efficient Computation of Deep Convolutional Neural Networks,”
Frontiers of Information Technology and Electronic Engineering (19:1),
pp. 64–77.
[10] Y. LeCun, 1989. “Generalization and Network Design Strategies in
Connectionism in Perspective”, R. Pfeifer, Z. Schreter, F. Fogelman,
and L. Steels (eds.), Elsevier.
[11] D. Mishkin, N. Sergievskiy, and J. Matas, 2017. “Systematic Evaluation
of CNN Advances on the ImageNet,” Computer Vision and Image
Understanding (161),pp.11–19.
[12] S. Ji, W. Xu, M. Yang, K. Yu: 3D convolutional neural net-works for
human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1),
221–231 (2012)
[13] X. Cao, 2015. “A Practical Theory for Designing Very Deep Convolu-
tional Neural Networks,” Technical Report.
[14] K. He, X. Zhang, S. Ren, S. Sun, J. (2016). Deep residual learning for
image recognition. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (pp.770–778).
[15] I. Goodfellow, Y. Bengio, A. Courville, (2016). Deep Learning. MIT
Press.
[16] K. Simonyan,A. Zisserman, (2014). Very Deep Convolutional Networks
for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556.
Similarity Report ID: oid:3618:67456687

16% Overall Similarity


Top sources found in the following databases:
12% Internet database 12% Publications database
Crossref database Crossref Posted Content database
0% Submitted Works database

TOP SOURCES
The sources with the highest number of matches within the submission. Overlapping sources will not be
displayed.

engati.com
1 6%
Internet

aisel.aisnet.org
2 3%
Internet

MD Shahbaz Khan, Niharika, Priya Yadav, Rishabh Verma, Indu Sreede...


3 1%
Crossref

ebin.pub
4 <1%
Internet

Tajendra Singh, D.C. Jhariya, Mridu Sahu, Pankaj Dewangan, P.Y. Dhek...
5 <1%
Crossref

listens.online
6 <1%
Internet

D.R.V.A.Sharath Kumar, Narsaiah Domala, A Karthik, B Veena. "CMOS ...


7 <1%
Crossref

Surendra Patro, D.C. Jhariya, Mridu Sahu, Pankaj Dewangan, P.Y. Dhek...
8 <1%
Crossref

Sources overview
Similarity Report ID: oid:3618:67456687

econpapers.repec.org
9 <1%
Internet

Hisham El-Amir, Mahmoud Hamdy. "Deep Learning Pipeline", Springer ...


10 <1%
Crossref

nature.com
11 <1%
Internet

eetj.mtu.edu.iq
12 <1%
Internet

iris.unina.it
13 <1%
Internet

"Recent Advances in Soft Computing", Springer Science and Business ...


14 <1%
Crossref

"Data Engineering and Intelligent Computing", Springer Science and Bu...


15 <1%
Crossref

export.arxiv.org
16 <1%
Internet

digitalcommons.kennesaw.edu
17 <1%
Internet

skynettoday.com
18 <1%
Internet

Sources overview

You might also like