0% found this document useful (0 votes)
19 views

Optical Character Recognition Using Convolutional Neural Network[1][1]

Uploaded by

ATHARV RAJ
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Optical Character Recognition Using Convolutional Neural Network[1][1]

Uploaded by

ATHARV RAJ
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Optical Character Recognition Using

Convolutional Neural Network

Ankit Kumar Punit Mittal


Dep. Computer Science & Information Dep. Computer Science & Information
Technology Technology
Meerut Instiute of Engineering & Meerut Institute of Engineering &
Technology Technology
Meerut, India Meerut, India
[email protected] [email protected]

Abstract: OCR is a technology used to convert images of handwritten, typewritten, or printed text into a machine-
readable format. The primary objectives of OCR include enabling text editing, efficient indexing and searching, and
reducing storage requirements. The process typically begins with scanning the image, followed by segmentation to
divide the text into lines, words, and individual characters. The characters are then analyzed and translated into
corresponding character codes, such as ASCII. In this paper, we propose an OCR approach that utilizes a
segmentation algorithm to divide the image into lines, words, and characters. For character recognition, we employ
CNNs, which are known for their ability to capture spatial hierarchies in image data. Our experiments demonstrate
that the CNN-based approach outperforms traditional machine learning techniques, such as SVMs and ANNs, in
terms of accuracy and robustness. The results indicate the potential of CNNs in achieving superior character
recognition performance, thereby enhancing the efficiency and accuracy of OCR systems. Furthermore, we explore
various network architectures and hyper parameter tuning to optimize performance, including the use of dropout
layers and data augmentation techniques to mitigate overfitting. The proposed model demonstrates its adaptability
across various handwriting styles and fonts, showing promise for real-world applications in document digitization
and automated data extraction. Overall, our approach highlights the transformative potential of deep learning in
advancing OCR technology for practical use cases.

Keywords: Optical Character Recognition, Artificial Neural Network, Support Vector Machines, Convolutional Neural
Network, Segmentation, Image Characters

I. INTRODUCTION processing, storage, and further analysis. With


the development of advanced OCR software,
OCR is a technology that involves new opportunities are emerging for automating
the electronic or mechanical conversion of document-based tasks and creating innovative
handwritten text into machine-encoded text or solutions for managing information. OCR is
images of typed. This process is widely used to particularly important in the field of
translate scanned documents, photos, or text handwritten character recognition, where it is
from sources such as magazines, journals, and divided into two categories: offline and online
receipts into editable and searchable formats. recognition. Offline handwritten character
OCR devices can read images from both typed recognition is more challenging due to the
and handwritten text, and can also capture variation in handwriting styles between
images through a webcam. The accuracy of individuals.
OCR systems typically reaches around 90% for This method involves the translation
typed characters, while handwritten text is of handwritten text into machine-readable
recognized with an accuracy of approximately codes, which can then be processed by a
40-50%. computer. Despite the challenges, OCR has
OCR is particularly valuable for made significant advancements through its
digitizing documents like business cards, applications in pattern recognition, artificial
passports, bank receipts, and other relevant intelligence, machine vision, and signal
papers, making them accessible for computer processing. By accurately converting scanned
or photographed text into digital format, OCR In study [3] focuses on intelligent
plays a crucial role in enhancing document optical character recognition for DFB
management, improving accessibility, and (Distributed Feedback) chip identification
supporting various business applications. As using Deep Convolutional Neural Networks
OCR technology continues to evolve, it (DCNN). It shows how DCNN can be applied
promises to further streamline the conversion to recognize characters from complex images
of physical documents into valuable, machine- such as those found in semiconductor and
readable data. micro-optical systems. This work [4] explores
OCR is a technique used to the application of deep neural networks
automatically identify and convert optical (DNNs) for both optical character recognition
characters into machine-readable text. One of and face recognition. It highlights how DNNs
the methods employed for OCR is CNNs. In are capable of learning hierarchical features for
the context of image classification, CNNs have accurate recognition in diverse settings.
surpassed traditional machine learning The study combines deep neural
techniques like SVMs and KNN in terms of networks (DNN), Long Short-Term Memory
accuracy and precision, despite requiring more (LSTM), and Tesseract for real-time license
time for training. CNNs are especially effective plate detection and OCR. It demonstrates the
in computer vision tasks due to their ability to synergy of these techniques to handle noisy
generalize at a human level and avoid and distorted images typically found in real-
overfitting through pooling layers. CNNs, a world license plate recognition. [5]
deep learning architecture inspired by the Convolutional Neural Networks
human visual system, are widely used in (CNNs) for OCR in the banking sector,
various fields such as computer vision and particularly for recognizing handwritten and
NLP. In NLP, CNNs have found applications printed text on banking forms and documents.
in speech recognition and text classification, The CNN model provides effective feature
showcasing their versatility in processing both extraction and recognition capabilities in a
visual and textual data. high-demand industry. [6]. This work employs
OCR is used in various applications 2D Convolutional Neural Networks (2D CNN)
such as document digitization, scanning books, to recognize text on bank cheques. The authors
automated data entry, and number plate showcase the use of 2D CNNs for capturing
recognition. OCR helps in converting the complex visual patterns from cheque images,
physical text (whether handwritten, printed, or enhancing OCR accuracy in a financial
scanned) into an editable, searchable, and context.
structured format for further processing. This paper [7] focuses on extracting
information from handwritten text using OCR
II. LITERATURE SURVEY integrated with neural networks. It emphasizes
the need for neural networks to address the
This study focuses on applying challenges posed by the variation in
Convolutional Neural Networks (CNNs) for handwriting styles, achieving higher accuracy
optical character recognition, leveraging in the recognition of handwritten data.
CNN’s ability to automatically learn spatial
hierarchies of features from raw image data,
enhancing OCR performance by automatically
extracting relevant features [1]. This paper [2]
presents an improved OCR method using Deep
Neural Networks (DNNs). The authors explore
how DNNs can enhance the recognition
accuracy of optical characters by modeling
complex patterns in handwritten or printed text,
surpassing traditional OCR techniques.
The paper addresses OCR for Literature survey table:
Sanskrit text using Convolutional Neural
Networks (CNNs). It demonstrates the Rf. Technolo Advantages Limitations
efficiency of CNNs in extracting relevant No gy Used
features for recognizing complex scripts such
as Sanskrit, where traditional OCR systems
[1] CNN Automatic feature Struggles
struggle due to the intricacies of the script [3] extraction from with noisy
raw image data, images and
improves OCR varying significant potential for real-world applications
performance. handwriting in document digitization and automated data
styles extraction.
[2] DNN Enhanced Requires
recognition large labelled
accuracy for datasets,
complex patterns computational
in handwritten or ly expensive.
printed text.
[3] CNN Efficient for May
complex scripts misrecognize
like Sanskrit, cursive or
capturing relevant older forms of
features for Sanskrit
recognition. script.
[4] DCNN Applied to Requires
i. Image Pre-processing:
complex images significant
like DFB chips, computational
with high resources, The first step involves pre-processing the input image,
recognition may not which includes resizing the image to a consistent size,
accuracy. perform well converting the image to gray scale, and applying
on low-
techniques like thresholding or binarization to enhance
quality
images. the text and remove any noise. This improves the overall
[5] DNN Capable of Difficulties in image quality, making the character recognition process
recognizing both noisy more accurate.
characters and environments
faces, with and
hierarchical challenging
feature learning. lighting ii. Segmentation:
conditions.
[6] (DNN), Effective for real- May not The image is then segmented into smaller regions
LSTM, time license plate perform well containing individual characters or words. This step is
Tesseract detection, robust with heavily critical for isolating each character to be processed by the
to noisy and occluded or
neural network, ensuring that the CNN focuses only on
distorted images. distorted
license plates. the relevant parts of the image.
[7] CNN Improves OCR for Performance
banking sector can degrade iii. Feature Extraction Using CNN
documents, with poor
handles image quality
handwritten and or stylized A Convolutional Neural Network (CNN) is employed to
printed text handwriting. extract hierarchical features from the image. The CNN
efficiently. consists of multiple layers, including convolutional
layers, pooling layers, and fully connected layers, which
work together to automatically learn relevant features
from the raw image data. The network captures spatial
III. USED METHODOLOGY hierarchies and patterns in the text.

The proposed method in this paper iv. Training the CNN Model:
leverages CNNs to enhance optical character
recognition (OCR) performance by The CNN is trained on a labeled dataset of characters or
automatically learning spatial hierarchies from words. During training, the network adjusts its weights to
image data. The process begins with minimize the error between its predictions and the actual
segmenting the image into lines, words, and labels. The model is trained using a back propagation
characters, followed by CNN-based character algorithm and optimized using gradient descent.
recognition. The approach outperforms
traditional methods like SVMs and ANNs in v. Character Recognition:
terms of accuracy and robustness. Through
network architecture optimization and Once the CNN has been trained, it is used to recognize
techniques like dropout and data augmentation, characters from new, unseen images. The network
the method demonstrates adaptability across outputs the predicted character or text based on the
various handwriting styles and fonts, showing learned features. The recognition results are then
compared against the ground truth to evaluate the signs, billboards, and product packaging. This
accuracy. application is critical for technologies like
augmented reality (AR) and autonomous
vi. Post-processing: vehicles, as it enables real-time text extraction
for navigation, content indexing, and more.
After character recognition, additional post-processing Tools for OCR Using CNNs:
techniques, such as spell-checking or contextual analysis, 1. Deep Learning Frameworks:
may be applied to refine the output, especially in cases Several deep learning platforms and libraries provide the
where the network might have made errors due to noisy necessary tools to build and deploy CNN-based OCR
data or complex fonts. models. Some of the most popular ones include:
o TensorFlow
o PyTorch
vii. Evaluation and Performance Metrics:
o Keras
2. OCR Frameworks:
The performance of the proposed OCR system is
evaluated using standard metrics such as accuracy,
o Tesseract OCR
precision, recall, and F1-score. The authors likely o EasyOCR
compare the CNN-based approach with traditional o OCRopus
machine learning methods like Support Vector Machines 3. Image Processing Libraries:
(SVMs) to demonstrate improvements in accuracy and o OpenCV
robustness. o Pillow
4. Pre-trained Models and Transfer Learning
IV. APPLICATION AND TOOLS
V. CONCLUSION AND FUTURE
OCR using CNNs has found wide- SCOPE
ranging applications across various industries,
automating the extraction of text from images In this paper, we have explored the
and documents. One key application is application of CNNs for OCR, a technology
document digitization, where CNN-based OCR that converts images of handwritten,
systems convert printed or handwritten text typewritten, or printed text into machine-
from books, articles, and historical records into readable formats. The OCR process involves
editable, searchable formats. This is especially several key stages, including image scanning,
useful in sectors like publishing, legal, and segmentation into lines, words, and characters,
education, where large volumes of physical and ultimately, character recognition. By
documents need to be preserved and accessed leveraging CNNs, which excel at learning
electronically. Another major application is spatial hierarchies and detecting complex
handwritten text recognition, which is more patterns in image data, we have demonstrated
challenging than printed text due to variability that this deep learning approach significantly
in writing styles and inconsistencies. CNNs outperforms traditional machine learning
have demonstrated exceptional performance in algorithms such as SVMs and ANNs in terms
recognizing handwritten forms, letters, and of both accuracy and robustness. Our
notes, which is beneficial in industries such as experiments highlight the potential of CNN-
insurance, healthcare, and government based OCR systems to improve character
documentation. recognition, enabling more efficient text
In addition to these, CNN-based extraction from images. This approach not only
OCR is widely used in license plate recognition enhances the accuracy of OCR applications but
(LPR) for vehicle identification in systems like also paves the way for their use in real-time,
parking management and toll collection. large-scale implementations, such as digitizing
Banking and finance also leverage CNN- handwritten documents, license plate
powered OCR systems to automate the recognition, and automated data entry systems.
processing of checks, invoices, and receipts, Future scopes are to Improving
increasing accuracy and efficiency. In the recognition accuracy across diverse and
healthcare sector, CNNs help digitize complex scripts, including cursive and
handwritten prescriptions, medical records, and multilingual texts. With advancements in
patient charts, improving data management and transfer learning and pre-trained models, CNN-
access for medical professionals. Finally, based OCR systems can be fine-tuned for
CNN-based OCR is increasingly used for text specific domains such as medical, legal, or
recognition in natural images, such as street historical document analysis. Additionally,
integrating CNNs with other deep learning Volume 2, pp. 589-596. Springer Singapore,
techniques, such as Recurrent Neural Networks 2019.
(RNNs) or Transformer models, may enhance 9. Mishra, Piyush, Pratik Pai, Mihir Patel, and
performance in text recognition from noisy, Reena Sonkusare. "Extraction of information
distorted, or low-resolution images. from handwriting using optical character
REFERENCES recognition and neural networks." In 2020 4th
International Conference on Electronics,
1. Seetharaman, R., Jyotsna Rajaraman, Judith Communication and Aerospace Technology
Alex, and Shreya Nigam. "Optical Character (ICECA), pp. 1328-1333. IEEE, 2020.
Recognition using Convolutional Neural 10. Gayathri, S., and R. S. Mohana. "Optical
Network." In Advances in Mechanical and Character Recognition in Banking Sectors
Industrial Engineering, pp. 251-256. CRC Using Convolutional Neural Network."
Press, 2022. In 2019 Third International conference on I-
2. Wei, Tan Chiang, U. U. Sheikh, and Ab Al- SMAC (IoT in Social, Mobile, Analytics and
Hadi Ab Rahman. "Improved optical character Cloud)(I-SMAC), pp. 753-756. IEEE, 2019.
recognition with deep neural network." In 2018
IEEE 14th international colloquium on signal
processing & its applications (CSPA), pp. 245-
249. IEEE, 2018.
3. Avadesh, Meduri, and Navneet Goyal. "Optical
character recognition for Sanskrit using
convolution neural networks." In 2018 13th
IAPR international workshop on document
analysis systems (DAS), pp. 447-452. IEEE,
2018.
4. Wang, Xudong, Yebin Li, Juanxiu Liu, Jing
Zhang, Xiaohui Du, Lin Liu, and Yong Liu.
"Intelligent micron optical character
recognition of DFB chip using deep
convolutional neural network." IEEE
Transactions on Instrumentation and
Measurement 71 (2022): 1-9.
5. Younis, Khaled S., and Abdullah A. Alkhateeb.
"A new implementation of deep neural
networks for optical character recognition and
face recognition." Proceedings of the new
trends in information technology (2017): 157-
162.
6. Singh, Jaskirat, and Bharat Bhushan. "Real
time Indian license plate detection using deep
neural networks and optical character
recognition using LSTM tesseract." In 2019
international conference on computing,
communication, and intelligent systems
(ICCCIS), pp. 347-352. IEEE, 2019.
7. Gayathri, S., and R. S. Mohana. "Optical
Character Recognition in Banking Sectors
Using Convolutional Neural Network."
In 2019 Third International conference on I-
SMAC (IoT in Social, Mobile, Analytics and
Cloud)(I-SMAC), pp. 753-756. IEEE, 2019.
8. Srivastava, Shriansh, J. Priyadarshini, Sachin
Gopal, Sanchay Gupta, and Har Shobhit Dayal.
"Optical character recognition on bank cheques
using 2D convolution neural network."
In Applications of Artificial Intelligence
Techniques in Engineering: SIGMA 2018,

You might also like