Optical Character Recognition Using Convolutional Neural Network[1][1]
Optical Character Recognition Using Convolutional Neural Network[1][1]
Abstract: OCR is a technology used to convert images of handwritten, typewritten, or printed text into a machine-
readable format. The primary objectives of OCR include enabling text editing, efficient indexing and searching, and
reducing storage requirements. The process typically begins with scanning the image, followed by segmentation to
divide the text into lines, words, and individual characters. The characters are then analyzed and translated into
corresponding character codes, such as ASCII. In this paper, we propose an OCR approach that utilizes a
segmentation algorithm to divide the image into lines, words, and characters. For character recognition, we employ
CNNs, which are known for their ability to capture spatial hierarchies in image data. Our experiments demonstrate
that the CNN-based approach outperforms traditional machine learning techniques, such as SVMs and ANNs, in
terms of accuracy and robustness. The results indicate the potential of CNNs in achieving superior character
recognition performance, thereby enhancing the efficiency and accuracy of OCR systems. Furthermore, we explore
various network architectures and hyper parameter tuning to optimize performance, including the use of dropout
layers and data augmentation techniques to mitigate overfitting. The proposed model demonstrates its adaptability
across various handwriting styles and fonts, showing promise for real-world applications in document digitization
and automated data extraction. Overall, our approach highlights the transformative potential of deep learning in
advancing OCR technology for practical use cases.
Keywords: Optical Character Recognition, Artificial Neural Network, Support Vector Machines, Convolutional Neural
Network, Segmentation, Image Characters
The proposed method in this paper iv. Training the CNN Model:
leverages CNNs to enhance optical character
recognition (OCR) performance by The CNN is trained on a labeled dataset of characters or
automatically learning spatial hierarchies from words. During training, the network adjusts its weights to
image data. The process begins with minimize the error between its predictions and the actual
segmenting the image into lines, words, and labels. The model is trained using a back propagation
characters, followed by CNN-based character algorithm and optimized using gradient descent.
recognition. The approach outperforms
traditional methods like SVMs and ANNs in v. Character Recognition:
terms of accuracy and robustness. Through
network architecture optimization and Once the CNN has been trained, it is used to recognize
techniques like dropout and data augmentation, characters from new, unseen images. The network
the method demonstrates adaptability across outputs the predicted character or text based on the
various handwriting styles and fonts, showing learned features. The recognition results are then
compared against the ground truth to evaluate the signs, billboards, and product packaging. This
accuracy. application is critical for technologies like
augmented reality (AR) and autonomous
vi. Post-processing: vehicles, as it enables real-time text extraction
for navigation, content indexing, and more.
After character recognition, additional post-processing Tools for OCR Using CNNs:
techniques, such as spell-checking or contextual analysis, 1. Deep Learning Frameworks:
may be applied to refine the output, especially in cases Several deep learning platforms and libraries provide the
where the network might have made errors due to noisy necessary tools to build and deploy CNN-based OCR
data or complex fonts. models. Some of the most popular ones include:
o TensorFlow
o PyTorch
vii. Evaluation and Performance Metrics:
o Keras
2. OCR Frameworks:
The performance of the proposed OCR system is
evaluated using standard metrics such as accuracy,
o Tesseract OCR
precision, recall, and F1-score. The authors likely o EasyOCR
compare the CNN-based approach with traditional o OCRopus
machine learning methods like Support Vector Machines 3. Image Processing Libraries:
(SVMs) to demonstrate improvements in accuracy and o OpenCV
robustness. o Pillow
4. Pre-trained Models and Transfer Learning
IV. APPLICATION AND TOOLS
V. CONCLUSION AND FUTURE
OCR using CNNs has found wide- SCOPE
ranging applications across various industries,
automating the extraction of text from images In this paper, we have explored the
and documents. One key application is application of CNNs for OCR, a technology
document digitization, where CNN-based OCR that converts images of handwritten,
systems convert printed or handwritten text typewritten, or printed text into machine-
from books, articles, and historical records into readable formats. The OCR process involves
editable, searchable formats. This is especially several key stages, including image scanning,
useful in sectors like publishing, legal, and segmentation into lines, words, and characters,
education, where large volumes of physical and ultimately, character recognition. By
documents need to be preserved and accessed leveraging CNNs, which excel at learning
electronically. Another major application is spatial hierarchies and detecting complex
handwritten text recognition, which is more patterns in image data, we have demonstrated
challenging than printed text due to variability that this deep learning approach significantly
in writing styles and inconsistencies. CNNs outperforms traditional machine learning
have demonstrated exceptional performance in algorithms such as SVMs and ANNs in terms
recognizing handwritten forms, letters, and of both accuracy and robustness. Our
notes, which is beneficial in industries such as experiments highlight the potential of CNN-
insurance, healthcare, and government based OCR systems to improve character
documentation. recognition, enabling more efficient text
In addition to these, CNN-based extraction from images. This approach not only
OCR is widely used in license plate recognition enhances the accuracy of OCR applications but
(LPR) for vehicle identification in systems like also paves the way for their use in real-time,
parking management and toll collection. large-scale implementations, such as digitizing
Banking and finance also leverage CNN- handwritten documents, license plate
powered OCR systems to automate the recognition, and automated data entry systems.
processing of checks, invoices, and receipts, Future scopes are to Improving
increasing accuracy and efficiency. In the recognition accuracy across diverse and
healthcare sector, CNNs help digitize complex scripts, including cursive and
handwritten prescriptions, medical records, and multilingual texts. With advancements in
patient charts, improving data management and transfer learning and pre-trained models, CNN-
access for medical professionals. Finally, based OCR systems can be fine-tuned for
CNN-based OCR is increasingly used for text specific domains such as medical, legal, or
recognition in natural images, such as street historical document analysis. Additionally,
integrating CNNs with other deep learning Volume 2, pp. 589-596. Springer Singapore,
techniques, such as Recurrent Neural Networks 2019.
(RNNs) or Transformer models, may enhance 9. Mishra, Piyush, Pratik Pai, Mihir Patel, and
performance in text recognition from noisy, Reena Sonkusare. "Extraction of information
distorted, or low-resolution images. from handwriting using optical character
REFERENCES recognition and neural networks." In 2020 4th
International Conference on Electronics,
1. Seetharaman, R., Jyotsna Rajaraman, Judith Communication and Aerospace Technology
Alex, and Shreya Nigam. "Optical Character (ICECA), pp. 1328-1333. IEEE, 2020.
Recognition using Convolutional Neural 10. Gayathri, S., and R. S. Mohana. "Optical
Network." In Advances in Mechanical and Character Recognition in Banking Sectors
Industrial Engineering, pp. 251-256. CRC Using Convolutional Neural Network."
Press, 2022. In 2019 Third International conference on I-
2. Wei, Tan Chiang, U. U. Sheikh, and Ab Al- SMAC (IoT in Social, Mobile, Analytics and
Hadi Ab Rahman. "Improved optical character Cloud)(I-SMAC), pp. 753-756. IEEE, 2019.
recognition with deep neural network." In 2018
IEEE 14th international colloquium on signal
processing & its applications (CSPA), pp. 245-
249. IEEE, 2018.
3. Avadesh, Meduri, and Navneet Goyal. "Optical
character recognition for Sanskrit using
convolution neural networks." In 2018 13th
IAPR international workshop on document
analysis systems (DAS), pp. 447-452. IEEE,
2018.
4. Wang, Xudong, Yebin Li, Juanxiu Liu, Jing
Zhang, Xiaohui Du, Lin Liu, and Yong Liu.
"Intelligent micron optical character
recognition of DFB chip using deep
convolutional neural network." IEEE
Transactions on Instrumentation and
Measurement 71 (2022): 1-9.
5. Younis, Khaled S., and Abdullah A. Alkhateeb.
"A new implementation of deep neural
networks for optical character recognition and
face recognition." Proceedings of the new
trends in information technology (2017): 157-
162.
6. Singh, Jaskirat, and Bharat Bhushan. "Real
time Indian license plate detection using deep
neural networks and optical character
recognition using LSTM tesseract." In 2019
international conference on computing,
communication, and intelligent systems
(ICCCIS), pp. 347-352. IEEE, 2019.
7. Gayathri, S., and R. S. Mohana. "Optical
Character Recognition in Banking Sectors
Using Convolutional Neural Network."
In 2019 Third International conference on I-
SMAC (IoT in Social, Mobile, Analytics and
Cloud)(I-SMAC), pp. 753-756. IEEE, 2019.
8. Srivastava, Shriansh, J. Priyadarshini, Sachin
Gopal, Sanchay Gupta, and Har Shobhit Dayal.
"Optical character recognition on bank cheques
using 2D convolution neural network."
In Applications of Artificial Intelligence
Techniques in Engineering: SIGMA 2018,