0% found this document useful (0 votes)
27 views17 pages

Math El

The document discusses a system that uses optical character recognition and text-to-speech conversion to allow visually impaired individuals to access printed text by listening to it. The system works by using a Raspberry Pi camera to capture images of text, preprocessing the images, applying OCR to recognize characters, and then converting the recognized text to speech for audio output.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views17 pages

Math El

The document discusses a system that uses optical character recognition and text-to-speech conversion to allow visually impaired individuals to access printed text by listening to it. The system works by using a Raspberry Pi camera to capture images of text, preprocessing the images, applying OCR to recognize characters, and then converting the recognized text to speech for audio output.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

MATHEMATICS

EXPERIENTIAL LEARNING

Real-Time Text-to-Speech Glasses

TEAM INTRODUCTION

Shreya Chakote 1RV22CS189


Vanshika Khandelwal 1RV22CS224
Hanisha R 1RV22CV031
CONTENTS
1. INTRODUCTION

2. LITERATURE SURVEY

3. METHODOLOGY

4. IMPLEMENTATION

5. CONCLUSION
INTRODUCTION

In our mission to empower individuals with visual


impairments, we introduce a groundbreaking smart
text-to-speech model. With over 285 million people
worldwide facing visual challenges, including 45 million who
are blind, the need for accessible printed text is urgent. Our
solution utilizes Optical Character Recognition (OCR), a
technology that converts printed text into spoken words.
This process relies heavily on mathematics, from
simplifying images to recognizing characters. By harnessing
mathematical principles, we aim to provide seamless
access to printed content, enhancing independence and
inclusion for individuals with visual impairments.
LITERATURE SURVEY
AUTHOR PAPER TITLE PUBLICATION SUMMARY

1Nayana G H, Smart Reading International Journal This paper presents a


text-to-speech conversion
2Sowmya.N, Glasses: Conversion of Creative Research system implemented on a
3Yaduguri Sravani, of Image Text into Thoughts (IJCRT) Raspberry Pi board, offering a
compact and portable solution
4Beulah James Speech suitable for assisting visually
impaired individuals. The
system takes input from an
image captured by the Pi
camera connected to the CSI
port of the Raspberry Pi. The
captured image undergoes
preprocessing to eliminate
noise and is then converted
into a binary image. Text
extraction techniques are
applied to the binary image,
followed by the generation of
clear and accurate audio
output. The system's main
advantage lies in its
compactness and portability,
making it an efficient tool for
supporting visually impaired
individuals.
LITERATURE SURVEY
AUTHOR PAPER TITLE PUBLICATION SUMMARY
This paper introduces a smart
Jinsoo Choo Smart Glass System Department of glass system tailored for
Dr. Mukhriddin Using Deep Learning Computer individuals with blindness or
visual impairments (BVI),
Mukhiddinov for the Blind and Engineering, Gachon leveraging computer vision and
Visually Impaired University, deep learning techniques. The
system comprises object
Sujeong-gu, detection, salient object
Seongnam-si 13120, extraction, and text recognition
models, all aimed at enhancing
Korea, mdpi the user's perception of their
surroundings. Notably, the
system operates fully
automatically and is hosted on an
artificial intelligence server to
ensure real-time performance
and overcome energy constraints
associated with embedded
systems. By extending traditional
smart glass systems with deep
learning models and
incorporating salient object
extraction and text recognition
functionalities, the proposed
system enables users to navigate
and interact with their
environment even in low-light
conditions.
LITERATURE SURVEY
AUTHOR PAPER TITLE PUBLICATION SUMMARY

Yiyi liu A Convolutional Guangdong Provincial This paper introduces a novel


composite network model
Yuxin wang Recurrent Key Laboratory of structure that enhances scene
Hongjain shi Interdisciplinary text recognition by combining
Neural-Network-Base CRNN with other techniques
Research and such as text direction
d Machine Learning Application for Data classification, DBNet, and
Retinex algorithm. The model
for Scene Text Science, Beijing effectively segments and
Normal recognizes text in various
Recognition backgrounds and orientations by
Application University—Hong applying affine transformation,
Kong Baptist text direction classification, and
clarity evaluation. Experimental
University United results demonstrate that the
International College, proposed model surpasses the
limitations of CRNN in complex
mdpi and multi-oriented text scenes,
achieving higher accuracy and
broader application scope.
Future research will focus on
compressing the model using
knowledge distillation, adding
more evaluation methods, and
deploying the model on mobile
platforms.
METHODOLOGY
IMPLEMENTATION
The working of the proposed system for text recognition and speech synthesis involves several steps:

● Image Capture: The system captures images using a Raspberry Pi camera connected to the CSI
port. The camera is moved over printed text to capture clear and high-quality images.
● Pre-processing: The captured images undergo pre-processing to enhance clarity and remove
noise. This includes thresholding, where grayscale images are converted to binary images based
on a specified threshold, and blurring to eliminate noise using techniques like median filtering.
● Optical Character Recognition (OCR): The pre-processed images are then passed through an
OCR engine, in this case, Tesseract OCR. Tesseract analyzes the images, detects individual
characters, and segments them into words. It employs x-height normalization to differentiate
between capital and small text.
● Text-to-Speech Conversion: The recognized text is then converted into speech using eSpeak, an
open-source software for speech synthesis. This process involves text analysis, phonetic
analysis, prosodic analysis, and speech production. The synthesized speech output is then ready
for playback.
● Speech Output: Finally, the synthesized speech output is played through the audio jack of the
Raspberry Pi board, allowing users to hear the converted text through headphones.
IMPLEMENTATION
OCR Module:

Image Processing: OCR often begins with image preprocessing, which includes
techniques like thresholding, blurring, and edge detection. These techniques involve
mathematical operations such as convolution, averaging, and gradient calculation.
IMPLEMENTATION
OCR Module:

Feature Extraction: In OCR, characters need to be distinguished from the background and other
objects in the image. Mathematical algorithms, such as histogram analysis, connected component
analysis, and contour detection, are used to extract relevant features of characters, such as shape,
size, and orientation.
IMPLEMENTATION
OCR Module:

Pattern Recognition: Once features are extracted, OCR systems use


mathematical pattern recognition algorithms to match these features with known
patterns of characters. These algorithms can include template matching,
statistical pattern recognition methods like Hidden Markov Models (HMMs), or
machine learning techniques such as Support Vector Machines (SVMs) or neural
networks.
IMPLEMENTATION
OCR Module:

Classification and Decision Making: In OCR, classification


algorithms are used to determine the identity of characters based on
their extracted features. This involves mathematical calculations to
compare feature vectors and make decisions about which characters
best match the observed patterns.
IMPLEMENTATION
OCR Module:

Error Correction:
OCR systems often incorporate mathematical algorithms for error correction, which may
involve techniques like error detection and correction codes, probabilistic models, or
contextual analysis to improve accuracy.

Hamming Codes: These codes add parity bits to the data to detect and correct single-bit
errors. The parity bits are calculated based on specific bit positions in the data.

Reed-Solomon Codes: These codes are used for correcting multiple errors in data,
commonly employed in barcode and QR code scanning applications.They work by adding
redundancy to the data, enabling the correction of errors even in the presence of a significant
number of corrupted bits.
IMPLEMENTATION
Image: Camera:
IMPLEMENTATION
Text and audio:
CONCLUSION
In conclusion, the project presents an innovative solution leveraging
Raspberry Pi technology for real-time text-to-speech conversion, aimed at
enhancing accessibility for visually impaired individuals. Through the
integration of computer vision, deep learning, and text synthesis techniques,
the system offers a compact and portable means to process text input and
generate clear and accurate speech output. Its efficient processing
capabilities and real-time performance ensure immediate access to
information from printed text, thus empowering users with greater
independence and inclusion in daily activities. Furthermore, the project's
potential for future development and expansion, such as knowledge
distillation and mobile deployment, signifies its contribution to advancing
assistive technology and improving the quality of life for individuals with
visual impairments.
THANKYOU!

You might also like