0% found this document useful (0 votes)
14 views17 pages

Presentation 4

The document outlines the development of an Optical Text-to-Speech (TTS) converter that utilizes Optical Character Recognition (OCR) to transform printed or handwritten text into audible speech, aimed at assisting individuals with reading disabilities and language barriers. The proposed system integrates a Raspberry Pi, camera, and speaker, allowing users to capture images and convert the extracted text into speech with a user-friendly interface. The project demonstrates high accuracy in text recognition and translation across multiple languages, while also suggesting future enhancements for improved functionality and accessibility.

Uploaded by

chethanm9945
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views17 pages

Presentation 4

The document outlines the development of an Optical Text-to-Speech (TTS) converter that utilizes Optical Character Recognition (OCR) to transform printed or handwritten text into audible speech, aimed at assisting individuals with reading disabilities and language barriers. The proposed system integrates a Raspberry Pi, camera, and speaker, allowing users to capture images and convert the extracted text into speech with a user-friendly interface. The project demonstrates high accuracy in text recognition and translation across multiple languages, while also suggesting future enhancements for improved functionality and accessibility.

Uploaded by

chethanm9945
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

B.M.

S COLLEGE OF ENGINEERING
Bull Temple Road, Basavanagudi, Bangalore - 560 019,

OPTICAL TEXT TO
SPEECH CONVERTER
USING OCR AND TTS

Guided By :-
DR. LALAITHA.S
INTRODUCTION

• The Optical text-to-speech (TTS) converter is a transformative assistive technology


designed to empower individuals by enabling them to access textual information in
images and translate to different languages.

• Languages are the oldest way of communication between human beings whether
they are in spoken or written forms.

• Therefore, we have started to digitize these images, extract and interpret the data by
using specific techniques, and then perform text-to-speech synthesis (TTS).

• It is done in order to read the information aloud for the benefit and ease of the user.
Text extraction and TTS can be utilized together to help people with reading
disabilities

• This project has represented the innovative idea as well as a low cost technique that
is used to hear the contents of the text image without reading them.
LITERATURE SURVEY

S TITLE IEE LIMITATION


L

N
O
PROBLEM DEFINITION

• The problem is to translate the language for the individuals who have language
barrier.

• Even for travelers or tourists who travel to different states for vacation or business
can have language barriers.

• The individuals have reading disabilities.

• The lack of accessible, efficient, and natural-sounding TTS solutions limits the
usability of systems for visually impaired individuals, language learners, and smart
device users.
PROPOSED SOLUTION

• An Optical Image Text-to-Speech (OITS) translator is a system that


automatically converts printed or handwritten text into speech.

• This system integrates Optical Character Recognition (OCR) to extract text


from images (such as photos of documents or books) and Text-to-Speech (TTS)
synthesis to vocalize the extracted text.

• Text-to-Speech Conversion: The extracted text is converted into speech using


the Festival Text-to-Speech (TTS) engine. The system can read the text aloud in
realtime with a clear and natural voice.

• User Interaction: The system is designed for ease of use, requiring the user to
simply press a single button to initiate the entire process. The button triggers
the image capture, OCR, and text-to-speech conversion automatically.
HARDWARE & SOFTWARE REQUIREMENTS AND ESTIMATED
COSTS

HARDWARE

 Raspberry Pi 3 ₹3645
 Speaker ₹249
 Camera (OV5647) ₹200
 Cables and Connectors ₹100
 Adapter : ₹199
 Pvc box: ₹350
 Battery : ₹300
 Switch : ₹50

SOFTWARE

 Python 3 compiler
 Programming Language: Python
METHODOLOGY &
IMPLEMENTATION
Block Diagram
1.Raspberry Pi 3B+
A single-board computer used as the core processing unit.
Features:
 Quad-core 64-bit processor.
 1 GB RAM.
 Multiple GPIO pins for interfacing with other devices.
 3.5mm audio jack for sound output.

2. Push Button
• Acts as an input device to trigger image capture or processing.
• Connected to the GPIO pins of the Raspberry Pi.
• Enables user interaction with the system.
2. Webcam
• Captures images as input for the project.
• Connected to the Raspberry Pi via a USB port.
• The images are processed by software running on the Raspberry Pi.

4. Speakers
• Outputs the speech generated from the image processing result.
• Connected to the Raspberry Pi through the 3.5mm audio jack or USB (if using a USB
speaker).
5. Power Bank
• Serves as the power source for the Raspberry Pi and its peripherals.
• Connected via the Raspberry Pi's power supply port (micro-USB or USB-C
depending on the model).

Connections in the Diagram:


• Power Supply: Power bank to Raspberry Pi for continuous operation.
• USB: For connecting the webcam to capture images.
• 3.5mm Audio Jack: For connecting the speakers to output audio.
• GPIO: Push button connected for user input.
FLOW CHART
RESULTS & DISCUSSIONS
Experimental Setup:- The system was tested using text in Kannada, Hindi, and
English under various conditions to evaluate:
1. Text clarity and size.
2. Background complexity.
3. Lighting conditions.

Test Environment:
Lighting: Well-lit indoor settings with occasional tests in low-light conditions.
Text Source: Printed documents and posters in supported languages.
Camera: USB Webcam (720p resolution).

Parameters Evaluated:
OCR Accuracy: Ability to correctly extract text.
Language Detection: Ability to identify the dominant language..
TTS Clarity: Naturalness and intelligibility of spoken output.
RESULT ANALYSIS
Thresholds and Text Characteristics :-

Text Size: Text sizes between 12pt and 72pt (font sizes commonly used in printed
documents) were extracted with 95% accuracy.

Small Text Issues: Fonts smaller than 10pt resulted in OCR inaccuracies, with
recognition rates dropping 60% to 75%. Decorative or cursive fonts saw a drop in
accuracy to 70% due to OCR limitations.

Language Detection: Correctly identified the dominant language in 93% of cases


when single-language text was used. For mixed-language documents, results were
inconsistent, with ~75% accuracy in detecting the dominant language.

Translation: Kannada and Hindi translations to English showed ~90% accuracy for
simple sentences. Complex sentences with idiomatic expressions or ambiguous
contexts occasionally resulted in incorrect translations.
FUTURE TRENDS

• Additional Language Support: Extend capabilities to support more regional and


global languages.

• Improved OCR: Integrate AI-powered OCR for better recognition of cursive or


decorative text and handwritten scripts.

• Compact Design: Develop an all-in-one device with integrated camera, speaker, and
tactile feedback for greater portability.

• Cloud Integration: Leverage cloud-based OCR and translation for faster processing
and real-time updates.

• Accessibility Features: Add voice commands, braille displays, or haptic feedback


for broader usability
CONCLUSION

The Optical Image-to-Speech Converter Using OCR and TTS effectively transforms
printed text into audible speech, providing a valuable assistive tool for visually
impaired individuals.

The system demonstrated:


• Multilingual Support: Accurate recognition and translation of Kannada, Hindi, and
English text.
• High Performance: Reliable OCR and TTS outputs, with an average accuracy of
~90% for clear and well-lit images.
• User-Friendly Design: Interactive GPIO buttons and LEDs ensure ease of use and
portability.

By combining image preprocessing, language translation, and text-to-speech synthesis,


the project successfully created a portable, cost-effective solution for bridging
accessibility gaps in textual content.
REFERENCES
 Ravi, S. Khasimbee, T. Asha, T. Joshna and P. Jyothirmai, "Raspberry pi based
smart reader for blindpeople," 2020 International Conference on Electronics and
Sustainable Communication Systems, no. 1, pp. 445-450, 2020.

 V. Mainkar, T. Bagayatkar, S. Shetye, H. Tamhankar andR. Jadhav, "Raspberry Pi


based intelligent reader for visually impaired persons," 2020.

 S. Akhil, "An overview of tesseract OCR engine," Department of CSE, Calicut


Monsoon, 2016.

 S. Anbarasi, R. Krishnaveni and R. Aruna, "Smart Reader Glass for Blind and
Visually Impaired People," IOS Press, 2021.

 http://www.daveconroy.com/turn-raspberry-pi-translator-speech

 www.raspberrypi.org
O U
K Y
A N
T H

You might also like