0% found this document useful (0 votes)

12 views12 pages

Document To Voice Converter For Blind: Dr. Meril Cyriac, Aani Shaji, Amritha MM, Avani Rajeev, Thara Thilak

This paper presents an advanced document-to-voice conversion tool designed to enhance accessibility for visually impaired individuals and non-native speakers by integrating features such as text extraction, summarization, translation, and natural-sounding speech output. The system utilizes Optical Character Recognition (OCR) for text extraction, AI-driven models for summarization, and translation APIs to ensure users can access information in their preferred language. By promoting independence and efficiency, this tool aims to create a more inclusive and equitable information landscape.

Uploaded by

Meril Cyriac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views12 pages

Document To Voice Converter For Blind: Dr. Meril Cyriac, Aani Shaji, Amritha MM, Avani Rajeev, Thara Thilak

Uploaded by

Meril Cyriac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Zeichen Journal ISSN N0 : 0932 - 4747

DOCUMENT TO VOICE CONVERTER FOR BLIND

Dr. Meril Cyriac , Aani Shaji , Amritha MM , Avani Rajeev , Thara Thilak
1,2,3,4,5
Assistant Professor Deparment of Electronics and Communicaton Engineering.

LBS Institute of Technology For Women Thiruvananthapuram

Abstract

This paper develops an enhanced document-to-voice conversion tool to address key limitations
of current accessibility technologies. Traditional text-to-speech solutions often lack advanced
features like summarization and translation, making it difficult for users especially those who
are visually impaired or face language barriers to efficiently process information and access
content in multiple languages. Our tool integrates these essential functionalities: text extraction,
intelligent summarization, multi-language translation, and high-quality, natural-sounding
speech output. Designed with user accessibility and affordability in mind, this tool allows users
to convert lengthy or complex documents into concise, audio summaries, significantly reducing
the cognitive load and time required to understand content. The translation feature ensures that
users can seamlessly access multilingual materials, greatly expanding the scope of accessible
information. The inclusion of these advanced features enhances user experience, making this
tool beneficial for diverse users, including students, professionals, and organizations with
budget considerations. By combining affordability, accessibility, and usability, this paper
empowers a wide range of individuals to interact with written content more effectively and
independently. It provides an inclusive, user-centered platform that bridges accessibility gaps
and supports equal access to information, thus creating a more supportive and equitable
information landscape. This solution represents a critical step forward in accessibility
technology, offering a meaningful, practical tool that allows users to engage with information,
regardless of visual ability or language proficiency.

1.Introduction
Document-to-voice converters are essential for visually challenging individuals. They depend on
others for accessing books, articles and documents. But this document to voice converters
provide accessibility to written content, allowing users to access books, articles and documents
independently.

Volume 11 , Issue 02 , February 2025 Page N0 :1

Zeichen Journal ISSN N0 : 0932 - 4747
.

This promotes equal learning opportunities in educational settings also. It allows the users to
quickly process information and engage with the content dynamically, such as listening articles
or story books while multitasking. Integrating features like summarization and translation also
enhances the functionality by improving efficiency and accessibility, saving time and making
information more digestible. This also helps the non-native speakers to understand the contents
in their own language [1].

2. Problem Statement

Existing document to voice converters may lacks some additional features like summarization
and translation. It makes them less effective and leads to difficulties in processing information.
There arise a need for a better tool that combines text-to-speech with these additional features.
This solution would help the visually challenging individuals to access the articles, books and
also helps the non-native speakers to understand the contents in their own language. Thereby, it
provides a more inclusive environment for all users, regardless for their visual abilities or
language skills.

The goal of this paper is that by integrating additional features like summarization and
translation within the document to voice converter enables the visually impaired individuals to
quickly grasp key information in their preferred language. This approach enhances the
accessibility and engagement by offering a more efficient listening experience. Compatibility
with educational and professional platforms provides a wider usability and adoption across
diverse contexts. Integrating features like summarization and translation capabilities enhances
the user experience. Summarizing long documents to meaningful short paragraphs allows the
users to understand the key ideas contained in it and can save time and also translation allows
the non-native speakers to understand the contents in their own language. Also, it supports
multitasking by allowing the users to listen the content while engaging in other activities. It also
helpful for students who have visual impairments i.e., by providing tools that promote equal
opportunities in educational settings, help the visually challenging students to access course
materials more effectively. Moreover, the system enhances the efficiency and accessibility.

Volume 11 , Issue 02 , February 2025 Page N0 :2

Zeichen Journal ISSN N0 : 0932 - 4747

4.1 Specific Objectives

The proposed system revolve around integrating several advanced functionalities to develop a
robust and efficient document-to-speech solution. First, the system seeks to implement highly
accurate Optical Character Recognition (OCR) using tools such as docTR or Tesseract-OCR.
These tools are capable of extracting text from a wide range of document types, including
printed and handwritten materials, ensuring the system is adaptable to various text formats and
maintains a high level of accuracy. This feature enables users to digitize content from physical
documents seamlessly. Next, the system incorporates summarization capabilities using AI-
driven models like those provided by Hugging Face Transformers. This feature is designed to
process lengthy documents and condense them into concise, meaningful summaries. By focusing
on the core information, this functionality significantly reduces the time users spend on content
consumption while ensuring they retain the most critical insights.

The paper also integrates translation capabilities to support a multilingual user base. Using
libraries like Google Translate, the system ensures that users, particularly non-native speakers,
can access and understand content in their preferred language. This feature enhances
accessibility by breaking down language barriers and expanding the system’s usability across
different linguistic groups. Finally, the system includes a Text-to-Speech (TTS) module to
convert extracted or summarized text into speech. It utilizes tools such as pyttsx3 for offline use
and gTTS for natural-sounding audio when internet access is available. This functionality
ensures that visually impaired individuals can listen to content effortlessly, making the system
user-friendly and inclusive. These specific objectives aim to create a comprehensive solution that
seamlessly integrates OCR, summarization, translation, and TTS functionalities. The system is
optimized to provide high performance, accuracy, and accessibility, catering to diverse user
needs and enhancing their interaction with written content.

4.2 Broad Objectives

The proposed system aims to address critical challenges in accessibility, learning, efficiency,
inclusivity, and versatility by leveraging advanced technologies. One of its primary objectives is
to promote accessibility by bridging the gap between written and auditory communication,
making content more accessible to visually impaired individuals [2] .Furthermore, it enables
non-native speakers to understand written materials in their preferred language through
translation, breaking down language barriers and enhancing comprehension.

Volume 11 , Issue 02 , February 2025 Page N0 :3

Zeichen Journal ISSN N0 : 0932 - 4747

In terms of learning opportunities, the system seeks to provide equal access to educational
resources for visually impaired students by converting course content into audio format,
ensuring they can keep pace with their peers. It also supports non-native speakers in academic
and professional settings by offering language adaptability, which enhances their ability to
engage with course materials, research papers, and workplace documents without linguistic
limitations. The system is designed to improve user efficiency by summarizing lengthy
documents into concise and meaningful content, saving valuable time for users. It also enables
multitasking by providing an audio-based solution, allowing individuals to listen to content
while performing other activities, thus enhancing productivity in various scenarios. This
approach empowers users to explore books, articles, and other documents without relying on
external assistance, contributing to their independence and confidence. Finally, the system is
built with versatility in mind, ensuring compatibility across diverse platforms, including
educational institutions and professional environments. It is designed to be adaptable, allowing
for future enhancements, such as the integration of new features or deployment on embedded
devices. This flexibility ensures the system remains relevant and capable of meeting evolving
user needs in a dynamic technological landscape. By addressing these objectives, the system
aims to create a more inclusive and efficient way of accessing and interacting with written
content.

The proposed methodology offers a comprehensive approach to building a document-to-voice

conversion system. By integrating OCR, text cleaning, summarization, translation, and text-to-
speech technologies, the system effectively transforms written text into spoken language. The
modular design allows for flexibility and adaptability to different document formats and user
preferences. The system's ability to process documents, extract relevant information, and
present it in an auditory format makes it a valuable tool for users with visual impairments or
those who prefer auditory content consumption.

5.1 Document Scanning and Text Extraction

The system begins by digitizing the physical document. This is achieved through either scanning
the document using a scanner or capturing an image of it using a camera connected to the
computer. The acquired image is then fed into an Optical Character Recognition (OCR) engine.
This engine employs advanced algorithms to analyze the image pixel by pixel, identifying and
recognizing individual characters.

Volume 11 , Issue 02 , February 2025 Page N0 :4

Zeichen Journal ISSN N0 : 0932 - 4747

Once the characters are recognized, they are converted into digital text, effectively transforming
the scanned image into a machine-readable format. This extracted text can then be further
processed for tasks like summarization, translation, or text-to-speech conversion. [3] [4]

5.2 Text-to-Speech (TTS) Conversion

The extracted text is fed into a Text-to-Speech (TTS) engine, which transforms it into natural-
sounding spoken language. This engine employs sophisticated algorithms to analyze the text,
identify the appropriate pronunciation of words, and generate corresponding audio waveforms.
The user can control the initiation of the reading process by pressing a physical switch.This
switch sends a signal to the PC, triggering the TTS engine to start processing the text and
generating the audio output. The synthesized speech can be played through the system's
speakers or headphones, providing an auditory representation of the written content.

5.3 Language Translation

The extracted text, once cleaned and processed, can be translated into a desired language using
a language translation API. These APIs, such as Google Translate or DeepL, leverage advanced
machine learning techniques to accurately translate text from one language to another. By
integrating such an API into the system, users can access information in their preferred
language. To initiate the translation process, a dedicated switch can be incorporated. When this
switch is pressed, the system will trigger the translation API to translate the text. The translated
text is then fed into the TTS engine [8].

5.4 Summarization of Text

To further enhance the system's functionality, a summarization model can be integrated. This
model, such as those provided by Hugging Face's Transformers or LangChain, can process the
extracted text and generate a concise summary. This summary captures the key points of the
document, making it easier for users to quickly grasp the main ideas. The user can trigger the
summarization process by pressing a dedicated switch. Upon receiving this signal, the system
will feed the extracted text into the summarization model. The model will then process the text
and generate a summary. This summary can be either displayed on the screen or spoken aloud
using the TTS engine. By incorporating a summarization model, the system can provide a more
efficient and user-friendly experience, especially when dealing with lengthy documents [5] [6] [7].

Volume 11 , Issue 02 , February 2025 Page N0 :5

Zeichen Journal ISSN N0 : 0932 - 4747

5.5 Audio Output

The final stage of the process involves the audio output. Once the text-to-speech engine has
converted the processed text into audio waveforms, the system plays the generated audio
through the device's speakers or headphones. This auditory output provides a convenient and
accessible way for users to consume the information. The quality of the audio output is
influenced by factors such as the TTS engine's capabilities, the quality of the input text, and the
system's hardware. By optimizing these factors, a clear and natural-sounding audio experience
can be achieved.

Figure 1 . Flow chart for methodology.

6. Software Requirements

This paper is developed using Python and runs on Windows, though it can be executed on any
modern operating system that supports Python and the necessary libraries. The core
functionality of the project involves several key Python libraries. For OCR (Optical Character
Recognition), docTR is used as the primary tool, a deep learning-based OCR library that
extracts text from scanned images or documents [10] . If needed, Tesseract-OCR can also be
used as an alternative for traditional OCR tasks. For converting the extracted text into speech,
the paper utilizes pyttsx3, an offline Text-to-Speech (TTS) library, though gTTS (Google Text-
to-Speech) can be used for more natural-sounding voices when internet access is available [12].

Volume 11 , Issue 02 , February 2025 Page N0 :6

Zeichen Journal ISSN N0 : 0932 - 4747

The paper also integrates Google Translate a library that interfaces with Google Translate API,
enabling text translation into multiple languages [9]. For text summarization, Transformers
from Hugging Face is used to implement AI-based models for generating concise summaries of
long texts. Additionally, keyboard is employed to handle keyboard inputs, allowing users to
trigger actions like scanning, translating, or summarizing. The development is carried out in
PyCharm, a Python-compatible IDE that provides robust features for managing and debugging
the code. These libraries, combined with PyCharm, create a comprehensive and efficient
environment for handling text recognition, translation, summarization, and speech synthesis.
Developed in PyCharm, the paper benefits from an optimal development environment that
enhances productivity and ease of debugging. Overall, this solution provides a flexible and
scalable framework for automating and improving workflows that require text extraction,
translation, summarization, and speech output.

6. Results

The paper has successfully implemented document scanning to speech conversion and
summarization of the scanned document [Figure.6] [Figure.7] , leveraging docTR for Optical
Character Recognition (OCR) and pyttsx3 for Text-to-Speech (TTS) and from transformers
like T5Tokenizer, T5ForConditionalGeneration etc. [11]. The system can scan documents (both
images and printed text), extract the content using docTR's OCR capabilities, and then convert
the extracted text to speech with pyttsx3, which provides clear and intelligible audio output
through the system's speakers. It also provides a short summary of the scanned document. This
functionality makes the system highly accessible, particularly for blind users, as it allows them
to listen to the content of printed or image-based documents.

However, there are some challenges: the OCR accuracy is generally good for printed text
[Figure.2] [Figure.3], but the system struggles with handwritten documents, where recognition
accuracy is lower. [Figure.4] [Figure.5]. Despite this limitation, the speech output generated by
pyttsx3 is of high quality, ensuring a smooth and understandable user experience for text-to-
speech conversion. Moving forward, further improvements in handwritten text recognition or
integration with additional OCR models could enhance the system's robustness in diverse use
cases. For real-time implementation, the system needs several future enhancements. First,
optimizing OCR speed using GPU acceleration and image preprocessing is crucial for faster text
extraction. The text-to-speech engine can be upgraded to more advanced neural models for
natural and quicker voice output.

Volume 11 , Issue 02 , February 2025 Page N0 :7

Zeichen Journal ISSN N0 : 0932 - 4747

To minimize translation and summarization delays, local pre-trained models or efficient API
calls should be integrated. Additionally, using multithreading or asynchronous processing will
streamline the workflow, reducing overall latency. Upgrading hardware, such as faster
processors or adding a GPU, will further boost real-time performance. These improvements are
essential to achieve seamless real-time document-to-speech conversion.

Figure. 2 .Detection of printed image.

Figure 3 . Denoting confidence percentage.

Volume 11 , Issue 02 , February 2025 Page N0 :8

Zeichen Journal ISSN N0 : 0932 - 4747

Figure 4 . Detection of handwritten document

Figure 5 . Denoting the confidence percentage.

Volume 11 , Issue 02 , February 2025 Page N0 :9

Zeichen Journal ISSN N0 : 0932 - 4747

Figure 6. Extracted text

Figure 7. Generated Summary

Volume 11 , Issue 02 , February 2025 Page N0 :10

Zeichen Journal ISSN N0 : 0932 - 4747

This document-to-voice conversion project introduces an innovative solution to address the

challenges that many existing tools overlook. By incorporating advanced features such as text
summarization, language translation, and natural-sounding speech output, this tool goes
beyond basic text-to-speech functionalities to create a more comprehensive and inclusive
experience. It effectively reduces the cognitive load on users by providing them with
summarized content, which is especially valuable when dealing with lengthy or complex
documents. The integration of translation broadens accessibility further, enabling users to
interact with content in multiple languages seamlessly.

This tool is particularly impactful for visually impaired users, who often encounter barriers in
accessing written materials, as well as for individuals facing language barriers, enhancing their
ability to access and understand information independently. By ensuring these additional
features operate with ease and affordability, the tool is designed to serve a wide range of users,
making it accessible to students, professionals, and anyone needing improved document
accessibility.

This product, we aim to create a supportive, user-centered platform that promotes equal
access to information, regardless of visual or linguistic challenges. Ultimately, this tool fosters
a more inclusive environment by enabling all users to interact with information more
effectively and meaningfully, bridging gaps in accessibility and advancing the goal of a more
equitable information landscape.

Volume 11 , Issue 02 , February 2025 Page N0 :11

Zeichen Journal ISSN N0 : 0932 - 4747

[1] Singh, Anshika, and Sharvan Kumar Garg. "Comparative study of optical character
recognition using different techniques on scanned handwritten images." Micro-Electronics and
Telecommunication Engineering: Proceedings of 6th ICMETE 2022. Singapore: Springer
Nature Singapore, 2023. 411-420.

[2] Guravaiah, Koppala, et al. "Third eye: object recognition and speech generation for visually
impaired." Procedia Computer Science 218 (2023): 1144-1155.
[3] Batra, Pulkit, et al. "OCR-MRD: performance analysis of different optical character
recognition engines for medical report digitization." International Journal of Information
Technology 16.1 (2024): 447-455.
[4] Manju, S., and J. Anitha. "Investigation of Handwritten Image-To-Speech Using Deep
Learning." 2024 International Conference on Advances in Modern Age Technologies for
Health and Engineering Science (AMATHE). IEEE, 2024.
[5] Gupta, Anushka, et al. "Automated news summarization using transformers." Sustainable
Advanced Computing: Select Proceedings of ICSAC 2021. Singapore: Springer Singapore,
2022. 249-259.
[6] Bauboorally, S. M. W., & Pudaruth, S. (2023). A Statistical and Machine Learning
Approach for Summarising Computer Science Research Papers. International Journal of
Computing and Digital Systems.
[7] Adhikari, Surabhi. "Nlp based machine learning approaches for text summarization." 2020
Fourth International Conference on Computing Methodologies and Communication
(ICCMC). IEEE, 2020.
[8] Vieira, Lucas Nunes, et al. "Machine translation in society: insights from UK users."
Language Resources and Evaluation 57.2 (2023): 893-914.
[9] Kolhar, Manjur, and Abdalla Alameen. "Artificial Intelligence Based Language Translation
Platform." Intelligent Automation & Soft Computing 28.1 (2021).
[10] Porwal, Utkarsh, Alicia Fornés, and Faisal Shafait. "Advances in handwriting recognition."
International Journal on Document Analysis and Recognition (IJDAR) 25.4 (2022): 241-243.
[11] Raj, Ankit, et al. "Document-Based Text Summarization using T5 small and gTTS." 2024
International Conference on Advances in Data Engineering and Intelligent Computing Systems
(ADICS). IEEE, 2024.
[12] Sisman, Berrak, et al. "An overview of voice conversion and its challenges: From statistical
modeling to deep learning." IEEE/ACM Transactions on Audio, Speech, and Language
Processing 29 (2020): 132-157.

Volume 11 , Issue 02 , February 2025 Page N0 :12

Siemens FireFinder XLS Zeus v3.0 Programming Tool Quick Start Guide PDF
No ratings yet
Siemens FireFinder XLS Zeus v3.0 Programming Tool Quick Start Guide PDF
60 pages
Text - To - Speech Converter: Bachelor of Engineering IN Computer Science & Engineering
57% (7)
Text - To - Speech Converter: Bachelor of Engineering IN Computer Science & Engineering
42 pages
Real Time Braille To Speech Using Python
100% (1)
Real Time Braille To Speech Using Python
10 pages
Mad Lab Report
0% (2)
Mad Lab Report
27 pages
Ai Powered Edge Computing
No ratings yet
Ai Powered Edge Computing
29 pages
Advanced Image To Speech Conversion
No ratings yet
Advanced Image To Speech Conversion
46 pages
Microsoft Office 2010 Introductory Completed Assignments
100% (3)
Microsoft Office 2010 Introductory Completed Assignments
11 pages
Chapter 1 Ms Word Summary Quiz
No ratings yet
Chapter 1 Ms Word Summary Quiz
3 pages
Project Mini 1
No ratings yet
Project Mini 1
75 pages
SE-Unit-2-Agile Development
No ratings yet
SE-Unit-2-Agile Development
20 pages
Safety Manual
No ratings yet
Safety Manual
25 pages
Durkopp DAC Programming 745-34-S
No ratings yet
Durkopp DAC Programming 745-34-S
28 pages
Audio Speech To Sign Language Converter Master Complete Document
No ratings yet
Audio Speech To Sign Language Converter Master Complete Document
54 pages
Salisu Umar Final Copy-1-67 Compressed Compressed
No ratings yet
Salisu Umar Final Copy-1-67 Compressed Compressed
60 pages
Duplication - Typecasting-Problem Statement
100% (1)
Duplication - Typecasting-Problem Statement
3 pages
PDF To Audio Mohan
No ratings yet
PDF To Audio Mohan
51 pages
3 Using The Qradar Siem Dashboard
No ratings yet
3 Using The Qradar Siem Dashboard
93 pages
Devel Projevct
No ratings yet
Devel Projevct
59 pages
Man Ecdis 213 Uk 2
No ratings yet
Man Ecdis 213 Uk 2
280 pages
Open Source Computer Vision
No ratings yet
Open Source Computer Vision
79 pages
Image To Speech Conversion in Multi Languages
No ratings yet
Image To Speech Conversion in Multi Languages
31 pages
Speech Recognition Using Python
No ratings yet
Speech Recognition Using Python
49 pages
1.1. Purpose: 1.2. Objective
No ratings yet
1.1. Purpose: 1.2. Objective
13 pages
ER Modeling (I)
No ratings yet
ER Modeling (I)
55 pages
Ranjith S - Mini Project
No ratings yet
Ranjith S - Mini Project
74 pages
6.python Text To Speech
No ratings yet
6.python Text To Speech
2 pages
Visual Assist
No ratings yet
Visual Assist
53 pages
7sem Projectreport
No ratings yet
7sem Projectreport
33 pages
Document Reader For Visually Imapired: Prof. Deepti Chandran
No ratings yet
Document Reader For Visually Imapired: Prof. Deepti Chandran
26 pages
Augustin Document
No ratings yet
Augustin Document
43 pages
Dip PDF
No ratings yet
Dip PDF
30 pages
Online PDF To Text and Audio Converter and Language Translator Using Python
No ratings yet
Online PDF To Text and Audio Converter and Language Translator Using Python
8 pages
"Text Recognition and Face Detection Aid For Visually Impaired Person Using Raspberry Pi
No ratings yet
"Text Recognition and Face Detection Aid For Visually Impaired Person Using Raspberry Pi
62 pages
Image To Text and Speech Conversion
No ratings yet
Image To Text and Speech Conversion
3 pages
(IJCST-V9I2P18) :swati, Harpreet Kaur
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
6 pages
Ai Project
No ratings yet
Ai Project
22 pages
Lifa Operating System 4
No ratings yet
Lifa Operating System 4
8 pages
PDF To Voice by Using Deep Learning
No ratings yet
PDF To Voice by Using Deep Learning
5 pages
Mini Proj Rep
No ratings yet
Mini Proj Rep
20 pages
Programming in C Unit III
No ratings yet
Programming in C Unit III
19 pages
Presentation 4
No ratings yet
Presentation 4
17 pages
CHAPTER ONEpdfCORRECTION To Audio
No ratings yet
CHAPTER ONEpdfCORRECTION To Audio
9 pages
Phase-1 Report
No ratings yet
Phase-1 Report
29 pages
Final Synopsis PANS
No ratings yet
Final Synopsis PANS
14 pages
18CSP109L 1st Review-Major
No ratings yet
18CSP109L 1st Review-Major
15 pages
18CSP109L - 1st Review-Major
No ratings yet
18CSP109L - 1st Review-Major
15 pages
Math El
No ratings yet
Math El
17 pages
Wa0002.
No ratings yet
Wa0002.
10 pages
InDesign Course Outline
No ratings yet
InDesign Course Outline
3 pages
Text To Speech Conversion
No ratings yet
Text To Speech Conversion
4 pages
Springer-Naman Khetrapal Final
No ratings yet
Springer-Naman Khetrapal Final
12 pages
133-138, Tesma0810, IJEAST
No ratings yet
133-138, Tesma0810, IJEAST
6 pages
Image To Speech Conversion PDF
No ratings yet
Image To Speech Conversion PDF
7 pages
5 6075541830752010636
No ratings yet
5 6075541830752010636
2 pages
Main (pt2)
No ratings yet
Main (pt2)
13 pages
Roots Assignment
No ratings yet
Roots Assignment
15 pages
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
No ratings yet
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
5 pages
Fingerprint - Extraction 1 Copy Copy1
No ratings yet
Fingerprint - Extraction 1 Copy Copy1
10 pages
Speech To Braille Conversion Using Python
No ratings yet
Speech To Braille Conversion Using Python
5 pages
An Efficient Approach For Text-to-Speech Conversio
No ratings yet
An Efficient Approach For Text-to-Speech Conversio
6 pages
Using Deep Learning and Augmented Reality To Improve Accessibility Inclusive Conversations Using Diarization Captions and Visualization-FINAL
No ratings yet
Using Deep Learning and Augmented Reality To Improve Accessibility Inclusive Conversations Using Diarization Captions and Visualization-FINAL
13 pages
Text To Speech Conversion Using Raspberry - PI
No ratings yet
Text To Speech Conversion Using Raspberry - PI
3 pages
Project Report: Demonstration of Types of Viruses and Its Mechanism
No ratings yet
Project Report: Demonstration of Types of Viruses and Its Mechanism
11 pages
19.MS Research Proposal PDF
No ratings yet
19.MS Research Proposal PDF
3 pages
Assignment 4
No ratings yet
Assignment 4
10 pages
Text To Speech
No ratings yet
Text To Speech
9 pages
AI Based Reading System For Blind Using OCR
No ratings yet
AI Based Reading System For Blind Using OCR
4 pages
Leslie Mashonga T2082163F
No ratings yet
Leslie Mashonga T2082163F
9 pages
Visually Disabled
No ratings yet
Visually Disabled
7 pages
Inbound 4683096324164272159
No ratings yet
Inbound 4683096324164272159
6 pages
Ss Ebook MigratingToTheCloud PDF
No ratings yet
Ss Ebook MigratingToTheCloud PDF
11 pages
VP Researchpaper 11
No ratings yet
VP Researchpaper 11
4 pages
DL Based Speech To Text Converter For Audio Visual Applications
No ratings yet
DL Based Speech To Text Converter For Audio Visual Applications
4 pages
Outtext
No ratings yet
Outtext
6 pages
Dynamic Project
No ratings yet
Dynamic Project
4 pages
BR2 Wallbox EN
No ratings yet
BR2 Wallbox EN
6 pages
Stepper Driver Power Supplies Processor: Title: Duet 2
No ratings yet
Stepper Driver Power Supplies Processor: Title: Duet 2
7 pages
Paper 5728
No ratings yet
Paper 5728
3 pages
2PAA118718 A en LD 810HSE Ex Datasheet
No ratings yet
2PAA118718 A en LD 810HSE Ex Datasheet
4 pages
Real-Time Braille To Speech Conversion: Project Reference No.: 41S - Be - 1713
No ratings yet
Real-Time Braille To Speech Conversion: Project Reference No.: 41S - Be - 1713
3 pages
Get A VPS Completely Free!: by Sakib Hasan
No ratings yet
Get A VPS Completely Free!: by Sakib Hasan
6 pages
Recover Table With RMAN
No ratings yet
Recover Table With RMAN
5 pages
Stack-1673530992081 STK
No ratings yet
Stack-1673530992081 STK
3 pages
Lab2.ipynb - Colaboratory
No ratings yet
Lab2.ipynb - Colaboratory
2 pages
Text To Speech Converter: Advantages
No ratings yet
Text To Speech Converter: Advantages
2 pages
Ashwani's Resume
No ratings yet
Ashwani's Resume
1 page
Marc 8 Midi PDF
No ratings yet
Marc 8 Midi PDF
1 page
SQL Cheat Sheet Bascis - MD
No ratings yet
SQL Cheat Sheet Bascis - MD
1 page

Document To Voice Converter For Blind: Dr. Meril Cyriac, Aani Shaji, Amritha MM, Avani Rajeev, Thara Thilak

Uploaded by

Document To Voice Converter For Blind: Dr. Meril Cyriac, Aani Shaji, Amritha MM, Avani Rajeev, Thara Thilak

Uploaded by

Zeichen Journal ISSN N0 : 0932 - 4747

DOCUMENT TO VOICE CONVERTER FOR BLIND

LBS Institute of Technology For Women Thiruvananthapuram

Volume 11 , Issue 02 , February 2025 Page N0 :1

Volume 11 , Issue 02 , February 2025 Page N0 :2

4.1 Specific Objectives

4.2 Broad Objectives

Volume 11 , Issue 02 , February 2025 Page N0 :3

The proposed methodology offers a comprehensive approach to building a document-to-voice

5.1 Document Scanning and Text Extraction

Volume 11 , Issue 02 , February 2025 Page N0 :4

5.2 Text-to-Speech (TTS) Conversion

5.3 Language Translation

5.4 Summarization of Text

Volume 11 , Issue 02 , February 2025 Page N0 :5

5.5 Audio Output

Figure 1 . Flow chart for methodology.

Volume 11 , Issue 02 , February 2025 Page N0 :6

Volume 11 , Issue 02 , February 2025 Page N0 :7

Figure. 2 .Detection of printed image.

Figure 3 . Denoting confidence percentage.

Volume 11 , Issue 02 , February 2025 Page N0 :8

Figure 4 . Detection of handwritten document

Figure 5 . Denoting the confidence percentage.

Volume 11 , Issue 02 , February 2025 Page N0 :9

Figure 6. Extracted text

Figure 7. Generated Summary

Volume 11 , Issue 02 , February 2025 Page N0 :10

This document-to-voice conversion project introduces an innovative solution to address the

Volume 11 , Issue 02 , February 2025 Page N0 :11

Volume 11 , Issue 02 , February 2025 Page N0 :12

You might also like