0% found this document useful (0 votes)
45 views9 pages

Final Research Paper

The document describes a virtual voice assistant software for visually impaired people that takes Hindi voice inputs. The software allows users to access computers and websites using voice commands instead of keyboards and mice. It provides text summaries of web content and answers questions in either English or Hindi.

Uploaded by

PRAGYA PATIDAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views9 pages

Final Research Paper

The document describes a virtual voice assistant software for visually impaired people that takes Hindi voice inputs. The software allows users to access computers and websites using voice commands instead of keyboards and mice. It provides text summaries of web content and answers questions in either English or Hindi.

Uploaded by

PRAGYA PATIDAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Virtual Voice Assistant for Visually Impaired with Hindi inputs

Priyanshu Sahu Yashshree Patankar Pragya Patidar


Department of Information Technology Department of Information Technology Department of Information Technology
Medicaps University, Indore Medicaps University, Indore Medicaps University, Indore
[email protected] [email protected] [email protected]

software can automate such websites. The


I. ABSTRACT- International research
user is free from
shows that folks with visual impairments
remembering complex braille keyboard
are 30% less likely to access the web than
commands or the effort
individuals without
of typing, he/she can simply voice out
disabilities and also our country includes
his/her command and therefore the
a major problem of communication
software will execute it. The system also
through English language. This paper
has the functionality of
illustrates the implementation of software
providing a summary using output voice
that provides assistance to the visually
commands of the content on
impaired for accessing the
the website and answering questions asked
computer system and it'll take inputs using
by the user. It also has
HINDI language so it may also be
the functionality of taking input in Hindi
beneficial and simply accessible to
language using different speech
the folk who doesn't know English
methods with chat-bot facility include in it.
language all right. The software will
The system has an adaptable
provide instruments within the way
nature in step with the user need also with a
they can access the system and
system navigation functionality.
internet which is able to increase the
Hence, this software will provide the leads
convenience of
to the shape of voice outputs for
its usage. Although technology has grown
Visually impaired persons so it often easily
leaps and bounds, the net -
hearable to them and also gives
especially websites are still inaccessible by
Text summary for each input search
the visually impaired.
whether it's in English or HINDI language.
The software provides how to interact with
these websites with II. Keywords— Visually impaired; Voice
much ease. With the utilization of voice control; automate website and system; blind
commands rather than the people; HINDI Inputs.
traditional keyboard and mouse, our III. INTRODUCTION - Today there are
software provides a replacement nearly 275 million people within the world
dimension to access and supply commands that are visually impaired. Although
to any website. The technology has grown leaps and bounds, the
software will read out the content of the accessibility, especially that of the web and
web site and so using system for differently-baled people continue
speech to text and text to speech to be far-fetched. during this nowadays,
modules together with selenium our more and more things is performed
digitally. From movie tickets, ordering
food, to booking train tickets people a people are sort of complicated to
everything will be done online. for use in day to day or to control voice
nearly all of those online assistance also so for increasing the
facilities someone has got to use a web convenience of virtual voice assistant we
site and a system. Using technologies introduce HINDI language as a voice
like this may be a trivial task for input. a number of the screen readers work
many people but it's very difficult for only with a selected quite browser and
visually impaired people. The system system and a few require the user to
internet may be a highly visual kind recollect complex commands thus screen
of communication, different "accessibility readers and braille system don't seem to
blockers" can hinder differing kinds of be an efficient solution to the matter at
internet sites and system functionalities, hand and can't be accustomed access the
unlike brick-and-mortar businesses where system thoroughly. The accessibility of
accessibility is often made by including some system functions and web
ramp for disabled persons. As an example, content regressed because of visual
researchers found that 60% of automatic disability. Both of the language and
data processing system “had significant impairment problem result in an
accessibility issues," while 50% of inconsistent state with regards to its
respondents said they were “unable to accessibility. The American Foundation for
access information and services through the Blind determined that individuals with
government websites and computer.” Thus, visual impairments are over 31% less likely
we wanted to return up with a novel way of to report back to hook up with the
allowing visually impaired people to access technologies and over 35% less likely to use
the system and internet. Although the a PC than people without disabilities.
W3C includes a set of recommendations Keeping all the above factors in mind we
that stipulate the foundations to be followed came up with the answer of virtual voice
when designing an internet site for the assistant. the first objective is to bridge the
visually impaired. the foremost challenge in accessibility gap between the
developing a stable software is to typical user and also the visually impaired
incorporate as few keystrokes as possible individuals also with the language gap
and to produce an end-to-end experience between an English tutor and HINDI one
with the assistance of voice alone. The for the voice inputs with regards to the
inclusion of multiple languages and PC system. The system is blind to the
setting the correct pace of the speech when visually impaired and misunderstood to the
played back to the user are important person with a language issue, but to not
factors to contemplate. To support the make the converse the reality, during
widespread usage of the software, a this paper we present an end-to-end voice-
vital parameter is that the dependency of the based software for the visually impaired and
software on the local environment and non-anglophone humans to enable them to
operating systems. While the tech has access the system with minimal or no
evolved greatly, the accessibility, difficulties. The user will provide the
especially the PC and internet for the commands he wants to execute as a voice
differently baled continues to be stagnant. input rather than employing a keyboard.
II. Also, for the local Indian or native The software then uses a speech to text
module to convert the input speech WCAG 2.0 but the techniques didn't solve
either it's in English or in HINDI to the issues. Porter points out that lots of
text which can be the command to be editing has already been done if newspapers
executed. The command is executed using are produced on Braille for visually
web driver and software. Once executed the impaired individuals, but once they are
user will have three options: - either to available on the net the individual has the
read the complete content of the search selection of what to read, which increases
output, read a summary or ask a matter. The the accessibility. A respondent from the
second and third options are implemented study conducted stated that “without the
using machine learning. Once the voice software there's no access for blind people.
input is taken and also the command is JAWS could be a specialized software
executed the output is claimed to the user which needs a knowledgeable person to
using the text to speech module. Thus, the supply support.” Thus, with respect to these
software manages to create the system more studies, it is often inferred that there's a
accessible easily, quickly and more desire for a software to access the system
effectively for the visually impaired and and also technologies that's much easier for
non-anglophone humans. the user to control than existing solutions
IV. LITERATURE SURVEY - Conducted a like screen readers. For developing a
study to see whether the system provides software to boost system accessibility for
opportunities for disabled people and also the visually impaired, Ferati mentions that a
with HINDI language to hold out activities “one solution for all model” is insufficient
which they were previously unable to try without considering the amount of visual
and do or whether it ends up in greater defect when providing customized system
social exclusion. It states that there's no experience also with the convenience of
known research to work out the language mobility like native HINDI
explanations people with disabilities can’t language for inputs.
access the system and technologies more V. SYSTEM OVERVIEW AND DESIGN-
fluently. On the opposite hand states that The system comprises a modular client
the first barrier within the accessibility is
server distributed architecture. The system
that of economic, theoretical and technical
in-capabilities. This thought is seconded by consists of the foremost menu which first
Kirsty who states that bad HTML code and runs on the startup of the software and
technologies causes an hindrance in so the web site modules. The client
accessing the PC or any digital tool for the communicates with the server and back
visually impaired Although the globe Wide
with the utilization of REST API s, thus the
Web Community mentions a listing of
guidelines for maintaining a high level of online site modules don't seem to be local
accessibility for the visually impaired, it to the client. Throughout the system, the
also states that only 50.4% of the issues user communicates with the software via
encountered by users were covered by speech-to-text interface. The Google library
Success Criteria within the online page
of speech-to-text (Speech Recognition) for
Accessibility Guidelines 2.0 (WCAG 2.0)
and 16.7% of internet sites and systems Python is utilized for this purpose. For
implemented techniques recommended in communicating the system’s output to the
user likewise as for confirming the user Figure 1 could be a representation of the
input, the recognized input is played back system architecture of our software. The
to the user using the Python text-to-speech user accesses the software using the
library. The modules are written in Python net interface where the speech to text (STT)
and make use of Selenium for automation module converts the voice input to text. The
of the respective module and exquisite user is then presented with the most menu
Soup for scraping the contents of the where they need three options to decide
net page. The “Script” component of on from and judge which website they
each module consists of the customized require to browse. Accordingly, the module
code that entails the features of the net site is invoked with its corresponding speech to
contained within the module. for instance, text modules, web driver and machine
the Wikipedia module consists of a matter learning module. The output is played to the
and Answer and Summary feature along user using text to speech (TTS) module. this
Figure 1: System Architecture
with the conventional feature of reading can be the overview of the software.
out the whole article. the previous is
implemented by training a BERT model on
the Stanford Question Answering Data set
(Squad). The API s that holds the system
together are written in Flask. The
software is functioning system independent
to support hassle free application and usage
of the system

Figure 2: BERT model on Squad Dataset


architecture Dataset - we've got used the
quality Question Answering Dataset
(SQUAD) available to pre train the machine
learning model for the question and answers
component of the module. The dataset has
questions posed by people on Wikipedia
where the solution to the question is from
within the given excerpt of text on
Wikipedia or it's going to be unanswered.

VI. METHODOLOGY -The user first interacts


with the foremost menu of the software
once the microcomputer or laptop has been
switched on. the foremost menu of the web driver and machine learning module.
software is also invoked by either the The output is played to the user using text
integrated voice assistant, as an example to speech (TTS) module. this can be the
Siri, or by a predefined keyboard shortcut, overview of the software. The main menu
being the only real keyboard interaction runs when the software is first opened.
required. the foremost menu interface Using the pytts (Python text-to-speech)
provides the available options to the user module, the initial set of instructions
viz. Installed website modules, pace of the illustrating the choices provided to the
audio, accent of the audio. Each of the user. The system takes the user input after
net site modules contains a speech-to-text the beep using Google speech-to-text
and text-to-speech bundle, a python script python module. The keywords from the
that automates the net site and also the voice are then extracted and appropriate
features specific to the online site. For response is executed. The user is
efficient speech recognition, the user is additionally unengaged to change the voice
given a beep the smallest amount bit stages tempo and accent
after which he's unengaged to speak. The that suits him/her the simplest.
input received and recognized by the system
from the user is additionally played back to
the user therefore the user can confirm his
intended input, to cut back any errors right
at that individual stage, thus, enabling some
way of editing. The methodology followed
to implement three modules - Google,
Gmail, Wikipedia - and thus the most menu
is described below.
The main menu runs when the software is
first opened. Using the pytts (Python text-
to-speech) module, the initial set of
instructions illustrating the
alternatives provided to the user. The
system takes the user input after the beep
using Google speech-to-text python module. Flow diagram for main menu
The keywords from the voice are then B. Google Module- recognized user input and
extracted and appropriate response is also the input is finalized providing it’s
executed. The user is confirmed by the user.
additionally unengaged to change the voice This module consists of a python script
tempo and accent that suits him/her the that automates
only. the website using Selenium and
A. Main Menu -The user is then presented delightful Soup. The
with the most menu where they need three user can hunt for any query through the
options to decide on from and speech-to-
choose which website they need to browse. text and text-to-speech interfaces and also
Accordingly, the module is invoked with the recognized
its corresponding speech to text modules, query is searched with the assistance of
Gmail contents using the respectively, so sent with the user’s
Beautiful Soup module of python. The confirmation. At each stage, the user
search results is absolving to edit and
are indexed which enables quick accessing undo any of his inputs. The system repeats
of the online the Wikipedia module presents the user
page in keeping with the user’s desire, thus novel options
saving time, such as summarizing and reading out the
as opposition the user reading out the article, and
full search provides intelligent answers to queries
result that he wishes to pick. using NLP and
Machine Reading Comprehension.
Once the net page
is loaded, the user enters the search query,
followed by
the confirmation, after which the user
is supplied with
3 options- reading out the whole article,
reading out the
summary of the article, a matter and
answer session.
The entire article is read by scraping the
net page,
cleaning the text, and using the text-to-
speech module.
Summarization of the text is performed
using the
summary method provided by the
Wikipedia python
library. For the question answer session, a
BERT
model on Stanford Question Answering
Dataset
Flow diagram for Google (SQuaD) is employed. It consists of
100,000 questions with
C. Gmail Module-This module consists of a
over 50,000 unanswerable questions.
python script that starts up
BERT is employed for
Gmail, logs the user into his/her mailbox
Question Answering on SQuAD dataset
and provides
by:
the support for the user to send or read
applying two linear transformations to
mails. For
BERT outputs for every sub token.
sending a replacement mail, the system
First/second linear transformation for
prompts the user to
prediction of probability that current sub
provide relevant details and after filtering
token is start/end position of a solution
out noise,
The user can then ask any question
through selenium the input fields are filled
relevant to the subject of the accuracy of 80.88% which is that the
article explore for, and also the model proportion of predictions that
returns the foremost suitable match anyone of the underside truths
answer to the user through text-to-speech. answers exactly, and the F1
score was found to be 88.49%.
Results showed that we were able to run our
software on the three preferred sites:
Google, Gmail and Wikipedia. The
software was run on each of them
separately. The software
could send an email effectively using the
commands from the
user. The software also provided an
accurate answer to the
question the user asked on Wikipedia. The
software managed to summarize the text in
Wikipedia accurately and thus we were
able to test and build a software which is
able to make the online site
easily, quickly and efficiently accessible for
the visually
impaired.
VIII. APPLICATION- Virtual Assistant for the
visually impaired acts as a superb
support to the visually disabled people to
access the online on
VII. RESULTS any browser as our software is browser
The built-in modules of text to speech independent. They can
(pyttsx3) and speech to access the online using their speech so can
text (speech recognition library by Google) navigate the
in python provides a website using voice commands. The
good accuracy and also provide a software will read out the
straightforward and quick due to content of the net site to the user thus
convert the text. The speech-to-text making the net site more
recognized the words with accessible. This feature won't only help the
96.25% accuracy with 4 different voice visually impaired
samples each but also allow people to access the
containing 20 different inputs during a online with ease and
moderate to quiet eliminate the use of hardware devices a bit
environment. like the keyboard.
The BERT model on Squad dataset for the Virtual Assistant also provides the feature
question answering of providing
feature within the Wikipedia module answers to a specific question from a given
showed an actual Match text of knowledge, thus
now the user doesn't have to read the The virtual assistant provides an easy due
entire text to figure out to access any website
the answer, he/she has to easily input the for the visually impaired. It eliminates the
question, the software necessity to remember
will understand the answer from the text complex keyboard commands or the use of
data on itself using screen readers. The
machine learning. The software also assistant isn't only a wonderful due
provides a summary of the to interact with the websites but
text using machine learning, that the user also an efficient due to do so. The software
doesn't must read the works as a
entire thing and thus making it easy to steppingstone towards Web 3.0 where
access the net site. Thus, everything will work on
using machine learning and speech to text voice commands.
techniques we make
the task of accessing the net site, which was X. FUTURE ENHANCEMENT
earlier difficult At present the appliance supports only
now super easy, quick and efficient. Thus, commands given in
we believe that the English language. We arrange to expand
virtual assistants for the visually impaired that and make it available in most of the
are the beginning of daily used languages thus people from
Web 3.0. all parts of the world can access the net with
IX. CONCLUSION-In this paper, we none issue
presented a modular solution to We would also wish to form a
reinforce web- regular framework which will
based accessibility for the visually be plugged to any website and make a
impaired. The virtual browser extension thus
assistant is functioning system independent making it possible to toggle between the
and doesn't rely upon two modes easily,
keyboard inputs from the user to maximize especially for educational websites to
easy use and aims enable visually impaired
to provide a hassle-free experience for the individuals to access online courses a small
user. Through speech amount just like the common
to text and text to speech interfaces, the user individual.
can communicate XI. REFERENCES
with and customize the system. We
presented the system design
and methodology of the three modules that's Pilling, D., Barrett, P. and Floyd, M. (2004).
currently Disabled people and the Internet:
implemented. The Wikipedia module uses a experiences, barriers and opportunities.
BERT model on York, UK: Joseph Rowntree Foundation,
the SQuAD dataset to answer user queries unpublished.
quickly and Porter, P. (1997) ‘The reading washing
accurately. the precise Match was found to machine’, Vine, Vol. 106, pp. 34– 7 JAWS
be 80.88%. -
https://www.freedomscientific.com/product
s/software/jaws/ accessed in April 2020
Ferati, Mexhid & Vogel, Bahtijar & Kurti,
Arianit & Raufi, Bujar & Astals, David.
(2016). Web accessibility for visually
impaired people: requirements and design
issues. 9312. 79-96. 10.1007/978-3-319-
459165_6. Power, C., Freire, A.P., Petrie,
H., Swallow, D.: Guidelines are only half of
the story: accessibility problems
encountered by blind users on the web. In:
CHI 2012, Austin, Texas USA, 5–10 May
2012, pp. 1–10 (2012) Sinks, S., & King, J.
(1998). Adults with disabilities: Perceived
barriers that prevent Internet access. Paper
presented at the CSUN 1998 Conference,
Los Angeles, March. Retrieved January 24,
2000 from the World Wide Web Muller, M.
J., Wharton, C., McIver, W. J. (Jr.), & Laux,
L. (1997). Toward an HCI research and
practice agenda based on human needs and
social responsibility. Conference on Human
Actors in Computing Systems. Atlanta,
Georgia, 22–27 March. Kirsty Williamson,
Steve Wright, Don Schauder, Amanda Bow,
the internet for the blind and visually
impaired, Journal of Computer Mediated
Communication, Volume 7, Issue 1, 1
October 2001, JCMC712 Deep Pavlov
documentation
http://docs.deeppavlov.ai/en/master/features
/models/squad.html accessed in April 2020
The website for American foundation for
the blind
https://www.afb.org/about-afb/what-we-
do/afb-consulting/afbaccessibility-
resources/challenges-web-accessibility
accessed in April 2020 Ryle Zhou, Question
answering models for SQuAD 2.0, Stanford
University, unpublished.
Global data on visual impairments 2010 by
World Health Organization.

You might also like