The document describes a virtual voice assistant software for visually impaired people that takes Hindi voice inputs. The software allows users to access computers and websites using voice commands instead of keyboards and mice. It provides text summaries of web content and answers questions in either English or Hindi.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
45 views9 pages
Final Research Paper
The document describes a virtual voice assistant software for visually impaired people that takes Hindi voice inputs. The software allows users to access computers and websites using voice commands instead of keyboards and mice. It provides text summaries of web content and answers questions in either English or Hindi.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9
Virtual Voice Assistant for Visually Impaired with Hindi inputs
Priyanshu Sahu Yashshree Patankar Pragya Patidar
Department of Information Technology Department of Information Technology Department of Information Technology Medicaps University, Indore Medicaps University, Indore Medicaps University, Indore [email protected][email protected][email protected]
software can automate such websites. The
I. ABSTRACT- International research user is free from shows that folks with visual impairments remembering complex braille keyboard are 30% less likely to access the web than commands or the effort individuals without of typing, he/she can simply voice out disabilities and also our country includes his/her command and therefore the a major problem of communication software will execute it. The system also through English language. This paper has the functionality of illustrates the implementation of software providing a summary using output voice that provides assistance to the visually commands of the content on impaired for accessing the the website and answering questions asked computer system and it'll take inputs using by the user. It also has HINDI language so it may also be the functionality of taking input in Hindi beneficial and simply accessible to language using different speech the folk who doesn't know English methods with chat-bot facility include in it. language all right. The software will The system has an adaptable provide instruments within the way nature in step with the user need also with a they can access the system and system navigation functionality. internet which is able to increase the Hence, this software will provide the leads convenience of to the shape of voice outputs for its usage. Although technology has grown Visually impaired persons so it often easily leaps and bounds, the net - hearable to them and also gives especially websites are still inaccessible by Text summary for each input search the visually impaired. whether it's in English or HINDI language. The software provides how to interact with these websites with II. Keywords— Visually impaired; Voice much ease. With the utilization of voice control; automate website and system; blind commands rather than the people; HINDI Inputs. traditional keyboard and mouse, our III. INTRODUCTION - Today there are software provides a replacement nearly 275 million people within the world dimension to access and supply commands that are visually impaired. Although to any website. The technology has grown leaps and bounds, the software will read out the content of the accessibility, especially that of the web and web site and so using system for differently-baled people continue speech to text and text to speech to be far-fetched. during this nowadays, modules together with selenium our more and more things is performed digitally. From movie tickets, ordering food, to booking train tickets people a people are sort of complicated to everything will be done online. for use in day to day or to control voice nearly all of those online assistance also so for increasing the facilities someone has got to use a web convenience of virtual voice assistant we site and a system. Using technologies introduce HINDI language as a voice like this may be a trivial task for input. a number of the screen readers work many people but it's very difficult for only with a selected quite browser and visually impaired people. The system system and a few require the user to internet may be a highly visual kind recollect complex commands thus screen of communication, different "accessibility readers and braille system don't seem to blockers" can hinder differing kinds of be an efficient solution to the matter at internet sites and system functionalities, hand and can't be accustomed access the unlike brick-and-mortar businesses where system thoroughly. The accessibility of accessibility is often made by including some system functions and web ramp for disabled persons. As an example, content regressed because of visual researchers found that 60% of automatic disability. Both of the language and data processing system “had significant impairment problem result in an accessibility issues," while 50% of inconsistent state with regards to its respondents said they were “unable to accessibility. The American Foundation for access information and services through the Blind determined that individuals with government websites and computer.” Thus, visual impairments are over 31% less likely we wanted to return up with a novel way of to report back to hook up with the allowing visually impaired people to access technologies and over 35% less likely to use the system and internet. Although the a PC than people without disabilities. W3C includes a set of recommendations Keeping all the above factors in mind we that stipulate the foundations to be followed came up with the answer of virtual voice when designing an internet site for the assistant. the first objective is to bridge the visually impaired. the foremost challenge in accessibility gap between the developing a stable software is to typical user and also the visually impaired incorporate as few keystrokes as possible individuals also with the language gap and to produce an end-to-end experience between an English tutor and HINDI one with the assistance of voice alone. The for the voice inputs with regards to the inclusion of multiple languages and PC system. The system is blind to the setting the correct pace of the speech when visually impaired and misunderstood to the played back to the user are important person with a language issue, but to not factors to contemplate. To support the make the converse the reality, during widespread usage of the software, a this paper we present an end-to-end voice- vital parameter is that the dependency of the based software for the visually impaired and software on the local environment and non-anglophone humans to enable them to operating systems. While the tech has access the system with minimal or no evolved greatly, the accessibility, difficulties. The user will provide the especially the PC and internet for the commands he wants to execute as a voice differently baled continues to be stagnant. input rather than employing a keyboard. II. Also, for the local Indian or native The software then uses a speech to text module to convert the input speech WCAG 2.0 but the techniques didn't solve either it's in English or in HINDI to the issues. Porter points out that lots of text which can be the command to be editing has already been done if newspapers executed. The command is executed using are produced on Braille for visually web driver and software. Once executed the impaired individuals, but once they are user will have three options: - either to available on the net the individual has the read the complete content of the search selection of what to read, which increases output, read a summary or ask a matter. The the accessibility. A respondent from the second and third options are implemented study conducted stated that “without the using machine learning. Once the voice software there's no access for blind people. input is taken and also the command is JAWS could be a specialized software executed the output is claimed to the user which needs a knowledgeable person to using the text to speech module. Thus, the supply support.” Thus, with respect to these software manages to create the system more studies, it is often inferred that there's a accessible easily, quickly and more desire for a software to access the system effectively for the visually impaired and and also technologies that's much easier for non-anglophone humans. the user to control than existing solutions IV. LITERATURE SURVEY - Conducted a like screen readers. For developing a study to see whether the system provides software to boost system accessibility for opportunities for disabled people and also the visually impaired, Ferati mentions that a with HINDI language to hold out activities “one solution for all model” is insufficient which they were previously unable to try without considering the amount of visual and do or whether it ends up in greater defect when providing customized system social exclusion. It states that there's no experience also with the convenience of known research to work out the language mobility like native HINDI explanations people with disabilities can’t language for inputs. access the system and technologies more V. SYSTEM OVERVIEW AND DESIGN- fluently. On the opposite hand states that The system comprises a modular client the first barrier within the accessibility is server distributed architecture. The system that of economic, theoretical and technical in-capabilities. This thought is seconded by consists of the foremost menu which first Kirsty who states that bad HTML code and runs on the startup of the software and technologies causes an hindrance in so the web site modules. The client accessing the PC or any digital tool for the communicates with the server and back visually impaired Although the globe Wide with the utilization of REST API s, thus the Web Community mentions a listing of guidelines for maintaining a high level of online site modules don't seem to be local accessibility for the visually impaired, it to the client. Throughout the system, the also states that only 50.4% of the issues user communicates with the software via encountered by users were covered by speech-to-text interface. The Google library Success Criteria within the online page of speech-to-text (Speech Recognition) for Accessibility Guidelines 2.0 (WCAG 2.0) and 16.7% of internet sites and systems Python is utilized for this purpose. For implemented techniques recommended in communicating the system’s output to the user likewise as for confirming the user Figure 1 could be a representation of the input, the recognized input is played back system architecture of our software. The to the user using the Python text-to-speech user accesses the software using the library. The modules are written in Python net interface where the speech to text (STT) and make use of Selenium for automation module converts the voice input to text. The of the respective module and exquisite user is then presented with the most menu Soup for scraping the contents of the where they need three options to decide net page. The “Script” component of on from and judge which website they each module consists of the customized require to browse. Accordingly, the module code that entails the features of the net site is invoked with its corresponding speech to contained within the module. for instance, text modules, web driver and machine the Wikipedia module consists of a matter learning module. The output is played to the and Answer and Summary feature along user using text to speech (TTS) module. this Figure 1: System Architecture with the conventional feature of reading can be the overview of the software. out the whole article. the previous is implemented by training a BERT model on the Stanford Question Answering Data set (Squad). The API s that holds the system together are written in Flask. The software is functioning system independent to support hassle free application and usage of the system
Figure 2: BERT model on Squad Dataset
architecture Dataset - we've got used the quality Question Answering Dataset (SQUAD) available to pre train the machine learning model for the question and answers component of the module. The dataset has questions posed by people on Wikipedia where the solution to the question is from within the given excerpt of text on Wikipedia or it's going to be unanswered.
VI. METHODOLOGY -The user first interacts
with the foremost menu of the software once the microcomputer or laptop has been switched on. the foremost menu of the web driver and machine learning module. software is also invoked by either the The output is played to the user using text integrated voice assistant, as an example to speech (TTS) module. this can be the Siri, or by a predefined keyboard shortcut, overview of the software. The main menu being the only real keyboard interaction runs when the software is first opened. required. the foremost menu interface Using the pytts (Python text-to-speech) provides the available options to the user module, the initial set of instructions viz. Installed website modules, pace of the illustrating the choices provided to the audio, accent of the audio. Each of the user. The system takes the user input after net site modules contains a speech-to-text the beep using Google speech-to-text and text-to-speech bundle, a python script python module. The keywords from the that automates the net site and also the voice are then extracted and appropriate features specific to the online site. For response is executed. The user is efficient speech recognition, the user is additionally unengaged to change the voice given a beep the smallest amount bit stages tempo and accent after which he's unengaged to speak. The that suits him/her the simplest. input received and recognized by the system from the user is additionally played back to the user therefore the user can confirm his intended input, to cut back any errors right at that individual stage, thus, enabling some way of editing. The methodology followed to implement three modules - Google, Gmail, Wikipedia - and thus the most menu is described below. The main menu runs when the software is first opened. Using the pytts (Python text- to-speech) module, the initial set of instructions illustrating the alternatives provided to the user. The system takes the user input after the beep using Google speech-to-text python module. Flow diagram for main menu The keywords from the voice are then B. Google Module- recognized user input and extracted and appropriate response is also the input is finalized providing it’s executed. The user is confirmed by the user. additionally unengaged to change the voice This module consists of a python script tempo and accent that suits him/her the that automates only. the website using Selenium and A. Main Menu -The user is then presented delightful Soup. The with the most menu where they need three user can hunt for any query through the options to decide on from and speech-to- choose which website they need to browse. text and text-to-speech interfaces and also Accordingly, the module is invoked with the recognized its corresponding speech to text modules, query is searched with the assistance of Gmail contents using the respectively, so sent with the user’s Beautiful Soup module of python. The confirmation. At each stage, the user search results is absolving to edit and are indexed which enables quick accessing undo any of his inputs. The system repeats of the online the Wikipedia module presents the user page in keeping with the user’s desire, thus novel options saving time, such as summarizing and reading out the as opposition the user reading out the article, and full search provides intelligent answers to queries result that he wishes to pick. using NLP and Machine Reading Comprehension. Once the net page is loaded, the user enters the search query, followed by the confirmation, after which the user is supplied with 3 options- reading out the whole article, reading out the summary of the article, a matter and answer session. The entire article is read by scraping the net page, cleaning the text, and using the text-to- speech module. Summarization of the text is performed using the summary method provided by the Wikipedia python library. For the question answer session, a BERT model on Stanford Question Answering Dataset Flow diagram for Google (SQuaD) is employed. It consists of 100,000 questions with C. Gmail Module-This module consists of a over 50,000 unanswerable questions. python script that starts up BERT is employed for Gmail, logs the user into his/her mailbox Question Answering on SQuAD dataset and provides by: the support for the user to send or read applying two linear transformations to mails. For BERT outputs for every sub token. sending a replacement mail, the system First/second linear transformation for prompts the user to prediction of probability that current sub provide relevant details and after filtering token is start/end position of a solution out noise, The user can then ask any question through selenium the input fields are filled relevant to the subject of the accuracy of 80.88% which is that the article explore for, and also the model proportion of predictions that returns the foremost suitable match anyone of the underside truths answer to the user through text-to-speech. answers exactly, and the F1 score was found to be 88.49%. Results showed that we were able to run our software on the three preferred sites: Google, Gmail and Wikipedia. The software was run on each of them separately. The software could send an email effectively using the commands from the user. The software also provided an accurate answer to the question the user asked on Wikipedia. The software managed to summarize the text in Wikipedia accurately and thus we were able to test and build a software which is able to make the online site easily, quickly and efficiently accessible for the visually impaired. VIII. APPLICATION- Virtual Assistant for the visually impaired acts as a superb support to the visually disabled people to access the online on VII. RESULTS any browser as our software is browser The built-in modules of text to speech independent. They can (pyttsx3) and speech to access the online using their speech so can text (speech recognition library by Google) navigate the in python provides a website using voice commands. The good accuracy and also provide a software will read out the straightforward and quick due to content of the net site to the user thus convert the text. The speech-to-text making the net site more recognized the words with accessible. This feature won't only help the 96.25% accuracy with 4 different voice visually impaired samples each but also allow people to access the containing 20 different inputs during a online with ease and moderate to quiet eliminate the use of hardware devices a bit environment. like the keyboard. The BERT model on Squad dataset for the Virtual Assistant also provides the feature question answering of providing feature within the Wikipedia module answers to a specific question from a given showed an actual Match text of knowledge, thus now the user doesn't have to read the The virtual assistant provides an easy due entire text to figure out to access any website the answer, he/she has to easily input the for the visually impaired. It eliminates the question, the software necessity to remember will understand the answer from the text complex keyboard commands or the use of data on itself using screen readers. The machine learning. The software also assistant isn't only a wonderful due provides a summary of the to interact with the websites but text using machine learning, that the user also an efficient due to do so. The software doesn't must read the works as a entire thing and thus making it easy to steppingstone towards Web 3.0 where access the net site. Thus, everything will work on using machine learning and speech to text voice commands. techniques we make the task of accessing the net site, which was X. FUTURE ENHANCEMENT earlier difficult At present the appliance supports only now super easy, quick and efficient. Thus, commands given in we believe that the English language. We arrange to expand virtual assistants for the visually impaired that and make it available in most of the are the beginning of daily used languages thus people from Web 3.0. all parts of the world can access the net with IX. CONCLUSION-In this paper, we none issue presented a modular solution to We would also wish to form a reinforce web- regular framework which will based accessibility for the visually be plugged to any website and make a impaired. The virtual browser extension thus assistant is functioning system independent making it possible to toggle between the and doesn't rely upon two modes easily, keyboard inputs from the user to maximize especially for educational websites to easy use and aims enable visually impaired to provide a hassle-free experience for the individuals to access online courses a small user. Through speech amount just like the common to text and text to speech interfaces, the user individual. can communicate XI. REFERENCES with and customize the system. We presented the system design and methodology of the three modules that's Pilling, D., Barrett, P. and Floyd, M. (2004). currently Disabled people and the Internet: implemented. The Wikipedia module uses a experiences, barriers and opportunities. BERT model on York, UK: Joseph Rowntree Foundation, the SQuAD dataset to answer user queries unpublished. quickly and Porter, P. (1997) ‘The reading washing accurately. the precise Match was found to machine’, Vine, Vol. 106, pp. 34– 7 JAWS be 80.88%. - https://www.freedomscientific.com/product s/software/jaws/ accessed in April 2020 Ferati, Mexhid & Vogel, Bahtijar & Kurti, Arianit & Raufi, Bujar & Astals, David. (2016). Web accessibility for visually impaired people: requirements and design issues. 9312. 79-96. 10.1007/978-3-319- 459165_6. Power, C., Freire, A.P., Petrie, H., Swallow, D.: Guidelines are only half of the story: accessibility problems encountered by blind users on the web. In: CHI 2012, Austin, Texas USA, 5–10 May 2012, pp. 1–10 (2012) Sinks, S., & King, J. (1998). Adults with disabilities: Perceived barriers that prevent Internet access. Paper presented at the CSUN 1998 Conference, Los Angeles, March. Retrieved January 24, 2000 from the World Wide Web Muller, M. J., Wharton, C., McIver, W. J. (Jr.), & Laux, L. (1997). Toward an HCI research and practice agenda based on human needs and social responsibility. Conference on Human Actors in Computing Systems. Atlanta, Georgia, 22–27 March. Kirsty Williamson, Steve Wright, Don Schauder, Amanda Bow, the internet for the blind and visually impaired, Journal of Computer Mediated Communication, Volume 7, Issue 1, 1 October 2001, JCMC712 Deep Pavlov documentation http://docs.deeppavlov.ai/en/master/features /models/squad.html accessed in April 2020 The website for American foundation for the blind https://www.afb.org/about-afb/what-we- do/afb-consulting/afbaccessibility- resources/challenges-web-accessibility accessed in April 2020 Ryle Zhou, Question answering models for SQuAD 2.0, Stanford University, unpublished. Global data on visual impairments 2010 by World Health Organization.