0% found this document useful (0 votes)
159 views27 pages

Submitted in Partial Fulfilment of The Requirements For The Award of The Degree of

The document describes a technical seminar report on a voice based email system for blind people. It discusses the need for such a system as most existing email services cannot be used independently by visually impaired users. The proposed system uses speech recognition and text-to-speech to allow blind users to compose, send, and receive emails simply through voice commands and audio feedback without needing to see the screen. It aims to make email accessible for even novice blind users without any training. The system architecture, advantages, and future enhancements are discussed in the report.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views27 pages

Submitted in Partial Fulfilment of The Requirements For The Award of The Degree of

The document describes a technical seminar report on a voice based email system for blind people. It discusses the need for such a system as most existing email services cannot be used independently by visually impaired users. The proposed system uses speech recognition and text-to-speech to allow blind users to compose, send, and receive emails simply through voice commands and audio feedback without needing to see the screen. It aims to make email accessible for even novice blind users without any training. The system architecture, advantages, and future enhancements are discussed in the report.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

TECHNICAL SEMINAR REPORT

ON
VOICE BASED EMAIL SYSTEM FOR BLIND
Submitted in

Partial fulfilment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
BY
N. PAVANI
(15D21A05E8)

Department of Computer Science and Engineering


SRIDEVI WOMEN’S ENGINEERING COLLEGE
(Approved by AICTE and Affiliated to JNTU HYD)
V.N.PALLY, Gandipet, Hyderabad-75
2018-2019

i
Department of Computer Science and Engineering
SRIDEVI WOMEN’S ENGINEERING COLLEGE
(Approved by AICTE and Affiliated to JNTU HYD )
V.N.PALLY, Gandipet, Hyderabad-75
2018-2019

CERTIFICATE
This is to certify that the TECHNICAL SEMINAR report entitled “VOICE
BASED EMAIL SYSTEM FOR BLIND” is being submitted by Ms. NEMALIKONDA
PAVANI (15D21A05E8) in partial fulfilment for the award of degree of Bachelor of
Technology in Computer Science and Engineering is a record bonafide work carried out
by them.

UNDER THE COORDINATOR HEAD OF THE

GUIDENCE OF DEPARTMENT

Mrs. D. MADHAVI Mrs. E. Krishnaveni MS. M. Ramasubramanian


Asst. professor Professor HOD

EXTERNAL EXAMINER
ii
ACKNOWLEDGEMENT
First of all, I would like to express my deep gratitude towards my internal guide Mrs. D.
MADHAVI, ASSISTANT PROFESSOR in CSE and E. Krishnaveni, ASSISTANT
PROFESSOR in CSE and DR.TKS. RATISHBABU PROFESSOR in CSE for their support
in completion of my technical project.

I wish to express my sincere thanks to our Dr. M. RAMASUBRAMANIAM


PROFESSOR & HEAD OF DEPARTMENT OF CSE and also to my principal Dr. B.L.
MALLESWARI for providing the facilities to complete the technical seminar.

I would like to thank total all my faculty and friends for the guidance and constant
cooperation who are extended all possible help to complete the task.

Finally, I am very much indebted to my parents for their moral support and
encouragement to achieve goals.

N. PAVANI

(15D21A05E8)

iii
TABLE OF CONTENTS

CHAPTER TOPIC PAGENO

CERTIFICATE

ACKNOWLEDGEMENT

ABSTRACT

1. INTRODUCTION
1.1 LITERATURE SURVEY
1.2 EASE OF SCOPE
1.3 AIM AND OBJECTIVE
1.4 OVERALL DESCRIPTIVE
1.4.1 PROJECT DESCRIPTION
1.4.2 SOFTWARE INTERFACE
1.4.3 HARDWARE INTERFACE
1.4.4 PROJECT FUNCTION
1.4.5 USER CHARACTERISTICS
1.4.6 CONSTRAINTS
2. ARCHITECTURE DESIGN
3. ADVANTAGES &DISADVANTAGES
3.1 ADVANTAGES
3.2 DISADVANTAGES
4. IMPLEMENTATION
5. CONCLUSION
5.1 FUTURE ENHANCEMENTS
6. REFERNCES
LIST OF FIGURES

S.NO FIGURE FIGURE NAME PAGENO


ABSTRACT
In today’s world communication has become so easy due to integration of communication
technologies with internet. However the visually challenged people find it very difficult to
utilize this technology because of the fact that using them requires visual perception. Even
though many new advancements have been implemented to help them use the computers
efficiently no naïve user who is visually challenged can use this technology as efficiently as a
normal naïve user can do that is unlike normal users they require some practice for using the
available technologies. This paper aims at developing an email system that will help even a
naïve visually impaired person to use the services for communication without previous
training. The system will not let the user make use of keyboard instead will work only on
mouse operation and speech conversion to text. Also this system can be used by any normal
person also for example the one who is not able to read. The system is completely based on
interactive voice response which will make it user friendly and efficient to use.
CHAPTER-1

INTRODUCTION
Internet is considered as a major storehouse of information in today’s world. No single work
can be done without the help of it. It has even become one of the defector methods used in
communication. And out of all methods available email is one of the most common forms of
communication especially in the business world. However not all people can use the internet.
This is because in order to access the internet you would need to know what is written on the
screen. If that is not visible it is of no use. This makes internet a completely useless
technology for the visually impaired and illiterate people. Even the systems that are available
currently like the screen readers TTS and ASR do not provide full efficiency to the blind
people so as to use the internet. As nearly 285 million people worldwide are estimated
visually impaired it become necessary to make internet facilities for communication usable
for them also. Therefore we have come up with this project in which we will be developing a
voice based email system which will aid the visually impaired people who are naive to
computer systems to use email facilities in a hassle free manner. The users of this system
would not need to have any basic information regarding keyboard shortcuts or where the keys
are located. All functions are based on simple mouse click operations making it very easy for
any type of user to use this system. Also the user need not worry about remembering which
mouse click operation he/she needs to perform in order to avail a given service as the system
itself will be prompting them as to which click will provide them with what operations. The
most common mail services that we use in our day today life cannot be used by visually
challenged people. This is because they do not provide any facility so that the person in front
can hear out the content of the screen.

1.1 LITERATURE REVIEW


The main aim of our application is to help visually impaired people to enjoy the benefits of
email and should be self-sufficient is sending and receiving them independently. There is a
working module of the application which is working on instructions given specifically in
English. For the future scope it can also design the Voice Based Email Application (VMAIL)
working with other languages.

The basic function of the application is to provide user with a simple way to perform email
operations on his phone without compromising his security. The application is totally voice-
based allowing blind person to send and receive emails on the go. It converts the user spoken
voice into text and performs the action accordingly. It consists of voice confirmation i.e.,
confirming if the user has actually spoken the recognized text or not, which minimizes the
errors involved.

Components:

1) Authentication: Since users tend to forget their passwords or simply use weak passwords
that allow an adversary to break into their email accounts, the application makes use of
fingerprints. The Secure Hash Algorithm is used to hash the password and store the hash
value in the database instead of the password itself, to enhance security. SQLite is a software
library that implements a self-contained, server-less, zero-configuration, transactional SQL
database engine. The Java Mail Application Programming Interface (API) provides a
platform independent and protocol-independent framework to build mail and messaging
applications. The Java Mail API provides a set of abstract classes defining objects that
comprise a mail system. It is an optional package (standard extension) for reading,
composing, and sending electronic messages. Simple Mail Transfer Protocol (SMTP) is used
when email is delivered from an email client to an email server or when email is delivered
from one email server to another. Post Office Protocol (POP) allows a client to download an
email from mail server. Internet Message Access Protocol (IMAP) is an Internet standard
protocol used by e-mail clients to retrieve e-mail messages from a mail server over a TCP/IP
connection. IMAP is defined by RFC 3501.

2) Navigation: Here, the user will have to use certain keywords which will perform certain
actions. The keywords like: Compose, Received Mails, Sent Mails, Go Back.

3) Speech to text (STT): here whatever we speak is converted to text. Their will a small icon
of microphone on whose clicking the user had to speak and the speech will be converted to
text format, which the naked people would see and read.

4) Text to speech: Here the method is full opposite of STT. This method, converts the text
format of the emails to synthesized speech.

1.2 EASE OF SCOPE

For people who can see, emailing is not a big deal, but for people who are not blessed with
gift of vision it postures a key concern because of its intersection with many vocational
responsibilities. This voice based email system has great application as it is used by blind
people as they can understand where they are. E.g. whenever cursor moves to any icon on the
website say Register it will sound like “Register Button”. There are many screen readers
available. But people had to remember mouse clicks. This system will reduce this problem as
mouse pointer would read out where he/she lies. This system focuses more on user
friendliness of all types of persons including regular persons, visually compromised people as
well as illiterate. This system makes the disabled people feel like a normal user. They can
hear the recently received mails to the Inbox, as well as the IVR technology proves very
effective for them in the terms of guidance.

1.3 AIM AND OBJECTIVE

The project aims to develop a voice based email system that would help blind people to
access email in a hassle free manner with the help of a smart watch. The system will not let
the user make use of the keyboard instead will work on speech recognition. In today’s age
much of the communication takes place through internet .In order to make the visually
challenged person take the benefits of the internet we come up with our project of voice
based email system through smart watch. The smart watch will recognize the speech and
convert that into text hence user friendly for them. It will be connected to internet via
Bluetooth or wifi-hotspot or stand alone internet connection so that the respective email can
be sends to the receiver. Arduino smart-watch processor will be implemented so as to get the
access of Bluetooth, wifi and battery status.

1.4. OVERALL DESCRIPTION


1.4.1 Project perspective

To provide the user friendly system to all the visually impaired peoples. To help them to
moving towards in the challenging world of internet, to provide them a facility to use these
technologies, through this they have a chance to overcome their visual disability.

1.4.2 Software Interface

Front End: JSP, java.

Back End: SQL.

The proposed system has 4 stages of implementation namely

1. System and mailing server

2. Traditional mailing system

3. Voice based command detection

4. Voice based mailing system


1.4.3 Hardware Interface

1. Pentium core processor.

2. 512 mb RAM.

3. Microphone.

1.4.4 Project Function

This voice mail system is developing to help the visually impaired people to make feel them a
normal user. Voice interactions can escape the physical limitations on keypad and help user
to accessing mails easily. This system can used by both visually abled or disabled persons.
The proposed system is a desktop application that allows sending and receiving of mails via
the internet. We use artificial intelligence to benefit the blind to make use of the advanced
technology for their growth and improvement. The proposed system is a desktop application
which makes use of artificial intelligence that makes it cost-effective and easy to maintain.

1.4.5 User Characteristics

 Automated mailing system instead of using keypads and mouse.


 User friendly.
 Easy to access.
 This can be support user on any time.

1.5 Constraints

The information of all the users must be stored in a database that is accessible by the
Administrator. Voice Mail system facility is available to all the users 24 hours a day.

User can access their account from any computer and can send or retrieve messages
previously stored.

CHAPTER-2
ARCHITECHTURE DESIGN:
The design of this project is divided into three phrases as described below:

A. User Interface Design: The user interface is designed using Java eclipse (Html, CSS, and
JavaScript). The website focuses more on efficiency in understanding the Interactive voice
response(IVR) rather than the look and feel of the system as the system is primarily
developed for the blind people3 to whom the look and feel won’t be of that primary
importance as the efficiency of understanding the prompting would be.
B. Database Design: Our system maintains a database for user validation and storing mails of
the user. The database is used to store the information of user like username, password ,his
mails .When user request for any information then information is retrieved from database.
There are total of five tables. The relationship between them is assigned after much
consideration. The implementation part of Fig 1.

C. System Design: Fig. 2 depicts the complete system design. It is the level-2 data flow
diagram which gives complete detailed flow of events in the system. As we can see all
operations are performed by mouse click events only. Also at some places voice input is
required.

FIG NO -2 ARCHITECTURE

1. STT: Accepts speech from user and producer text.


2. Language Understanding Component: Extracts semantics from a text string by using a
pre-specified grammar.
3. Context Interpreter: Enhances the semantics from the Language Understanding
Module by obtaining context information from a dialog history. For Example: the
Context interpreter may replace a pronoun to which the pronoun referred.
4. Dialog Manager: Prompts the user for input, makes sense of the input, and determines
what to do next according to instructions in the dialog script specified.
5. Language Generator: Accepts text from the dialog manager and prepares it to the user
as spoken voice via text-to-speech synthesizer(TTS).
6. Text-to-speech Synthesizer(TTS): Accepts text from the Language Generator and
produces acoustic signals which the user hears as a human-like voice.

METHODOLOGY
 The Software Development Life Cycle includes models such as Waterfall Model,
Prototype Model, and Object-oriented Model, etc. for developing the correct software.
The Waterfall Model is the earliest method of structured system development.

Fig. 2: Waterfall Model


The software model used by our system is the waterfall model. Waterfall model is a
Systematic and sequential approach to the software development. This include syst-
-em engineering and modelling which establishes requirements for all the system
elements and allocating some subset of these requirements to software. System
engineering and analysis encompass requirement gathering at the system level with
small amount of top-level design.
  A. Problem definition:
This is the very first stage to develop any project. It actually defines the aim and the
concept of the project. The aim of “Voice based email system” is to envisage
providing effective accessing capabilities to people having reading disabilities and
visual impairments.
 B. Analysis:
Existing systems such as screen readers and ASR are analyzed. Care is taken to
ensure that all the drawbacks of the existing browsers were overcome.
C. Design and Coding:
It is necessary to get the logical flow of the software.
  D. Testing:
Testing will involve working of individual module and after integration.
 E.  Maintenance:
Maintenance of the system to check if it is accepting all speech commands as
expected and dynamic additions to the grammar are not causing any problems.

CHAPTER-3
ADVANTAGES AND DISADVANTAGES

3.1 ADVANTAGES
 The messages may be created in the user’s voice mailbox and then they are
transported to another voice mailbox , Voice messaging is a viable alternative
to e-mail and fax systems as a business communicating tool , The voice-
messaging system improves the public relations in the companies .
 The voice-messaging systems include many services such as the voice
messages , the voice-mail distribution lists , fax-in and fax-on demand in the
mailbox , the interactive voice response , and the voice forms that any user can
access anywhere in the world .

FIG NO-3.1 Voicemail


o Voice mail provides twenty-four-hour-a-day answering capability, It can
enhance the efficiency and boost the job productivity , It can save and generate
the money for the company , It can improve the accuracy of message content
and it can enable one to send multiple messages to the people .
o Voice mail can allow the messages to be easily updated , It can reduce the
need for administrative / receptionist / secretarial support , It can serve as an
important medium for business communication , It can make transferring of
phone calls from department to department easier and more efficient.
o You no longer miss any calls when the people leave the messages on your
voice mail. You can listen to your messages. You will remember your
schedules and it will keep you in the loop.
o The complete system is based on IVR- interactive voice response. When using
this system the computer will be prompting the user to perform specific
operations to avail respective services and if the user needs to access the
respective services then he/she needs to perform that operation.
o One of the major advantages of this system is that user won’t require to use
the keyboard. All operations will be based on mouse click events. Now the
question that arises is that how will the blind users find location of the mouse
pointer.
o Able to read large paragraphs.
o It offers a range of different accents and voices.
o Provide significant help for people with eyes disabilities.
o It can be adapted easily to say whatever users want them to say.
o Interacting application where user’s eyes and hands are busy.
o Your callers will be able to get in touch with you by leaving you a voice mail
message Instead of calling you until they get hold of you, they can leave the
message, their name, and the phone number .
o A well-implemented voice mail system can provide the benefits to the
customer and the business. The customers can leave a message at any time,
without waiting on hold or navigating the system.
3.2 DISADVANTAGES
 Some people cannot use the voice-messaging systems. The voice-messaging
system is less economical for the smaller companies. Some people do not see
any benefit in having a voice mailing system in place. It will be a nuisance for
them.
 Some people do not like that they cannot reach a live person, when there are
too many voice-messaging options that may make it difficult for people to
recall which options they used previously.
 If you miss a lot of calls that you will be flooded by many voicemail
messages. Listening to the voice mail is very tiring and time consuming,  you
can use your time doing more important and urgent tasks .
 You will get tired of listening to the messages and end up deleting the
messages without listening to them. This causes you to miss the important
messages.
 The message recording systems can fall prey to the hackers who phish the
passwords through spam email or social engineering. They can access to the
messages, They can take the personally identifiable or proprietary business
information.
 Outdoor communication is becoming a harder task for blind and visually
impaired people in the complex urban world.
 Advances in technology are causing the blind to fall behind, sometimes even
putting their lives at risk.
 Although there are many screen readers available then also blind people face
some minor difficulties.
 A User is new to computer can therefore not use this service as they are not
aware of the key locations.
 Many of the mail services in today’s world are of no use to visually impaired
people because they do not provide audio feedback.
 Impaired people had to remember and recognize the characters of the
keyboard which was very difficult on their path. The problem was later solved
using Braille keyboard, but these keyboard were very costly.
 Visual Layout: A screen reader cannot survey the entirely of a screen as a
visual user may do. They can quickly realize a webpage and realize how a
page is been organized.

CHAPTER-4

IMPLEMENTATION

This system is currently being developed by us. The following are modules are the ones that
are already developed. Their working is as follows:

A. Registration:
This is the first module of the system. Any user who wishes to use the system should first
register to obtain username and password. This module will collect complete information of
the user by prompting the user as to what details needs to be entered. The user will need to
speak up the details to which the system will again confirm by prompting alphabetically. If
the information is not correct user can re-enter else the prompt will specify the operation to
be performed to confirm.

B. Login: Once the registration is done the user can login to the system. This module will ask
the user to provide the username and password. This will be accepted in speech. Speech
conversion will be done to text and user will be told to validate whether the details are
entered correctly or not. Once the entry is done correctly database will be checked for entry.
If the user is authorized it will be directed to homepage.
C. Forgot Password: In case where an authorized user forgets the password and thus is not
able to login he/she can select forgot password module. In this module the user will be first
told to enter username. According to username the security question will be searched in
database. This is the question provided at time of registration. The question will be spoken
out by the computer. The user should in turn specify the answer that was provided by him/her
during registration. If both get matched, user is given option to change password.
D. Home Page: The user is redirected to this page once log in done successfully. From this
page now the user can perform operations that the user wishes to perform. The options
available are: 1. Inbox 2. Compose 3. Sent mail 4. Trash Prompting will provide the mouse
click operation that needs to be performed for the required service. The double right click
event is specifically reserved to log out of the system at any time the user wants to. This will
be specified by the prompt right at the beginning after login.
All these functionalities has been implemented. The modules given below are to be included
in the system and will be implemented as a part of the proposed system. The complete
walkthrough of this system is given as follows:

E. Compose mail: This is one of the most important options provided by the mail services.
The functionality of compose mail option would not match the already existing mail system.
Since the system is for visually challenged people and keyboard operations are completely
avoided composing mail would only be done on voice input and mouse operations. No typed
input will be required. User can directly record message that needs to be propagated and can
send it. This voice massage will go in form of attachment. The receiver can hear the
recording and get the message user wanted to send. User would not require attaching the file.
Record option will be provided in the compose window itself. Once recorded it will confirm
whether the recording is perfect or not by letting the user hear it and if the user confirms it
will be automatically attached to the mail.

FIG NO -4.1. COMPOSE MAIL


F. Inbox: This option helps the user view all the mails that has been received to his/her
account. The user can listen to mails he/she wants to by performing the click operation
specified by the prompt. In order to navigate through different mails prompt will specify
which operations to perform. Each time the mail is selected the user will be prompted as
whom the sender is and what is the subject of that particular mail. Accordingly user can
decide whether the mail needs to be read or not or it should be deleted.

FIG NO -4.1. INBOX

G. Sent mail: This option will keep a track of all the mails sent by the user. If the user wants
to access these mails, this option will provide them with their needs. In order to access the
sent mails user will need to perform the actions provided by the prompt to navigate between
mails. When the control lands on particular mail user will be prompted as who the receiver
was and what is the subject of the mail. This will help the user in efficiently understanding
and extracting the required mail.

4.1 IMPLEMENTATION PROCESS

This project is designed using some set of APIs. SNMTP (Simple Network Mail
Transmission Protocol) has been used for mailing servicing. Voice Typing and Dictation
Speech Interaction Models are designed using the Windows 7 LVCSR dictation engine.
In order to control speech accuracy, we turned off the (default) MLLR acoustic adaptation.
Error Correction Methods are implemented using the Windows 7 API’s and Windows
Presentation Foundation (WPF).

FIG NO -4.1 IMPLEMENTATION PROCESS

4.2: Interactive Voice Response (IVR)

Interactive voice response (IVR) is a technology that allows a computer to interact


with humans through the use of voice and DTMF tones input via a keypad. In
telecommunications, IVR allows customers to interact with a company’s host system
via a telephone keypad or by speech recognition, after which services can be inquired
about through the IVR dialogue. IVR systems can respond with pre-recorded or
dynamically generated audio to further direct users on how to proceed. IVR systems
deployed in the network are sized to handle large call volumes and also used for
outbound calling, as IVR systems are more intelligent than many predictive dialer
systems.

IVR systems can be used for mobile purchases, banking payments and services, retail
orders, utilities, travel information and weather conditions. A common misconception
refers to an automated attendant as an IVR. The terms are distinct and mean different
things to traditional telecommunications professionals—the purpose of an IVR is to
take input, process it, and return a result, whereas that of an automated attendant is to
route calls. The term voice response unit (VRU) is sometimes used as well. DTMF
decoding and speech recognition are used to interpret the caller's response to voice
prompts. DTMF tones are entered via the telephone keypad.

Other technologies include using text-to-speech (TTS) to speak complex and dynamic
information, such as e-mails, news reports or weather information. IVR technology is
also being introduced into automobile systems for hands-free operation. TTS is
computer generated synthesized speech that is no longer the robotic voice
traditionally associated with computers. Real voices create the speech in fragments
that are spliced together (concatenated) and smoothed before being played to the
caller. Another technology which can be used is using text to speech to talk advanced
and dynamic data, such as e-mails, reports and news and data about weather. IVR
used in automobile systems for easy operations too. Text To Speech is system
originated synthesized speech that’s not the robotic voice historically related to
computer. Original voices produce the speech in portions that are joined together and
rounded before played to the caller.

4.3 Speech Recognition

Speech recognition is the inter-disciplinary sub-field of computational linguistics that


develops methodologies and technologies that enables the recognition and translation
of spoken language into text by computers. It is also known as "automatic speech
recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). It
incorporates knowledge and research in the linguistics, computer science, and
electrical engineering fields. Some speech recognition systems require "training" (also
called "enrollment") where an individual speaker reads text or isolated vocabulary
into the system.
The system analyzes the person's specific voice and uses it to fine-tune the
recognition of that person's speech, resulting in increased accuracy. Systems that do
not use training are called "speaker independent" systems. Systems that use training
are called "speaker dependent".
Speech recognition applications include voice user interfaces such as voice dialing
(e.g. "Call home"), call routing (e.g. "I would like to make a collect call"), domotic
appliance control, search (e.g. find a podcast where particular words were spoken),
simple data entry (e.g., entering a credit card number), preparation of structured
documents (e.g. a radiology report), speech-to-text processing (e.g., word processors
or emails), and aircraft (usually termed Direct Voice Input).
The term voice recognition or speaker identification refers to identifying the speaker,
rather than what they are saying. Recognizing the speaker can simplify the task of
translating speech in systems that have been trained on a specific person's voice or it
can be used to authenticate or verify the identity of a speaker as part of a security
process.
From the technology perspective, speech recognition has a long history with several
waves of major innovations. Most recently, the field has benefited from advances in
deep learning and big data. The advances are evidenced not only by the surge of
academic papers published in the field, but more importantly by the worldwide
industry adoption of a variety of deep learning methods in designing and deploying
speech recognition system. Speech recognition works using algorithms through
acoustic and language modeling.
Acoustic modeling represents the relationship between linguistic units of speech and
audio signals; language modeling matches sounds with word sequences to help
distinguish between words that sound similar. Often, hidden Markov models are used
as well to recognize temporal patterns in speech to improve accuracy within the
system. The most frequent applications of speech recognition within the enterprise
include call routing, speech-to-text processing, voice dialing and voice search.
While convenient, speech recognition technology still has a few issues to work
through, as it is continuously developed. The pros of speech recognition software are
it is easy to use and readily available. Speech recognition software is now frequently
installed in computers and mobile devices, allowing for easy access. The downside of
speech recognition includes its inability to capture words due to variations of
pronunciation, its lack of support for most languages outside of English and its
inability to sort through background noise. These factors can lead to inaccuracies.
Speech recognition performance is measured by accuracy and speed. Accuracy is
measured with word error rate. WER works at the word level and identifies
inaccuracies in transcription, although it cannot identify how the error occurred.
Speed is measured with the real-time factor. A variety of factors can affect computer
speech recognition performance, including pronunciation, accent, pitch, volume and
background noise. It is important to note the terms speech recognition and voice
recognition are sometimes used interchangeably. However, the two terms mean
different things. Speech recognition is used to identify words in spoken language.
Voice recognition is a biometric technology used to identify a particular individual's
voice or for speaker identification.

4.4 Speech to text Converter

The process of converting spoken speech or audio into text is called speech to text
converter. The process is usually called speech recognition. The Speech recognition is
used to characterize the broader operation of deriving content from speech which is
known as speech understanding. We often associate the process of identifying a
person from their voice, that is voice recognition or speaker recognition so it is wrong
to use this term for it. As shown in the above block diagram speech to text converters
depends mostly on two models 1.Acoustic model and 2.Language model. Systems
generally use the pronunciation model. It is really imperative to learn that there is
nothing like a universal speech recognizer.
If you want to get the best quality of transcription, you can specialize the above
models for the any given language communication channel. Likewise another pattern
recognition technology, speech recognition can also not be without error. Accuracy of
speech transcript deeply relies on the voice of the speaker , the characteristic of
speech and the environmental conditions. Speech recognition is a tougher method
than what folks unremarkably assume, for a personality’s being. Humans are born for
understanding speech, not to transcribing it, and solely speech that’s well developed
will be transcribed unequivocally. From the user's purpose of read, a speech to text
system will be categorized based in its use.

4.5: Speech Synthesis(TTS)

Speech synthesis is the synthetic production of speech. A automatic data handing out
system used for this purpose is called as speech synthesizer, and may be enforced in
software package and hardware product. A text-to-speech (TTS) system converts
language text into speech, alternative systems render symbolic linguistic
representations. Synthesized speech can be created by concatenating pieces of
recorded speech that are stored in a database. Systems differ in the size of the stored
speech units; a system that stores phones or diphones provides the largest output
range, but may lack clarity. For specific usage domains, the storage of entire words or
sentences allows for high-quality output. Alternatively, a synthesizer can incorporate
a model of the vocal tract and other human voice characteristics to create a completely
"synthetic" voice output.

The quality of a speech synthesizer is judged by its similarity to the human voice and
by its ability to be understood clearly. An intelligible text to speech program permits
individual with ocular wreckage or reading disabilities to concentrate to written words
on a computing device. Several computer operational systems have enclosed speech
synthesizers since the first nineteen nineties years.
The text to speech system is consist of 2 parts:-front-end and a back-end. The front-
end consist of 2 major tasks. Firstly, it disciple unprocessed text containing symbols
like numbers and abstraction into the equivalent of written out words. This method is
commonly known as text, standardization, or processing. Front end then assigns
spoken transcriptions to every word, and divides and marks the text into speech units,
like phrases, clauses, and sentences. The process of assigning phonetic transcriptions
to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic
transcriptions and prosody information together make up the symbolic linguistic
representation that is output by the front-end. The back-end—often referred to as the
synthesizer—then converts the symbolic linguistic representation into sound. In
certain systems, this part includes the computation of the target prosody (pitch
contour, phoneme durations), which is then imposed on the output speech.
FIG NO – 4.6. TEXT-TO-SPEECH
Text-to-speech (TTS) is a type of speech synthesis application that is used to create a
spoken sound version of the text in a computer document, such as a help file or a Web
page. TTS can enable the reading of computer display information for the visually
challenged person, or may simply be used to augment the reading of a text message.
Current TTS applications include voice-enabled e-mail and spoken prompts in voice
response systems. TTS is often used with voice recognition programs. There are
numerous TTS products available, including Read Please 2000, Proverb Speech Unit,
and Next Up Technology's Text Aloud. Lucent, Elan, and AT&T each have products
called “Text-to-Speech”.

In addition to TTS software, a number of vendors offer products involving hardware,


including the Quick Link Pen from WizCom Technologies, a pen-shaped device that
can scan and read words; the Road Runner from Ostrich Software, a handheld device
that reads ASCII text; and DecTalk TTS from Digital Equipment, an external
hardware device that substitutes for a sound card and which includes an internal
software device that works in conjunction with the PC's own sound card.
CHAPTER-5
CONCLUSION

Voice based email system helps visually challenged people to access email services
efficiently. It has been observed that nearly about 60% total blind population across the world
is present in India. This system overcomes difficulties faced by visually impaired people as
well as illiterate people. This will reduce the drawbacks of existing system such as software
load of using screen readers and Automatic Speech Recognizer (ASR). The system will be
guiding the user what needs to be performed for obtaining desired results by prompting.
Hence this reduces the user’s load of remembering keyboard shortcuts and location of keys.
The user needs to follow the instructions given by the system.

5.1 FUTURE ENHANCEMENTS

The system we are developing will be working only on desktops. As use of mobile phones is
increasing day-to-day, there is a need to include this facility as an application in mobile
phones also. Also security features can be implemented during login phase to make the
system more secure.
CHAPTER-6
REFERENCES
1. TheWHOwebsite.[Online].Available:
http://www.who.int/mediacentre/factsheets/fs282/en/
 
2. The Radicati website. [Online]. Available: http://www.radicati.com/wp/wp-
content/uploads/2014/01/Email-Statistics-Report-2014-2018-Executive-Summary.pdf.
3.  http://www.match-project.org.uk/resources/tutorial/Speech_Language/
Speech_Recognition/Rec_6.html
 
4. http://webaim.org/articles/visual/blind
 
5. https://developers.google.com/gmail/api/?hl=en. 
 
6. T. Dasgupta and A. Basu. A speech enabled indian language text to braille
transliteration system. In Information and Communication Technologies and
Development (ICTD), 2009 International Conference on, pages 201 IEEE, 2009
Jagtap Nilesh, Pawan Alai, Chavhan Swapnil and Bendre M.R.
7. “Voice Based System in Desktop and Mobile Devices for Blind People”. In
International Journal of Emerging Technology and Advanced Engineering (IJETAE),
2014 on Pages 404-407(Volume 4, issue 2).
 
8. G. Shoba, G. Anusha, V. Jeevitha, R. Shanmathi. “AN Interactive Email for Visually
Impaired”. In International Journal of Advanced Research in Computer and
Communication Engineering (IJARCCE), 2014 on Pages 5089-5092.(Volume 3, Issue
1).

You might also like