0% found this document useful (0 votes)
26 views25 pages

Bhavika Voice XML

The document discusses Voice XML, which is a standard markup language for building voice user interfaces. It describes how Voice XML works and is defined, provides an overview of its architecture and concepts, and discusses some applications of Voice XML.

Uploaded by

BHAVIKA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views25 pages

Bhavika Voice XML

The document discusses Voice XML, which is a standard markup language for building voice user interfaces. It describes how Voice XML works and is defined, provides an overview of its architecture and concepts, and discusses some applications of Voice XML.

Uploaded by

BHAVIKA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

VOICE XML

A Seminar Report
Seminar Report submitted in partial fulfillment of the requirements for the award of
the degree of B.Tech. in Computer Science & Engineering under
Bikaner Technical University
by

Bhavika
University Roll No.: 20EMCCS026

Under the Guidance of


Manoj Kumar Mishra
(Asst. Professor, Dept. of CSE )

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


MITRC, ALWAR
2023
CERTIFICATE

This is to certify that the Seminar entitled Voice XML presented by Bhavika bearing University
Roll No. 20EMCCS026 of Computer Science & Engineering in MITRC has been completed
successfully.

This is in partial fulfillment of the requirements of Bachelor Degree in Computer Science &
Engineering under Bikaner Technical University, Bikaner, Rajasthan.

I wish her success in all future endeavors.

Manoj Kumar Mishra


(Asst. Professor, Department of CSE / AI & DS)
ACKNOWLEDGEMENTS
I would like to express my deep and sincere gratitude to my guide, Asst. Prof. Manoj Kumar
Mishra of CSE for his unflagging support and continuous encouragement throughout the seminar
work. Without his guidance and persistent help this report would not have been possible.

I express my sincere gratitude to Dr. J. R. ArunKumar, HOD of Computer Science Engineering and
Faculty and staffs for their support and guidance.

Bhavika
Department of CSE
University Roll no. - 20EMCCS026

i
ABSTRACT

VoiceXML is the standard scripting language for rendering web pages over the telephone.
Developing costs of an interactive phone application have changed dramatically with the new
markup language. VoiceXML builds on the basic concept and rules set by XML. Interactive
applications contain synthesized speech, pre-recorded audio, grammars defining words that could be
recognised, and DTMF key input. By saying something or pressing the keypad on the phone the user
transitions between different pages. VoiceXML could be used in many different ways. You can
integrate it with your web page, letting people access it through a phone. It is simple to create
services such as booking a ticket or looking up when the bus leaves. It can be used to create a
voicemail from your phone or having your regular e-mail be read to you. Some vendors of voice
gateways even offer SMS add-ons. The neat thing is that VoiceXML works both with the traditional
PSTN system and the new technology of Voice over IP (VoIP).

This report describes how VoiceXML is defined and how it works. It contains the concept of the
VoiceXML with the architectural models and implementation of the VoiceXML with some of its
application in our daily life.

Signature of the Student


Name: Bhavika
University Roll No.: 20EMCCS026
Branch: CSE
Semester: VII
Section: A
Date:
Table of Contents
ACKNOWLEDGEMENTS............................................................................................................i
ABSTRACT..........................................................................................................................................ii
LIST OF TABLES ........................................................................................................................iii
LIST OF FIGURES.......................................................................................................................iv

1. Introduction to VoiceXML.........................................................................................................1
1.1 Basic Overview.......................................................................................................................1
1.2 History....................................................................................................................................2
1.3 Goals of Voice Xml................................................................................................................3
2. Creating a basic voiceXML document......................................................................................4
2.1 Voice Xml elements................................................................................................................4
3. Architectural model of VoiceXML............................................................................................6
3.1 Principle of designs.................................................................................................................7
4. Concepts of VoiceXML...............................................................................................................8
4.1 Concepts in Voice Xml...........................................................................................................9
4.2 Supported audio forma..........................................................................................................10
4.3 Applications of VoiceXml.....................................................................................................13
5. Voice User Interface using VoiceXML....................................................................................15
5.1 Interaction..............................................................................................................................15
5.2 Dialogue Initiative.................................................................................................................16

CONCLUSIONS............................................................................................................................17
REFERENCE ................................................................................................................................18
LIST OF TABLES

Table 2.1 VoiceXML Elements ……………………………………………………………………4


Table 4.1 Supported Audio file formats…………………………………………………………...10
LISTS OF FIGURES

Figure 3.1 Architectural model of VoiceXML……………………………………………………….6


Figure 4.1 Transitioning between documents in an application……………………………………...8
Figure 5.1 Voice User Interface……………………………………………………………………..15
CHAPTER - 1
INTRODUCTION TO VOICE XML

Voice XML is developed as a standard markup language for delivering and processing voice
dialogs. Voice XML applications include automated driving assistance,voice access to email, voice
directory access and other services. Voice XML pages are transported online via the HTTP protocol.

There are two basic Voice XML file types:

 Static: Hard coded by the application developer


 Dynamic: Generated by the server in response to client requests.

1.1 BASIC OVERVIEW

VoiceXML is a language for creating voice-user interfaces, particularly for the telephone. It uses
speech recognition and touchtone (DTMF keypad) for input, and pre-recorded audio and text-to-
speech synthesis (TTS) for output. It is based on the Worldwide Web Consortium's (W3C's)
Extensible Markup Language (XML), and leverages the web paradigm for application development
and deployment. By having a common language, application developers, platform vendors, and tool
providers all can benefit from code portability and reuse.

With VoiceXML, speech recognition application development is greatly simplified by using familiar
web infrastructure, including tools and Web servers. Instead of using a PC with a Web browser, any
telephone can access VoiceXML applications via a VoiceXML "interpreter" (also known as a
"browser") running on a telephony server. Whereas HTML is commonly used for creating graphical
Web applications, VoiceXML can be used for voice-enabled Web applications.

There are two schools of thought regarding the use of VoiceXML:

1. As a way to voice-enable a Web site, or


2. As an open-architecture solution for building next-generation interactive voice response
telephone services.

One popular type of application is the voice portal, a telephone service where callers dial a phone
number to retrieve information such as stock quotes, sports scores, and weather reports. Voice
portals have received considerable attention lately, and demonstrate the power of speech
recognition-based telephone services. These, however, are certainly not the only application but also

1
for VoiceXML. Other application areas, including voice-enabled intranets and contact centers,
notification services, and innovative telephony services, can all be built with VoiceXML.

By separating application logic (running on a standard Web server) from the voice dialogs (running
on a telephony server), VoiceXML and the voice-enabled Web allow for a new business model for
telephony applications known as the Voice Service Provider. This permits developers to build phone
services without having to buy or run equipment.

While originally designed for building telephone services, other applications of VoiceXML, such as
speech-controlled home appliances, are starting to be developed.

1.2 HISTORY

VoiceXML has its roots in a research project called Phone Web at AT&T Bell Laboratories. After
the AT&T/Lucent split, both companies pursued development of independent versions of a phone
markup language.

Lucent's Bell Labs continued work on the project, now known as TelePortal. The recent research
focus has been on service creation and natural language applications.

AT&T Labs has built a mature phone markup language and platform that have been used to
construct many different types of applications, ranging from call center-style services to consumer
telephone services that use a visual Web site for customers to configure and administer their
telephone features. AT&T's intent has been twofold. First, it wanted to forge a new way for its
business clients to construct call center applications with AT&T-provided network call handling.
Second, AT&T wanted a new way to build and quickly deploy advanced consumer telephone
services, and in particular define new ways in which third parties could participate in the creation of
new consumer services.

Motorola embraced the markup approach as a way to provide mobile users with up-to-the-minute
information and interactions. Given the corporate focus on mobile productivity, Motorola's efforts
focused on hands-free access. This led to an emphasis on speech recognition rather than touch-tones
as an input mechanism. Also, by starting later, Motorola was able to base its language on the
recently-developed XML framework. These efforts led to the October 1998 announcement of the
VoxML™ technology. Since the announcement, thousands of developers have downloaded the
VoxML language specification and software development kit.

2
There has been growing interest in this general concept of using a markup language to define voice
access to Web-based applications. For several years Netphonic has had a product known as Web-on-
Call that used an extended HTML and software server to provide telephone access to Web services;
in 1998, General Magic acquired Netphonic to support Web access for phone customers. In October
1998, the World Wide Web Consortium (W3C) sponsored a workshop on Voice Browsers. A
number of leading companies, including AT&T, IBM, Lucent, Microsoft, Motorola, and Sun,
participated.

Most recently, IBM has announced SpeechML, which provides a markup language for speech
interfaces to Web pages; the current version provides a speech interface for desktop PC browsers.

1.3 GOALS OF VOICE XML

VoiceXML’s main goal is to bring the full power of web development and content delivery to voice
response applications, and to free the authors of such applications from low-level programming and
resource management. It enables integration of voice services with data services using the familiar
client-server paradigm. A voice service is viewed as a sequence of interaction dialogs between a
user and an implementation platform. The dialogs are provided by document servers, which may be
external to the implementation platform. Document servers maintain overall service logic, perform
database and legacy system operations, and produce dialogs. A VoiceXML document specifies each
interaction dialog to be conducted by a VoiceXML interpreter. User input affects dialog
interpretation and is collected into requests submitted to a document server. The document server
may reply with another VoiceXML document to continue the user’s session with other dialogs.

VoiceXML is a markup language that:

 Minimizes client/server interactions by specifying multiple interactions per document.


 Shields application authors from low-level, and platform-specific details.
 Separates user interaction code (in VoiceXML) from service logic (CGI scripts).
 Promotes service portability across implementation platforms. VoiceXML is a common
language for content providers, tool providers, and platform providers.
 Is easy to use for simple interactions, and yet provides language features to support complex
dialogs.

3
CHAPTER - 2
CREATING A BASIC XML DOCUMENT

VoiceXML is an extensible markup language (XML) for the creation of automated speech
recognition (ASR) and interactive voice response (IVR) applications. Based on the XML
tag/attribute format, the VoiceXML syntax involves enclosing instructions (items) within a tag
structure in the following manner:

< element_name attribute_name="attribute_value">

......contained items......

< /element_name>

A VoiceXML application consists of one or more text files called documents. These document files
are denoted by a ".vxml" file extension and contain the various VoiceXML instructions for the
application. It is recommended that the first instruction in any document to be seen by the interpreter
be the XML version tag:

< ?xml version="1.0"?>

The remainder of the document's instructions should be enclosed by the vxml tag with the version
attribute set equal to the version of VoiceXML being used ("1.0" in the present case) as follows:

< vxml version="1.0">

2.1 VOICE XML ELEMENTS

Table 2.1 Voice XML elements


Element Purpose
<assign> Assign a variable a value.
<audio> Play an audio clip within a prompt.
<block> A container of (non-interactive) executable code.
<break> JSML element to insert a pause in output.
<catch> Catch an event.
<choice> Define a menu item.
<clear> Clear one or more form item variables.

Element Purpose
4
<div> JSML element to classify a region of text as a particular type.
<dtmf> Specify a touch-tone key grammar.
<else> Used in <if> elements.
<elseif> Used in <if> elements.
<emp> JSML element to change the emphasis of speech output.
<enumerate> Shorthand for enumerating the choices in a menu.
<error> Catch an error event.
<exit> Exit a session.
<field> Declares an input field in a form.
<filled> An action executed when fields are filled.
<form> A dialog for presenting information and collecting data.
<goto> Go to another dialog in the same or different document.
<grammar> Specify a speech recognition grammar.
<help> Catch a help event.
<if> Simple conditional logic.
<initial> Declares initial logic upon entry into a (mixed-initiative) form.
<link> Specify a transition common to all dialogs in the link’s scope.
<menu> A dialog for choosing amongst alternative destinations.
<meta> Define a meta data item as a name/value pair.
<noinput> Catch a noinput event.
<object> Interact with a custom extension.
<option> Specify an option in a <field>
<param> Parameter in <object> or <subdialog>.
<prompt> Queue TTS and audio output to the user.
<property> Control implementation platform settings.
<pros> JSML element to change the prosody of speech output.
<record> Record an audio sample.
<reprompt> Play a field prompt when a field is re-visited after an event.
<return> Return from a subdialog.

CHAPTER - 3
5
ARCHITECTURAL MODEL

The architectural model assumed by this document has the following components:

Figure 3.1 Architecture model of Voice XML

A document server (e.g. a web server) processes requests from a client application, the VoiceXML
Interpreter, through the VoiceXML interpreter context. The server produces VoiceXML documents
in reply, which are processed by the VoiceXML Interpreter. The VoiceXML interpreter context may
monitor user inputs in parallel with the VoiceXML interpreter. For example, one VoiceXML
interpreter context may always listen for a special escape phrase that takes the user to a high-level
personal assistant, and another may listen for escape phrases that alter user preferences like volume
or text-to-speech characteristics.

The implementation platform is controlled by the VoiceXML interpreter context and by the
VoiceXML interpreter. For instance, in an interactive voice response application, the VoiceXML
interpreter context may be responsible for detecting an incoming call, acquiring the initial
VoiceXML document, and answering the call, while the VoiceXML interpreter conducts the dialog

6
after answer. The implementation platform generates events in response to user actions (e.g. spoken
or character input received, disconnect) and system events (e.g. timer expiration). Some of these
events are acted upon by the VoiceXML interpreter itself, as specified by the VoiceXML document,
while others are acted upon by the VoiceXML interpreter context.

3.1 PRINCIPLES OF DESIGN

VoiceXML is an XML schema. For details about XML, refer to the Annotated XML Reference
Manual.

1. The language promotes portability of services through abstraction of platform resources.


2. The language accommodates platform diversity in supported audio file formats, speech grammar
formats, and URI schemes. While platforms will respond to market pressures and support common
formats, the language per se will not specify them.
3. The language supports ease of authoring for common types of interactions.
4. The language has a well-defined semantics that preserves the author's intent regarding the
behavior of interactions with the user. Client heuristics are not required to determine document
element interpretation.
5. The language has a control flow mechanism.
6. The language enables a separation of service logic from interaction behavior.
7. It is not intended for heavy computation, database operations, or legacy system operations. These
are assumed to be handled by resources outside the document interpreter, e.g. a document server.
8. General service logic, state management, dialog generation, and dialog sequencing are assumed to
reside outside the document interpreter.
9. The language provides ways to link documents using URIs, and also to submit data to server
scripts using URIs.
10. VoiceXML provides ways to identify exactly which data to submit to the server, and which
HTTP method (get or post) to use in the submittal.

The language does not require document authors to explicitly allocate and deallocate dialog
resources, or deal with concurrency. Resource allocation and concurrent threads of control are to be
handled by the implementation platform.

7
CHAPTER - 4
CONCEPTS OF VOICE XML

A VoiceXML document (or a set of documents called an application) forms a conversational finite
state machine. The user is always in one conversational state, or dialog, at a time. Each dialog
determines the next dialog to transition to. Transitions are specified using URIs, which define the
next document and dialog to use. If a URI does not refer to a document, the current document is
assumed. If it does not refer to a dialog, the first dialog in the document is assumed. Execution is
terminated when a dialog does not specify a successor, or if it has an element that explicitly exits the
conversation.

4.1 CONCEPTS OF VOICE XML

1. Dialogs and subdialogs : There are two kinds of dialogs: forms and menus. Forms define an
interaction that collects values for a set of field item variables. Each field may specify a grammar
that defines the allowable inputs for that field. If a form-level grammar is present, it can be used to
fill several fields from one utterance. A menu presents the user with a choice of options and then
transitions to another dialog based on that choice.

A subdialog is like a function call, in that it provides a mechanism for invoking a new interaction,
and returning to the original form. Local data, grammars, and state information are saved and are
available upon returning to the calling document. Subdialogs can be used, for example, to create a
confirmation sequence that may require a database query; to create a set of components that may be
shared among documents in a single application; or to create a reusable library of dialogs shared
among many applications.

2. SESSIONS : A session begins when the user starts to interact with a VoiceXML interpreter
context, continues as documents are loaded and processed, and ends when requested by the user, a
document, or the interpreter context.

3. APPLiCATION : An application is a set of documents sharing the same application root


document. Whenever the user interacts with a document in an application, its application root
document is also loaded. The application root document remains loaded while the user is
transitioning between other documents in the same application, and it is unloaded when the user
transitions to a document that is not in the application. While it is loaded, the application root

8
document’s variables are available to the other documents as application variables, and its
grammars can also be set to remain active for the duration of the application.

Figure 4.1: Transitioning between documents in an application.

4. GRAMMARS : Each dialog has one or more speech and/or DTMF grammars associated with it.
In machine directed applications, each dialog’s grammars are active only when the user is in that
dialog. In mixed initiative applications, where the user and the machine alternate in determining
what to do next, some of the dialogs are flagged to make their grammars active (i.e., listened for)
even when the user is in another dialog in the same document, or on another loaded document in the
same application. In this situation, if the user says something matching another dialog’s active
grammars, execution transitions to that other dialog, with the user’s utterance treated as if it were
said in that dialog. Mixed initiative adds flexibility and power to voice applications.

5. EVENTS : VoiceXML provides a form-filling mechanism for handling "normal" user input. In
addition, VoiceXML defines a mechanism for handling events not covered by the form
mechanism.Events are thrown by the platform under a variety of circumstances, such as when the
user does not respond, doesn't respond intelligibly, requests help, etc. The interpreter also throws
events if it finds a semantic error in a VoiceXML document. Events are caught by catch elements or
their syntactic shorthand. Each element in which an event can occur may specify catch elements.
Catch elements are also inherited from enclosing elements "as if by copy". In this way, common
event handling behavior can be specified at any level, and it applies to all lower levels.

6. LINKS : A link supports mixed initiative. It specifies a grammar that is active whenever the user
is in the scope of the link. If user input matches the link’s grammar, control transfers to the link’s
destination URI. A <link> can be used to throw an event to go to a destination URI.

9
4.2 SUPPORTED AUDIO FILE FORMATS

VoiceXML recommends that a platform support the playing and recording audio formats specified
below. Note: a platform need not support both A-law and μ-law simultaneously.

Table 4.1 Supported Audio file formats

Audio Format MIME Type

Raw (headerless) 8kHz 8-bit mu-law [PCM] single channel. audio/basic

Raw (headerless) 8kHz 8 bit A-law [PCM] single channel. audio/x-alaw-basic

WAV (RIFF header) 8kHz 8-bit mu-law [PCM] single channel. audio/wav

WAV (RIFF header) 8kHz 8-bit A-law [PCM] single channel. audio/wav

EXAMPLES

The top-level element is <vxml>, which is mainly a container for dialogs.

There are two types of dialogs: forms and menus.

A form in a VoiceXML document presents information and gathers input from the user. A form is
represented by the <form> tag and has an ID attribute associated with it. The ID attribute is the
name of the form. Following is an example of the use of a form element:

<form id="hello" >


<block>
Hello world!
</block>
</form>
In this example, the name of the form is “hello” and “Hello world” is presented to the user.

10
Form items
Two types of form items exist: field items and control items. A field item prompts the user on what
to say or key in and then collects the information from the user that is then filled into the field item
variable. A field item also has grammars that define the allowed inputs, event handlers to process
the resulting events, and a <filled> element that defines an action to be taken after the field item
variable has been filled. Following is a list of types of field items:

<field>: value of the field item is obtained from the user via speech or DTMF grammars

<record>: value of the field item is an audio clip recorded by the user, such as a voice mail message,
which can be collected by the <record> element

<transfer>: used for transferring the user to another telephone number

<object>: invokes platform-specific object with one or more properties

<subdialog>: like a function call, invokes a call to another dialog on the current page or another
VoiceXML document. A control item's task is to help control the gathering of the form's fields.

Following are two types of control items:


<block>: sequence of statements used for prompting and computation

<initial>: useful in mixed initiative dialogs that prompt the user for information
Form item variables and conditions
A form item variable is associated with each form. The form item 'variable by default' is set to
'undefined' initially and contains a result (collected from the user) once a form item has been
intepreted. You can define the name of a form item variable by using the name attribute. A guard
condition exists for each form item. The guard condition tests whether the item's variable currently
has a value. If a value exists, then the form item is skipped.
Menus
A menu gives the user a list of choices to select from and transitions to a different dialog or
document based on the user's choice. Following is an example of a menu:
<menu>
<prompt>Say what sports news you are interested in:
<enumerate/></prompt>
<choice next="http://www.news.com/hockey.vxml">
Hockey
</choice>

11
<choice next="http://www.news.com/baseball.vxml">
Baseball
</choice>
<choice next="http://www.news.com/football.vxml">
Football
</choice>
<noinput>Please say what sports news you are interested in
<enumerate/>
</noinput>
</menu>
Our second example asks the user for a choice of drink and then submits it to a server script:
<?xml version="1.0"?>
<vxml version="1.0">
<form>
<field name="drink">
<prompt>Would you like coffee, tea, milk, or nothing?</prompt>
<grammar src="drink.gram" type="application/x-jsgf"/>
</field>
<block> <submit next="http://www.drink.example/drink2.asp"/> </block>
</form>
</vxml>
A field is an input field. The user must provide a value for the field before proceeding to the next
element in the form. A sample interaction is:
C (computer): Would you like coffee, tea, milk, or nothing?
H (human): Orange juice.
C: I did not understand what you said.
C: Would you like coffee, tea, milk, or nothing?
H: Tea
C: (continues in document drink2.asp)

Creating the welcome message


In this VoiceXML example, you will be creating an application that gives you a selection to choose
from. Once you make a selection, you are taken to the appropriate document or dialog. In this

12
section, you will create the main greeting message of the application. In the code below, the user
hears a “welcome” message and is then given a list of choices from the main menu. The <goto>
element is used to skip to the menu section.

<?xml version="1.0"?>
<vxml version = "2.0" xmlns="http://www.w3.org/2001/vxml">
<!-- user hears welcome the first time -->
<form id="intro">
<block>
<audio>Welcome</audio>
<goto next="#make_choice"/>
</block>
</form>
</vxml>
Creating the weather document
In this section you will create the document that gives the user weather information. Once the user
says or selects weather from the menu, the following code is executed:

<?xml version="1.0"?>
<vxml version = "2.0" xmlns="http://www.w3.org/2001/vxml">
<form id="weather_report">
<block>
<audio>
It will be partly cloudy today.</audio>
<goto next="http://www.hostname.com/main.vxml"/>
</block>
</form>
</vxml>

4.3 APPLICATIONS OF VOICE XML


Below are a few examples in which VoiceXML applications can be used:
1. Voice portals: Just like Web portals, voice portals can be used to provide personalized services to
access information like stock quotes, weather, restaurant listings, news, etc.
2. Location-based services: You can receive targeted information specific to the location you are

13
dialing from. Applications use the telephone number you are dialing from.

3. Voice alerts (such as for advertising): VoiceXML can be used to send targeted alerts to a user.
The user would sign up to receive special alerts informing him of upcoming events.
4. Commerce: VoiceXML can be used to implement applications that allow users to specific
products that don't need a lot of description (such as tickets, CDs, office supplies, etc.) work well.

Healthcare: VoiceXML can be employed to create applications reminding patients to take their
medication like medication reminders. Automated systems for scheduling and confirming healthcare
appointments.

5. Education:VoiceXML can be used in language learning applications with voice-based interactions


and pronunciation exercises and for educational holtlines that provide information on educational
programs, courses, and resources
6. Finance: Users can check their account balances and recent transactions through voice
interactions. Some financial institutions use VoiceXML for secure and automated fund transfer
applications.
7. Governmental Services: Automated systems for providing information about government
services and programs.VoiceXML can be utilized to disseminate emergency alerts and information.

CHAPTER - 5
14
VOICE USER INTERFACE USING VOICEXML

Construction of voice interface applications is a challenge and the reason for this is that the language
is deeply related to human behaviour (Schinelle, 2005). As a consequence, the expectations related
to the interface become very high. This kind of interface tries to lead the user to the sensation that he
could speak as if it he was talking with a human, however it was not perfectly achieved.
The main objective of a voice user interface project is to support the user navigation with options,
commands and available information in a system to carry out a specifc task. Unfortunately, access
information through navigation is more complex in the audio ambit. For this, some factors must be
considered in the voice interface design: the application requirements, potentialities and limitations
of the technology and the population characteristics (Kamm, 1995). Once understood those factors,
the voice interface designer can anticipate some difficulties and incompatibilities that will affect the
success of the application, minimizing its impacts.
5.1 INTERACTION
Voice interfaces supply the information systems with an interesting alternative for input and output
data such as a voice-only interface (phone) or a component of a multimodal and/or multimedia
system.

Figure 5.1 Voice User Interface (VUI)

A voice-only interface in an information system can become desirable for two reasons. First, the
application can require free hands in the interaction. Second, the telephone system is a net
15
technology truly robust and universal. Then, it makes sense to extend the information services from
computer to phone (Dey, 1997). Multimodal interfaces are a human-machine interaction for
sequential or parallel applications of input/output data. Speech recognition, keyboard, mouse,
mimic, gestures can be used as modality of input data and to get a synthesized reply voice, graphics
or text message. These ways of interaction can be combined dynamically to provide bigger mobility
to the user (Englert, 2006).
2.2 DIALOGUE INITIATIVE
One of the fundamental aspects of the development of applications with voice interface is the way
the dialogue initiative is taken. The strategy of management dialogue can be by system, user or
mixed initiative (SPI Group, 2006). In a system-initiative dialogue, the computer asks the user and
when the necessary information is received, the solution is processed and the answer is given.
Dialogues with user-initiative assume that the user knows what to do and how interact with the
system. Generally, the system waits for the user input and answers it through operations.
Applications with mixed-initiative assume that the initiative of the dialogue can be taken by the
system or the user.

CONCLUSIONS

16
VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio,
recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-
initiative conversations. Its major goal is to bring the advantages of web-based development and
content delivery to intera. With this we can conclude VoiceXML development in speech recognition
application is greatly simplified by using familiar web infrastructure, including tools and Web
servers. Instead of using a PC with a Web browser, any telephone can access VoiceXML
applications via a VoiceXML "interpreter" (also known as a "browser") running on a telephony
serverctive voice response applications.

REFERENCES

17
[1] Dave Reggett’s Introduction to VoiceXML
http://www.w3.org/Voice/Guide/

[2] VoiceXML : What's Everyone Talking About, Kevin Reichard


http://networking.earthweb.com/netsp/article.php/10953_3349421_1

[3] Integrate VoIP into your enterprise infrastructure, Veronika Megler


http://www-106.ibm.com/developerworks/wireless/library/wi-calvoip/

[4] The Fundamentals of Text-To-Speech Synthesis, Juergen Schroeter


http://www.voicexmlreview.org/Mar2001/features/tts.html

18

You might also like