Voice User Interface Design Patterns
Voice User Interface Design Patterns
net/publication/221034540
CITATIONS READS
33 6,343
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Dirk Schnelle-Walka on 14 January 2014.
Abstract
We present in this paper a set of design patterns we have mined in
the area of Voice User Interfaces (VUI). In a previous paper [14], we
introduced several patterns regarding fundamental issues of developing
a voice application. In this paper we explore further aspects concerning
the internal structure of an audio interface, the construction of the in-
teraction style, the system response architecture, and implementation
strategies to meet the demands of real world scenarios.
1 Introduction
Voice User Interfaces are particularly dicult to build due to their transient
and invisible nature. Unlike visual interfaces, once the commands and ac-
tions have been communicated to the user, they are not there anymore.
Another particular aspect of VUI is that the interaction with the user in-
terface is not only aected by the objective limitations of a voice channel,
but human factors play a decisive role: auditive and orientation capabilities,
attention, clarity, diction, speed, and ambient noise.
The design patterns presented in [14] demonstrated that it is possible to
build a language to talk about VUI designs. In this paper we continue build-
ing this new pattern language, focusing on the dierent dialog strategies,
auditory design, and usage scenarios.
The structure of the paper is as follows: we rst provide an overview
of the whole pattern language. Afterwards, we introduce the discovered
patterns grouped by the VUI design aspect they address: Interaction Style,
the organization of a spoken response and a nal set of patterns that address
voice usability issues in large scale settings.
B3-1
2 OVERVIEW
2 Overview
In this section we present patterns we have mined and categorized according
to their main purpose. An overview is given in gure 1. The existing pat-
terns in the language (taken from [14]), marked with a grayed box, are not
described in this paper.
The patterns belong to a language we started in [14], as shown in gure 2.
These new patterns expand the language in important areas of VUI de-
sign, closer to how the actual process of VUI design development actually
takes place. In this context, the election and development of an appropriate
dialog strategy is a rst step designers must take when building a VUI inter-
face. Menu Hierarchy serves as general organizational scheme for many
Audio Patterns. Novel designers would naturally apply it, although it is just
one characteristic of a dialog strategy. Further dialog strategies are explained
in section 4.1. Another major design concern described later in section 4.2
B3-2
3 CHALLENGES WITH AUDIO
focuses on the system output or rather, the system response. Note that the
system response structure will be underlying the dialog structuring and in-
teraction mechanisms implemented later, thus becoming the cornerstone of
how the overall user interface will be perceived.
B3-3
3 CHALLENGES WITH AUDIO
Speech is asymmetric People can speak faster than they type, but can
listen much more slowly than they can read. This has a direct inuence
on the amount of audio data and the information being delivered. This
property is extremely useful in cases where we have the possibility
of using additional displays to supplement the basic audio interface.
We can use the displays for delivering information which is unsuitable
for audio due to its length, and focus on using the audio device for
interaction and delivering short pieces of information.
Flexibility vs. Accuracy Speech can have many faces for the same issue
and natural language user interfaces must serve many of them. This
has a direct impact to recognition accuracy. To illustrate this trade
o between exibility of the interface and its accuracy, consider the
following example for entering a date. A exible interface would allow
the user to speak the date in any format the user desires (e.g., March
2nd, yesterday, 2nd of March 2004, etc.). Another possibility would
be to prompt the user individually for each of the components of the
date (e.g., Say the year, Say the month, etc.). Obviously, the rst
method is much more exible for the user, but requires much more
work from the recognition software, and is far more error-prone than
the second approach.
B3-4
3 CHALLENGES WITH AUDIO
<<extends>>
Experienced User
B3-5
4 VUI DESIGN PATTERNS
• Experienced users know what to say and need a fast way to enter the
data
Context The user has to provide data that can't be gathered through
a selection process. Forms are a well known concept of the real world to
provide information in a sequential order, one at a time.
Forces
• Forms forces users to provide one datum at a time
• Speech is invisible: Users might forget what they entered in the begin-
ning
B3-6
4.1 Dialog strategy 4 VUI DESIGN PATTERNS
Consequences
+ Dialogs are clear and structured
Known Uses The Poldi Gewinnspiel of the 1. FC Köln used Form Fill-
ing to request the desired information about the caller, like membership of
the 1. FC Köln and to enter the answers to the quiz. The application went
oine in 2005, but was callable at +49 (180) 5 29 00 29.
B3-7
4.1 Dialog strategy 4 VUI DESIGN PATTERNS
The dating line L.U.C.Y. from Com Vision uses Form Filling to request
the data of the caller, i.e. sex, age and ZIP code. Having entered all data,
the system summarizes all data in one go. The application can be called at
+49 (12) 345 662 662.
The Citiphone Banking application uses Form Filling to enter the
data, i.e. account number or bank code for a bank transfer. After the caller
entered a value for a eld, she is asked for conrmation. The application can
be called at +49 (180) 33 22 111.
Related patterns Form pattern [17] solves a similar problem for graph-
ical user interfaces.
Escalating Detail as a means of error recovery.
Mixed Initiative is an alternate approach and enables the user to pro-
vide more than one datum at a time.
Sample code In a car inspection scenario, the worker enters the data after
a car has been repaired. He enters his worker id, the order number and other
data that belongs to the order such as part numbers in a form before the
back-end system continues with the billing process.
Note: The VoiceXML <form> element must not be confused with the
form concept that is described in this pattern. <field> elements are just
one side of the elements which a <form> may contain.
<form i d=" p r e p a r e _ b i l l ">
< f i e l d name=" i d " type=" d i g i t s ">
<prompt>P l e a s e say your i d .</ prompt>
</ f i e l d>
B3-8
4.1 Dialog strategy 4 VUI DESIGN PATTERNS
Context The user has to provide data that can be gathered through a
selection process. The options to be presented to the user are interrelated in
a hierarchy, or can be made to appear that way.
Forces
• Hierarchical structures are easy to understand
• Speech is one-dimensional: Long menus force the user to wait until the
wanted option is presented
B3-9
4.1 Dialog strategy 4 VUI DESIGN PATTERNS
Consequences
+ Dialogs are clear and structured
+ Good for beginners/novices
+ The user is guided by the system and gets more details on demand
+ The user is able to control the leaves she wants to enter or leave
o System controls dialog ow
- Dialogs are too structured
- The lost-in-space problem increases when the number of options or the
depth of the hierarchy are too high
- Poorly chosen categories force the user to re-enter the menu to nd the
category that matches their reason for calling [16]
B3-10
4.1 Dialog strategy 4 VUI DESIGN PATTERNS
Sample code At the car inspection the worker browses for information
about a part that needs to be exchanged. The parts are sorted in a hierar-
chical structure. Having navigated to the exhaust system, she is been asked
for the sub-part in which she is interested.
<menu>
<prompt>
Say t h e p a r t o f t h e e x h a u s t system <enumerate />
</ prompt>
<c h o i c e next=" h t t p : //www. example . com/ p i p e . vxml">
Exhaust p i p e
</ c h o i c e>
<c h o i c e next=" h t t p : //www. example . com/ c o n v e r t e r . vxml">
Catalytic converter
</ c h o i c e>
<c h o i c e next=" h t t p : //www. example . com/ r e a r _ m u f f l e r . vxml">
Rear m u f f l e r
</ c h o i c e>
...
</menu>
Context Users with a dierent level of experience have to enter some piece
of information that may have interdependencies. The dialog should imitate
a natural human-to-human conversation.
Forces
• Users want to enter data without too many restrictions
• Grammars can cover only part of the ways to capture user input
Solution Start the conversation with an open ended prompt, i.e. How may
I help you?. Create a eld, also known as slot for each piece of information
to obtain from the user. Allow the user to provide more information than
the current question expects, i.e.
B3-11
4.1 Dialog strategy 4 VUI DESIGN PATTERNS
[filled=field1,field2]
InputPrompt ConfirmAll
PromptField 2
[filled=field2]
Consequences
+ Dialogs are more natural
B3-12
4.1 Dialog strategy 4 VUI DESIGN PATTERNS
- Recognition rate decreases since the user is free to make any utterance
- User is fooled to think she is in a real free form entry, which will cause
her to use words that might not belong to the grammar
- Turn taking (When should or can the user speak?) and intention recog-
nition (What are the goals of the user?) come up as new problems [7]
Sample code In a car inspection scenario, the worker enters the data after
a car has been repaired. He enters his worker id, the order number and other
data, that belongs to the order, such as part numbers into the system before
the back-end system continues with the billing process. The worker can enter
all data at once or in a sequential order.
<form i d=" r e p a i r _ d o n e ">
<grammar s r c=" r e p a i r . grxml "
type=" a p p l i c a t i o n / s r g s+xml" />
B3-13
4.2 Design of the system response 4 VUI DESIGN PATTERNS
<b l o c k>
<submit next=" h t t p : //www. example . com/ s e r v l e t / b i l l "
n a m e l i s t=" id , o r d e r , part_number " />
</ b l o c k>
</ form>
4.2.1 Persona
Intent Dene a look & feel for voice based applications.
Context Users of voice based applications build their own mental image
of a personality or character that they infer from the application's voice and
language. Such mental image relates certain properties to the virtual dialog
partner, where systems responses should fall within a foreseeable range of
possibilities.
Problem How to realize a look & feel for voice based applications?
Forces
• Interests of the target groups are dierent
B3-14
4.2 Design of the system response 4 VUI DESIGN PATTERNS
• The character must match the user's mental image of the application
Consequences
+ VUI anticipates the caller's needs
Known Uses BERTI, the soccer information service from Sympalog uses
the ctive person BERTI to interact in a Mixed Initiative dialog with the
caller. The application can be called at +49 (9131) 61 00 17.
The Poldi Gewinnspiel of the 1. FC Köln used a very famous person to
create their Persona: Lukas Podolski. The application went oine in 2005,
but was callable at +49 (180) 5 29 00 29.
B3-15
4.2 Design of the system response 4 VUI DESIGN PATTERNS
Sample code In our car inspection scenario, the customer can call to get
information about the progress of the repair. He has to identify herself with
her customer id to retrieve the current repairs that are associated with this
customer.
The sample code provides an example, how SSML tags can be used to
use prosodic information within an application to control the TTS output.
This example can give only an idea about this pattern. The decision for a
persona is more or less a philosophical question, that has to be consistent
throughout the whole dialog.
< f i e l d name=" customer ">
<prompt>
Hi , my name i s B i l l . I can g i v e you some i n f o r m a t i o n
about t h e s t a t u s o f your c a r r e p a i r .<break />
P l e a s e say your
<emphasis l e v e l=" s t r o n g ">customer i d</ emphasis> t h a t
i s p r i n t e d on your copy o f t h e
<emphasis l e v e l=" s t r o n g ">o r d e r</ emphasis> .
</ prompt>
<grammar mode=" v o i c e " r o o t=" customer_id ">
...
</grammar>
</ f i e l d>
Forces
• Structural information helps the user for easy and fast access to infor-
mation
B3-16
4.2 Design of the system response 4 VUI DESIGN PATTERNS
Consequences
+ Use of structural information in audio
Known Uses The AHA framework [9] uses Structured Audio to render
structural information of web pages in audio.
Sample code At our car inspection scenario, the worker listens to a list
of repairs that have to be done. Each item is marked with an item sound to
indicate the list structure.
< f i e l d name=" d e t a i l C h o i c e ">
...
<prompt>
This c a r n e e d s t h e f o l l o w i n g r e p a i r s :
<mark name=" o i l C h a n g e " />
<a u d i o s r c=" item . raw"/> o i l change <break />
<mark name=" changeTyres " />
<a u d i o s r c=" item . raw"/> change t y r e s <break />
<mark name=" c h e c k N o i s e " />
<a u d i o s r c=" item . raw"/> check n o i s e from e x h a u s t
<break />
</ prompt>
...
</ f i e l d>
B3-17
4.3 Usability in Business Scenarios 4 VUI DESIGN PATTERNS
Forces
• Global companies and current increasing migration within multilingual
economic regions such as the EU or Mercosur force services to be oered
in multiple languages
• The default language of the company that oers the service is unknown
to users
B3-18
4.3 Usability in Business Scenarios 4 VUI DESIGN PATTERNS
L2 Application
[L2]
Consequences
+ Allow users to easily recognize an option
+ Enable users with little foreign language skills access the system
Sample code If the customer calls our international garage, if her car is
ready, the application asks for the language, sh wants to use.
<form i d=" l a n g u a g e ">
< f i e l d i d=" l a n g u a g e ">
<prompt>
Welcome t o t h e c a r i n s p e c t i o n .
<p xml:lang=" en ">Do you want t o u s e E n g l i s h ?</p>
<p xml:lang=" de ">
Moechten S i e Deutsch verwenden ?
</p>
<p xml:lang=" e s ">Desea a c c e d e r e s Espanol ?</p>
</ prompt>
<grammar s r c=" l a n g u a g e . grxml "
B3-19
4.3 Usability in Business Scenarios 4 VUI DESIGN PATTERNS
Problem How to transform the passive user wait in a phone line into a
more active, engaging experience (and keep them waiting!)?
Forces
• Waiting is avoided by customers
• Avoid users leaving the service (possibly in favor of a competitor)
• Minimize the time users spend waiting to be attended
• Physical limitations of the telephone system or used services to attend
more people at the same time
• The company wants to oer the service, but adding more operators is
costly
• Users expect a 24x7 service
• Not all call centers are available 24x7
B3-20
4.3 Usability in Business Scenarios 4 VUI DESIGN PATTERNS
• Let the user know that the system is aware of the waiting, apologize
and reinforce the message that the user will be attended as soon as
possible.
• Provide information about how many people are on the queue (as a
subjective delay measure) or better, provide an estimate of the waiting
time
• Play a smooth and calming music in the background. This can be com-
bined with brief corporate or branding messages and intervals where
only the music is heard.
[timeout]
Info
Consequences
+ Users become more satised because they know more.
B3-21
4.3 Usability in Business Scenarios 4 VUI DESIGN PATTERNS
Known Uses The customer service at T-Online informs the user upon
alternative request possibilities, if all technicians are busy. While the user
is in the waiting queue, she gets notied that she will be connected to a
technician as soon as possible from time to time. The application can be
called at +49 (180) 5 30 50 00.
Sample code If the customer calls the garage, if her car is ready, she may
end up in a queue. Each 60 seconds, she will be informed about the number
of persons to be served rst.
<form i d=" x f e r ">
<t r a n s f e r name=" m y c a l l " d e s t=" t e l : +1 −555 −123 −4567"
t r a n s f e r a u d i o=" music . wav" c o n n e c t t i m e o u t=" 60 s "
b r i d g e=" t r u e ">
<prompt>
Say c a n c e l t o d i s c o n n e c t t h i s c a l l a t any time .
</ prompt>
<grammar s r c=" c a n c e l . grxml "
type=" a p p l i c a t i o n / s r g s+xml" />
< f i l l e d>
<o b j e c t name=" s i z e "
c l a s s i d=" method: /// queue . s i z e "
data=" h t t p : //www. example . com/ queue . j a r ">
</ o b j e c t>
<prompt>
There a r e o n l y <v a l u e expr=" s i z e " />
p e r s o n s t o s e r v e b e f o r e you .
</ prompt>
<reprompt />
</ f i l l e d>
</ t r a n s f e r>
</ form>
Note, that this example would not really work, since the caller would be
disconnected and enqueued again, if the connection timeout expires. The rea-
son is the limitation of the <transfer> element in VoiceXML. A workaround
B3-22
4.3 Usability in Business Scenarios 4 VUI DESIGN PATTERNS
for VoiceXML is to dynamically create the audio le music.wav and synthe-
size the announcements to this le. Proprietary implementations with a CTI
connection to the telephony system do not have this limitation.
Context Requesting services and goods over the phone is a common part
of our lives. These voice applications are continuously applied everyday into
the most diverse purposes. From appointments and accessing bank services
to ordering a pizza or a taxi, spoken requests are an every day activity.
A certain amount of context information is often required for any request
in order to be meaningful. In voice interfaces, such information must be
provided by the user.
Forces
• The process of conveying directions and specications of what is being
asked can be time consuming when the amount of information is large
• Users do not feel annoyed, if they do not use the service frequently
• Users need feedback about the information which the system already
has, and temporal orientation
• A low speech recognizer performance makes that data entry more error
prone
B3-23
4.3 Usability in Business Scenarios 4 VUI DESIGN PATTERNS
Process
Enter
[ctx!=available]
Consequences
+ Personalized service. The Information about who's calling and poten-
tial needs and preferences is already available
+ Less mistakes on the overall process of attending a request. Elimination
of repetitive data entry favors accuracy
+ More ecient interaction between the user and the operator. Substan-
tially more calls can be attended in the same amount of time
+ Other planning and logistic systems can be integrated and use the
contextual information to schedule service according to other criteria
such as logistics or customer rate
- Cannot be applied for services that requires authentication, since a
caller ID might not be enough to identify the caller for critical appli-
cations
B3-24
5 SUMMARY
Sample code In a car inspection scenario, the worker has to get a list of
which repairs have to be done for the next car. This information can be
entered manually by reading the license plate to the system or by a location
system, that automatically detects that the worker is standing near to a
certain car.
<form i d=" g e t _ o r d e r ">
<o b j e c t name=" l o c a t i o n "
c l a s s i d=" method: // L o c a t o r / l o c a t e "
data=" h t t p : //www. example . com/ l o c a t i o n . j a r ">
<param name=" worker " expr=" i d " />
</ o b j e c t>
<b l o c k>
< i f cond=" l o c a t i o n ">
<submit next=" h t t p : //www. example . com/ r e a d . j s p "
n a m e l i s t=" l o c a t i o n " />
</ i f>
</ b l o c k>
</ form>
5 Summary
We have presented in this paper 8 new patterns in the area of Voice User
Interfaces (VUI). Several known guidelines for VUI design are covered. The
design for VUI applications has its roots in the one-dimensional, transient
and invisible nature of the audio medium, enhanced by technical limitations,
such as speech synthesis quality and speech recognition performance. These
three factors introduce a particular set of problems and requirements to the
application that must be addressed, and that we have documented in this
paper.
B3-25
REFERENCES REFERENCES
The design patterns we presented in this work aim to help the designer
of a VUI understand the nature of the problems, and successfully analyse
and solve these issues to provide a successful voice interface.
The new patterns consistently build on the previously mined VUI pat-
terns and show non-trivial design decisions in three dierent aspects of VUI
applications such as Dialog Strategy, System Response and Usage Scenarios.
Form Filling, Menu Hierarchy and Mixed Initiative are the fun-
damental decision of the dialog strategy that the designer follows in her
application.
Persona and Structured Audio help to design the system response.
Busy Waiting, Language Selector, and Context Aware Call
are some common usability issues in business scenarios.
These patterns are the continuation of a new pattern language, intro-
duced in [14]. We hope to continue growing it with the help of the VUI
community.
Acknowledgments
Thanks a lot to Amir Raveh who shepherded this paper for EuroPLoP 2006.
His comments were very useful and helped to develop this paper.We would
also thank Jussi Kangasharju for reviewing the paper and Jürgen Haas for
his comments from the developer's perspective.
References
[1] C. Alexander, S. Ishikawa, and M. Silverstein. A Pattern Language:
Towns, Buildings, Constructions. Oxford University Press, UK, 1977.
[2] Michael H. Cohen, James P. Giangola, and Jennifer Balogh. Voice User
Interface Design. Addison-Wesley, Boston, January 2004.
[3] Andy Dearden and Janet Finlay. Pattern Languages in HCI; A Critical
Review. Human-Computer Interaction, 21, 2006.
[5] James A. Larson et. al. Ten Guidelines for Designing a Successful Voice
User Interface. Speech Technology Magazine, January 2005.
B3-26
View publication stats
REFERENCES REFERENCES
[8] Speech Science Institute. Guidelines for Developing Voice User In-
terfaces. http://www.ssi-interactive.com/vui_guidelines1.htm,
2004.
[11] SUN Microsystems. Java Speech API Programmer's Guide. SUN Mi-
crosystems, 1998.
[12] George A. Miller. The magical number seven, plus or minus two: Some
limtis on our capacity for processing information. Psycological Review,
63:8197, 1956.
[14] Dirk Schnelle, Fernando Lyardet, and Tao Wei. Audio Navigation Pat-
terns. In EuroPLoP 2005 Conference Proceedings, 2005.
[15] Ben Shneiderman. Designing the User Interface: Strategies for Eec-
tive Human-Computer Interaction. Addison-Wesley Longman Publish-
ing Co., Inc., Boston, MA, USA, 1986.
[16] Bernhard Suhm, Barbara Freeman, and David Getty. Curing the Menu
Blues in Touch-tone Voice Interfaces. In CHI '01: CHI '01 extended
abstracts on Human factors in computing systems, pages 131132, New
York, NY, USA, 2001. ACM Press.
[19] W3C. Speech Synthesis Markup Language (SSML) Version 1.0. http:
//www.w3.org/TR/speech-synthesis/, September 2004.
[20] Nicole Yankelovich. How Do Users Know What to Say? ACM interac-
tions, 3(6), November 1996.
B3-27