CHAPTER 11 - Multimodal Interaction
CHAPTER 11 - Multimodal Interaction
Providing access to information through more than one mode of interaction is an important
principle of universal design. Such design relies on multi-modal interaction.
There are five senses: sight, sound, touch, taste and smell.
Sight is the predominant sense for the majority of people, and most interactive systems
consequently use the visual channel as their primary means of presentation, through graphics,
text, video and animation.
However, sound is also an important channel, keeping us aware of our surroundings, monitoring
people and events around us, reacting to sudden noises, providing clues that switch our attention
from one thing to another.
It can also have an emotional effect on us, particularly in the case of music. Music is almost
completely an auditory experience, yet is able to alter moods, conjure up visual images, evoke
atmospheres or scenes in the mind of the listener.
Touch, too, provides important information: tactile feedback forms an intrinsic part of the
operation of many common tools – cars, musical instruments, pens, anything that requires
holding or moving. It can form a sensuous bond between individuals, communicating a wealth of
non-verbal information.
Taste and smell are often less appreciated (until they are absent) but they also provide useful
information in daily life: checking if food is bad, detecting early signs of fire, noticing that
manure has been spread in a field, pleasure.
Examples of the use of sensory information are easy to come by but the important point is that
our everyday interaction with each other and the world around us is multi-sensory, each sense
providing different information that informs the whole. Since our interaction with the world is
improved by multi-sensory input, it makes sense that interactive systems that utilize more than
one sensory channel will also provide a richer interactive experience.
In addition, such multi-sensory or multi-modal systems support the principle of redundancy
1
required for universal design, enabling users to access the system using the mode of interaction
that is most appropriate to their abilities.
The majority of interactive computer systems are predominantly visual in their interactive
properties; often WIMP based, they usually make use of only rudimentary sounds while adding
more and more visual information to the screen. As systems become more complex, the visual
channel may be overloaded if too much information is presented all at once. This may lead to
frustration or errors in use.
By utilizing the other sensory channels, the visual channel can be relieved of the pressure of
providing all the information required and so interaction should improve.
It should always be remembered that multi-modal interaction is not just about enhancing the
richness of the interaction, but also about redundancy. Redundant systems provide the same
information through a range of channels, so, for example, information presented graphically is
also captioned in readable text or speech, or a verbal narrative is provided with text captions. The
aim is to provide at least an equivalent experience to all, regardless of their primary channel of
interaction.
Video games offer further evidence, since experts tend to score less well when the sound is
turned off than when it is on; they pick up vital clues and information from the sound while
concentrating their visual attention on different things. The dual presentation of information
through sound and vision supports universal design, by enabling access for users with visual and
hearing impairments respectively. It also enables information to be accessed in poorly lit or
noisy environments.
2
Sound can convey transient information and does not take up screen space, making it potentially
useful for mobile applications.
3. Gesture recognition
Gesture is a component of human–computer interaction that has become the subject of attention
in multi-modal systems. Being able to control the computer with certain movements of the hand
would be advantageous in many situations where there is no possibility of typing, or when other
senses are fully occupied.
a) Visual impairment
The rise in the use of graphical interfaces reduces the possibilities for visually impaired users. In
text-based interaction, screen readers using synthesized speech or braille output devices provided
complete access to computers: input relied on touch-typing, with these mechanisms providing
the output.
However, today the standard interface is graphical. Screen readers and braille output are far more
3
restricted in interpreting the graphical interface. A number of systems use sound to provide
access to graphical interfaces for people with visual impairment.
A range of approaches to the use of sound such as speech, earcons and auditory icons have been
used in interfaces for blind users.
Auditory icons are short sound messages that convey information about an object, event or
situation. Auditory icons are everyday sounds that convey information about computer events by
analogy with everyday events. For instance, when we mark a computer file for deletion, we
might hear the sound of an object crashing into a wastebasket. The type of object we hear might
indicate the type of object we've discarded – whether it is a text file, a program, etc. –
Earcons differ from Auditory Icons in that earcons are generally synthesized tones or sound
patterns, and have no direct relationship to the event. A learning process is involved for the
indirect sound to eventually have a specific meaning.
Examples of earcons can be the sound our iPhone makes when we type a message or the
indication of a misspelled word in user input, the arrival of a mail message, The system going off
line in a few minutes, The indication of a misspelled word in user input.
b) Hearing impairment
Computer technology can actually enhance communication opportunities for people with hearing
loss. Email and instant messaging are great levellers and can be used equally by hearing and deaf
users alike.
Gesture recognition has also been proposed to enable translation of signing to speech or text,
again to improve communication particularly with non-signers.
c) Physical impairment
Users with physical disabilities vary in the amount of control and movement that they have over
their hands, but many find the precision required in mouse control difficult. Speech input and
output is an option for those without speech difficulties.
d) Speech impairment
For users with speech and hearing impairments, multimedia systems provide a number of tools
4
for communication, including synthetic speech and text-based communication and conferencing
systems. Textual communication is slow, which can lower the effectiveness of the
Communication.
A speech synthesizer is a computerized voice that turns a written text into a speech. It is an
output where a computer reads out the word loud in a simulated voice; it is often called text-to-
speech. It is not only to have machines talk simply but also to make a sound like humans of
different ages and gender.
Predictive algorithms have been used to anticipate the words used and fill them in, to reduce the
amount of typing required. Conventions can help to provide context, which is lost from face-to-
face communication, for example the ‘smilie’ :-), to indicate a joke.
Older people
Older people and children have specific needs when it comes to interactive technology.
The proportion of disabilities increases with age: more than half of people over 65 have some
kind of disability. Just as in younger people with disabilities, technology can provide support for
failing vision, hearing, speech and mobility. New communication tools, such as email and instant
messaging, can provide social interaction in cases where lack of mobility or speech difficulties
reduce face-to-face possibilities.
Mobile technologies can be used to provide memory aids where there is age-related memory
loss.
Basic universal design principles are important here. Access to information must make use of
redundancy and support the use of access technologies.
Designs must be clear and simple and forgiving of errors. In addition, thought needs to be given
to sympathetic and relevant training aimed at the user’s current knowledge and skills.
In spite of the potential benefits of interactive technology to older people, very little attention has
been paid to this area until recently. Researchers are now beginning to address issues such as
5
how technology can best support older people, what the key design issues are, and how older
people can be effectively included in the design process.
Children
Paper prototyping, using art tools familiar to children, enables both adults and children to
participate in building and refining prototype designs on an equal footing. The approach has been
used effectively to develop a range of new technologies for children. As well as their likes and
dislikes, children’s abilities will also be different from those of adults. Younger children may
have difficulty using a keyboard for instance, and may not have well-developed hand–eye
coordination. Pen-based interfaces can be a useful alternative input device Again, universal
design principles guide us in designing interfaces that children can use. Interfaces that allow
multiple modes of input, including touch or handwriting, may be easier for children than
keyboard and mouse. Redundant displays, where information is presented through text, graphics
and sound will also enhance their experience.
Key factors that we need to consider carefully if we are to practice universal design. Include
language, cultural symbols, gestures and use of color.
Similarly, symbols have different meanings in different cultures. Ticks and crosses X
represent positive and negative respectively in some cultures, and are interchangeable in others.
We cannot assume that everyone will interpret symbols in the same way and should ensure that
alternative meanings of symbols will not create problems or confusion for the user. The study of
the meaning of symbols is known as semiotics and is a worthwhile diversion for the student of
universal design.