It Will Happen
It Will Happen
It Will Happen
ABSTRACT with their voices. When the singer's voice is loud enough and when
Growing awareness of the possible over-dominance of the visual its pitch matches the resonance frequency of the glass, the glass may
modality in the field of interactive media and of the existence of shatter. Another phenomenon that can be caused by resonance is
untapped dimensions of sound has led many developers to embark the collapse of a bridge as a result of soldiers’ rhythmic marching.
upon the development of sound-based projects. Many of these Within the field of interactive media, sound can also be
projects restrict themselves to delivering and fostering audio-visual programmed to "cause" any kind of movement. Nonetheless, most
and audio-physical art works. In this paper, we advocate the of the attempts to explore correlates of sound have been made to
progression of audio-visual applications into voice-visual map acoustic characteristics to visual screen-based parameters. As a
performances, and the evolution of audio-physical applications into result, many sound- related installations are audiovisual applications
voice-physical installations. We suggest new forms of voice- while very few, in comparison, are audio-physical, and hardly any
physical artwork aimed at the use of paralinguistic vocalizations to are voice-physical. In many of the few audio-physical applications
physically control real inanimate objects. which exist today, physical movement is what controls sound and
not the opposite. Focusing on sound output, or even sound input, as
Categories and Subject Descriptors a primary mean of interacting with an installation may allow the use
H.5.5 Sound and Music Computing (e.g., HCI): Miscellaneous. of other modes of interaction simultaneously. It facilitates the
avoidance of using the traditional mouse and keyboard. Using the
acoustic mode as the main mode of interaction may reduce visual
General Terms interaction with the screen, but it may also facilitate and open up
Performance, Design, Experimentation possibilities for visual interaction with objects other than, and away
from, the screen. William Gaver explains this in terms of attention
directionality:-
Keywords
Paralanguage, vocal input, vocal telekinesis, voice-physical “Visual objects exist in space but over time, while sound exists in
time, but over space[…]One does not have to face a source of sound
to listen to it. This implies that sound can convey information to
1. INTRODUCTION users despite their orientation, while visual information depends on
Motion can cause sound, but can sound cause motion? If not, then users' directed attention” [2].
can sound be programmed to cause motion?
Many audiovisual works, however, seem to neglect the fact that
This question has resonated in our minds every time we watched the sound has overlooked dimensions that allow for its use as an input
table vibrate below the stereo's speakers while we tested our sound- as well as an output. One neglected dimension is the paralinguistic
based projects. The sound output from those speakers didn't only dimension which includes voice characteristics (pitch, volume,
move the surface of the table but also directed our thoughts towards timbre, etc.), emotional vocalizations (laughing, crying, screaming,
a way to use sound to physically move inanimate objects in real life. etc), and vocal segregates (fillers like ahh and mmm, pauses, and
Most people who have conducted the 'sound-vibrations experiment' other hesitation phenomena). Using this dimension of voice in an
with rice at school know that sound does actually cause motion; it interactive application does not require access to the verbal content.
causes objects to vibrate at the frequency of the waveform. Each This may expand the cross-cultural scope of the interactive work
object has a set of natural frequencies at which it may vibrate. and allow its use by any users regardless of their language. The use
While the stereo is playing loud music, the air around the stereo of paralinguistic vocal control creates a real-time causal relationship
vibrates. If one of the sound frequencies generated from the stereo between the acoustic input and the visual or physical, or haptic
matches the natural frequency of a nearby object, resonance will output, and may therefore facilitate continuity and direct
occur. It is also known that some sopranos are able to shatter glass engagement. It may also allow an installation to be used for
therapeutic and vocal training purposes.
Copyright is held by the author/owner(s). Through reversing the role of sound in an installation by making it a
MM'06, October 23–27, 2006, Santa Barbara, California, USA. controlling factor rather than a controllable element, we hope to
ACM 1-59593-447-2/06/0010. introduce the terms voice-visual and voice-physical to the field of
interactive media and hence to enrich the repertoire of interaction.
In this paper, we will use the term voice-visual to refer to the use of
voice as an input and the use of screen-based visuals as an output.
The term voice-physical will be used to refer to the use of voice as
an input and non-screen-based physical events as an output.
We started exploring the use of paralinguistic voice as an input to by
developing a number of voice-visual games. One of these games is
Sing Pong which is a voice-controlled version of 'Pong'. Whereas in
the traditional Pong the arrow keys in a keyboard are used to move
the paddles, Sing Pong allows players to move the paddles using
their voices and shadows. The paddle's height is mapped to volume,
while its position is mapped to the position of the player's shadow
on a projected screen. Players' interaction with Sing Pong in an
exhibition in London motivated us to further explore possible novel Figure 1. sssSnake: a voice-physical version of the
uses of paralinguistic voice. We soon realized that moving the classic 'Snake' game (2005)
players away from the monitor was not the only advantage that audio conversation, or while cursing a crashed computer. Most of
vocal input allows, as it also allows moving the output out of the these vocalizations consist of speech and whenever non-speech
monitor. sound is digitally involved, it is almost always generated by the
computer in the form of beeps and similar sounds. We believe that
voice is still partly mute in the field of interactive media. The use of
2. VOCAL TELEKINESIS its verbal dimension mainly as a communication tool overshadows
For our purposes, we define Vocal Telekinesis as the physical
many other possible uses. We hope that our voice-visual and voice-
control of inanimate objects via simple paralinguistic vocal input.
physical installations will blur the boundaries between input and
We aim to explore a variety of novel voice-physical mappings
output by enabling the voice to be both an input to the installation
which will extend beyond the graphical output to include physical
and an output to the audience. One of our goals is to use the various
feedback such as changes in the size, temperature, brightness, speed,
emerging non-conventional input mechanisms to exploit the
direction, and height of real objects. Our first implementation of
potential of the body to be a rich source of input; rich enough to be
Vocal Telekinesis was in sssSnake.
an output too.
sssSnake is a two-player voice-physical version of the classic 'Snake'
Moreover, most causality experiments to date have concerned a
game (see Figure 1). It consists of an installation table on which a
physical object causing another physical object to move, or a
real coin is placed and a virtual snake is projected. One player
physical event causing sound (ex. Michotte’s experiments in [1]).
controls the snake by uttering "SSS", while the other player moves
As no causality studies appear to exist in the voice-physical area
the coin away from the snake by uttering "AAHHH". The position
addressed by our research, our future work will involve designing a
of a player round the table determines the direction of the coin's or
usability test of the perception of causality in voice-controlled
snake's path. The snake moves towards the player uttering "SSS",
applications, with particular emphasis on the issue of latency.
and the coin moves away from the player uttering "AHH". Through
prompting players to run round the playing area, the game We think that using voice as a physical tool pushes some of the
encourages physical movement as well as vocal activity. The "SSS" boundaries of human-computer interaction and allows interaction
and "AHH" voices are not differentiated through speech recognition with forms of output beyond conventional screen-based two-
but rather through the detection of frequency range differences dimensional and three-dimensional visuals. It helps the interface
between the high-pitched "SSS" and the low-pitched "AHH". extend beyond the monitor and become an integral part of the
physical world.
We programmed sssSnake in Lingo/Macromedia Director, using the
asFFT Xtra (external software module) which employs the Fast
Fourier Transform (FFT) algorithm [3]. The movement of the coin 4. REFERENCES
involved using the Hewlett-Packard Graphics Language (HPGL) to [1] Cavazza, M., Lugrin, J., Crooks, S., Nandi, A., Palmer, M., and
program a plotter to move its head in response to the ''AAHH' voice. Le Renard, M. Causality and virtual reality art. In Proceedings
A magnet was attached to the plotter head, and the plotter was of the 5th Conference on Creativity &Amp; Cognition
hidden below the surface of the installation table. (London, United Kingdom, April 12 - 15). C&C '05. ACM
Press, New York, NY, 4-12, 2005.
3. CONCLUSION AND FUTURE WORK [2] Gaver, W. The Sonic Finder: An Interface that Uses Auditory
Many multimedia applications are used silently, and any Icons. In Human-Computer Interaction, 4 (1) p. 67-94, 1989.
vocalizations are usually made only while interacting with speech [3] Schmitt, A. asFFT Xtra. http://www.as-ci.net/asFFTXtra,
recognition software, during a Voice over Internet Protocol (VOIP) 2003.