2 Theory of Sonification: Bruce N. Walker and Michael A. Nees
2 Theory of Sonification: Bruce N. Walker and Michael A. Nees
2 Theory of Sonification
Bruce N. Walker and Michael A. Nees
Page 1 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
1
Intentional sounds are purposely engineered to perform as an information display (see
Walker & Kramer, 1996), and stand in contrast to incidental sounds, which are non-
engineered sounds that occur as a consequence of the normal operation of a system (e.g., a
car engine running). Incidental sounds may be quite informative (e.g., the sound of wind
rushing past can indicate a car’s speed), though this characteristic of incidental sounds is
serendipitous rather than designed. The current chapter is confined to a discussion of
intentional sounds.
Page 2 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 3 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 4 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
discussed at length by Edworthy and Hellier (2006), blur the line between alarms
and status indicators, discussed next.
Page 5 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
and health data ("Global music - The world by ear", 2006), among others. Quinn
(2001, 2003) has used data sonifications to drive ambitious musical works, and
he has published entire albums of his compositions.
Page 6 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 7 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 8 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 9 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
2.4.2.1 Monitoring
Monitoring requires the listener to attend to a sonification over a course
of time and to detect events (represented by sounds) and identify the meaning of
the event in the context of the system’s operation. These events are generally
discrete and occur as the result of the attainment of some threshold in the system.
Sonifications for monitoring tasks communicate the crossing of a threshold to the
user, and they often require further (sometimes immediate) action in order for the
system to operate properly (see the treatment of alerts and notifications above).
Kramer (1994) has described monitoring tasks as “template matching”
in that the listener has a priori knowledge and expectations of a particular sound
and its meaning. The acoustic pattern is already known, and the listener’s task is
to detect and identify the sound from a catalogue of known sounds. Consider a
worker in an office environment that is saturated with intentional sounds from
common devices, including telephones, fax machines, and computer interface
sounds (e.g., email or instant messaging alerts). Part of the listener’s task within
such an environment is to monitor these devices. The alerting and notification
sounds emitted from these devices facilitate that task in that they produce known
acoustic patterns; the listener must hear and then match the pattern against the
catalogue of known signals.
2
Human factors scientists have developed systematic methodologies for describing and
understanding the tasks of humans in a man-machine system. Although an in-depth
treatment of these issues is beyond the scope of this chapter, see Luczak (1997) for
thorough coverage of task analysis purposes and methods.
Page 10 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
the interface, such as scrolling, clicking, and dragging with the mouse, or
deleting files, etc. Whereas the task that follows from monitoring an auditory
display cannot occur in the absence of the sound signal (e.g., one can’t answer a
phone until it rings), the task-related processes in a computer interface can occur
with or without the audio. The sounds are employed to promote awareness of the
processes rather than to solely trigger some required response.
Similarly, soundscapes—ongoing ambient sonifications—have been
employed to promote awareness of dynamic situations (a bottling plant, Gaver et
al., 1991; financial data, Mauney & Walker, 2004; a crystal factory, Walker &
Kramer, 2005). Although the soundscape may not require a particular response at
any given time, it provides ongoing information about a situation to the listener.
Page 11 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
magnitude to a baseline or reference tone (i.e., determine the scaling factor); and
5) report the value.
Point comparison, then, is simply comparing more than one datum;
thus, point comparison involves performing point estimation twice (or more) and
then using basic arithmetic operations to compare the two points. In theory, point
comparison should be more difficult for listeners to perform accurately than
point estimation, as listeners have twice as much opportunity to make errors, and
there is the added memory component of the comparison task. Empirical
investigations to date, however, have not examined point comparison tasks with
sonifications.
Page 12 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 13 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 14 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
the association between the sound and its intended meaning is more direct and
should require little or no learning, but many of the actions and processes in a
human-computer interface have no inherent auditory representation. For
example, what should accompany a “save” action in a word processor? How can
that sound be made distinct from a similar command, such as “save as”?
Earcons, on the other hand, use sounds as symbolic representations of actions or
processes; the sounds have no ecological relationship to their referent (see
Blattner, Sumikawa, & Greenberg, 1989; Kramer, 1994). Earcons are made by
systematically manipulating the pitch, timbre, and rhythmic properties of sounds
to create a structured set of non-speech sounds that can be used to represent any
object or concept through an arbitrary mapping of sound to meaning. Repetitive
or related sequences or motifs may be employed to create “families” of sounds
that map to related actions or processes. While earcons can represent virtually
anything, making them more flexible than auditory icons, a trade-off exists in
that the abstract nature of earcons may require longer learning time or even
formal training in their use. Walker (2006) has discussed a new type of interface
sound, the spearcon, which is intended to overcome the shortcomings of both
auditory icons and earcons. Spearcons are created by speeding up a spoken
phrase to the point where it is no longer recognizable as speech, and as such can
represent anything (like earcons can), but are non-arbitrarily mapped to their
concept (like auditory icons). The main point here is that there are tradeoffs
when choosing how to represent a concept with a sound, and the designer needs
to make explicit choices with the tradeoffs in mind.
Page 15 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
2.5.3.3 Scaling
Once an effective mapping and polarity has been chosen, it is important
to determine how much change in, say, the pitch of a sound is used to convey a
given change in, for example, temperature. Matching the data-to-display scaling
function to the listener’s internal conceptual scaling function between pitch and
temperature is critical if the sonification is to be used to make accurate
comparisons and absolute or exact judgments of data values, as opposed to
simple trend estimations. This is a key distinction between sonifications and
warnings or trend monitoring sounds. Again, Walker (2002, in press) has
empirically determined scaling factors for several mappings, in both positive and
negative polarities. Such values begin to provide guidance about how different
data sets would be represented most effectively. However, it is important not to
over-interpret the exact exponent values reported in any single study, to the point
where they are considered “the” correct values for use in all cases. As with any
performance data that are used to drive interface guidelines, care must always be
taken to avoid treating the numbers as components of a design recipe. Rather,
they should be treated as guidance, at least until repeated measurements and
continued application experience converge toward a clear value or range.
Beyond the somewhat specific scaling factors discussed to this point,
there are some practical considerations that relate to scaling issues. Consider, for
Page 16 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 17 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
2.5.3.5 Context
Context refers to the purposeful addition of non-signal information to a
display (Smith & Walker, 2005; Walker & Nees, 2005a). In visual displays,
additional information such as axes and tick marks can increase readability and
aid perception by enabling more effective top-down processing (Bertin, 1983;
Tufte, 1990). A visual graph without context cues (e.g., no axes or tick marks)
provides no way to estimate the value at any point. The contour of the line
provides some incidental context, which might allow an observer to perform a
trend analysis (rising versus falling), but the accurate extraction of a specific
value (i.e., a point estimation task) is impossible.
Even sonifications that make optimal use of mappings, polarities, and
scaling factors need to include contextual cues equivalent to axes, tick marks and
labels, so the listener can perform the interpretation tasks. Recent work (Nees &
Walker, 2006; Smith & Walker, 2005) has shown that even for simple
sonifications, the addition of some kinds of context cues can provide useful
information to users of the display. For example, simply adding a series of clicks
to the display can help the listener keep track of the time better, which keeps
their interpretation of the graph values more “in phase” (see also Bonebright et
al., 2001; Flowers et al., 1997; Gardner, Lundquist, & Sahyun, 1996). Smith and
Walker (2005) showed that when the clicks played at a rate that was twice the
rate of the sounds representing the data, the two sources of information
combined like the major and minor tick marks on the x-axis of a visual graph.
The addition of a repeating reference tone that signified the maximum value of
the data set provided dramatic improvements in the attempts by listeners to
estimate exact data values, whereas a reference tone that signified the starting
value of the data did not improve performance. Thus, it is clear that adding
context cues to auditory graphs can play the role that x- and y-axes play in visual
graphs, but not all implementations are equally successful. Researchers have
only scratched the surface of possible context cues and their configurations, and
we need to implement and validate other, perhaps more effective, methods (see,
e.g., Nees & Walker, 2006).
Page 18 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 19 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 20 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 21 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
2.6.2.4 Training
Sonification offers a novel approach to information representation, and
this novelty stands as a potential barrier to the success of the display unless the
user can be thoroughly and efficiently acclimated to the meaning of the sounds
being presented. Visual information displays owe much of their success to their
pervasiveness as well as to users’ formal education and informal experience at
deciphering their meanings. Graphs, a basic form of visual display, are
incredibly pervasive in print media (see Zacks et al., 2002), and virtually all
children are taught how to read graphs from a very young age in formal
education settings. Complex auditory displays currently are not pervasive, and
users are not taught how to comprehend auditory displays as part of a standard
education. This problem can be partially addressed by exploiting the natural
analytic prowess and intuitive, natural meaning-making processes of the auditory
system (see Gaver, 1993), but training will likely be necessary even when
ecological approaches to sound design are pursued.
To date, little attention has been paid to the issue of training sonification
users. Empirical findings suggesting that sonifications can be effective are
particularly encouraging considering that the majority of these studies sampled
naïve users who had presumably never listened to sonifications before entering
the lab. For the most part, information regarding performance ceilings for
sonifications remains speculative, as few or no studies have examined the role of
extended training in performance.
As Watson and Kidd (1994) suggested, many populations of users may
be unwilling to undergo more than nominally time-consuming training programs,
but research suggests that even brief training for sonification users offers
benefits. Smith and Walker (2005) showed that brief training for a point
estimation task (i.e., naming the Y axis value for a given X axis value in an
auditory graph) resulted in better performance than no training, while Walker
Page 22 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
and Nees (2005b) further demonstrated that a brief training period (around 20
min) can reduce performance error by 50% on a point estimation sonification
task. Recent and ongoing work is examining exactly what kinds of training
methods are most effective for different classes of sonifications (e.g., Walker &
Nees, 2005c).
Page 23 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
References
R
Page 24 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 25 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 26 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 27 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Lakatos, S., McAdams, S., & Causse, R. (1997). The representation of auditory
source characteristics: Simple geometric form. Perception &
Psychophysics, 59(8), 1180-1190.
Levitin, D. J. (1999). Memory for musical attributes. In P. Cook (Ed.), Music,
Cognition, and Computerized Sound: An Introduction to
Psychoacoustics. (pp. 209-227). Cambridge, MA: MIT Press.
Listening to the mind listening: Concert of sonifications at the Sydney Opera
House. (2004). [Concert]. Sydney, Australia: International Conference
on Auditory Display (ICAD04).
Luczak, H. (1997). Task analysis. In G. Salvendy (Ed.), Handbook of Human
Factors and Ergonomics (2nd ed., pp. 340-416). New York: Wiley.
Mauney, B. S., & Walker, B. N. (2004). Creating functional and livable
soundscapes for peripheral monitoring of dynamic data. Proceedings of
the 10th International Conference on Auditory Display (ICAD04),
Sydney, Australia.
McAdams, S., & Bigand, E. (1993). Thinking in sound: the cognitive psychology
of human audition. Oxford: Oxford University Press.
McGookin, D. K., & Brewster, S. A. (2004). Understanding concurrent earcons:
Applying auditory scene analysis principles to concurrent earcon
recognition. Acm Transactions on Applied Perceptions, 1, 130-150.
Meyer, J. (2000). Performance with tables and graphs: Effects of training and a
visual search model. Ergonomics, 43(11), 1840-1865.
Meyer, J., Shinar, D., & Leiser, D. (1997). Multiple factors that determine
performance with tables and graphs. Human Factors, 39(2), 268-286.
Moore, B. C. J. (1997). An introduction to the psychology of hearing (4th ed.).
San Diego, Calif.: Academic Press.
Mulligan, B. E., McBride, D. K., & Goodman, L. S. (1984). A design guide for
nonspeech auditory displays: Naval Aerospace Medical Research
Laboratory Technical Report.
Nees, M. A., & Walker, B. N. (2006). Relative intensity of auditory context for
auditory graph design. Proceedings of the Twelfth International
Conference on Auditory Display (ICAD06) (pp. 95-98), London, UK.
Neuhoff, J. G., & Heller, L. M. (2005). One small step: Sound sources and
events as the basis for auditory graphs. Proceedings of the Eleventh
Meeting of the International Conference on Auditory Display, Limerick,
Ireland.
Neuhoff, J. G., Kramer, G., & Wayand, J. (2002). Pitch and loudness interact in
auditory displays: Can the data get lost in the map? Journal of
Experimental Psychology: Applied, 8(1), 17-25.
Neuhoff, J. G., & Wayand, J. (2002). Pitch change, sonification, and musical
expertise: Which way is up? Proceedings of the International
Conference on Auditory Display (pp. 351-356), Kyoto, Japan.
Page 28 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 29 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 30 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 31 of 32
Principles of Sonification: An Introduction to Auditory Display and Sonification
Page 32 of 32