William M. Hartmann - How We Localize Sound

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

How We Localize Sound

William M. Hartmann

Citation: Physics Today 52, 11, 24 (1999); doi: 10.1063/1.882727


View online: https://doi.org/10.1063/1.882727
View Table of Contents: https://physicstoday.scitation.org/toc/pto/52/11
Published by the American Institute of Physics

ARTICLES YOU MAY BE INTERESTED IN

(); https://doi.org/10.1063/PT.6.1.20201015a

The Virtual Cook: Modeling Heat Transfer in the Kitchen


Physics Today 52, 30 (1999); https://doi.org/10.1063/1.882728

The sounds around us


Physics Today 73, 28 (2020); https://doi.org/10.1063/PT.3.4387

The math behind epidemics


Physics Today 73, 28 (2020); https://doi.org/10.1063/PT.3.4614

(); https://doi.org/10.1063/PT.6.1.20210416a

Special Issue: Everyday Physics


Physics Today 52, 23 (1999); https://doi.org/10.1063/1.882726
How WE
LOCALIZE SOUND
The spherical-head model
F or as long as we humans
have lived on Earth, we Relying on a variety of cues, including is obviously a simplification.
have been able to use our ears intensity, timing, and spectrum, our Human heads include a vari-
to localize the sources of ety of secondary scatterers
sounds. Our ability to localize
brains recreate a three-dimensional that can be expected to lead
warns us of danger and helps image of the acoustic landscape from to structure in the higher-
us sort out individual sounds the sounds we hear. frequency dependence of the
from the usual cacophony of ILD. Conceivably, this struc-
our acoustical world. ture can serve as an addition-
Characterizing this ability in al cue for sound localization.
humans and other animals William M. Hartmann As it turns out, that is exactly
makes an intriguing physical, what happens, but that is
physiological, and psychological study (see figure 1). another story for later in this article.
John William Strutt (Lord Rayleigh) understood at In the long-wavelength limit, the spherical-head
least part of the localization process more than 120 years model correctly predicts that the ILD should become use-
ago.1 He observed that if a sound source is to the right of lessly small. If sounds are localized on the basis of ILD
the listener's forward direction, then the left ear is in the alone, it should be very difficult to localize a sound with a
shadow cast by the listener's head. Therefore, the signal in frequency content that is entirely below 500 Hz. It there-
the right ear should be more intense than the signal in the fore came as a considerable surprise to Rayleigh to discov-
left one, and this difference is likely to be an important er that he could easily localize a steady-state low-frequen-
clue that the sound source is located on the right. cy pure tone such as 256 or 128 Hz. Because he knew that
localization could not be based on ILD, he finally conclud-
Interaural level difference ed in 1907 that the ear must be able to detect the differ-
The standard comparison between intensities in the left ence in waveform phases between the two ears. 3
and right ears is known as the interaural level difference
(ILD). In the spirit of the spherical cow, a physicist can Interaural time difference
estimate the size of the effect by calculating the acoustical For a pure tone like Rayleigh used, a difference in phases
intensity at opposite poles on the surface of a sphere, given is equivalent to a difference in arrival times of waveform
an incident plane wave, and then taking the ratio. The features (such as peaks and positive-going zero crossings)
level difference is that ratio expressed in decibels. at the two ears. A phase difference \<t> corresponds to an
As shown in figure 2, the ILD is a strong function of interaural time difference (ITD) of It = ±<t>K2Trf) for a tone
frequency over much of the audible spectrum (canonically with frequency f. In the long-wavelength limit, the formula
quoted as 20-20 000 Hz). That is because sound waves are for diffraction by a sphere4 gives the interaural time differ-
effectively diffracted when their wavelength is longer than ence At as a function of the azimuthal (left-right) angle 0:
the diameter of the head. At a frequency of 500 Hz, the
wavelength of sound is 69 cm—four times the diameter of Af = —sinfl, (1)
the average human head. The ILD is therefore small for c
frequencies below 500 Hz, as long as the source is more where a is the radius of the head (approximately 8.75 cm)
than a meter away. But the scattering by the head increas- and c is the speed of sound (34 400 cm/s). Therefore, 3a/c
es rapidly with increasing frequency, and at 4000 Hz the = 7oo IJLS.
head casts a significant shadow. Psychoacoustical experiments show that human lis-
Ultimately, the use of an ILD, small or large, depends teners can localize a 500 Hz sine tone with considerable
on the sensitivity of the central nervous system to such accuracy. Near the forward direction (6 near zero), listen-
differences. In evolutionary terms, it would make sense if ers are sensitive to differences 10 as small as 1-2°. The
the sensitivity of the central nervous system would some- idea that this sensitivity is obtained from an ITD initially
how reflect the ILD values that are actually physically seems rather outrageous. A 1° difference in azimuth corre-
present. In fact, that does not appear to be the case. sponds to an ITD of only 13 /is. It hardly seems possible
Psychoacoustical experiments find that the central ner- that a neural system, with synaptic delays on the order of
vous system is about equally sensitive at all frequencies. a millisecond, could successfully encode such small time
The smallest detectable change in ILD is approximately differences. However, the auditory system, unaware of
0.5 dB, no matter what the frequency." Therefore the ILD such mathematical niceties, goes ahead and does it any-
is a potential localization cue at any frequency where it is way. This ability can be proved in headphone experiments,
physically greater than a decibel. It is as though Mother in which the ITD can be presented independently of the
Nature knew in advance that her offspring would walk ILD. The key to the brain's success in this case is parallel
around the planet listening to portable music through processing. The binaural system apparently beats the
headphones. unfavorable timing dilemma by transmitting timing infor-
mation through many neurons. Estimates of the number
BILL HARTMANN is a professor ofphysics at Michigan State University of neurons required, based on statistical decision theory,
in East Lansing, Michigan ([email protected]; have ranged from 6 to 40 for each one-third-octave fre-
http://www.pa.msu.edu/acoustics). He is the author of the texthook
quency band.
Signals, Sound, and Sensation (AIP Press, 1997).
There remains the logical problem of just how the

24 NOVEMBER 1999 PHYSICS TODAY © 1999 American Institute of Physics


wave must be longer
than the delay be-
tween the ears. When
the delay is exactly
half a period, the sig-
nals at the two ears
are exactly out of
phase and the ambi-
guity is complete. For
shorter periods, be-
tween twice the delay
and the delay itself,
the ITD leads to an
apparent source loca-
tion that is on the
opposite side of the
head compared to the
true location. It would
be better to have no
ITD sensitivity at all
than to have a process
that gives such mis-
leading answers. In
fact, the binaural sys-
tem solves this prob-
lem in what appears
to be the best possible
way: The binaural
system rapidly loses
sensitivity to any ITD
at all as the frequency
of the wave increases
FIGURE 1. THE SOUND LOCALIZATION FACILITY at Wright Patterson Air Force Base in Dayton, from 1000 to 1500
Ohio, is a geodesic sphere, nearly 5 m in diameter, housing an array of 277 loudspeakers. Each speaker Hz —exactly the range
has a dedicated power amplifier, and the switching logic allows the simultaneous use of as many as 15 in which the interaur-
sources. The array is enclosed in a 6 m cubical anechoic room: Foam wedges 1.2 m long on the walls of al phase difference
the room make the room strongly absorbing for wavelengths longer than 5 m, or frequencies above 70 becomes ambiguous.
Hz. Listeners in localization experiments indicate perceived source directions by placing an electromag- One might imag-
netic stylus on a small globe. (Courtesy of Mark Ericson and Richard McKinley.) ine that the network
of delay lines and
coincidence detectors
auditory system manages to use ITDs. There is now good described in the box vanishes at frequencies greater than
evidence that the superior olive —a processing center, or about 1500 Hz. Such a model would be consistent with the
"nucleus," in the midbrain —is able to perform a cross- results of pure-tone experiments, but it would be wrong.
correlation operation on the signals in the two ears, as In fact, the binaural system can successfully register an
described in the box on page 27. ITD that occurs at a high frequency such as 4000 Hz, if the
The headphone experiments with an ITD give the lis- signal is modulated. The modulation, in turn, must have a
tener a peculiar experience. The position of the image is rate that is less than about 1000 Hz. Therefore, the failure
located to the left or right as expected, depending on the of the binaural timing system to process sine tones above
sign of the ITD, but the image seems to be within the lis- 1500 Hz cannot be thought of as a failure of the binaural
tener's head—it is not perceived to be in the real external neurons tuned to high frequency. Instead, the failure is
world. Such an image is said to be "lateralized" and not best described in the temporal domain, as an inability to
localized. Although the lateralized headphone sensation is track rapid variations.
quite different from the sensation of a localized source, To summarize the matter of binaural differences, the
experiments show that lateralization is intimately con- physiology of the binaural system is sensitive to amplitude
nected to localization. cues from ILDs at any frequency, but for incident plane
Using headphones, one can measure the smallest waves, ILD cues exist physically only for frequencies
detectable change in ITD as a function of the ITD itself. above about 500 Hz. They become large and reliable for
These ITD data can be used with equation 1 to predict the frequencies above 3000 Hz, making ILD cues most effec-
smallest detectable change in azimuth 10 for a real source tive at high frequencies. In contrast, the binaural physiol-
as a function of 6. When the actual localization experiment ogy is capable of using phase information from ITD cues
is done with a real source, the results agree with the pre- only at low frequencies, below about 1500 Hz. For a sine
dictions, as is to be expected if the brain relies on ITDs to tone of intermediate frequency, such as 2000 Hz, neither
make decisions about source location. cue works well. As a result, human localization ability
Like any phase-sensitive system, the binaural phase tends to be poor for signals in this frequency region.
detector that makes possible the use of ITDs suffers from
phase ambiguity when the wavelength is comparable to The inadequacy of binaural difference cues
the distance between the two measurements. This prob- The binaural time and level differences are powerful cues
lem is illustrated in figure 3. The equivalent temporal for the localization of a source, but they have important
viewpoint is that, to avoid ambiguity, a half period of the limitations. Again, in the spherical-head approximation,

NOVEMBER 1999 PHYSICS TODAY 25


FIGURE 2. INTERAURAL LEVEL DIFFERENCES, calculated for a
source in the azimuthal plane defined by the two ears and the
nose. The source radiates frequency/and is located at an
azimuth 0 of 10° (green curve), 45° (red), or 90° (blue) with
respect to the listener's forward direction. The calculations
assume that the ears are at opposite poles of a rigid sphere.

as waves that come from the forward direction are boost-


ed near 3000 Hz. The most dramatic effects occur above
4000 Hz: In this region, the wavelength is less than 10 cm
and details of the head, especially the outer ears, or pin-
nae, become significant scatterers. Above 6000 Hz, the
ATF for different individuals becomes strikingly individu-
alistic, but there are a few features that are found rather
generally. In most cases, there is a valley-and-peak struc-
100 IK 2K 5K
FREQUENCY (Hz) ture that tends to move to higher frequencies as the eleva-
tion of the source increases from below to above the head.
the inadequacy of interaural differences is evident For example, figure 4 shows the spectrum for sources in
because, for a source of sound moving in the midsagittal front, in back, and directly overhead, measured inside the
plane (the perpendicular bisector of a line drawn through ear of a Knowles Electronics Manikin for Acoustic Research
both ears), the signals to left and right ears —and there- (KEMAR). The peak near 7000 Hz is thought to be a par-
fore binaural differences—are the same. As a result, the ticularly prominent cue for a source overhead.
listener with the hypothetical spherical head cannot dis- The direction-dependent filtering by the anatomy,
tinguish between sources in back, in front, or overhead. used by listeners to resolve front-back confusion and to
Because of a fine sensitivity to binaural differences, this determine elevation, is also a necessary component of
listener can detect displacements of only a degree side-to- externalization. Experiments further show that getting
side, but cannot tell back from front! This kind of localiza- the ATF correct with virtual reality techniques is sufficient
tion difficulty does not correspond to our usual experience. to externalize the image. But there is an obvious problem
There is another problem with this binaural differ- in the application of the ATF. A priori, there is no way that
ence model: If a tone or broadband noise is heard through a listener can know if a spectrally prominent feature
headphones with an ITD, an ILD, or both, the listener has comes from direction-dependent filtering or whether it is
the impression of laterality—coming from the left or part of the original source spectrum. For instance, a signal
right—as expected, but, as previously mentioned, the with a strong peak near 7000 Hz may not necessarily come
sound image appears to be within the head, and it may from above—it might just come from a source that hap-
also be diffuse and fuzzy instead of compact. This sensa- pens to have a lot of power near 7000 Hz.
tion, too, is unlike our experience of the real world, in Confusion of this kind between the source spectrum
which sounds are perceived to be externalized. The reso- and the ATF immediately appears with narrow-band
lution of front-back confusion and the externalization of sources such as pure tones or noise bands having a band-
sound images turn on another sound localization cue, the width of a few semitones. When a listener is asked to say
anatomical transfer function. whether a narrow-band sound comes from directly in
front, in back, or overhead, the answer will depend entire-
The anatomical transfer function ly on the frequency of the sound—the true location of the
Sound waves that come from different directions in space sound source is irrelevant.5 Thus, for narrow-band sounds,
are differently scattered by the listener's outer ears, head, the confusion between source spectrum and location is
shoulders, and upper torso. The scattering leads to an complete. The listener can solve this localization problem
acoustical filtering of the signals appearing at left and only by turning the head so that the source is no longer in
right ears. The filtering can be described by a complex the midsagittal plane. In an interesting variation on this
response function —the anatomical transfer function theme, Frederic Wightman and Doris Kistler at the
(ATF), also known as the head-related transfer function University of Wisconsin —Madison have shown that it is
(HRTF). Because of the ATF, waves that come from behind not enough if the source itself moves—the listener will
tend to be boosted in the 1000 Hz frequency region, where- still be confused about front and back. The confusion can
be resolved, though, if the listener is in control of the
source motion.6
Fortunately, most sounds of the everyday world are

FIGURE 3. INTERAURAL TIME DIFFERENCES, given by the dif-


ference in arrival times of waveform features at the two ears,
are useful localization cues only for long wavelengths. In (a),
the signal comes from the right, and waveform features such as
the peak numbered 1 arrive at the right ear before arriving at
the left. Because the wavelength is greater than twice the head
0 1 2 diameter, no confusion is caused by other peaks of the wave-
form, such as peaks 0 or 2. In (b), the signal again comes from
the right, but the wavelength is shorter than twice the head
diameter. As a result, every feature of cycle 2 arriving at the
right ear is immediately preceded by a corresponding feature
from cycle 1 at the left ear. The listener naturally concludes
that the source is on the left, contrary to fact.

26 NOVEMBER 1999 PHYSICS TODAY


The Binaural Cross-Correlation Model
I n 1948, Lloyd Jcffress proposed that the auditory system
processes interaural time differences by using a network of
neural delay lines terminating in e-e neurons.10 An e-e neuron
from the right. Inputs are delayed by neural delay lines so that
different e-e cells experience a coincidence for different arrival
times at the two ears.
is like an AND gate, responding only if excitation is present on An illustration of how the network is imagined to work is
both of two inputs (hence the name "e-e"). According to the shown in the figure. An array of e-e cells is distributed along
Jeffress model, one input comes from the left ear and the other two axes: frequency and neural internal delay. The frequency
axis is needed because binaural processing takes
place in tuned channels. These channels repre-
sent frequency analysis—the first stage of audi-
tory processing. Any plausible auditory model
must contain such channels.
Inputs from left ear (blue) and right ear (red)
proceed down neural delay lines in each chan-
nel and coincide at the e-e cells for which the
neural delay r exactly compensates for the fact
that the signal started at one ear sooner than the
other. For instance, if the source is off to the lis-
tener's left, then signals start along the delay
lines sooner from the left side. They coincide
with the corresponding signals from the right
ear at neurons to the right of r = 0, that is, at a
positive value of T. The coincidence of neural
signals causes the e-e neurons to send spikes to
higher processing centers in the brain.
The expected value for the number of coin-
cidences Nc at the e-e cell specified by delay r is
given in terms of the rates PL(t) and PR(t) of neu-
ral spikes from left and right ears by the convo-
lution-like integral

dt'PL(t')PR(t'

where Tw is the width of the neuron's coinci-


dence window and Ts is the duration of the
stimulus.11 Thus, Nc is the cross correlation
-3 between signals in the left and right ears.
Signal from LAG T (ms) Signal from Neural delay and coincidence circuits of just
left ear right ear this kind have been found in the superior olive
in the midbrain of cats.12

broadband and relatively benign in their spectral varia- the real world using headphones, once the transfer func-
tion, so t h a t listeners can both localize the source and tions of the microphones and headphones themselves had
identify it on the basis of the spectrum. It is still not been compensated by inverse filtering.
entirely clear how this localization process works. Early Adequate filtering requires fast, dedicated digital sig-
models of the process t h a t focused on particular spectral nal processors linked to the computer that runs experi-
features (such as the peak at 7000 Hz for a source over- ments. The motion of the listener's head can be taken into
head) have given way, under the pressure of recent account by means of an electromagnetic head tracker. The
research, to models t h a t employ the entire spectrum. head tracker consists of a stationary transmitter, whose
three coils produce low-frequency magnetic fields, and a
The experimental art receiver, also with three coils, that is mounted on the lis-
Most of what we know about sound localization has been tener's head. The tracker gives a reading of all six degrees
learned from experiments using headphones. With head- of freedom in the head motion, 60 times per second. Based
phones, the experimenter can precisely control the stimu- on the motion of the head, the controlling computer directs
lus heard by the listener. Even experiments done on cats, the fast digital processor to refilter the signals to the ears
birds, and rodents have these creatures wearing minia- so that the auditory scene is stable and realistic. This vir-
ture earphones. tual reality technology is capable of synthesizing a con-
In the beginning, much was learned about fundamen- vincing acoustical environment. Starting with a simple
tal binaural capabilities from headphone experiments monaural recording of a conversation, the experimenter
with simple differences in level and arrival time for tones can place the individual talkers in space. If the listener's
of various frequencies and noises of various compositions.7 head turns to face a talker, the auditory image remains
However, work on the larger question of sound localization constant, as it does in real life. What is most important for
had to await several technological developments to the psychoacoustician, this technology has opened a large
achieve an accurate rendering of the ATF in each ear. First new territory for controlled experiments.
were the acoustical measurements themselves, done with
tiny probe microphones inserted in the listener's ear Making it wrong
canals to within a few millimeters of the eardrums. With headphones, the experimenter can create conditions
Transfer functions measured with these microphones not found in nature to try to understand the role of differ-
allowed experimenters to create accurate simulations of ent localization mechanisms. For instance, by introducing

NOVEMBER 1999 PHYSICS TODAY 27


0- FIGURE 4. THE ANATOMICAL TRANSFER func-
A tion, which incorporates the effects of second-
ary scatterers such as the outer ears, assists in
eliminating front-back confusion, (a) The

vA
^ \ A curves show the spectrum of a small loudspeak-
-10 - M er as heard in the left ear of a manikin when the
speaker is in front (red), overhead (blue), and in

J
back (green). A comparison of the curves reveals
the relative gains of the anatomical transfer
function, (b) The KEMAR manikin is, in every
-20 - gross anatomical detail, a typical American. It
has silicone outer ears and microphones in its
head. The coupler between the ear canal and the

1,
1
microphone is a cavity tuned to have the input
acoustical impedance of the middle ear. The
-30 KEMAR shown here is in an anechoic room
i i i i
0.2 0.5 1 10 20 accompanied by Tim, an undergraduate physics
FREQUENCY (kHz)
major at Michigan State.

an ILD that points to the left opposed by an ITD that


points to the right, one can study the relative strengths of
these two cues. Not surprisingly, it is found that ILDs
dominate at high frequency and ITDs dominate at low fre-
quency. But perception is not limited to just pointlike
localization; it also includes size and shape. Rivalry exper-
iments such as contradictory ILDs and ITDs lead to a
source image that is diffuse: The image occupies a fuzzy
region within the head that a listener can consistently
describe. The effect can also be measured as an increased
variance in lateralization judgements.
Incorporating the ATF into headphone simulations
considerably expands the menu of bizarre effects. An accu-
rate synthesis of a broadband sound leads to perception
that is like the real world: Auditory images are localized,
externalized, and compact. Making errors in the synthe-
sis, for example progressively zeroing the ITD of spectral ly true for the ITD cue.
lines while retaining the amplitude part of the ATF, can The ITD is particularly vulnerable because it depends
cause the image to come closer to the head, push on the on coherence between the signals in the two ears—that is,
face, and form a blob that creeps into the ear canal and the height of the cross-correlation function, as described in
finally enters the head. The process can be reversed by the box on page 27. Reverberated sound contains no use-
progressively restoring accurate ITD values.8 ful coherent information, and in a large room where
A wide variety of effects can occur, by accident or reflected sound dominates the direct sound, the ITD
design, with inaccurate synthesis. There are a few gener- becomes unreliable.
al rules: Inaccuracies tend to expand the size of the image, By contrast, the ILD fares better. First, as shown by
put the images inside the head, and produce images that headphone experiments, the binaural comparison of inten-
are in back rather than in front. Excellent accuracy is sities does not care whether the signals are binaurally
required to avoid front-back confusion. The technology coherent or not. Such details of neural timing appear to be
permits a listener to hear the world with someone else's stripped away as the ILD is computed. Of course, the ILD
ears, and the usual result is an increase in confusion about accuracy is adversely affected by standing waves in a
front and back. Reduced accuracy often puts all source room, but here the second advantage of the ILD appears:
images in back, although they are nevertheless external- Almost every reflecting surface has the property that its
ized. Further reduction in accuracy puts the images inside acoustical absorption increases with increasing frequency;
the back of the head. as a result, the reflected power becomes relatively smaller
compared to the direct power. Because the binaural neu-
Rooms and reflections rophysiology is capable of using ILDs across the audible
The operations of interaural level and time difference cues spectrum with equal success, it is normally to the listen-
and of spectral cues have normally been tested with head- er's advantage to use the highest frequency information
phones or by sound localization experiments in anechoic that can be heard. Experiments in highly reverberant
rooms, where all the sounds travel in a straight path from environments find listeners doing exactly that, using cues
the source to the listener. Most of our everyday listening, above 8000 Hz. A statistical decision theory analysis using
however, is done in the presence of walls, floors, ceilings, ILDs and ITDs measured with a manikin shows that the
and other large objects that reflect sound waves. These pattern of localization errors observed experimentally can
reflections result in dramatic physical changes to the be understood by assuming that listeners rely entirely on
waveforms. It is hard to imagine how the reflected sounds, ILDs and not at all on ITDs. This strategy of reweighting
coming from all directions, can contribute anything but localization cues is entirely unconscious.
random variation to the cues used in localization.
Therefore, it is expected that the reflections and reverber- The precedence effect
ation introduced by the room are inevitably for the worse There is yet another strategy that listeners unconsciously
as far as sound localization is concerned. That is especial- employ to cope with the distorted localization cues that

28 NOVEMBER 1999 PHYSICS TODAY


FIGURE 5. PRECEDENCE EFFECT demonstration with two
loudspeakers reproducing the same pulsed wave. The pulse
from the left speaker leads in the left ear by a few hundred
microseconds, suggesting that the source is on the left. The
pulse from the right speaker leads in the right ear by a similar
amount, which provides a contradictory localization cue.
Because the listener is closer to the left speaker, the left pulse
arrives sooner and wins the competition—the listener perceives
just one single pulse coming from the left.

In the left ear Conclusions and conjectures


After more than a century of work, there is still much
about sound localization that is not understood. It remains
-•
TIME an active area of research in psychoacoustics and in the
physiology of hearing. In recent years, there has been
In the right ear growing correspondence between perceptual observations,
physiological data on the binaural processing system, and
-• neural modeling. There is good reason to expect that next
TIME year we will understand sound localization better than we
do this year, but it would be wrong to think that we have
occur in a room: They make their localization judgments only to fill in the details. It is likely that next year will
instantly based on the earliest arriving waves in the onset lead to a qualitatively improved understanding with mod-
of a sound. This strategy is known as the precedence els that employ new ideas about neural signal processing.
effect, because the earliest arriving sound wave—the In this environment, it is risky to conjecture about
direct sound with accurate localization information—is future development, but there are trends that give clues.
given precedence over the subsequent reflections and Just a decade ago, it was thought that much of sound
reverberation that convey inaccurate information. Anyone localization in general, and precedence in particular,
who has wandered around a room trying to locate the might be a direct result of interaction at early stages of the
source of a pure tone without hearing the onset can appre- binaural system, as in the superior olive. Recent research
ciate the value of the effect. Without the action of the suggests that the process is more widely distributed, with
precedence effect on the first arriving wave, localization is peripheral centers of the brain such as the superior olive
virtually impossible. There is no ITD information of any sending information—about ILD, about ITD, about spec-
use, and, because of standing waves, the loudness of the trum, and about arrival order—to higher centers where
tone is essentially unrelated to the nearness of the source. the incoming data are evaluated for self-consistency and
The operation of the precedence effect is often thought plausibility, and are probably compared with information
of as a neural gate that is opened by the onset of a sound, obtained visually. Therefore, sound localization is not sim-
accumulates localization information for about 1 ms, and ple; it is a large mental computation. But as the problem
then closes to shut off subsequent localization cues. This has become more complicated, our tools for studying it
operation appears dramatically in experiments where it is have become better. Improved psychophysical techniques
to the listener's advantage to attend to the subsequent for flexible synthesis of realistic stimuli, physiological
cues but the precedence effect prevents it. An alternative experiments probing different neural regions simultane-
model regards precedence as a strong reweighting of local- ously, faster and more precise methods of brain imaging,
ization cues in favor of the earliest sound, because the sub- and more realistic computational models will one day
sequent sound is never entirely excluded from the local- solve this problem of how we localize sound.
ization computation.
Precedence is easily demonstrated with a standard The author is grateful to his colleagues Brad Rakerd, Tim
home stereo system set for monophonic reproduction, so McCaskey, Zachary Constan, and Joseph Gaalaas for help with
that the same signal is sent to both loudspeakers. this article. His work on sound localization is supported by the
Standing midway between the speakers, the listener National Institute on Deafness and Other Communication
hears the sound from a forward direction. Moving half a Disorders, one of the National Institutes of Health.
meter closer to the left speaker causes the sound to appear
to come entirely from that speaker. The analysis of this References
result is that each speaker sends a signal to both ears. 1. J. W. Strutt (Lord Rayleigh), Phil. Mag. 3, 456 (1877).
Each speaker creates an ILD and—of particular impor- 2. W. A. Yost, J. Acoust. Soc. Am. 70, 397 (1981).
tance—an ITD, and these cues compete, as shown in fig- 3. J. W. Strutt (Lord Rayleigh), Phil. Mag. 13, 214 (1907).
4. G. F. Kuhn, J. Acoust. Soc. Am. 62, 157 (1977).
ure 5. Because of the precedence effect, the first sound 5. J. Blauert, Spatial Hearing, 2nd ed., J. S. Allen, trans., MIT
(from the left speaker) wins the competition, and the lis- Press, Cambridge, Mass. (1997).
tener perceives the sound as coming from the left. But 6. F. L. Wightman, D. J. Kistler, J. Acoust. Soc. Am. 105, 2841
although the sound appears to come from the left speaker (1999).
alone, the right speaker continues to contribute loudness 7. N. I. Durlach, H. S. Colburn, in Handbook of Perception, vol.
and a sense of spatial extent. This perception can be veri- 4, E. Carterette, M. P. Friedman, eds., Academic, New York
fied by suddenly unplugging the right speaker—the differ- (1978).
ence is immediately apparent. Thus, the precedence effect 8. W. M. Hartmann, A. T. Wittenberg, J. Acoust. Soc. Am. 99,
is restricted to the formation of a single fused image with 3678(1996).
a definite location. The precedence effect appears not to 9. R. Y. Litovsky, B. Rakerd, T. C. T. Yin, W. M. Hartmann, J.
Neurophysiol. 77, 2223 (1997).
depend solely on interaural differences; it operates also on 10. L. A. Jeffress, J. Comp. Physiol. Psychol. 41, 35 (1948).
the spectral differences caused by anatomical filtering for 11. R. M. Stern, H. S, Colburn, J. Acoust. Soc. Am. 64, 127 (1978).
sources in the midsagittal plane.9 12. T. C. T. Yin, J. C. K. Chan, J. Neurophysiol. 64, 465 (1990). •

NOVEMBER 1999 PHYSICS TODAY 29

You might also like