p76
p76
p76
ABSTRACT
Typical attempts to collect sets of head-related transfer functions (HRTFs) attempt to remove from the derived trans-
fer functions the acoustical properties of the sound source used to make those measurements. This is done so that the
directionally-dependent variations in the binaural response at the receiver’s location can be independently character-
ised. For sound sources that are far from the receiver’s location, this is appropriate and relatively straightforward to
achieve. However, at close range (e.g., at distances between source and receiver position of less than 1 meter), char-
acterising the variation in derived transfer functions that is dependent upon the acoustical characteristics of an orally-
radiated source becomes potentially useful. The measurements reported here attempted to capture both source and
receiver characteristics for a particular case, that in which the sound source is radiated from the mouth of an anthro-
pomorphic manikin, and is received at the ear of a nearby manikin. Substantial range-dependent variation in the
measured transfer functions was observed, clearly due to the presence of reflections between the surfaces of the
source manikin and the receiver manikin’s head. These results have implications for spoken telecommunication ap-
plications employing headphone-based virtual acoustic simulations.
Acoustics 2011 1
2-4 November 2011, Gold Coast, Australia Proceedings of ACOUSTICS 2011
METHOD
Measurements were made in an anechoic room. The receiv-
ing Head And Torso Simulator (HATS) was mounted on a
turntable, with a pole supporting the HATS manikin at an ear
height of 1.5 m above the acoustically transparent floor. This
HATS model was Brüel & Kjær type 4100, which has micro-
phones where the entrance of the ear canals would be in a
human ear. For successive measurements, this receiving
HATS was rotated in 2º increments, starting with the nose
directly facing the sound source (0º), the configuration shown
in Figure 1, and finishing with the nose facing away from the
sound source (180º) – yielding 91 orientations. Note that the
manikins were carefully positioned using a laser that is visi-
ble just over the left manikin’s nose in Figure 1.
The sound source was a HATS (Brüel & Kjær type 4128C),
which features a mouth simulator. A Brüel & Kjær type The data most simply derived from the measurements was
4134 microphone was mounted at the source (The HATS has impulse responses from the system output to the receiver
two possible microphone positions, and we used the position HATS’ ear microphones (by convolving the recorded signals
with the microphone right at the mouth). Calibration tones with the inverse-filter of the test signal). Following this,
were recorded on all microphones so that the gains associated head-related impulse responses (HRIRs) were derived from
with transfer functions could be derived. the transfer function between the head-centre microphone
and the ear microphones. As reflective surfaces become close
The sound source was positioned at four distances from the to high impedance (constant volume velocity) sound sources,
centre of the interaural axis of the receiving HATS when it the sound pressure may increase, and this would be seen in
was facing the source: 2 m, 0.71 m, 0.35 m and 0.25 m. the HRIRs derived in this way. Since a real mouth is best
Closer distances were not possible because the torso of the modelled as constant volume velocity, this effect may be
rotating HATS would have collided with the HATS sound regarded as a beneficial contribution to the measurements.
source. Hence, in total, 91x4 binaural measurements were However, using the reference microphone on the sound
made. The distances mentioned here should be regarded as source, it was also possible to derive HRIRs for a constant
nominal distances, because in fact the distance increased as pressure (low impedance) sound source.
the HATS rotation angle increased (because the mounting
position of the pole supporting the HATS was slightly behind In deriving the impulse responses and transfer functions men-
the interaural axis). tioned above, we truncated the signal in both frequency and
time. In the frequency domain, very low and very high spec-
The measurement was made using a logarithmic sinusoidal tral components were suppressed (using a window function)
sweep (cf. Farina, 2000) with a frequency range extending to remove noise that was outside of the measurement range.
between 50 Hz and 20 kHz, at a duration of 45 s and sam- In the time domain, similarly a window function was used to
pling rate of 44.1 kHz. Impulse responses were derived from suppress sound outside the plausible time response of the
the sweep recordings from each microphone. For measure- system (e.g., from rogue reflections in the room). Examina-
ments with the head-centre microphone, we synchronously tion of the results for such artefacts revealed some small un-
averaged six impulse responses to further increase signal to wanted reflections still remained within the analysis window;
noise ratio. For the measurements with HATS as receiver, we however, these were much lower in level than those due to
made a single sweep recording per angle per distance. the reflections of interest, those being the reflections between
the head and torso of the two manikins. Time-domain dis-
play of these substantial reflections of interest, and their
range-dependent and azimuth-dependent effects upon the
obtained magnitude reponses, are presented in the following
section.
2 Acoustics 2011
Proceedings of ACOUSTICS 2011 2-4 November 2011, Gold Coast, Australia
Figure 2. Two graphical perspectives on the HRIRs derived from the measurements made at three source ranges are displayed sepa-
rately here as a function of azimuth angle (on the x-axis) and time (on the y-axis). The y-axis values range from 0 ms, corresponding
to the arrival time of the source at 90 deg incidence, through the first 5 ms of the impulse response. The top row of three images
shows from left to right the original amplitude of the envelope functions observed at source ranges of .25 m, .35 m, and .71 m, re-
spectively (indicated by the yellow characters internal to each image). The row of three images on the bottom present an enhanced
display of the low-level reflections that occur between 1 and 3 ms after arrival of the first wavefront (see text for a description of the
enhancement method). The colorbar on the left shows the colormap that was used to display the amplitude of the envelope function
for all six of the images displayed here, with colours ranging from the lowest brightness for minimum data values to the highest
brightness for maximum amplitudes observed.
Acoustics 2011 3
2-4 November 2011, Gold Coast, Australia Proceedings of ACOUSTICS 2011
Figure 3. Images showing the difference in magnitude between ipsilateral HRTFs measured at close range relative to a reference set
of HRTFs measured at a source range of 2 m. As in Figure 2, the results from the measurements at three ranges are displayed as a
function of azimuth angle (on the x-axis), but only for incidence in the frontal hemifield. In contrast to Figure 2, the y-axis parame-
ter of each image is frequency, with values ranging from 0 to 8 kHz. Again, the three images show results observed at source ranges
of .25 m, .35 m, and .71 m, respectively (labelled in yellow characters). Since the displayed data is a magnitude differences relative
to a reference measure, a different colormap is used in this figure, so that positive and negative deviations from the reference magni-
tude can be easily distinguished. Positive deviations (indicating responses greater than the reference) are always colour coded in
shades of magenta, and negative deviations (responses below that of the reference) are always colour coded in shades of cyan. The
colorbar on the right shows that data values near zero dB are colour coded as black.
In order to enable visual inspection of the range dependence when a reflection with an amplitude just a bit less than the
of HRTF magnitude response, the magnitude response sur- direct sound were to summed with it a relatively constant
face (in decibels) with azimuth and frequency as parameters delay. However, as the receiver HATS was rotated so that
associated with the reference measurements made at 2 m was the receiving ear faced the source more directly, up to around
subtracted from the HRTF magnitude responses data meas- 75 degrees azimuth, the pattern of the ripple shifts toward the
ured at the three smaller source ranges. Just as in Figure 2, higher frequencies, as the reflection latency is reduced.
but in the frequency domain instead of the time domain, the When the azimuth angle approaches 90 degrees, however, the
three images show results observed at source ranges of .25 m, pattern is not so clear. The same type of behaviour is seen at
.35 m, and .71 m, respectively. At the largest of the three the .35-m source range, although the peaks and troughs are
range values (.71 m), the differences from the reference more closely space. This is what is expected for the slightly
measurements are so small that very few details are visible. longer reflection latencies in this case. The conclusion re-
In effect, the response surface visualized in this rightmost garding the contribution of reflections of interest between the
image appears nearly black because the deviations are all two manikins is that substantial modulation of the response is
near 0 dB (as indicated in the colorbar on the right of the to be expected only when the source manikin is within arm’s
figure). At the two smaller source range values, however, reach of the receiving manikin, since the modulation all but
the deviations approach extremes of 6 dB, as the responses disappears as the source range increases through the .71 m
were modulated above and below the magnitude of the refer- case that was observed here. But the pattern of variation in
ence measurement in a manner of the ripple pattern associ- responses measured at the ipsilateral tells only half the story;
ated with a comb filtering effect. Indeed, the regular pattern therefore, modulation in contralateral ear responses was also
of peaks and troughs that is clearly visible for azimuth angles observed.
between 0 and 35 degrees is just what would be expected
4 Acoustics 2011
Proceedings of ACOUSTICS 2011 2-4 November 2011, Gold Coast, Australia
Figure 4. Images showing the difference in magnitude between contralateral HRTFs measured at close range relative to a reference
set of HRTFs is displayed here just as was done for the ipsilateral HRTFs in Figure 3. Note that the images displayed here were gen-
erated using the same colormap as that used in Figure 3, so that the extreme positive and negative deviations from the reference mag-
nitude reached at the contralateral ear could be directly compared with that of the ipsilateral ear.
Acoustics 2011 5
2-4 November 2011, Gold Coast, Australia Proceedings of ACOUSTICS 2011
be, as compared with the more straightforward extension of information to the listener when virtual acoustic simulations
more conventional dry HRTF-based systems, such as that include acoustical cues resembling those that were measured
reviewed by Martens (2003). That review examined the de- between two manikins in the current study.
velopment and evaluation of a binaural synthesis system
providing close-range HRTF-based cues to source range, CONCLUSION
which system has been described also as a 'near-field virtual
audio display’ by Brungart (2002). The directionally-dependent variations in the binaural re-
sponse were measured for an orally-radiated source at a num-
The reason this application of binaural technology is consid- ber of nearby source ranges, relative to a more distant source.
ered to be important is that control of perceived source range At the two closest ranges tested (.25 m and .35 m), substan-
in auditory displays is a complicated matter that requires tial modulation was observed that resembled the comb-
some more sophisticated treatment than the attention to level filtering effect associated with the summing of a direct sound
cues alone. The level-based cue is the only range-related with a single reflection arriving at short latency (between 1 m
parameter that typically is manipulated (i.e., that due to direct and 3 ms), and at a level near that of the direct sound. This
sound propagation attenuation). Of course there are many modulation in HRTF magnitude response was even more
cues that can contribute to the modulation of the perceived pronounced at the contralateral ear. A slightly more distant
range of a sound source. For example, Little, Mershon, & source (at .71 m) showed much less modulation in HRTF
Cox (1992) have examined the role of spectral content as a magnitude, and the reflection pattern was so low in level at
cue to perceived auditory distance of the direct sound. Per- this longer latency that an enhanced visualization did not
haps even more important, however, is the role of indirect render it visible. These results have implications for applica-
sound resulting from room reflections in providing more tions of binaural technology, especially for spoken telecom-
graphic modulation of source range when more realistic spa- munication systems employing headphone-based virtual
tial impressions are desired. In this case, it is crucial to con- acoustic simulations. Although no subjective tests using hu-
sider the role of indirect sound that arrives soon after the man listeners have been run to establish the relative impor-
direct sound when these room reflections are included in tance of these modulations, it is proposed that such patterns
what the human listener hears (Bronkhorst & Houtgast, may make a clearly audible difference when compared to
1999). Indeed, Martens (2004) has shown that when simu- virtual acoustic rendering solutions that include only dry
lated reflections based upon a small virtual acoustic environ- HRTFs in their close-range simulation of orally-radiated
ment are included in a headphone-based auditory display, sources.
much improved externalization of the virtual sources may be
expected. Furthermore, as the indirect sound simulation is REFERENCES
held constant for sources in the space quite nearby the lis-
tener, source range was shown to be manipulated in a manner Blauert, J 1997, 'Spatial Hearing,' MIT Press, 1997.
that decouples direct sound level (i.e., loudness) from the
range control made possible by including range-related Bronkhorst A & Houtgast T 1999, ‘Auditory distance percep-
HRTF variation in the sound processing. For example, the tion in rooms,’ Nature, 397:517-520.
results reported by Martens (2004) indicated that the decrease
in source range associated with a 9dB increase in interaural Brungart D 2002, 'Near-field virtual audio displays,' Pres-
level difference (capturing the range-dependent variation of ence: Teleoperators & Virtual Environments, 11(1):93-
close-range HRTFs), could be counteracted by a 3dB de- 106.
crease in direct sound level. One interesting implication of
this finding is that source range and source loudness could be Brungart D & Rabinowitz W 1999, 'Auditory localization of
somewhat decoupled, at least for source quite nearby a lis- nearby sources: Head-related transfer functions,' The
tener’s ear. What was not studied in that previous investiga- Journal of the Acoustical Society of America,
tion of virtual source range control was the special situation 106(3):1465-1479.
in which a sound source is radiated from a talker’s mouth,
and received at the ear of a nearby listener. Duda RO & Martens WL 1998, 'Range dependence of the
response of a spherical head model,' The Journal of the
Of particular interest here is the potential application of the Acoustical Society of America, 104(5):3048-3058.
more comprehensive virtual acoustic simulation suggested
here for applications of binaural technology in the develop- Farina A 2000, 'Simultaneous measurement of impulse re-
ment of spoken telecommunication systems, such as that sponse and distortion with a swept-sine technique,' Pro-
described by Kan, Pope, Jin, & van Schaik (2004). But what ceedings of the 108th AES Convention, Paris, France.
may have been less well appreciated in recent literature on
this topic are the benefits that might be derived from the in- Kan A, Pope G, Jin C, & van Schaik A 2004, 'Mobile spatial
clusion of sophisticated application of binaural technologies audio communication system,' In Proceedings of ICAD
such as those proposed in the current paper. For example, 04-Tenth Meeting of the International conference on
when the voice of a talker at one end of a transmission is Auditory Display, Sydney, Australia, 2004.
treated as a virtual acoustic source that is allowed to approach
quite close to the receiver’s position, the possibility for deliv- Little AD, Mershon DH, & Cox PH 1992, 'Spectral content
ering messages in confidence that sound like a whisper in the as a cue to perceived auditory distance,' Perception,
listener’s ear may be enabled (Martens & Yoshida, 2000). It 21(3):405-416.
might be said that the natural sound of a source would afford
immediate awareness by the listener that a message was be- Martens, WL 2003, ‘Perceptual evaluation of filters control-
ing delivered in confidence, due to the apparent source prox- ling source direction: Customized and generalized
imity rather than any explicit statement confirming the HRTFs for binaural synthesis,’ Acoustical Science and
talker’s intention to deliver such a confidential message. It Technology, 24 (5), 220-232.
will be interesting to examine in more depth whether the
reflections between talker and receiver will provide useful
6 Acoustics 2011
Proceedings of ACOUSTICS 2011 2-4 November 2011, Gold Coast, Australia
Acoustics 2011 7