p76

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Paper Number 76, Proceedings of ACOUSTICS 2011 2-4 November 2011, Gold Coast, Australia

Close-range variation in binaural responses


to orally-radiated sources

William L. Martens, Densil Cabrera, and Ken Stewart


Faculty of Architecture, Design and Planning, University of Sydney, Australia

ABSTRACT
Typical attempts to collect sets of head-related transfer functions (HRTFs) attempt to remove from the derived trans-
fer functions the acoustical properties of the sound source used to make those measurements. This is done so that the
directionally-dependent variations in the binaural response at the receiver’s location can be independently character-
ised. For sound sources that are far from the receiver’s location, this is appropriate and relatively straightforward to
achieve. However, at close range (e.g., at distances between source and receiver position of less than 1 meter), char-
acterising the variation in derived transfer functions that is dependent upon the acoustical characteristics of an orally-
radiated source becomes potentially useful. The measurements reported here attempted to capture both source and
receiver characteristics for a particular case, that in which the sound source is radiated from the mouth of an anthro-
pomorphic manikin, and is received at the ear of a nearby manikin. Substantial range-dependent variation in the
measured transfer functions was observed, clearly due to the presence of reflections between the surfaces of the
source manikin and the receiver manikin’s head. These results have implications for spoken telecommunication ap-
plications employing headphone-based virtual acoustic simulations.

was produced at the mouth of an anthropomorphic manikin,


INTRODUCTION and received at the ear canal entrance of a second anthropo-
morphic manikin. The primary goal of this study is to ob-
It is well established that measured head-related transfer serve variations that are clearly dependent upon the presence
functions (HRTFs) vary with source azimuth and elevation of reflections between the surfaces of the source manikin and
(Blauert, 1997), but less well studied is how HRTFs vary the receiver manikin’s head and torso.
with range as well (see, e.g., Brungart & Rabinowitz, 1999).
However, when measurements are made at close range (i.e., In a previous study (Duda & Martens, 1998), an attempt was
less than 1 meter from source to receiver position), substan- made to develop a better understanding of the close-range
tial variations in the measured transfer functions can be ob- variation in the HRTF through a theoretical and experimental
served under different measurement conditions, due to the investigation of the response on the surface of an ideal rigid
difficulty in delivering sound from a loudspeaker without the sphere. An algorithm was developed for computing the varia-
loudspeaker’s presence affecting the measurement in a range- tion in sound pressure at the surface of the sphere as a func-
dependent fashion. Theoretically, measurements would show tion of direction and range to the sound source. Impulse
range dependence even if it were possible to employ an ideal responses were measured using a hard-surfaced sphere (in
point source that would present no physical structure within fact, a bowling ball) at a number of source ranges and many
the time window of analysis. Such range dependence has azimuth angles. The results may be summarized as follows:
been studied analytically for the ideal spherical head, and the First, the experimental measurements were in close agree-
associated predicted responsess have agreed well with meas- ment with the theoretical solution. Second, the variation of
urements under nearly ideal conditions (Duda & Martens, low-frequency interaural level difference with range was
1998). In additional to these range-dependent variations found to be quite substantial for source ranges smaller than
which occur even in the ideal case, there is an additional about five times the sphere radius. Third, the impulse re-
range dependent factor that is found in most practical HRTF sponse revealed the source of the ripples observed in the
measurements, which is a factor that is due to the physical magnitude response, and provided direct evidence that the
structure of the test loudspeaker. In almost every measure- interaural time difference (ITD) is not a strong function of
ment system, the loudspeaker presents a reflecting surface the range. Finally, the transfer function for the ideal sphere ap-
acoustical effects of which are hard to remove from the de- pears to be minimum-phase, permitting exact recovery of the
sired HRTF to be derived form the raw test signal. impulse response from the magnitude response in the fre-
quency domain.
Rather than attempt to correct for the acoustical influence of
the spatially extended transducer, the acoustical measure- These prior results set the stage for the current investigation
ments reported here intentionally attempt to capture both of a range dependent factor that was not included in the Duda
source and receiver characteristics for a particular case of & Martens (1998) study. Because the acoustics of real life
interest. The case under test is representative of an acoustical situations include reflections between source and receiver
situation that is found in everyday life when a human listener that the prior study explicitly excluded, the current study
is positioned within arm’s reach of a human talker. Instead addresses this issue specifically. In particular, the influence
of using human subjects, however, the test case reported here of these reflections on binaural responses measured for an
used a pair of anthropomorphic manikins, so that many orally-radiated source will be examined as a function of
measurements could be taken at high spatial resolution over a source range and receiver azimuth angle. Responses will be
wide range of receiver azimuth angles without fear of finding observed both in the time and frequency domains, so that the
error variance that typically is introduced when using human patterns of ripples in HRTF magnitude can be related to the
subjects. So the orally-radiated source signal in this study reflection patterns.

Acoustics 2011 1
2-4 November 2011, Gold Coast, Australia Proceedings of ACOUSTICS 2011

METHOD
Measurements were made in an anechoic room. The receiv-
ing Head And Torso Simulator (HATS) was mounted on a
turntable, with a pole supporting the HATS manikin at an ear
height of 1.5 m above the acoustically transparent floor. This
HATS model was Brüel & Kjær type 4100, which has micro-
phones where the entrance of the ear canals would be in a
human ear. For successive measurements, this receiving
HATS was rotated in 2º increments, starting with the nose
directly facing the sound source (0º), the configuration shown
in Figure 1, and finishing with the nose facing away from the
sound source (180º) – yielding 91 orientations. Note that the
manikins were carefully positioned using a laser that is visi-
ble just over the left manikin’s nose in Figure 1.

The receiving manikin was rotated clockwise (when viewed


from above), meaning that the left ear was ipsilateral, and the Figure 1. Photograph of the pair of anthropomorphic mani-
right ear contralateral. It should be pointed out here that the kins that were employed to make measurements between the
employed two-manikin test situation requires a distinction to orally-radiated source (right manikin) and the ear canal en-
be made between azimuthal variation due to rotation of the trance of the receiver (left manikin) as that receiver is rotated
receiver, and azimuthal variation due to rotation of the source through a 180 deg. range of azimuth angles. Note the pres-
manikin, the later potentially exhibiting directional depend- ence of a microphone at the mouth opening of the source
ence as a radiator relative to a fixed receiver location. Such manikin (on the right) that could be used to examine directly
directional dependence in oral radiation with source-manikin the oral-to-ear transfer function between manikins.
rotation was not investigated in the current study.

The sound source was a HATS (Brüel & Kjær type 4128C),
which features a mouth simulator. A Brüel & Kjær type The data most simply derived from the measurements was
4134 microphone was mounted at the source (The HATS has impulse responses from the system output to the receiver
two possible microphone positions, and we used the position HATS’ ear microphones (by convolving the recorded signals
with the microphone right at the mouth). Calibration tones with the inverse-filter of the test signal). Following this,
were recorded on all microphones so that the gains associated head-related impulse responses (HRIRs) were derived from
with transfer functions could be derived. the transfer function between the head-centre microphone
and the ear microphones. As reflective surfaces become close
The sound source was positioned at four distances from the to high impedance (constant volume velocity) sound sources,
centre of the interaural axis of the receiving HATS when it the sound pressure may increase, and this would be seen in
was facing the source: 2 m, 0.71 m, 0.35 m and 0.25 m. the HRIRs derived in this way. Since a real mouth is best
Closer distances were not possible because the torso of the modelled as constant volume velocity, this effect may be
rotating HATS would have collided with the HATS sound regarded as a beneficial contribution to the measurements.
source. Hence, in total, 91x4 binaural measurements were However, using the reference microphone on the sound
made. The distances mentioned here should be regarded as source, it was also possible to derive HRIRs for a constant
nominal distances, because in fact the distance increased as pressure (low impedance) sound source.
the HATS rotation angle increased (because the mounting
position of the pole supporting the HATS was slightly behind In deriving the impulse responses and transfer functions men-
the interaural axis). tioned above, we truncated the signal in both frequency and
time. In the frequency domain, very low and very high spec-
The measurement was made using a logarithmic sinusoidal tral components were suppressed (using a window function)
sweep (cf. Farina, 2000) with a frequency range extending to remove noise that was outside of the measurement range.
between 50 Hz and 20 kHz, at a duration of 45 s and sam- In the time domain, similarly a window function was used to
pling rate of 44.1 kHz. Impulse responses were derived from suppress sound outside the plausible time response of the
the sweep recordings from each microphone. For measure- system (e.g., from rogue reflections in the room). Examina-
ments with the head-centre microphone, we synchronously tion of the results for such artefacts revealed some small un-
averaged six impulse responses to further increase signal to wanted reflections still remained within the analysis window;
noise ratio. For the measurements with HATS as receiver, we however, these were much lower in level than those due to
made a single sweep recording per angle per distance. the reflections of interest, those being the reflections between
the head and torso of the two manikins. Time-domain dis-
play of these substantial reflections of interest, and their
range-dependent and azimuth-dependent effects upon the
obtained magnitude reponses, are presented in the following
section.

2 Acoustics 2011
Proceedings of ACOUSTICS 2011 2-4 November 2011, Gold Coast, Australia

Figure 2. Two graphical perspectives on the HRIRs derived from the measurements made at three source ranges are displayed sepa-
rately here as a function of azimuth angle (on the x-axis) and time (on the y-axis). The y-axis values range from 0 ms, corresponding
to the arrival time of the source at 90 deg incidence, through the first 5 ms of the impulse response. The top row of three images
shows from left to right the original amplitude of the envelope functions observed at source ranges of .25 m, .35 m, and .71 m, re-
spectively (indicated by the yellow characters internal to each image). The row of three images on the bottom present an enhanced
display of the low-level reflections that occur between 1 and 3 ms after arrival of the first wavefront (see text for a description of the
enhancement method). The colorbar on the left shows the colormap that was used to display the amplitude of the envelope function
for all six of the images displayed here, with colours ranging from the lowest brightness for minimum data values to the highest
brightness for maximum amplitudes observed.

that occur between 1 and 3 ms after arrival of the first wave-


RESULTS front. Furthermore, the pattern of variation shows the ex-
pected modulation of a primary reflection attributed to the
In order to enable visual inspection of the measured impulse proximity of the two manikins, reaching its longest delay
responses, two graphical perspectives on the HRIRs are dis- when the receiving manikin faces either directly towards or
played in Figure 2, both using a ‘geological’ colour map directly away from the source manikin. The pattern looks
indicating amplitude over a response surface with time and almost the same in the middle image visualizing the results
azimuth angle as independent parameters. The HRIRs are observed at a source range of .35 m, except that the delays
displayed over the first 5ms following the arrival of the first are longer and not quite as pronounced, as expected from the
wavefront at each of the 91 azimuth angles at which meas- increased source range. At a source range of .71 m the re-
urements were taken. The top row of images shows the am- flection pattern is not so clearly visible, as is expected from
plitude of the envelope functions for the HRIRs as observed the loss in level of the reflection relative to the direct sound.
at source ranges of .25 m, .35 m, and .71 m. These images This detail should also make a clear distinction in the fre-
are labelled as ‘original’ to distinguish this simple visualiza- quency domain as well: Whereas the higher level reflection
tion from the ‘enhanced’ version of each displayed in the found at the two smaller source ranges should result in a clear
second row. The enhancement uses a technique commonly comb filtering effect in the magnitude response, at the larger
used in image processing to highlight edges in photographic source range, the ripples in the magnitude response due to the
images (using the ‘unsharp mask’ approach, which subtracts reflection should not show so clearly. This detail will be
a Gaussian blur of the original image from itself, enabled by examined next in this section, but will not be examined for
the MATLAB routine fspecial). It is the bottom row of im- the whole 180-degree set of azimuth angles in the following.
ages that shows best the pattern of the reflections of interest Note that the reflection pattern was more pronounced in the
between the two manikins. In the leftmost plot of HRIRs in frontal region (between 0 and 90 degrees azimuth), and so the
Figure 2 that were observed at a source range of .25 m the magnitude response data will be examined only for these
longest latency reflections that are clearly visible are those data.

Acoustics 2011 3
2-4 November 2011, Gold Coast, Australia Proceedings of ACOUSTICS 2011

Figure 3. Images showing the difference in magnitude between ipsilateral HRTFs measured at close range relative to a reference set
of HRTFs measured at a source range of 2 m. As in Figure 2, the results from the measurements at three ranges are displayed as a
function of azimuth angle (on the x-axis), but only for incidence in the frontal hemifield. In contrast to Figure 2, the y-axis parame-
ter of each image is frequency, with values ranging from 0 to 8 kHz. Again, the three images show results observed at source ranges
of .25 m, .35 m, and .71 m, respectively (labelled in yellow characters). Since the displayed data is a magnitude differences relative
to a reference measure, a different colormap is used in this figure, so that positive and negative deviations from the reference magni-
tude can be easily distinguished. Positive deviations (indicating responses greater than the reference) are always colour coded in
shades of magenta, and negative deviations (responses below that of the reference) are always colour coded in shades of cyan. The
colorbar on the right shows that data values near zero dB are colour coded as black.

In order to enable visual inspection of the range dependence when a reflection with an amplitude just a bit less than the
of HRTF magnitude response, the magnitude response sur- direct sound were to summed with it a relatively constant
face (in decibels) with azimuth and frequency as parameters delay. However, as the receiver HATS was rotated so that
associated with the reference measurements made at 2 m was the receiving ear faced the source more directly, up to around
subtracted from the HRTF magnitude responses data meas- 75 degrees azimuth, the pattern of the ripple shifts toward the
ured at the three smaller source ranges. Just as in Figure 2, higher frequencies, as the reflection latency is reduced.
but in the frequency domain instead of the time domain, the When the azimuth angle approaches 90 degrees, however, the
three images show results observed at source ranges of .25 m, pattern is not so clear. The same type of behaviour is seen at
.35 m, and .71 m, respectively. At the largest of the three the .35-m source range, although the peaks and troughs are
range values (.71 m), the differences from the reference more closely space. This is what is expected for the slightly
measurements are so small that very few details are visible. longer reflection latencies in this case. The conclusion re-
In effect, the response surface visualized in this rightmost garding the contribution of reflections of interest between the
image appears nearly black because the deviations are all two manikins is that substantial modulation of the response is
near 0 dB (as indicated in the colorbar on the right of the to be expected only when the source manikin is within arm’s
figure). At the two smaller source range values, however, reach of the receiving manikin, since the modulation all but
the deviations approach extremes of 6 dB, as the responses disappears as the source range increases through the .71 m
were modulated above and below the magnitude of the refer- case that was observed here. But the pattern of variation in
ence measurement in a manner of the ripple pattern associ- responses measured at the ipsilateral tells only half the story;
ated with a comb filtering effect. Indeed, the regular pattern therefore, modulation in contralateral ear responses was also
of peaks and troughs that is clearly visible for azimuth angles observed.
between 0 and 35 degrees is just what would be expected

4 Acoustics 2011
Proceedings of ACOUSTICS 2011 2-4 November 2011, Gold Coast, Australia

Figure 4. Images showing the difference in magnitude between contralateral HRTFs measured at close range relative to a reference
set of HRTFs is displayed here just as was done for the ipsilateral HRTFs in Figure 3. Note that the images displayed here were gen-
erated using the same colormap as that used in Figure 3, so that the extreme positive and negative deviations from the reference mag-
nitude reached at the contralateral ear could be directly compared with that of the ipsilateral ear.

of the reflected sound, that brings these two acoustical com-


For comparison of the range dependence of HRTF magnitude ponents of the response closer in level, producing greater
response deviations between those observed at the ipsilateral modulation than in the ipsilateral case, where the direct sound
ear and those observed at the contralateral ear, the contralat- is relatively stronger than the reflected sound.
eral magnitude differences are displayed in Figure 4. Just as
in Figure 3, the three images show results observed at source DISCUSSION
ranges of .25 m, .35 m, and .71 m, respectively. Again, at the
largest of the three range values (.71 m), the differences from The directionally-dependent variation that was observed in
the reference measurements are very small at most azimuth the measured HRTFs for an orally-radiated source showed a
angles, with the exception that increased modulation is ob- clear range dependence just as expected from the results of
served as the receiver azimuth angle approaches 90 degrees. related previous studies (Duda & Martens, 1998; Brungart &
Although the response surface visualized in this rightmost Rabinowitz, 1999; Qu et al., 2009). However, the range-
image appears nearly black at smaller azimuth angles, more dependent modulation of HRTF magnitude associated with
substantial modulation is observed between 70 and 90 de- the reflections between source and receiver has apparently
grees, swinging nearly 6 dB above and below the reference. received no attention as a matter of interest in its own regard,
The source of this modulation is the subject of further inves- as was the perspective for the current study. Indeed, a rela-
tigation, and no speculation as to the cause of this phenome- tively recent review of auditory distance perception in hu-
non will be presented here. At the two smaller source range mans (Zahorik, et al., 2005), the summary of past and present
values, however, the deviations in magnitude are quite simi- research that was provided did not include any mention of the
lar to those in the ipsilateral ear, although the modulation close-range reflections investigated in the current study.
appears to be a bit greater at the contralateral ear. This oc- Whether the investigated phenomena are important in audi-
curs most likely because of the more extreme contralateral tory distance perception or not should be established through
attenuation of the direct sound that is observed at such close subjective testing using human listeners. Suffice it to say that
range. Indeed, quite substantial increases in the head shad- the substantial modulation in the HRTF magnitude that is
owing effect are well known, even in the analytical solution observed at close range, when referenced to HRTFs measured
in the case of the ideal sphere response, as explained by Duda at greater distance between source and receiver, suggests that
and Martens (1998). In the case of the comb-filtering effects the effects will be quite audible. More important than audi-
observed in the contralateral ear in the present study, it may bility, however, is the consideration of how effective the
be that the greater attenuation of direct sound, relative to that inclusion of such effects in a virtual acoustic simulation will

Acoustics 2011 5
2-4 November 2011, Gold Coast, Australia Proceedings of ACOUSTICS 2011

be, as compared with the more straightforward extension of information to the listener when virtual acoustic simulations
more conventional dry HRTF-based systems, such as that include acoustical cues resembling those that were measured
reviewed by Martens (2003). That review examined the de- between two manikins in the current study.
velopment and evaluation of a binaural synthesis system
providing close-range HRTF-based cues to source range, CONCLUSION
which system has been described also as a 'near-field virtual
audio display’ by Brungart (2002). The directionally-dependent variations in the binaural re-
sponse were measured for an orally-radiated source at a num-
The reason this application of binaural technology is consid- ber of nearby source ranges, relative to a more distant source.
ered to be important is that control of perceived source range At the two closest ranges tested (.25 m and .35 m), substan-
in auditory displays is a complicated matter that requires tial modulation was observed that resembled the comb-
some more sophisticated treatment than the attention to level filtering effect associated with the summing of a direct sound
cues alone. The level-based cue is the only range-related with a single reflection arriving at short latency (between 1 m
parameter that typically is manipulated (i.e., that due to direct and 3 ms), and at a level near that of the direct sound. This
sound propagation attenuation). Of course there are many modulation in HRTF magnitude response was even more
cues that can contribute to the modulation of the perceived pronounced at the contralateral ear. A slightly more distant
range of a sound source. For example, Little, Mershon, & source (at .71 m) showed much less modulation in HRTF
Cox (1992) have examined the role of spectral content as a magnitude, and the reflection pattern was so low in level at
cue to perceived auditory distance of the direct sound. Per- this longer latency that an enhanced visualization did not
haps even more important, however, is the role of indirect render it visible. These results have implications for applica-
sound resulting from room reflections in providing more tions of binaural technology, especially for spoken telecom-
graphic modulation of source range when more realistic spa- munication systems employing headphone-based virtual
tial impressions are desired. In this case, it is crucial to con- acoustic simulations. Although no subjective tests using hu-
sider the role of indirect sound that arrives soon after the man listeners have been run to establish the relative impor-
direct sound when these room reflections are included in tance of these modulations, it is proposed that such patterns
what the human listener hears (Bronkhorst & Houtgast, may make a clearly audible difference when compared to
1999). Indeed, Martens (2004) has shown that when simu- virtual acoustic rendering solutions that include only dry
lated reflections based upon a small virtual acoustic environ- HRTFs in their close-range simulation of orally-radiated
ment are included in a headphone-based auditory display, sources.
much improved externalization of the virtual sources may be
expected. Furthermore, as the indirect sound simulation is REFERENCES
held constant for sources in the space quite nearby the lis-
tener, source range was shown to be manipulated in a manner Blauert, J 1997, 'Spatial Hearing,' MIT Press, 1997.
that decouples direct sound level (i.e., loudness) from the
range control made possible by including range-related Bronkhorst A & Houtgast T 1999, ‘Auditory distance percep-
HRTF variation in the sound processing. For example, the tion in rooms,’ Nature, 397:517-520.
results reported by Martens (2004) indicated that the decrease
in source range associated with a 9dB increase in interaural Brungart D 2002, 'Near-field virtual audio displays,' Pres-
level difference (capturing the range-dependent variation of ence: Teleoperators & Virtual Environments, 11(1):93-
close-range HRTFs), could be counteracted by a 3dB de- 106.
crease in direct sound level. One interesting implication of
this finding is that source range and source loudness could be Brungart D & Rabinowitz W 1999, 'Auditory localization of
somewhat decoupled, at least for source quite nearby a lis- nearby sources: Head-related transfer functions,' The
tener’s ear. What was not studied in that previous investiga- Journal of the Acoustical Society of America,
tion of virtual source range control was the special situation 106(3):1465-1479.
in which a sound source is radiated from a talker’s mouth,
and received at the ear of a nearby listener. Duda RO & Martens WL 1998, 'Range dependence of the
response of a spherical head model,' The Journal of the
Of particular interest here is the potential application of the Acoustical Society of America, 104(5):3048-3058.
more comprehensive virtual acoustic simulation suggested
here for applications of binaural technology in the develop- Farina A 2000, 'Simultaneous measurement of impulse re-
ment of spoken telecommunication systems, such as that sponse and distortion with a swept-sine technique,' Pro-
described by Kan, Pope, Jin, & van Schaik (2004). But what ceedings of the 108th AES Convention, Paris, France.
may have been less well appreciated in recent literature on
this topic are the benefits that might be derived from the in- Kan A, Pope G, Jin C, & van Schaik A 2004, 'Mobile spatial
clusion of sophisticated application of binaural technologies audio communication system,' In Proceedings of ICAD
such as those proposed in the current paper. For example, 04-Tenth Meeting of the International conference on
when the voice of a talker at one end of a transmission is Auditory Display, Sydney, Australia, 2004.
treated as a virtual acoustic source that is allowed to approach
quite close to the receiver’s position, the possibility for deliv- Little AD, Mershon DH, & Cox PH 1992, 'Spectral content
ering messages in confidence that sound like a whisper in the as a cue to perceived auditory distance,' Perception,
listener’s ear may be enabled (Martens & Yoshida, 2000). It 21(3):405-416.
might be said that the natural sound of a source would afford
immediate awareness by the listener that a message was be- Martens, WL 2003, ‘Perceptual evaluation of filters control-
ing delivered in confidence, due to the apparent source prox- ling source direction: Customized and generalized
imity rather than any explicit statement confirming the HRTFs for binaural synthesis,’ Acoustical Science and
talker’s intention to deliver such a confidential message. It Technology, 24 (5), 220-232.
will be interesting to examine in more depth whether the
reflections between talker and receiver will provide useful

6 Acoustics 2011
Proceedings of ACOUSTICS 2011 2-4 November 2011, Gold Coast, Australia

Martens WL 2004, 'Decoupled loudness and range control for


a source located within a small virtual acoustic environ-
ment,' Proceedings of ICAD 04-Tenth Meeting of the In-
ternational conference on Auditory Display, Sydney,
Australia, 2004.

Martens WL & Yoshida A 2000, 'Psychoacoustically-based


control of auditory range: Display of virtual sound
sources in the listener's personal space,' Proceedings of
the International Conference on Information Society in
the 21st Century: Emerging Technologies and New Chal-
lenges (IS2000), Aizu-Wakamatsu, Japan.

Qu TS, Xiao Z, Gong M, Huang Y, Li XD, & Wu XH 2009,


'Distance-dependent head-related transfer functions
measured with high spatial resolution using a spark gap,’
IEEE Transactions On Audio Speech And Language
Processing, 17(6):1124-1132.

Thurlow, W, Mangels, JW, & Runge PS 1967, ‘Head move-


ments during sound localization,’ J. Acoust. Soc. Am.,
42, 489-493.

Zahorik P, Brungart, D & Bronkhorst A 2005, 'Auditory dis-


tance perception in humans: a summary of past and pre-
sent research,' Acta Acustica United with Acustica,
91:409-420.

Acoustics 2011 7

You might also like