Distance Perception in Interactive Virtual Acoustic Environments Using First and Higher Order Ambisonic Sound Fields
Distance Perception in Interactive Virtual Acoustic Environments Using First and Higher Order Ambisonic Sound Fields
Vol. 98 (2012) 61 – 71
DOI 10.3813/AAA.918492
Summary
In this paper, we present an investigation into the perception of source distance in interactive virtual auditory
environments in the context of First (FOA) and Higher Order Ambisonic (HOA) reproduction. In particular, we
investigate the accuracy of sound field reproduction over virtual loudspeakers (headphone reproduction) with
increasing Ambisonic order. Performance of 1st , 2nd and 3rd order Ambisonics in representing distance cues is
assessed in subjective audio perception tests. Results demonstrate that 1st order sound fields can be sufficient in
representing distance cues for Ambisonic-to-binaural decodes.
PACS no. 43.20.-f, 43.55.-n, 43.58.-e, 43.60.-c, 43.71.-k, 43.75.-z
2. Distance Perception
4
It is important to note that throughout the literature there
62
Kearney et al.: Distance perception ACTA ACUSTICA UNITED WITH ACUSTICA
Vol. 98 (2012)
the order of spherical harmonic decomposition. However, sound field measured at a single point in space into spher-
better directional localization can be achieved without af- ical harmonic functions defined as
fecting other important cues for distance estimation like
σ
overall sound intensity or direct to reverberant energy ra- Ymn (Φ, Θ) = Amn Pmn (sin Θ) (1)
tio. Thus it can constitute an ideal framework for testing
cos(mΦ) if σ = +1
whether less apparent properties of a sound field can influ- ·
sin(mΦ) if σ = −1 ,
ence the perception of distance.
where m is the order and n is the degree of the spherical
2.3. Former Psychoacoustical Studies on Distance harmonic and Pmn is the fully normalized (N3D) associ-
Perception ated Legendre function. The coordinate system used com-
The perception of distance has been shown to be one that is prises x, y and z axes pointing to the front, left and up
not linearly proportional to the source distance. For exam- respectively, Φ is the azimuthal angle with the clockwise
ple, both Nielson et al. [12] and Gardner [13] have shown rotation and Θ is the elevation angle form the x-y plane.
that the localization of speech signals is consistently un- For each order m there are (2m + 1) spherical harmonics.
derestimated in an anechoic environment. This underesti- In order for plane wave representation over a loud-
mation has also been shown by other authors in the context speaker array we must ensure that
of reverberant environments, both real and virtual. In [14],
Bronkhorst et al. demonstrate that in a damped virtual en- I
vironment, sources are consistently perceived to be closer s σ
Ymn (Φ, Θ) = σ
gi Ymn (φi , θi ), (2)
than in a reverberant virtual environment, due to the direct i=1
to reverberant ratio. In their studies, the room simulation
is conducted using simulated Binaural Room Impulse Re- where s is the pressure of the source signal from direction
sponses (BRIRs) created from the image source method (Φ, Θ) and gi is the ith loudspeaker gain from direction
[15]. They show how perceived distance increases rapidly (φi , θi ). We can then express the left hand side of equation
with the number and amplitude of the reflections. (2) in vector notation, giving the Ambisonic channels
In a similar study, Rychtarikova et al. [16] investi-
gated the difference in localization accuracy between real B = YΦΘ s (3)
rooms and computationally derived BRIRs. Their findings 1 1 σ
T
= Y0,0 (Φ, Θ), Y1,0 (Φ, Θ), ....Ymm (Φ, Θ) s.
show that at 1 m, localization accuracy in both the virtual
and real environments is in good agreement with the true
Equation (2) can then be rewritten as
source position. However, at 2.4 m, the accuracy degrades,
and high frequency localization errors were found in the
virtual acoustic pertaining to the difference in HRTFs be- B = C · g, (4)
tween the model and the subject. In the same vain, Chan et
al. [17] have shown that distance perception using record- where C are the encoding gains associated with the loud-
ings made from the in-ear microphones on individual sub- speaker positions and g is the loudspeaker signal vector. In
jects again lead to underestimation of the source distance order to obtain g, we require a decode matrix, D, which is
in virtual reverberant environments, more so than with real the inverse of C. However, to invert C we need the matrix
sources. to be a square, which is only possible when the number of
Waller [18] and Ashmead et al. [10] have identified that Ambisonic channels is equal to the number of loudspeak-
one of the factors improving distance perception is the lis- ers. When the number of loudspeaker channels is greater
tener movement in the virtual or real space. It is therefore than the number of Ambisonic channels, which is usually
crucial to account for any listener’s movements (or lack the case, we then obtain the pseudo-inverse of C where
thereof) in the experimental design.
Similarly, for headphone reproduction of virtual acous-
tic environments, small, subconscious head rotations may D = pinv(C) = CT (CCT )−1 . (5)
lead to improvements in distance perception by providing
enhanced ILD and ITD cues. Therefore, the sound field Since the sound field is represented by a spherical coor-
transformations should reflect well the small changes of dinate system, sound field transformation matrices can be
orientation of the listener’s head. used to rotate, tilt and tumble the sound fields. In this way,
the Ambisonic signals themselves can be controlled by the
user, allowing for the virtual loudspeaker approach to be
3. Ambisonic Spatialization employed. For 3-D reproduction, the number of I virtual
loudspeakers employed with the Ambisonics approach is
Ambisonics was originally developed by Gerzon, Barton dependent on the Ambisonic order m, where
and Fellgett [7] as a unified system for the recording, re-
production and transmission of surround sound. The the-
ory of Ambisonics is based on the decomposition of the I ≥ N = (m + 1)2 . (6)
63
ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al.: Distance perception
Vol. 98 (2012)
64
Kearney et al.: Distance perception ACTA ACUSTICA UNITED WITH ACUSTICA
Vol. 98 (2012)
65
ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al.: Distance perception
Vol. 98 (2012)
66
Kearney et al.: Distance perception ACTA ACUSTICA UNITED WITH ACUSTICA
Vol. 98 (2012)
67
ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al.: Distance perception
Vol. 98 (2012)
6
7. Results
5
The perceived sound source distance (indicated by the dis- 4
tance walked) was collected from 7 subjects for 4 presen-
3
tation points (2 m, 4 m, 6 m and 8 m), two stimuli (female
speech and pink noise bursts) and four playback options: 2
1st , 2nd and 3rd Order Ambisonics and real loudspeakers, 1
which for analysis we will denote FOA, SOA, TOA and
0
REAL respectively. With headphone trials, none of the 0 1 2 3 4 5 6 7 8 9
Real distance [m]
participants reported in-head localization, however there
were 3 cases were the proximity of the sound source was
Figure 9. Mean localization of real and virtual sound sources
very apparent so participants decided not to move at all.
(pink noise bursts).
In some cases, the virtual sound source was initially local-
ized behind the subjects but all participants were able to
resolve the confusion by applying head-rotation. investigate the effects of these two factors (referred later
We computed the mean values of walked distances µ for as factors A and B) as well as potential interaction ef-
each test condition along with the corresponding standard fects, for each presentation distance a two-way ANOVA
errors se(µ). The results are presented separately for each has been performed. The null hypothesis being tested here
stimulus type within 95% Confidence Intervals. is that all the mean perceived distances for all the stimuli
As expected, the perception of distance for the real sour- and playback methods do not differ significantly
ces was more accurate for near sources. Beyond 4 m, dis-
H0 : µF OA =µSOA =µT OA =µReal =µ,
tance perception was continuously underestimated which
is congruent with the previous studies outlined in sec- H1 : not all localization means (µi ) are the same.
tion 2. Furthermore, the standard deviation of localiza- No statistically significant effect of stimuli (familiar vs.
tion increases as the source moves further into the diffuse unfamiliar) on the perception of distance has been found
field. We also see, that unfamiliar stimuli produce greater (F2m (3, 48) = 0.835, p = 0.365; F4m (3, 48) = 2.0462,
variability in subjects’ answers. The mean localization of p = 0.159; F6m (3, 48) = 2.575, p = 0.115; F8m (3, 48) =
the virtual sources follows the reference source localiza- 2.0462, p = 0.159). For distances of 4m and more,
tion well. The answers for virtual sources deviate from playback option had also no statistically significant effect
their means roughly in the same fashion as the answers for (F4m (3, 48) = 2.192, p = 0.101; F6m (3, 48) = 0.665,
reference sources, as localization becomes more difficult p = 0.577; F8m (3, 48) = 0.202, p = 0.894).
within the diffuse field. However, a statistically significant difference has been
Since the study followed the within-subject factorial de- detected for the distance of 2 m. In larger study de-
sign with 2(stimuli)*4(playback conditions), in order to signs with multiple levels it is advisable to use the Hon-
68
Kearney et al.: Distance perception ACTA ACUSTICA UNITED WITH ACUSTICA
Vol. 98 (2012)
Table I. Mean localization [m] of virtual and real sound sources expected that a further underestimation of the source dis-
at 2 m. tance would ensue with the binaural rendering, as reported
µF OA µSOA µT OA µReal
in [17]. However, this was not the case, even for first or-
der presentations, and the apparent distances of the vir-
Speech 1.119 1.389 0.841 1.638 tual sources matched the real source distances well. One
Noise 0.877 1.001 0.902 1.641 should note that the major difference between this study
and that of [17] is our use of head-tracking, indicating
the importance of head-movements in perceiving source
Table II. Correlation coefficients ρ and corresponding p − values
for pairs of distance estimations for real and virtual sound
distance, which develops the findings of Waller [18] and
sources (Speech). Ashmead et al. [10] on user interaction in a virtual space.
Further work is required to quantify the effect of this.
ρ p − value Moreover the presented study demonstrates that the en-
hanced directional accuracy gained by presenting sound
Real vs FOA 0.9828 0.0172
Real vs SOA 0.9960 0.0040 sources in HOA through head-tracked binaural rendering
Real vs TOA 0.9590 0.0410 does not yield a significant improvement in the perception
of the source distance. What is noteworthy is that for each
order, there is no significant difference in the perception of
Table III. Correlation coefficients ρ and corresponding p−values the source location when compared to real-world sources.
for pairs of distance estimations for real and virtual sound We therefore conclude that sound field directionality for
sources (Noise). distance perception is sufficient with 1st order playback.
ρ p − value The presence of the ANOVA false alarm at the 2 m point
is of interest. It is noteworthy that the 2m point represents
Real vs FOA 0.9913 0.0087 a source inside the virtual array geometry. It is a known
Real vs SOA 0.9857 0.0143 issue that virtual sound sources rendered inside the array
Real vs TOA 0.9972 0.0028 of loudspeakers cannot be reproduced in a straightforward
way without artifacts. Some of these artifacts include in-
correct wave-front curvature and insufficient bass boost.
estly Significant Difference (HSD) approach since there In the first case, there is ample evidence in the litera-
is an increased risk of spuriously significant difference ture to suggest that the wavefront curvature translates to a
arisen purely by chance. So, in order to investigate further significant binaural cues for sound sources near the head
where the difference occurs, an HSD has been computed, [30, 38]. It was already shown in section 2.1 that as a
(HSD = 1.423m). If we now compile the table of mean source moves closer to the head the levels of the monau-
perceived distances for the sound sources located at 2 m ral transfer function and the ILD both change significantly
we can see that all of the above values clearly lie within with source angle. However this effect is not strong at 1m
a single HSD to each other and cannot be distinguished. and beyond. For sources further away, it has been shown
We can safely assume then an ANOVA false alarm (type in [39] that it is very difficult to assess distance by binaural
I error) and no statistically significant effect of playback cues alone.
method for the sources at the distance of 2 m as well. In the second case, the requirement for distance com-
Lastly, for all the distances no synergetic effects of fac- pensation filtering due to near field effects for the large
tors A (stimuli) and B (playback conditions) have been loudspeaker radius (3.27 m) and the given source distances
detected. (>2 m) is only prominent below 100 Hz. For the female
Additionally, we calculated correlation coefficients ρ for speech test stimuli, this will not have an effect, since the
pairs of distance estimations for real and virtual sound first formant frequencies do not go down below 180 Hz.
sources (either 1st , 2nd or 3rd order) and two stimuli. In Also, the current method employed for capturing HRIRs
all cases, high correlation coefficients have been obtained, allowed for reliably obtaining filters with a frequency
which confirms our findings that for these particular test response reaching down to around 170 Hz, thereby also
conditions, the perception of distance of binaurally ren- band-limiting the delivery of the pink noise stimuli.
dered Ambisonic sound fields of orders 1 to 3 cannot be
Finally, there was no significant difference in the results
distinguished from the perception of distance of the real
presented for different sources, although the greater vari-
sound sources.
ance in the results for pink noise suggest that the famil-
iarity of the source does indeed play a role in the percep-
8. Discussion tion of source distance, as mentioned in section 2.3. Future
studies will investigate the use of these monaural cues fur-
The results presented for real sources corroborate the clas- ther, and will utilize 0th order sound field rendering, since
sic underestimation of source distance, as reported in the it will remove the influence of any directional information.
literature. These results were used as a basis with which Considering the aforementioned study of Bronkhorst et
to measure the ability of Ambisonic sound fields of differ- al. [14], where the accuracy of distance perception for bin-
ent orders to present sources at different distances. It was aural playback increases with the number of reflections,
69
ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al.: Distance perception
Vol. 98 (2012)
our findings demonstrate that the net effect of the monaural [9] J. Blauert: Communication acoustics. Springer, 2008.
cues of direct to reverberant ratio, level difference and time [10] D. H. Ashmead, D. L. Davis, A. Northington: Contribution
of arrival of early reflections are of greater importance in of listeners’ approaching motion to auditory distance per-
distance perception for binaural rendering than Ambisonic ception. J. Exp. Psy: Hum. Percep. and Perform. 21 (1995)
directional accuracy beyond 1st order. 239–256.
[11] E. Czerwinski, A. Voishvillo, S. Alexandrov, A. Terekhov:
Propagation distortion in sound systems: Can we avoid it?
9. Conclusions J. Audio Eng. Soc 48 (2000) 30–48.
[12] S. H. Nielsen: Auditory distance perception in different
We have assessed through subjective analysis the per- rooms. J. Audio Eng. Soc. 41 (1993) 755–770.
ceived source distance in virtual Ambisonic sound fields
in comparison to real world sources. The hypothesis tested [13] M. B. Gardner: Distance estimation of 0◦ or apparent 0◦
oriented speech signals in anechoic space. J. Acoust. Soc.
was that enhanced directional accuracy of deterministic Am. 45 (1969) 47–53.
part of the sound field may lead to better reconstruction
[14] A. W. Bronkhorst, T. Houtgast: Auditory distance percep-
of environmental depth and thus improve the perception
tion in rooms. Nature 397 (1999) 517–520.
of sound source distance. However, it was shown that
Ambisonic reproduction matches the perceived real world [15] J. B. Allen, D. A. Berkley: Image method for efficiently
simulating small-room acoustics. J. Acoust. Soc. Am. 65
source distances well even at 1st order and no improvement (1979) 943–950.
in this regard was observed when increasing the order. It
[16] M. Rychtarikova, T. V. d. Bogaert, G. Vermeir, J. Wouters:
must be emphasized though, that this analysis applies to
Binaural sound source localization in real and virtual
Ambisonic-to-binaural decodes with higher order synthe- rooms. J. Audio Eng. Soc. 57 (2009) 205–220.
sis achieved using the directional analysis method of [23].
[17] J. S. Chan, C. Maguinness, D. Lisiecka, C. Ennis, M.
Therefore, further work will examine this topic for loud- Larkin, C. O’Sullivan, F. Newell: Comparing audiovisual
speaker reproduction for both centre and off-centre listen- distance perception in various real and virtual environ-
ing as well as investigate the effectiveness of HOA synthe- ments. Proc. of the 32nd Euro. Conf. on Vis. Percep., Re-
sis in comparison to real world HOA measurements. gensburg, Germany, 2009.
[18] D. Waller: Factors affecting the perception of interobject
Acknowledgments distances in virtual environments. Presence: Teleoper. Vir-
tual Environ. 8 (1999) 657–670.
The authors gratefully acknowledge the participation of
the test subjects for both their time and constructive com- [19] A. McKeag, D. McGrath: Sound field format to binaural
ments, as well as the technical support staff at the Depart- decoder with head-tracking. Proc. of the 6th Australian Re-
gional Convention of the AES, 1996.
ment of Theatre, Film and Television at the University of
York for their assistance in the experimental setups. This [20] M. Noisternig, A. Sontacchi, T. Musil, R. Holdrich: A
3D Ambisonic based binaural sound reproduction system.
research is supported by Science Foundation Ireland. Proc. of the 24th Int. Conf. of the Audio Eng. Soc., Alberta,
Canada, 2003.
References .. ..
[21] B.-I. Dalenback, M. Stromberg: Real time walkthrough au-
ralization - the first year. Proc. of the Inst. of Acous.,
[1] L. Fauster: Stereoscopic techniques in computer graphics. Copenhagen, Denmark, 2006.
Technical paper, TU Wien, 2007.
[22] C. Masterson, S. Adams, G. Kearney, F. Boland: A method
[2] J. Lee: Head tracking for desktop VR displays using the
for head related impulse response simplification. Proc.
Wii remote. http://johnnylee.net/projects/wii/,
of the 17th European Signal Processing Conference (EU-
accessed 30th Sept. 2011.
SIPCO), Glasgow, Scotland, 2009.
[3] D. R. Begault: Direct comparison of the impact of head
tracking, reverberation, and individualized head-related [23] J. Merimaa, V. Pulkki: Spatial impulse response rendering
transfer functions on the spatial perception of a virtual i: Analysis and synthesis. J. Audio Eng. Soc. 53 (2005).
sound source. J. Audio Eng. Soc 49 (2001) 904–916. [24] W. M. Hartmann: Localization of sound in rooms. J.
[4] M. Otani, T. Hirahara: Auditory artifacts due to switching Acoust. Soc. Am. 74 (1983) 1380–1391.
head-related transfer functions of a dynamic virtual audi- [25] D. Griesinger: Spatial impression and envelopment in small
tory display. IEICE Trans. Fundam. Electron. Commun. rooms. Proc. of the 103rd Conv. of the Audio. Eng. Soc,
Comput. Sci. E91-A (2008) 1320–1328. New York, USA, 1997.
[5] V. Pulkki: Virtual sound source positioning using Vector [26] G. Kearney, M. Gorzel, H. Rice, F. Boland: Depth per-
Base Amplitude Panning. J. Audio Eng. Soc. 45 (1997) ception in interactive virual acoustic environments using
456–466. higher order ambisonic soundfields. Proc. of the 2nd Int.
[6] A. J. Berkhout: A Holographic Approach to Acoustic Con- Ambisonics Symp., Paris, France, 2010.
trol. J. Audio Eng. Soc 36 (1988) 977–995. [27] A. Farina: Simultaneous measurement of impulse response
[7] M. A. Gerzon: Periphony: With-height sound reproduction. and distortion with a swept-sine technique. Proc. of the
J. Audio Eng. Soc 21 (1973) 2–10. 108th Conv. of the Audio Eng. Soc., Paris, France, 2000.
[8] F. Rumsey: Spatial quality evaluation for reproduced [28] M. Gerzon: The design of precisely coincident microphone
sound: Terminology, meaning, and a scene-based para- arrays for stereoand surround sound. Proc. of the 50th
digm. J. Audio Eng. Soc. 50 (2002) 651–666. Conv. of the Audio Eng. Soc., London, UK, 1975.
70
Kearney et al.: Distance perception ACTA ACUSTICA UNITED WITH ACUSTICA
Vol. 98 (2012)
71