2 AES Immersive
2 AES Immersive
2 AES Immersive
Immersive audio
Objects, mixing, and rendering
Francis Rumsey
Consultant Technical Writer
A
ny visitor to the recent 140th Conven- Kraft and Zölzer describe a signal-decom- Second, left and right ambience signals are
tion in Paris would have been left in position and source-separation approach assumed to sound similar but be decorrelated
no doubt about the importance of the based on mid-side (sum and difference) and to have an amplitude that is much lower
topic of immersive audio. The term immer- processing in the frequency domain, than the direct sound.
sive audio seems to be emerging as the most enhanced by improved ambience signal There are numerous possible immer-
commonly used to describe systems and processing for stereo to 3D upmixing. The sive audio loudspeaker layouts, and the
techniques that deliver spatial audio content decomposed direct
from all around the listener, although “3D” sound signals are
is used almost interchangeably by some. repanned using
Alongside the many demonstrations and VBAP (vector-base
workshops related to the theme were a num- amplitude panning),
ber of research papers, and some of these while ambient
are summarized here for the benefit of those sound is processed
that don’t have time to read them in depth. for the target chan-
nels in question
STEREO TO 3D UPMIXING using decorrelation
In their paper, “Low-Complexity Stereo Sig- filters. A number of
nal Decomposition and Source Separation assumptions enable
for Application in Stereo to 3D Upmixing” the two-chan-
(paper 9586), Sebastian Kraft and Udo Zöl- nel original to be
zer point out that while the movie industry decomposed reason-
has adapted to multichannel content pro- ably successfully.
duction, music is still almost all produced First it’s assumed
in two-channel stereo. Upmixing two-chan- that at any moment
nel content for surround and immersive in time, and in any
reproduction formats is therefore an attrac- frequency band,
tive proposition if it can be made to deliver only one dominant Fig. 1. 9-channel immersive audio loudspeaker setup (Figs. 1 and 2
a convincing experience. source will be active. courtesy Kraft and Zölzer)
584 J. Audio Eng. Soc., Vol. 64, No. 7/8, 2016 July/August
FEATURE ARTICLE
J. Audio Eng. Soc., Vol. 64, No. 7/8, 2016 July/August 585
FEATURE ARTICLE
586 J. Audio Eng. Soc., Vol. 64, No. 7/8, 2016 July/August
FEATURE ARTICLE
fluctuations of impulse
response segments occur-
ring 15–50 ms after the
first arrival contributed
to higher externalization.
Numerical optimization
was used to select BRIRs
that delivered the lowest
spectral distortion, and
results were tuned using a
small listening panel.
In listening tests it was
found that the new system
was preferred over stereo
Fig. 7. Architecture of the EDISON 3D ADM/BWF export system for between 60 and 95% of
listeners, with the average
which attempts to build a spatial audio headphone presentation, largely because of overall preference being
scene for various different output formats. the timbral changes introduced by HRTF 75%. One item showed a 50/50 split, but
In this case the two reference renderers processing, but also partly because of others showed a majority preferring the
are the Sonic Emotion WFS system for problems with perceived spatial width. An new system. The precise reasons for these
loudspeaker arrays and the BinauralWave echoic headphone virtualizer was aimed at preferences were not explored in these
Max MSP suite for headphones (although that would deliver higher listener prefer- experiments but the authors suggested that
the latter can also handle VBAP rendering ences than one using conventional ampli- further tests could examine specific audio
for multispeaker layouts). Communication tude panning techniques. Externalization attributes in addition to preference.
between the system elements uses OSC was considered important but only to the
(Open Sound Control), which means extent that any side effects did not outweigh LIVE SPATIAL SOUND MIXING
that alternative renderers could also be the benefits. In these tests immersive audio An approach to live sound mixing combin-
employed, say the authors. content from Dolby Atmos printmasters ing object-based mixing with WFS render-
Because ADM is not yet implemented in was rendered to a 7.1.4 loudspeaker format ing is discussed by Etienne Corteel, Raphaël
commercial DAWs, a freely available library before being virtualized for headphones. As Foulon, and Frédéric Changenet in their
known as bbcat from BBC Research was a reference for conventional stereo, the 7.1.4 paper “A Hybrid Approach to Live Spatial
used (see http://www.bbc.co.uk/rd/publica- material was downmixed using standard ITU Sound Mixing” (paper 9527). The authors
tions/audio-definition-model-software). This downmix coefficients. The aim was to exceed say that WFS is the only sound rendering
enables ADM metadata plus associated audio 70% listener preference for the new head- technique that can deliver effective spatial
to be encapsulated into a Broadcast Wave phone version compared with the stereo rendering over the large listening areas
File (BWF) container in such a way that downmix, and that no test of the headphone involved in live sound events, but point to
the main BWF contents can still be read by virtualizer should perform worse than stereo. the challenge arising from the large num-
a conventional DAW. The ADM metadata A stochastic room model was employed ber of loudspeakers needed by the technol-
is stored in the aXML header of the file, as that did not have to obey the constraints ogy. In the application described here they
shown in Fig. 7. of real rooms, but enabled the capture of combine standard stereo techniques with
only the most perceptually relevant binau- spatial mixing based on WFS.
HEADPHONE VIRTUALIZATION ral impulse response (BRIR) features. The It is common to use a number of large
Considering that a large amount of enter- model delivered direct sound, plus early line arrays with fill loudspeakers in live
tainment material is enjoyed on head- reflections and a late reverberant tail made sound systems, in order to cover the audi-
phones these days, the importance of up of individual reflections with specific ence area with a more or less even SPL
headphone rendering cannot be underesti- directionality tailored to enhance the distribution. Because coverage is the
mated. Grant Davidson and his colleagues sense of externalization. By trying differ- primary aim in such applications, relatively
describe a novel approach to this in “Design ent lengths of BRIR it was found that the little attention is paid to spatial positioning
and Subjective Evaluation of a Perceptu- sound moved out of the listener’s head with of reproduced sources. It’s suggested that
ally-Optimized Headphone Virtualizer” more than 10 ms of tail, and that the effect spatial sound reinforcement based on WFS
(paper 9588). Here the aim is to maximize stopped increasing after 30–70 ms, depend- enables naturally enhanced intelligibility
perceived externalization by simulating ing on the nature of the sound source and because of the increased ability of listen-
reproduction in a virtual room, while main- the size of the room. For this reason they ers to distinguish sources spatially from
taining a natural timbral balance. set the reverb duration in the system to one another and the resulting reduction in
The authors explain that listeners often only 80 ms, which delivered the sense of masking. This in turn can lead to reduced
prefer conventional loudspeaker stereo over externalization without noticeable echoes. use of compression and EQ, resulting in a
various forms of spatial enhancement for Also it was found that strong azimuthal more natural sound quality. The perception
J. Audio Eng. Soc., Vol. 64, No. 7/8, 2016 July/August 587
FEATURE ARTICLE
588 J. Audio Eng. Soc., Vol. 64, No. 7/8, 2016 July/August