Spatial Sound - Technologies and Psychoacoustics: This Tutorial
Spatial Sound - Technologies and Psychoacoustics: This Tutorial
cc
m
k
loudspeaker m
loudspeaker k
virtual
source loudspeaker n
g = [p
1
p
2
p
3
]
l
11
l
12
l
13
l
21
l
22
l
23
l
31
l
32
l
33
1
Normalization ||g|| = 1
Normalized barycentric coordinates
as gain factors [Pulkki-97]
Spatial sound Technologies and Psychoacoustics 92/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Vector base amplitude panning, VBAP
510 implementations for different platforms have been quite
successful
brutal resilience to different loudspeaker setups
loudspeaker number needed is 6 60, which is ok in many
cases
left/right direction quality ok
up/down direction quality individual
does not color sound (!!!)
quality degrades smoothly outside best listening position
does not recreate soundeld, or wave eld curvature
cannot bring virtual sources inside the array
Spatial sound Technologies and Psychoacoustics 93/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Techniques for loudspeaker listening
Loudspeaker layouts
Virtual source positioning
Microphone techniques
Ambisonics
Wave eld synthesis
Signal-dependent methods
Spatial sound Technologies and Psychoacoustics 94/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Microphone polar patterns
0.5
1
1.5
30
210
60
240
90
270
120
300
150
330
180 0
Omnidirectional
0.2
0.4
0.6
0.8
1
30
210
60
240
90
270
120
300
150
330
180 0
+ !
Figure of eight
Multiple coincindent microphones, polar patterns are additive.
Spatial sound Technologies and Psychoacoustics 95/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Microphone polar patterns
0.2
0.4
0.6
0.8
1
30
210
60
240
90
270
120
300
150
330
180 0
Cardiod
0.2
0.4
0.6
0.8
1
30
210
60
240
90
270
120
300
150
330
180 0
+ !
Hypercardiod
Spatial sound Technologies and Psychoacoustics 96/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Coincident microphones for stereo
implement amplitude panning
point-like virtual sources
directional patterns favor front
tend to decrease reverberation in listening
Spatial sound Technologies and Psychoacoustics 97/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Spaced microphones for stereo
implement amplitude panning + time panning
spread virtual sources
capture all directions depending on setup
more spacious reverberation
Spatial sound Technologies and Psychoacoustics 98/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Spaced microphone arrays for multichannel
Decca tree
Spatial sound Technologies and Psychoacoustics 99/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Spaced microphone arrays for multichannel
Fukada tree
Spatial sound Technologies and Psychoacoustics 100/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Spaced microphone arrays for multichannel
number of microphones equals to number of loudspeakers
no well-established system
spaced array has to be tuned into recording space
directions reproduced a bit diffuse
reverberation perceived spacious and enveloping
no/low coloration artifacts
Spatial sound Technologies and Psychoacoustics 101/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Techniques for loudspeaker listening
Loudspeaker layouts
Virtual source positioning
Microphone techniques
Ambisonics
Wave eld synthesis
Signal-dependent methods
Spatial sound Technologies and Psychoacoustics 102/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
B-format recording
B-format microphones
Omni + 3 dipoles on Cartesian axis
Steerable rst-order microphone
Cardioid or hypercardioid for each loudspeaker
Spatial sound Technologies and Psychoacoustics 103/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
B-format recording
www.soundfield.com
Spatial sound Technologies and Psychoacoustics 104/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
First-order Ambisonics
[Gerzon 70s]
A signal for each loudspeaker is decoded from B-format
Loudspeaker channels are relatively coherent
Coloring
OK quality in best listening position, and in good listening room
Nearmost loudspeaker dominates outside best listening position
Spatial sound Technologies and Psychoacoustics 105/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Higher-order microphone patterns
Spatial sound Technologies and Psychoacoustics 106/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Higher-order Ambisonics benets
lower coherence between loudspeaker signals
stabler localization in larger listening area
better quality in overall
Spatial sound Technologies and Psychoacoustics 107/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Higher-order Ambisonics challenges
high directivity can be obtained in limited frequency window
low-frequency noise
number of transducers 8 24 at least
quality of transducers has to be high
more expensive microphones
microphones still under development
Spatial sound Technologies and Psychoacoustics 108/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Commercial higher-order microphone array
www.trinnov-audio.com
Different patterns at different frequencies (order 1-3 ??)
Spatial sound Technologies and Psychoacoustics 109/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Techniques for loudspeaker listening
Loudspeaker layouts
Virtual source positioning
Microphone techniques
Ambisonics
Wave eld synthesis
Signal-dependent methods
Spatial sound Technologies and Psychoacoustics 110/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Wave eld synthesis
[Snow 1955, Berkhout 1988]
Recreate the same acoustical eld as was in the original place
Original idea: curtain of microphones, curtain of loudspeakers
Nowadays: a dense loudspeaker array around the listeners
Lots of academic interest
Some commercial applications
Spatial sound Technologies and Psychoacoustics 111/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Wave eld synthesis
c IRCAM web pages
Spatial sound Technologies and Psychoacoustics 112/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Wave eld synthesis
Large listening area
Virtual sources can be positioned inside the loudspeaker array in
some cases
Wave front curvature is reproduced correcly
perspective is correct
human capability to perceive curvature is limited
Typical horizontal setups consist of about 100-200 loudspeakers
3D setups would require about 100,000 loudspeakers !!!
Spatial sound Technologies and Psychoacoustics 113/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Wave eld synthesis
Complications
very expensive systems
recording is not possible, although it was the original idea
microphone directional patterns should be matched with
loudspeaker directional patterns, which is not possible
positioning tool for virtual sources and for reverberation
number of artifacts, coloring, WF truncation etc
some of artifacts may exist only in theory
Spatial sound Technologies and Psychoacoustics 114/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Least squares method
Spatial sound Technologies and Psychoacoustics 115/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Techniques for loudspeaker listening
Loudspeaker layouts
Virtual source positioning
Microphone techniques
Ambisonics
Wave eld synthesis
Signal-dependent methods
Fallers system
Directional Audio Coding (DirAC)
Spatial sound Technologies and Psychoacoustics 116/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Signal-dependent processing for multichannel
eliminate signal X1 from X2 based on cross correlation
Spatial sound Technologies and Psychoacoustics 117/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Signal-dependent processing for multichannel
frequency-band processing
Wiener ltering
Spatial sound Technologies and Psychoacoustics 118/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Signal-dependent processing for multichannel
Arbitrary directional patterns can be formed from B-format
recordings
1st order Ambisonics Fallers system
Spatial sound Technologies and Psychoacoustics 119/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Fallers system
directional patterns valid for dry, or non-diffuse sound
in reverberant eld, the response is just the original cardioid
in many recording schemes a large advantage from rst-order
Ambisonics is obtained
Spatial sound Technologies and Psychoacoustics 120/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Directional Audio Coding (DirAC)
Psychoacoustics: In single frequency channel we decode
single ITD
single ILD
when two sinusoids from two loudspeakers are near in
frequency, they cannot be localized individually
Humans are also sensitive to interaural coherence.
Spatial sound Technologies and Psychoacoustics 121/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Directional Audio Coding (DirAC)
Hypothesis: If we reproduce correctly at each frequency band
the direction of sound and
the diffuseness of sound,
the spatial audio should be perceived with high quality.
Spatial sound Technologies and Psychoacoustics 122/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Directional Audio Coding (DirAC)
[Pulkki07 JAES][Merimaa & Pulkki 05 JAES]
Spatial sound Technologies and Psychoacoustics 123/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Energetic analysis of sound eld
Length of intensity vector equals to energy density
Opposite direction of intensity vector equals to direction of sound
Spatial sound Technologies and Psychoacoustics 124/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Random eld
Length of intensity vector small and varies with time
Direction of intensity vector varies with time
Spatial sound Technologies and Psychoacoustics 125/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Directional analysis
B-format microphone
Direction vector = Intensity vector
Diffuseness estimate = 1 (abs(Intensity) / Energy)
Spatial sound Technologies and Psychoacoustics 126/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Synthesis, version I
teleconference application
single audio channel + metadata
Spatial sound Technologies and Psychoacoustics 127/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
DirAC teleconferencing
Spatial sound Technologies and Psychoacoustics 128/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
DirAC Teleconferencing, low-cost microphone
W = mic 2 ; X = mic1 - mic3 ; Y = mic4 - mic2 ;
Spatial sound Technologies and Psychoacoustics 129/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Low-cost microphone
Dipole signals X,Y, contain prominent self-noise
In trad. techniques dipole signal would be useless
In DirAC teleconferencing, dipole signals are not used as audio
signal
Dipole signals are used as data, to steer omni signal
Spatial sound Technologies and Psychoacoustics 130/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
High-quality DirAC
Spatial sound Technologies and Psychoacoustics 131/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
High-quality DirAC
can be seen as signal-dependent enhancement of rst-order
Ambisonics
minimizes coherence between loudspeakers for diffuse and
non-diffuse sound
different effecting possible for diffuse and non-diffuse sound
Spatial sound Technologies and Psychoacoustics 132/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
High-quality DirAC
Listening tests
Auditory virtual reality generated with 25 loudspeaker channels in
anechoic room
B-format recording, reproduction with DirAC
DirAC quality was rated as excellent [unpublished results]
Work in progress: in some settings, some small differences btw
reference and reproduction can be perceived.
Spatial sound Technologies and Psychoacoustics 133/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Spatial sound technologies
Topics
Development paradigms
Binaural technologies
Techniques for loudspeaker listening
Coding of spatial audio
Spatial sound technologies not covered in this tutorial
Spatial sound Technologies and Psychoacoustics 134/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Spatial audio coding
Idea by Faller and Baumgarte in 2002
Encode multichannel audio into mono or stereo with metadata
Metadata consists channel level and time differences
Evolved to MPEG-surround
Also: Creative with Gerzon vectors
Spatial sound Technologies and Psychoacoustics 135/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
MPEG surround encoding
www.mpegsurround.com
Stereo audio + metadata of level differences in 5.1 input
Spatial sound Technologies and Psychoacoustics 136/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
MPEG surround decoding
www.mpegsurround.com
Decode to 5.1, or stereo, or headphones
Spatial sound Technologies and Psychoacoustics 137/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
MPEG surround DirAC
What is the difference?
MPEG surround, input is audio tracks
measures differences between the tracks
only an audio coding technique
input to DirAC is pressure and 2D or 3D velocity vector
measured from B-format recording
deals with physical quantities, direction, diffuseness
is also a microphone technique
Spatial sound Technologies and Psychoacoustics 138/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
DirAC in audio coding
Spatial sound Technologies and Psychoacoustics 139/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Spatial sound technologies
Topics
Development paradigms
Binaural technologies
Techniques for loudspeaker listening
Coding of spatial audio
Spatial sound technologies not covered in this tutorial
Spatial sound Technologies and Psychoacoustics 140/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Other spatial sound technologies
Reverberators
process [dry] input sound in a way, that listener perceives the
sound being in a virtual space
DSP structures
Convolving reverberators
Directional microphones
Capture sound selectively in direction
Spatial sound Technologies and Psychoacoustics 141/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Acknowledgment
The Academy of Finland (Projects #105780 and #119092) and Fraunhofer
Gesellshaft IIS have supported this work. The research leading to these results
has received funding from the European Research Council under the European
Communitys Seventh Framework Programme (FP7/2007-2013) / ERC grant
agreement no [240453].
Spatial sound Technologies and Psychoacoustics 142/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Suggested reading
Applies to all topics:
J. Blauert, Spatial hearing: the psychophysics of human sound localisation, MIT Press, Cambridge, MA, (1995).
Acoustics
Kuttruff H., Room Acoustics, Applied Science Publishers Ltd, London (1973)
Fahy F.J., Sound intensity Elsevier Applied Science 1989
Savioja L., Huopaniemi J., Lokki T., Vnnen, R.: Creating Interactive Virtual Acoustic Environments JAES Volume
47 Issue 9 pp. 675-705; 1999.
Psychoacoustics
Brian C.J. Moore: An Introduction to the Psychology of Hearing Elsevier, 2003
Eds Gilkey R, Anderson T: Binaural and spatial hearing in real and virtual environments Psychology Press, 1997.
Bech S., Zacharov N: Perceptual Audio Evaluation: Theory, Method and Application John Wiley & Sons 2006.
Goupell and Hartmann: Interaural uctuations and the detection of interaural incoherence: Bandwidth effects, JASA
119(6), 2006
Spatial sound Technologies and Psychoacoustics 143/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Suggested reading
Binaural technologies
Moller H., Fundamentals of binaural technology, Applied Acoustics, 171-218, (1992).
Begault D. R., 3-D sound for virtual reality and multimedia, (1994).
A. Hrm, J. Jakka, M. Tikander, M. Karjalainen, T. Lokki, J. Hiipakka, and G. Lorho, "Augmented reality audio for
mobile and wearable appliances", Journal of the Audio Engineering Society (JAES), vol. 52, no. 6, pp. 618-639, June
2004.
O. Kirkeby, P.A. Nelson, H. Hamada, The "Stereo Dipole" : Binaural sound reproduction using two closely spaced
loudspeakers, Audio Engineering Society 102nd Convention, pre-print 4463 (1997).
Spatial sound Technologies and Psychoacoustics 144/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012
Suggested reading
Loudspeaker techniques
V. Pulkki "Spatial Sound Generation and Perception by Amplitude Panning Techniques" TKK PhD Dissertation 2001
http://lib.tkk./Diss/2001/isbn9512255324/
Berkhout, A. J.: A Holographic Approach to Acoustic Control JAES Volume 36 Issue 12 pp. 977-995; December 1988
Spors S., Rabenstein R., Ahrens, J.: The Theory of Wave Field Synthesis Revisited AES 124th Convention, Paper
Number: 7358. May 2008.
Fazi, Filippo M.; Nelson, Philip A.: The Ill-Conditioning Problem in Sound Field Reconstruction AES Convention: 123
(October 2007) Paper Number: 7244
Pulkki V. Spatial Sound Reproduction with Directional Audio Coding JAES Volume 55 Issue 6 pp. 503-516; June 2007
Merimaa J, Pulkki V: Spatial Impulse Response Rendering I-II JAES Volume 53 Issue 12, Volume 54, Issue 1; 2005
and 2006
Gerzon M: Periphony: With-Height Sound Reproduction JAES Volume 21 Issue 1 pp. 2-10; February 1973
Faller C.,: A Highly Directive 2-Capsule Based Microphone System Paper 7313 AES 123rd Convention 2007.
Villemoes, L; Herre, J; Breebaart, J; Hotho, G; Disch, S; Purnhagen, H; Kjrling, K: MPEG Surround: The Forthcoming
ISO Standard for Spatial Audio Coding 28th AES Conference 2006
Goodwin M., Jot J-M: Binaural 3-D Audio Rendering Based on Spatial Audio Scene Coding, Paper 7277, AES
Convention 123 October 2007
Spatial sound Technologies and Psychoacoustics 145/145
Pulkki Jan 2012
Aalto University IEEE Winter School 2012