Spherical Microphone Array Processing With Wave Field Synthesis and Auralization
Spherical Microphone Array Processing With Wave Field Synthesis and Auralization
Master Thesis
Gyan Vardhan Singh
Matrikel No.:
47816
Thesis No.:
2181/13MA/08
Professor:
Supervisors:
Department:
Date:
21-May-2014
ACKNOWLEDGEMENTS
This masters thesis would not have been possible without the support of many people.
Firstly, I wish to express my gratitude to Univ.-Prof. Dr.-Ing. Karlheinz Brandenburg
for giving me the chance to work on such an interesting topic in his group.
I owe my deepest gratitude to my supervisor Dipl.-Ing. Johannes Nowak for giving me
the opportunity to do this master thesis under his supervision. His constant guidance,
assistance and support were really important for this thesis.
Further, I wish to express my love and gratitude to my beloved family, especially my
parents, for their love, understanding and supporting me through the duration of my
study.
Finally, I thank to all my friends for their support during my studies.
ABSTRACT
Microphone arrays are structures which have atleast two or more microphones placed
at different positions in space generally in a geometrical fashion. In many applications
apart from temporal characteristics we also need the spatial charaterization of sonic
fields and in order to achieve this goal microphone arrays are employed.
In particular for spatial sound reproduction microphone arrays play a very important
role. Researchers have earlier used different microphone array configurations for the
purpose to sound recording, characterization of room acoustics and for auralization.
As the research in spatial sound reproduction progressed it was found that rendering
sound using an array of loud speaker elements is not sufficient to fully auralize an acoustic scene. It was proposed that microphone arrays be implemented on recording side
in order to reproduce a complete three dimensional acoustic behaviour. Researchers
have used different array configurations like planar arrays or circular arrays to map the
listening room acoustics for the purpose of auralization with some rendering system
e.g. wave field synthesis (WFS).
A drawback was noticed using two dimensional arrays, as they were not able to sufficiently characterize an acoustic scene in three dimension hence spherical microphone
array came into picture. Spherical microphone arrays and there processing has been
described by many authors but a perceptual analysis of various factors which plague
the performance of spherical microphone array is still not established fully.
In the present work we do a detailed analysis of the processing chain which starts from
simulation of room characteristics with spherical microphone array, wave field analysis
of the sound fields, classification of errors and Auralization of the free field impulse
responses. We bring together the existing state of the art in spherical microphone
array processing and look for the perceptual impact of different factors. We use a rigid
sphere configuration and analyze the three different error categories namely: positioning error, spatial aliasing and microphone noise. We attempt to establish a qualitative
and quantitative relation between the errors and limitations, encountered in spherical
microphone array processing and look for the psychoacoustic effects by auralizing the
free field data through WFS.
Spherical microphone array gives a complete three dimensional image of the acoustic
environment, the spherical microphone array data is decomposed into plane wave using plane wave decomposition. In process of plane wave decomposition the spherical
aperture of a spherical microphone array is discretized and because of this, limitations
get imposed on the performance of the array.
We simulate an ideal full audio spectrum wave field impact on the continuous aperture
of spherical microphone array and compare this with sampled array aperture.
In the listening test we auralized sound field based on the ideal wave field decomposition
of a continuous aperture and compare it with different degrees of errors in different
categories. By this comparision we attempt to establish the extent to which a said
error would perceptually corrupt a reproduced sound field. We also try to see the
extent to which some degree to error remains perceptually insignificant or in other
words the extent of error which can be tolerated.
We check out the spatial aliasing limit imposed by the redering system and on the
basis of that establish a base for the transform order (l = 3) used in spherical array
processing. The perceptual analysis is done in two ways we first obtain an error
level which when incorporated in auralization process (simulated for l = 3) would be
perceptually insignificant. And then we look for the perceptual effects of this error
when stepwise tarnsform order l is changed.
We also try to establish a correspondence between wave field synthesis on the rendering
side and spherical microphone array on the measurement side. We investigate to
what extent wave field synthesis could retain the perceptual quality by analyzing the
psychoacoustic effects of changing various parameters on the spherical microphone
array side. The independence of rendering side in regard to the meausrement side is
also analysed.
Contents
Contents
1 INTRODUCTION
1.1 Preliminaries . . . . .
1.2 Auralization . . . . . .
1.3 Motivation . . . . . . .
1.4 Organization of Thesis
.
.
.
.
1
3
4
9
10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
15
15
18
20
20
22
23
23
25
26
28
29
32
34
35
3 ERROR ANALYSIS
3.1 Measurement errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
38
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
3.2
3.3
3.4
3.5
ii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 LISTENING TEST
5.1 Listening Test . . . . . . . . . . . . . . . . . . . .
5.2 Reproduction set up . . . . . . . . . . . . . . . .
5.3 Auralization . . . . . . . . . . . . . . . . . . . . .
5.3.1 Aspect to be percetually evaluated . . . .
5.3.2 Processing . . . . . . . . . . . . . . . . . .
5.4 Structure of listening test . . . . . . . . . . . . .
5.4.1 Audio Tracks . . . . . . . . . . . . . . . .
5.4.2 Listening test condition . . . . . . . . . .
5.5 Test subjects . . . . . . . . . . . . . . . . . . . .
5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . .
5.6.1 Test subject screening . . . . . . . . . . .
5.6.2 Statistic for the evaluation of listening test
5.6.3 Definations . . . . . . . . . . . . . . . . .
5.7 Spatial aliasing vs transform order . . . . . . . .
5.8 Evaluation of positioning error . . . . . . . . . . .
5.9 Microphone noise . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
43
44
47
.
.
.
.
.
48
48
51
56
60
60
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
62
63
64
65
67
69
70
70
73
74
74
74
75
76
78
82
6 Conclusions
84
Bibliography
86
List of Figures
91
List of Tables
93
Contents
iii
APPENDIX
95
A Derivations
A.1 Orthonormality of Spherical harmonics and Spherical Fourier
A.2 Position vector and Wave vector . . . . . . . . . . . . . . . .
A.3 Plane wave pressure field for different levels . . . . . . . . .
A.4 Rigid sphere and open sphere configuration . . . . . . . . . .
95
95
97
99
100
Theses
transform
. . . . . .
. . . . . .
. . . . . .
102
1 INTRODUCTION
1 INTRODUCTION
Digital processing of sounds so that they appear to come from particular locations in
three-dimensional space is a very important and is an integral part of virtual acoustics.
In virtual acoustics the goal is simulation of the complex acoustic fields so that a listener
experiences a natural environment and this is done by spatial sound reproduction
systems.
In realization of spatial sound reproduction systems the concept of sound field synthesis
is used. Various methodologies and analytical approaches are combinedly defined by
the concept of sound field synthesis.
In sound field synthesis we decompose the sound or audio into various components or
wave fields. In simple terms we pull apart the basic components of sound characterizing various spatial and temporal properties. And then after implementing complex
signal processing techniques we reproduce sound in such a way that these components
merge together in the propagation medium to auralize a complete three-dimensaional
characteristic of the sound.
Hence, sound field synthesis is a principle where an acoustic environment is processed,
synthesized and reproduced or re-created such that the real acoustic scenario could
be perceived by the listener. Spatial sound, Immersive audio, 3 D sound, surround
sound systems; these are some of the terms which are used often to describe such audio
systems.
Different aspects come into play in realizing a sound field reproduction system and
a very broad research work is being done attempting to understand various factors.
Some examples for sound field reproduction systems which deal with various conceptual aspects of signal processing are like wave field synthesis which is also our choice
of reproduction system used in this thesis, Higher order Ambisonic [1], sound field
1 INTRODUCTION
reproduction with MIMO acoustic channel inversion [2] and vector based amplitude
panning methods [3]. These are few of the spatial sound system examples which
were developed by respective researchers, in [4] the author has presented very detailed
mathematical treatment of various spatial sound reproduction techniques, he has attempted to bring these related spatial sound system on a single mathematical plane
on the basis of functional analysis.
In the present work we put forward our analysis where in, we answer various questions which come up when an acoustic environment is recreated.Sound reproduction
techniques for virtual sound systems have been studied, developed and implemented
in various different ways and configurations. Acoustic auralization of sound fields in
this work, focuses on wave field analysis (WFA) [5] [6] concerning spherical microphone arrays, and their auralization on a 2-dimensional geometry of loudspeaker array
following the principle of wave field synthesis (WFS).
In order to obtain the acoustic scene characteristics, researchers have proposed the
usage of microphone arrays. Apart from temporal properties, for spatial sound reproduction we need the spatial properties of sound field as well and therefore microphone
arrays are required as they can characterize the sound in space as well [7][5][6]. Auralization using microphone arrays have been attempted using various kind of array
geometries, In [5] the author has focused circular microphone array and used it for the
auralization of sound fields with wave field synthesis.
In [8], Spatial Sound Design principles have been explained for Auralization of room
acoustics.
In spatial sound design the spatial properties of an audio stream like position, direction,
orientation in a virtual room and room itself are modified. Two things are attempted
the first being the simulation of an acoustic environment and the other is the direction
dependent visualization and modification of the sound field by the user. In this work
we focus on the part where simulation and auralization of an acoustic scene is done.
More importantly we investigate the factors influencing the microphone array used for
room impulse response recording (RIR) and analyse the perceptual effects which would
be observed during auralization process when various parameters of the microphone
array are changed.
Any sound wave can be represented as a superpositon of plane waves in far field of its
sources [9][8], and consequently it can also be said that a room can be characterized
1 INTRODUCTION
by its impulse responses as it can be assumed to be linear time invariant (LTI). Hence
if we are able to capture the room impulse responses of a room then we can fully
characterize the acoustic nature of that room and inturn any acoustic event in that
room can be reproduced simply with the help of plane wave decomposed components
of its room impulse responses.
1.1 Preliminaries
To understand how sound radiates in a medium we would like to introduce the reader
with soap bubble analogy as explained in [10, page 6-10] by Zotter. The sound radiation
is considered as a soap bubble as shown in problem, Figure 1.1. We assume a free
sound field and an ideal bubble of soap which is large enough to enclose a musician and
an instrument, now when the sound is produced by the instrument, the bubble surface
will vibrate according to the motion of air, because as sound propagates through the
medium it will hit the bubble and consequently the soap bubble will also vibrate with
the air molecules. At respective observation points on the sphere or the soap bubble,
the wave form of the vibrating sphere can be said to represent the radiated sound.
In [9], Williams has explained that acoustic sound radiation from the instrument could
be completely defined if we are able to acoustically map the motion of this continuous
surface enclosing the sources. This kind of analysis of sound radiation is called exterior
problem.
In a similar way if we say that there are no sources inside (rather it is enclosing the
measurement set up or listening area) the soap bubble but instead the sound radiation
propagates from outside (i.e. the sources are outside) and the waves hit the bubble
from the exterior.
Now again as the bubble is in contact with the medium therefore it will vibrate and
identifying the motion of the surface of bubble would be sufficient to describe the
acoustic radiation, this is called interior problem.
In [11], exterior and interior problem have again been elaborated, as for our application interior problem is more important hence we would put forward the interior
problem with respect to spherical microphone array. A more mathematical treatment
1 INTRODUCTION
1.2 Auralization
In order to auralize the sound field keeping the spatial characteristics of sound alive,
method based on WFS is applied in the present work. WFS is a consequence of
Huygens Principle expressed mathematically by Kirchoff Helmhotz integrals [12]. In
[13] wave field synthesis is discussed in explicit details, In [13] Verheijen has explained
1 INTRODUCTION
1 INTRODUCTION
Desired Hall
source
Microphone array
Room Response
Data
Wave Field
Analysis
Recording Hall
Close mic
recording
convolver
Dry audio
Reproduction room
1 INTRODUCTION
source
Close mic
recording
Measurement
region
Processing
Direct sound
Wave Field
Synthesis
Room Response
audio
Reproduction Room
1 INTRODUCTION
olation area [7]. In [5] the author has shown that it atleast requires microphone array
size equivalent to that of the listening area in order to achieve satisfactory results.
Wave Field Decomposition, [17][15][18]: The wave field decomposition approach
decomposes the sound field into planes wave, which arrive from different directions.
The plane wave decompostion can be considered as an acoustic photograph of the
sound sources including secondary sources which can be regarded as the one generating
reflections [7].
The impulse responses are decomposed into plane waves which give the directional
image of the sound field. Further these plane wave are reproduced as point sources in
WFS set up. The measurement array and reproduction site are independent of each
other in this approach and we can reproduce the sound field for a larger area as compared to the other two approaches. The size of the measurement array and that of the
loud speaker array have no dependence as far as the microphone array characterizes
the room sufficiently [5][8]. The plane wave obtained through plane wave decomposition can optimally represent the sources and reflections. And hence in principle we can
reproduce the sound field satisfactorily. Due to considerable advantages of wave field
decomposition over other methods we would focus our work on plane wave decomposition of the acoustic wave fields. In next chapter we present the analysis for plane wave
decomposition of spherical microphone array. In [5], circular microphone array was
implemented for the purpose of auralization with WFS, but in order to obtain a three
dimensional plane wave decomposition use of spherical mircophone array necessiated
[11][8].
In our work we simulate the acoustics characteristics of a free field full spectrum wave
impact on spherical microphone array, and analyze it to obtain plane waves representing direct sources, reflections and reverberation part, these plane wave responses are
implemented in the driving filter of WFS and we try to auralize the sound. As a consequence of spherical microphone array being important for three dimensional sampling
of acoustic radiation we study different aspects of spherical microphone arrayin this
work and investigate their influence on spatial sound reproduction.
Finally we auralize the sound field for different cases and perceptual listening test are
conducted. Different test subjects are invited to listen to our simulated wave fields
which are auralized using WFS spatial sound renderer consisting of an 88 element
lounspeaker in a 2 dimensional nearly circular geometry.
1 INTRODUCTION
1.3 Motivation
In this thesis we do an investigation for the perceptual effects which get involved in
the spatial sound reproduction. Specifically we focus on spherical microphone arrays.
For auralization application people have used microphone arrays and reproduced the
sound waves using different methodologies importantly like wave field syntheis (WFS).
WFS is an effective tool for spatial sound reproduction as it can synthesize sound field
for large listening areas, after from that it shows robustness towards various practical
limitation [13] [19]. It is important to mention this rendering technique again over
here as our work can be divided in three parts and we use this methodology in second
part of our work.
1. Recording
2. Auralization using WFS
3. Investigation of Psychoacoustic effects
The need to understand and explain the perceptual effects which get incorporated
was still a fairly unexplored territory as far as usage of spherical microphone array is
concerned for auralization purposes using WFS.
There exist many mathematical parameters and inherent errors, namely
1. Microphone Noise
2. Positioning error in the array structure
3. Spatial Aliasing
4. Transform order
These errors and artifacts are bound to perceptually influence the auralization process, in our work we try to subjectively and objectively investigate these relatively
unexplored areas.
We try to see how much a sound field reproduction system based on some said parameters and the given specification would tolerate the errors. We also try to investigate
to what extent the mathematics and theory holds good in perceptual terms.
1 INTRODUCTION
10
11
In this chapter we talk about fundamentals of wave propagation and sound fields and
summarize the existing state of the art in spherical microphone array processing and
its auralization using wave field synthesis.
The work presented in this thesis is based on simulating free field room impulse responses with a spherical microphone array and further these impulse responses are
utilized for rendering the spatial sound with WFS set up.
12
Figure 2.1: Infinitesimal volume element used for the derivation of Eulers equation.
The momentum equation tells us the relation between the force applied to a volume
element and the acceleration of the element due to this applied force. In figure 2.1 an
infinitesimal volume element is considered, we use this explanation for the derivation of
Eulers equation [20][9]. Considering an infinitesimal volume element of fluid xyz.
13
We say that all the six faces experience forces due to the pressure p(x, y, z) in the fluid.
Assume that pressure on any one side is more than other side, therefore a force would
be exerted on the volume element and it would tend to move along the direction of the
force. From Newtons laws of motion we relate this force with acceleration. If we carry
out the same analysis for all three directions, finally it ends up with Eulers equation
which tells us the relation between the pressure applied on the fliud to changes in the
particle velicity of the fluid.
0
= p
t
(2.1)
Here 0 is the fluid density, is the velocity vector at any position (x, y, z) in the
medium.
(2.2)
= uex + ey + wez
p is the pressure. is called gradient or nabla operator and is defined as
ex +
ey + ez
x
y
z
(2.3)
where ex , ey and ez are unit vectors in x, y, z direction respectively, sometimes in literis the change
ature they are also written as i, j, k. p is the pressure gradient and
t
in particle velocity.
The second equation which follows conservation of mass is given as [22][20]:
+ = 0
t
(2.4)
14
2.5 expresses the proportionality between time derivative of acoustic pressure and ,
refer [20][25][22] for more detailed description.
p
= c2
t
t
(2.5)
where p is the pressure which is a variable in position and time t and c is speed of sound.
Equation 2.5 gives the temporal derivative of density of propogation medium in terms
of changes in pressure, combining equation 2.5 and 2.4 in view of last assumption,
we get
p
= 0 c2
t
(2.6)
Equation 2.1 and 2.6 with intial and boundary conditions form a complete set of
first order partial differential equations with a unique solution. These equation can be
combined together to form a single second order equation [25][20]. The time derivative
of equation 2.6
2
p = 0 c2
2
t
t
(2.7)
replacing the particle velocity component in equation 2.7 with gradient of pressure
from Eulers equation in equation 2.1 we obtain the the homogeneous wave equation
given by equation
2 p
1 2
p=0
c2 t2
(2.8)
where p is pressure which is a function of position and time t. Equation 2.8 can also
be represented in frequency domain by applying Fourier transformation with respect
to the time t to acoustic pressure p [26][9].
2 P (r, ) +
2
c }
| {z
P (r, ) = 0
(2.9)
k2
Equation 2.9 is known as Helmholtz equation, r is the position, r = (x, y, z), c is the
wave number k and = 2f . Analytically it can be seen that k = 2/, being wave
length, hence k is the amount of angle or radians acheived in one wave length, so if we
want to know the phase of a wave when it has travelled say 7/9 then 7/9 k gives
the phase.
15
(2.10)
(2.11)
(2.12)
(2.13)
p (t) = Aei(kr0 t)
(2.14)
A is a constant. This is the plane wave solution of the wave equation at a given
frequency 0 . We have directly put forward the solution of wave equation in cartesian
coordinate in an introductory form for detailed description please see [9].
1 2
p(x, y, z, t) = 0
c2 t2
(2.15)
16
2
2
2
+
+
x2 y 2 z 2
(2.16)
The spherical coordinate system shown in figure 2.2 would be followed in this thesis
work. Looking at figure 2.2 we can express cartesian coordinates in terms of r, , .
z
(r, ) (r, , )
Figure 2.2: Spherical coordinate system and its relation to Cartesian coordinate system
x = r sin cos
y = r sin sin
z = r cos
(2.17)
17
= tan1 [y/x]. Considering equation 2.15 and 2.17 we can express the wave equation
in spherical coordinates as
1
r2 r
r
2 p
+ 2
r sin
p
sin
+
1
2
1 2p
= 0 (2.18)
r2 sin 2 c2 t2
In this equation p is a variable of (r, , , t). The right hand side of the equation
explains the consideration that there are no source in the volume for which the equation
is defined. The solutions for this wave equation in frequency domain is explained in
[9] and is given in two forms as
p(r, , k) =
l=0 m=l
X
X
p(r, , k) =
l=0 m=l
X
X
(2.19)
(1)
(2)
Clm (k) hl (kr) + Dlm (k) hl (kr) Ylm ()
(2.20)
The two solution represent the interior and exterior problem, equation 2.20 refers
to the exterior problem and equation 2.19 refers to the interior problem. We will
elaborate more on these two solutions and the cofficients Alm (k), Blm (k), Clm (k), and
Dlm (k) in later sections.
The level l and mode m are integers with values defined as 0 l and l m l.
, here f is the frequency of
The acoustical wave number as defined earlier is k = c = 2f
c
the sound wave and c is the speed of sound in the medium. Functions jl (kr) and yl (kr)
(1)
are spherical bessel function of first and second kind respectively. Similarly hl (kr)
(2)
and hl (kr) are known as spherical hankel functions of first and second kind. Ylm ()
is the function known as spherical harmonic of level or order l and mode m and is
defined as
s
Ylm (, ) =
(2l + 1) (l m)! m
P (cos)eim
4 (l + m)! l
(2.21)
18
These expressions which are a outcome of the derivation for the solution of wave
equation 2.15, are acheived by separation of variable in equation 2.18. In [9, page
186],[25, page 380] and in [20, page 337] the derivation and solutions are explained quite
nicely, for more detailed analysis of separation of variable approach used in solving the
wave equation please refer to [27].
In equation 2.21, Plm (cos) is the Legendre function of the first kind and i =
1.
Jl+1/2 (x)
2x
r
yl (x)
Yl+1/2 (x)
2x
jl (x)
(2.22)
The equations in 2.22 are valid for l R. The spherical Hankel function of the first
(1)
(2)
and second kind hl (x) and hl (x), are defined as
(1)
(2.23)
here x is the argument, and in our case it is kr. It is seen that when x is real then
(1)
(2)
hl (x) is the conjugate of hl (x), in our case kr is always real, as it is the product
(1)
of wave number and the radius or the distance from the origin. hl (x) eikr and
(2)
hl (x) eikr [9], hence Hankel function of the first kind represents an outgoing wave
where as the other one represents an incoming wave, these solution are used depending
19
upon the location of our sources, in our case the sources lie out side the measurement
sphere (refer to the explaination in the chapter 1 about soap bubble) hence we would
be interested in the incoming wave for the analysis of our spherical microphone array.
Figure 2.3: Spherical Bessel function of the first kind jl (x) (left) and the second kind
yl (x) (right) for order l {0, 3, 6} [11]
Figure 2.3 shows the behaviour of these functions for different level or order l with
respect to the argument x. Here we would like to point out few conclusions, as seen
from the plots the spherical bessel function of the first kind are finite at the origin
but then for higher orders that is l > 0, there is an initial region where the function
remains zero except for the case of j0 (x), also the function of second kind experiences
a negative non fininte behaviour near the origin. Hence firstly we would like to state
the obvious that is from equation 2.23, we can say that the spherical Hankel function
are singular at x = 0. The other consequence which is of importance in latter part
of our analysis is that as the function shows depreciating behaviour near origin or in
cases where l > x, where for us x is kr, that is the product of wave number and radius
or simply we may call it a measure of frequency of the acoustic wave. We notice that
shperical wave solution gives us a kind of damped response for lower frequency regions
and in situation where we use a higher value of level l, also refered as transform order,
we loose low frequency information of the acoustic wave and in order to retrieve it we
try to amplify the signal extensively, these conclusions would again be recalled when
we talk about interior-exterior problem and radial filter components or mode strength
for rigid sphere in plane wave decomposition.
20
Pl (x) =
1 dl 2
(x 1)l
2l l! dxl
(2.24)
The function Plm (x) which has two indices are known as associated Legendre functions
where m 6= 0. For positive m
Plm (x) = (1)m (1 x2 )m/2
dm
Pl (x)
dxm
(2.25)
(l m)!
Pl (x)
(l + m)!
m>0
(2.26)
The property of Legendre function which makes it attractive for us is that they form
a set of orthogonal functions for each mode m. Hence the spherical harmonics are also
a set of orthogonal functions. For further details the reader is referred to [9][25]
m>0
(2.27)
21
where Ylm () is the complex conjugate of Ylm (). There are 2l + 1 different spherical
harmonics for each level l as l m l. One more property of spherical harmonics
is that they are not only orthogonal but they are orthonormal too[9, page 191].
Z
S2
Ylm ()Ylm
0 ()d = l0 l m0 m
(2.28)
here l0 l is the Kronecker delta, which is 1 for l0 = l and 0 otherwise. The surface
integral is defined as
Z
Z
d
d =
S2
sin d
(2.29)
As said above the any fuction on a shpere can be decomposed into the sum of spherical
harmonics [9, page 192] , [29, page 202] .
f () =
l
X
X
flm (k)Ylm ()
(2.30)
l=0 m=l
this expression can also be termed as inverse spherical Fourier transform (ISFT) [29].
As the spherical harmonic functions are orthonormal hence we can obtain the spherical
fourier transform coefficients, given as
Z
flm (k) =
S2
(2.31)
The derivation for this expression can be refered in [29, page 202] and [11] in appenddix
(A.1). The importance of these expression presented above is that with the help of
these expression we obtain our spherical wave decomposition and in turn the plane
wave decomposition.
The spherical harmonic functions are further depicted in figure 2.4 for levels l {0, 1, 2, 3}.
In the expression for spherical harmonics in equation 2.21, the Legendre function Plm
represents standing spherical waves in and the factor eim represents traveling spherical waves in [17].
22
1
1
er +
e +
e
r
r
rsin
(2.32)
(2.33)
(2.34)
23
Solving these equation we obtain the expression for radial velocity component
w(r, , k) =
1
p(r, , k)
i0 ck
r
(2.35)
X
l
X
(1)
(2.36)
l=0 m=l
Now we focus more rigorously on the Interior Problem, as this is more interesting for
our work and therefore all explainations would be done with regard to interior problem. In interior problem analysis, the sound sources are located outside the spherical
24
p(r, , k) =
X
l
X
(2.37)
l=0 m=l
where p(r, , k) is the sound pressure at point (r, ), k is the wavenumber and Alm (k)
is the coefficient of the spherical harmonics Ylm () of order l and mode m and jl (kr)
is the spherical Bessel function of first kind.
25
1 X X
Alm (k) jl0 (kr) Ylm ()
w(r, , k) =
ic0 l=0 m=l
(2.38)
= jl0 (kr) k
r
kr
r
(2.39)
As we are using spherical microphone array and hence we can describe the pressure
at any point on the surface of the spherical microphone array in the same fashion as
presented in the interior problem. This would become more clear in the later sections.
26
expanded in terms of its spherical harmonics [29] we follow the procedure as described
in Appendix A.1, on following the same treatment with equation 2.37 we obtain
1
Alm (k) =
jl (kr)
Z
S2
(2.40)
The expression for Alm (k) is also called spherical wave spectrum as it can be regarded
as spherical Fourier transform of p(r, , k) [9], also written as
1
Plm (r, k) =
jl (kr)
Z
S2
(2.41)
Plm (r, k) describes the sound wave in frequency in terms of wave number or k-space.
(r, ) (r, , )
P
P
r
z
rs
y
Figure 2.7: Geometrical description for the calculation of pressure p(r, , , k) at point
P for source at Q
27
We consider a point source, also termed as a monopole at the origin O. The pressure
p(r, k) at point P is given by the expression [9, page 198]
p(r, k) = ip0 (k)ckQs
eikr
4r
(2.42)
Here r is the length of the position vector r for point P , c is the speed of sound, and
k is the wave number. Qs represents the source strength. It is the amount of fluid
volume injected into the medium per unit time[9, page 198, 37]. The sound radiation
from a monopole is omnidirectional hence it is independent of angles and . p0 (k)
is the magnitude of the source at origin.
Now if we want to calculate the pressure field at point P due to a source located at
point Q then this can be done by some geometrical manupulation on equation 2.42.
Assume the same monopole to be located at Q with distance rs = krs k from the origin.
If we say r0s = rs then the pressure at point P due to source at Q would be quivalent
to the pressure at P 0 due to the source at the origin O. Therefore pressure p(r, , k)
at point P for a source at Q is
eikkrrs k
4kr r k
| {z s }
(2.43)
28
(2.44)
where p0 (k) is the magnitude of plane wave, r is the position vector (r, ), and k is
the wave vector. Assuming p0 (k) = 1 for the purpose of derivation and using equation
2.44 in 2.37 we get
ikr
X
l
X
(2.45)
l=0 m=l
Here k and r are the wave vector and position vector respectively. Over here we would
like to point out that the plane wave which was described in vector domain by wave
vector and position vector in equation 2.44 is expressed in terms of wave number k
and scalar distance r. More description about this is given in the A.2.
Equation 2.45 can be further transformed as explained in [9, page 227] and is given
as
eikr = 4
il jl (kr)
l=0
l
X
Ylm () Ylm (0 )
(2.46)
m=l
here 0 0 , 0 is the incident direction of the plane wave, where as is the point
where we want to observe the pressure field. From equations 2.45 and 2.46 we can
draw out a conclusion that
Alm = 4 il Ylm (0 )
(2.47)
and we observe from this that the spherical wave cofficient Alm for plane wave sound
field are not dependent on k or frequency f of the wave.
29
In [11, page 18] equation 2.46 has been simulated for a plane wave sound field of 1
kHz. The simulation has been shown for different maximum value of level l and finally
it was deduced that the plane wave field can be approximated exactly only within a
bounded region around the origin and this region is bigger for higher values of l. If
we say that in equation 2.46, in place of in the first summation, we replace it by a
maximum level l = L, then we can establish an approximate rule given by
d
L
=
(2.48)
here d is the radius of the region, L maximum level l and is the wavelength of
the plane wave. This prportionality states the fact that the region for which we can
effectively define the pressure field is proportional to the level l. For reference plots
are provided in the Appendix A.3.
s(r, , k) =
X
l
X
(2.49)
(2.50)
l=0 m=l
1
Alm (k) =
bl (kr)
Z
S2
30
here s(r, , k) is the spherical microphone array response. The term bl (kr) is called
as mode strength. For different microphone array structure the interaction of sound
fields with the array is approximated using this term [9] [32]. In general we define two
types of spherical array structures
Open sphere configuration
Rigid sphere configuration
In open sphere configuration we have a single microphone mounted on a robotic arm
and according to predefined microphone positions, measurements are done for respective positions on the sphere. In rigid sphere the sensors are arranged on a solid sphere.
In appendix A.4 images for open sphere and rigid sphere configuration are given.
4il jl kr,
bl (kr) =
l
4i jl kr
jl0 (ka)
0(2)
hl (ka)
(2)
(2.51)
0(2)
here jl (kr) is the spherical bessel function of first kind, hl (kr) and hl (ka)h(2) are
the spherical Hankel function of second kind, ()0 denotes the derivative, and a is the
radius of the sphere, where r a.
The rigid sphere configuration is better than the open sphere configuration [31] [17]
[32]. The major disadvantage of the rigid sphere configuration is that it interferes
or interacts with the surrounding sound fields. The mode strength does accounts
for the scattering effect caused by the rigid sphere while calculating for the incident
waves. Although the scattering effect is negligible for small spheres but it become
more prominent when a larger sphere configuration is used. Hence in case of larger
sphere, measurement should be done more accurately as the scattered waves can be
considered as additional incident waves when they get reflected by other objects in the
measurement environment and impinge on the sphere [31].
In figure 2.8 the mode strengths bl (kr) is plotted as a function of kr and for different
order l, in the figure order l are represented by the alphabet n.
The major advantage of using rigid sphere configuration is the improved numerical
conditioning as in equation 2.50 the spherical coefficient Alm contains a term 1/bl ,
31
Figure 2.8: Mode strength for rigid sphere array and open sphere array [31]
32
and as bl is zero for some cases in open sphere configuration but not in the case of rigid
spheres [17] [31] [33].
1 X
wq s(r, , k) Ylm ()
Alm (k) Alm (k) =
bl (kr) q
(2.52)
where Alm (k) is the approximated spherical coeffecient, Q is the number of microphone
positions and wq are the quadrature weights. The weights wq are the factors which are
used for compensation in different types of quadrature schemes so as to approximate
the sound field as closely as possible to the continuous aperture
Spherical microphone arrays perform spatial sampling of sound pressure defined on a
sphere and similar to time-domain sampling spatial sampling also requires to be limited
in band width i.e., limited harmonic order l to avoid aliasing [31] [34].
Hence in order to avoid spatial aliasing the following equation must hold good [8, page
44]
Alm (k) = 0,where l > Lmax
(2.53)
Here Lmax is the highest order spherical coefficient of the sound field. The equation
given in 2.53 must be ensured in sampling the sphere otherwise spatial aliasing will
33
corrupt the coefficients at lower orders. A more detailed analysis for spatial aliasing
in spherical microphone arrays is presented in [34].
The sampling of level-limited (the word level/order are used interchangeably and refer
to l) sound fields can be done in many different ways as explained in [35] [31] [8]. These
quadrature allow us to perform sampling on the sphere with negligible or no aliasing
as far as equation 2.53 holds good.
Commonly there are three sampling schemes, a more detailed mathematical description
of these sampling schemes can be found in the references provide above.
1. Chebyshev quadrature, the sampling is characterized by uniform sampling in
elevation and azimuth . The total number of microphones in this scheme are
given as Qch = 2Lmax (2Lmax + 1)
2. In Gauss-Legendre quadrature the sphere is sampled uniformly in azimuth
but in elevation it is sampled at the zeroes of the Legendre polynomials of level
Lmax + 1. Number of microphone position required in this scheme are given as
QGL = Lmax (2Lmax + 1)
3. Lebedev grid, in this quadrature scheme the microphone positions are uniformly
spread over the surface of the sphere such that each point has the same distance
to its nearest neighbours.
4
QLb = (Lmax + 1)2
3
(2.54)
In this work we use Lebedev grid as it has an advantage over the other two schemes
and that is, it uses a smaller number of microphones positions for the approximation than the other two. A more detailed description of the lebedev grid is given in
[36] [37] [38] [39]. Reference [39] gives the Fortran code for calculating the grid points
and weights for levels upto l = 131.
Using the approach of quadrature for discretization of the sphere we require a level
limited sound field in order to get an aliasing free sampling but for plane wave sound
fields the restrictions to a maximum level Lmax is not true as we can see this from
equation 2.45 and 2.46 which involve infinite number of non-zero spherical coeffecients
Alm (k). Hence some degree of spatial aliasing does occurs. But refering to section
2.2.1 we get to know that the spherical Bessel function jl (kr) decay rapidly for kr > l,
34
(2.55)
where mode strength bl (kr) is defined in the previous section. Now considering a single
unit amplitude plane wave arriving from 0 = (0 , 0 ) we can get Alm (k) from equation
2.47 and putting this value in equation 2.55 we get
flm (k) = 4il bl (kr)Ylm (0 )
(2.56)
Now this is coeffecient for SFT for a single plane wave, we would now generalize this
for an infinite number of plane waves with the assumption that they have magnitude of
35
w(0 , k) and are arriving from all the directions 0 . Integrating equation 2.56 for all
the incident directions we have the expression for spherical fourier coeffecients flm (k)
Z
w(0 , k)Ylm (0 )
(2.57)
The expression in equation 2.57 is termed as the spherical fourier transfor of amplitudes
w(0 , k) and we express it as wlm (k)
wlm (k) = flm (k)
1
4il b
(2.58)
l (kr)
For obtaining the amplitude ws (s , k) of any plane wave arriving from any direction
s we perform an inverse SFT of equation 2.58
ws (s , k) =
l=0 m=l
X
X
flm (k)
1
.Ylm (s )
l
4i bl (kr)
(2.59)
ws (s , k) is also called directivity function and describes the decomposed plane wave
for a particular direction s . s is also known as steering direction of the microphone
array, and tells the direction for which plane wave decomposition is computed.
Further if we use equation 2.55 in equation 2.59 we get the expression for plane wave
decomposition in terms of spherical harmonic coeffecients Alm (k).
l=0 m=l
X
X 1
ws (s , k) =
Alm (k) Ylm (s )
l
4i
(2.60)
36
It has been shown in [17] that directivity decreases for lower values of order l and this
directivity pattern has been quantified in [17] and [11, page 39] by expression
ws () =
L+1
(PL+1 (cos ) PL (cos ))
4(cos 1)
(2.61)
Here is the angle between arrival direction of plane wave 0 and steering direction
of the microphone array s . PL () is the Legendre polynomial of level l. ws () is the
directional weight and it defines the spatial resolution for plane wave decomposition
calculated with a maximum level L. Refer figure 2.10
PWD. 0 = 180
is a relation derived in [17] which tells us the extent to which plane
L
wave decomposition with a particular level L can decomepose a wave field in different
plane waves in spatial sense. Figure 2.11 is approximated by the relation 0 = 180
.
L
37
L
Figure 2.11: Half resolution of the PWD [17]
3 ERROR ANALYSIS
38
3 ERROR ANALYSIS
3 ERROR ANALYSIS
39
Microphone Noise
Positioning error
Spatial
Aliasing
Elevation
error
Azimuth
error
(3.1)
where k is the wave number, r is radius of the sphere, eq is the noise introduced
by the microphones and 0q is the microphone position with positioning errors. The
spherical harmonic coeffecients Alm (k) can be calculated by using equation 2.52 which
is explained in 2.6. Keeping the said equations in mind we obtain the following
1
Alm (k) =
bl (kr)
q=1
X
Q
wq s(r, 0q , k) Ylm () +
q=1
X
!
wq eq Ylm ()
(3.2)
3 ERROR ANALYSIS
40
In this equation Q is the number of microphones, wq are the quadrature weights and
bl (kr) is the mode strangth for rigid sphere configuration refer section 2.5. The correct
microphone position as defined by the sampling scheme are denoted by q . Now we
express the sound field s(r, 0q , k) in terms of the correct spherical harmonic coeffecients
Al0 m0 (k) using equation 2.49 in section 2.5 and substituting it in equation 3.2
1
Alm (k) =
bl (kr)
" q=1
X
0 =l0
X
l0 =0 mX
#
wq Y
m0
l0
(0q )
Yl
(0q )
l0
q=1
wq eq Yl ()
m
{z
0 0 + (l, m, l0 , m0 ), where l, l0 L
X
ll
mm
max
m0
0
m
0
wq Yl0 (q ) Yl (q ) =
(l, m, l0 , m0 ) + (l, m, l0 , m0 ), where l L
Q
q=1
max
< l0
(3.4)
Here l0 l and m0 m are Kronecker deltas. The maximum level Lmax is the highest level
of the spherical harmonic coeffecients Al0 m0 (k) inside the sound field which is sampled
using Q microphone positions, the relation for L and Q could be seen in section 2.6
equation 2.54 for Lebedev grid. In the first part of equation 3.4 the level l0 < Lmax
hence we do not see aliasing error in that expression.
Also from Kronecker deltas we see that if = 0 then 0q and q should be equal,
hence represents the positioning error.
In the lower part of equation 3.4 we consider l0 > Lmax hence we say spatial aliasing
would be there. Since l and l0 are different terms l0 l m0 m does not appears in this
part.
3 ERROR ANALYSIS
41
a (l, m, l , m ) =
q=1
X
0
0
m
wq Ylm
0 (q ) Yl0 (q ), where l Lmax < l
0
(3.5)
For positioning error we obtain it by subtracting equation 3.4 from equation 3.5 [31]
(l, m, l , m ) =
q=1
X
0
0
m0
w q Yl m
Ylm (q ), where l Lmax , l0 0 (3.6)
0 (q ) Yl0 (q )
Finally if use equation 3.4 in equation 3.3 and separate the summation over l0 we get
the expression for spherical harmonic coeffecients with all the error contributions [31]
0
l
1 X X
Alm (k)=
Al0 m0 (k) bl0 (kr) l0 l m0 m
bl (kr) l0 =0 m0 =l0
|
{z
}
(s)
l
1 X X
+
Al0 m0 (k) bl0 (kr) (l, m, l0 , m0 )
bl (kr) l0 =0 m0 =l0
{z
}
|
()
l
1 X X
+
Al0 m0 (k) bl0 (kr) a (l, m, l0 , m0 )
bl (kr) l0 =0 m0 =l0
{z
}
|
(3.7)
(a)
1 X
+
wq eq Ylm (q )
bl (kr) q=1
{z
}
|
(e)
In equation 3.7 the first term refer to the error free contribution in spherical harmonic
coeffecients Alm(k). As the Kronecker deltas would be one, hence the first term simplifies to Alm (k). All the other terms represent the errors. From the equation itself we see
that the errors depend on level l, kr and the quadrature. Although we are using rigid
3 ERROR ANALYSIS
42
sphere configuration but as mode strength bl (kr) has different expression for different
microphone configuration hence the errors are also dependent on array configuration.
Finally we can obtain the expression for plane wave decomposition by using equation
3.7 and substituting it in equation 2.60, we get the expression for directivity function
()
in plane wave decomposition. Each term Alm (k) in equation 3.7 yields the contribution
of that particular error to the direction weights ws
ws() (s , k)
X
l
X
l=0
1
()
Alm (k) Ylm (s )
l
4i
m=l
(3.8)
()
where s is the steering direction of spherical microphone array and Alm (k) can any of
(s)
()
(a)
(e)
the four different components in equation 3.7; Alm (k), Alm (k), Alm (k) or Alm (k). In
order to get the effective influence of the measurement errors on results of plane wave
decomposition we relate the error contribution in equation 3.7 to corresponding signal
contribution and we look for relative error contribution by taking ratio of the squared
absolute values of different errors with respect to signal contribution [31].
2
(a)
ws (s , k)
Ea (kr) =
2
(s)
ws (s , k)
2
()
ws (s , k)
E (kr) =
2
(s)
ws (s , k)
2
(e)
ws (s , k)
Ee (kr) =
2
(s)
ws (s , k)
(3.9)
Equation 3.9 Noise to signal ratios are calculated. Figure 3.2 shows the behaviour of
different errors; noise, positioning and alising, for different levels l.
On comparing various quadratures for spatial aliasing, microphone noise and positioning error the Lebedev quadrature is found to have better robustness against the
errors in general. Due to these characteristics we use Lebedev grid along with rigid
sphere [31].
3 ERROR ANALYSIS
43
3 ERROR ANALYSIS
44
significant affect in regard to the microphone noise and they all behave in a similar
way. But as the noise affect is more at low kr and hence, we can say that it limits the
array performance on lower frequencies.
1 X
Alm (k) Alm (k) =
wq s(r, , k) Ylm ()
bl (kr) Q
(3.10)
Q is the number of microphone positions and wq are the weights used to approximate
for respective microphone positions. As the number of microphones is limited or finitely
defined hence our transform order also gets limited. For different quadrature schemes
there exists a relationship between maximum level l which can be used for calculation of
spherical harmonic coefficients and number of microphones Q. Section 2.6 describes
3 ERROR ANALYSIS
45
various sampling techniques, our work is based on Lebedev grid which is given in
equation 2.54 as
4
QLb = (Lmax + 1)2
3
(3.11)
Here l = Lmax is the maximum level used, and Lmax in turn corresponds to the
number of microphone positions. Hence number of microphone restrict l to Lmax or
for a particular Lmax the above relation for Lebedev grid should hold good.
Now as the plane wave fields are not level limited because they are represented by an
infinite series of spherical harmonics, shown in equation 2.45 and reproduced again
over here
ikr
l
X
X
(3.12)
l=0 m=l
we notice that they contain an infinite number of non zero coeffecients Alm (k), hence
aliasing would be there due to higher orders.
We would elaborate further about this over here. First if we look at the spherical
Bessel function in figure 2.3, for better readabilty figure is reproduced over here.
Figure 3.3: Spherical Bessel function of the first kind jl (x) (left) and the second kind
yl (x) (right) for order l {0, 3, 6} (x is argument in the plots, x = kr)[11]
The spherical Bessel function jl (kr) decays rapidly for kr > l in the figure, plots
of different levels are depicted, and spherical Bessel function curve for higher orders
becomes more and more damped. Now if we look at the spherical harmonic coeffecient
3 ERROR ANALYSIS
46
Alm (k) values for each l, in cases where kr > l, Alm (k) are significantly low due to the
behaviour of spherical Bessel function, but it can be said that coeffecient values for
kr < l would be present.
Although as l increases the spherical bessel function curve settles more and more closer
to x axis, but if full spectrum sound wave is considered then to some extent for every
kr < l, coeffecient values would be present, where l is not level limited.
As we have a limitation by quadrature that QLb = 34 (Lmax + 1)2 , and because we can
only have a limited number of microphone positions, hence plane wave field where
coeffecients for higher frequencies and higher l are present need a higher number of
microphone positions to sample and obtain the coeffecients successfully, but this is not
possible because Q that is the number of microphone is limited. Hence coeffecients
the higher values of l would be sampled erroneously that means spatial aliasing would
occur.
As Bessel function for higher values of l are very low for values of kr that are higher
than l, but still components for kr < l would be there, hence we put a limit on kr that
is kr Lmax to subdue this effect.
Spherical coeffecients of plane wave given in section 2.45 are not level limited and
should contain higher level l, but this does not occur as the expression for plane wave
given above contains bessel function which have very low significant values for cases
where l > kr. Hence levels till l kr would be defined in the spherical harmonic
coeffecients of plane wave and others will be insignificant or non existant.
Hence Alm (k) = 0, l > Lmax would hold good if the condition kr << Lmax is kept.
Considering the expression for sound field given in terms of spherical harmonic coeffecients (section 2.5) in equation 2.49 and putting it into equation 3.10 with some
rearrangement of the terms we get the expression
" Q
#
l0
X
X
X
1
0
m
Al0 m0 (k) bl0 (kr)
w q Yl m
Alm (k) =
0 (q ) Yl (q )
(3.13)
bl (kr) l0 =0 m0 =l0
q=1
|
{z
}
Z
3 ERROR ANALYSIS
47
0
0
0
0
m
0
w q Yl m
(3.14)
0 (q ) Yl (q ) = l0 l m0 m + a (l, m, l , m ), where l Lmax < l
q=1
This is an approximation hence we get an additional term a (l, m, l0 , m0 ) which represents the aliasing error induced because of sampling. This aliasing error is the same
term which is the first part of the second case in equation 3.4
As in section 3.3 we said that microphone noise limits the performance of spherical
microphone array in lower frequency regions or at lower kr, in the case of spatial
aliasing, array performance is limited at higher frequencies or for larger radii.
48
The spherical microphone array data is auralized using wave field synthesis (WFS).
In this chapter we explain the basic underlying principles of WFS and connect it to
spharical microphone array analysis done in chapter 3, finally we combine the spherical
microphone array analysis and WFS in chapter 5 and auralize the different error cases
for perceptual evaluation.
49
ble to drop the primary source because the cumulative contribution of the secondary
sources recreates the wave front originally produced by the primary source.
According to Huygens proposition every partical P in the medium which encounters a
wave, propagates it further to nearby partical P which is kind of directly aligned with
it and with the source of the wave A and they all are in the direct path of the wave.
Now apart form communicating the wave to particle P it would also induce the wave
impact on other particle P .Hence it can be said that around every particale there is
a wave of which this particle would be the centre of propagation. in this explaination
of Huygens proposition, particle P corresponds to the secondary source and A would
be the primary source. This explaination becomes more clear with the following figure
4.3.
P
P
50
Figure 4.2: Placement of secondary sources in huygens principle and for sound field
reproduction [12]
and phases for the secondary sources. The frequency domain amplitude and phases
for the position of a given virtual source and a secondary source are transformed to
time domain where they finally define the filters used for deriving the WFS set up.For
every virtual source and every loudspeaker a synthesis operator is defined as shown in
4.2
Figure 4.3: Placement of secondary sources in huygens principle and for sound field
reproduction [12]
In figure 4.2 W F Si represents the synthesis operator which handles an audio content
from a wave file defining the sound to be reproduced in prescribed virtual position
for loudspeaker i. As an example in this thesis we generate a 12 channel sound file
51
Kirchhoff-Helmholtz
Kirchhoff-Helmholtz
Integral
Integral
Elemination
Elemination of
of
Dipoles
Dipoles
Secondary
Secondary Source
Source
Selection
Selection
Linear/planar
Neumann Greens Function
Neumann Greens
Function
Correction
Correction of
of Source
Source
Mismatch
Mismatch
Point sources/
Synthesis in aplane
Exact
Exact Sound
Sound Field
Field
Synthesis
Synthesis
2 1 - dimensional
2
WFS
Z
(P G GP )
(4.1)
52
Figure 4.5: Derivation of sound pressure using Greens theorem and wave equation [13]
where G is called the Greens function, P is the pressure at the surface caused by
an arbitrary source distribution outside the enclosure, and n is the inward pointing
normal unit vector to the surface.
2 G + k 2 G = 4(r + rA )
(4.2)
The pressure at a point A can be calculated if the wave field of an external source
distribution is known at the surface of a source-free volume containing A. General
form of the Greens function is
G=
exp(jkr)
+F
r
(4.3)
Here F may be any function satisfying the wave equation 4.2 with the right-hand term
set to zero. For the derivation of the Kirchhoff integral F = 0 is chosen, and the space
variable r is chosen with respect to A.
r=
p
(x xA )2 + (y yA )2 + (z zA )2
(4.4)
G represents the wave field of a point source at A. The physical interpretation of these
choices is that a imaginary point source must be placed at A to determine the acoustic
wave paths from the surface towards A.
P
= j0 Vn
n
(4.5)
53
Substitution of the solution for G and the equation of motion 4.5 into integral 4.1 we
finally obtain the Kirchhoff integral for homogeneous media.
1
P =
4
Z
1 + jkr
exp(jkr)
P
dS
cos exp(jkr) + j0 Vn
r2
r
S
(4.6)
In the equation 4.6 the first term represents a dipole source distribution driven by
the pressure P at the surface S and the second term represents a monopole source
distribution driven by the normal component of the particle velocity Vn at the surface.
The pressure at a point A can be synthesized by a monopole and dipole source distribution (combinedly called secondary source) at a surface S. The strength of the
distributions depends on the velocity and pressure of external sources measured at the
surface.
Since A can be anywhere within the volume enclosed by S, the wave field within that
volume is completely determined by equation 4.6. The positive lobes of the dipoles
interfere constructively with the monopoles inside the surface, while the negative lobes
of the dipoles exactly cancel the single positive lobe of the monopoles outside the
surface hence outside S the integral is zero. As complexity of the Kirchhoff equation
in equation 4.6 is because of cancellation of wave field of secondary sources outside the
surface S, but for application in our analysis this property is not that important and
hence we derive two special cases of the Greens Function that simplify the Kirchhoff
integral, but we consider some conditions
Fixed surface geometry
54
55
figure above. All sources are located in the half space z < 0, so for any value of the
radius R the volume enclosed by S1 and S2 is sourcefree. The pressure in A is now
found by substitution of equation 4.3 in integral equation 4.1. S1 is the infinite surface
at z = 0.
Z
1
exp(jkr)
exp(jkr)
P
P =
P
+F
+F
dS
(4.7)
4 S1
n
r
r
n
We choose a function F such that it cancels the first term, that is the term representing
the dipole distribution. In the case of Rayleigh II we choose F such that the second
term cancels out and we get the first term of the integral only which represents dipole
distribution. In order to achieve monopole distribution only the function F should be
such that the normal component of the gradient of F should have opposite sign to that
of the normal component of gradient of exp(jkr)/r.
In case of Rayleigh I we take F = exp(jkr0 )/r0 which is pressure at A0 , the image of
A mirrored in the plane S1 . Finally after solving we get Rayleigh I integral
1
P =
2
Z
j0 Vn [exp(jkr)/r] dS
(4.8)
S1
This equation states that a monopole distribution in the plane z = 0, driven by two
times the strength of the particle velocity components perpendicular to the surface,
can synthesize the wave field in the half space z > 0 of a primary source distribution
located somewhere in the other half space z < 0. Hence with the aid of the Rayleigh
I integral the wave field of a primary source distribution can be synthesized by a
monopole distribution at z = 0 if the velocity at that plane is known. Refer figure 4.8
The Rayleigh I solution can be applied for synthesis if the wave field in the half space
z < 0 is of no interest as the monopole distribution will radiate a mirror wave field
into the half space z < 0, since there are no dipoles to cancel this wave field.
Rayleigh II
1
P =
2
Z
P
S1
1 + jkr
cosexp(jkr)dS
r2
(4.9)
In Rayleigh II integral the wave field of a primary source distribution can be synthesized
by a dipole distribution at z = 0 if the pressure at that plane is known.
56
57
to a line integral. We consider a primary source S in the xz plane, the pressure field
of the primary source is given by
P (r, ) = S()G(, , )
exp(jkr)
r
(4.10)
1
=
2
exp(jkr)
j0 Vn (r, )
dxdy
r
xyplane
(4.11)
Vn (r, ) is the velocity component of primary source perpendicular to xy- plane. The
surface integral will reduce to a line integral along x axis by evaluating the integral in
y direction along line m. Using equation 4.5 defined earlier to equation 4.10 which
58
gives pressure field of the source and then applying method of stationary phase by
Bleistein given in [44] for solving the integration we finally get wave field at point R.
r
Psynth = S()
jk
2
exp(jkr0 ) exp(jkr0 )
r0
G(, 0, )
dx (4.12)
r0 + r0
r0
r0
The Rayleigh I integral is evaluated for a primary source S and receiver R, both located
in the xz-plane. Vector r points from the primary source to a secondary source M at
the line m. Vector r points from secondary source to receiver. Here we see that all
secondary sources M along the line m are approximated by a secondary point source
at the intersection of m and x axis. In this we first tried to approximate the secondary
sources along the vertical line m and then integrated it along x axis taking care of the
phase. Hence finally the integration converges to one dimension along x axis but maps
the whole surface. We define a function called driving function for a point mono pole
at x axis which approximates the line m.
r
Qm (x, ) = S()
jk
2
r
exp(jkr)
(, 0, ) cos
r + r
r
(4.13)
Psynth =
Qm (x, )
exp(jkr)
dx
r
(4.14)
59
Finally we get the wave field synthesis equation considering the driving function
Qm (x, ) and the stationary phase method.
Psynth = S()G(0 , 0, )
exp(jk( + ))
( + )
(4.15)
where is the vector pointing from primary source to stationary phase point. The
function is independent of receiver position at z = z0 . Hence in the situation of
different receiver points which are on the line z = z0 , the synthesis still holds correct.
If we see the driving function
r
Qm (x, ) = S()
jk
2
z0
exp(jkr)
G(, 0, ) cos
z0 + z0
r
(4.16)
There is a deviation because of the Rayleigh operator which results in analysis from
p
planar to linear and because of this there is a gain factor z0 /(z0 + z0 ), now the
wave field which is reproduced is correct in phase but slightly deviates in amplitude
because of this gain factor and if we choose the distance for reference line appropriately
the amplitude error can be kept small in large listening regions.
With some more analysis the wave field solution in the previous section which were
defined for straight line geometry can be extended for an arbitrary shape of secondary
source line and receiver line. The figure below depicts the scenario.
The pressure at the receiver line is approximated as:
Z
PR =
Qm (r, )
exp(jkr)
dl
r
(4.17)
Here the integration is along the secondary source line and the driving fuction is defined
as
r r
jk
r
exp(jkr)
(4.18)
Qm (x, ) = S()
cos
2 r + r
r
here r is the length of the vector from the primary source to the secondary sources,
is the angle of incidence of r at the secondary source line and r is the vector from
secondary line to reference . The driving function here allows flexibility in the synthesis
array for the speakers. Over here we have only considered the case of monopoles simi-
60
Figure 4.12: Approximation to a line integral for a non uniform line geometry [13]
larly we can derive it for dipole arrays also. There are some other similar explanations
of wave field synthesis, one is by Spors and Rabenstein [45] [13].
61
dure and that intends to optimize the sound field produced by the loud speaker array
[46] [47].
According to the Kirchhoff-Helmholtz integral, sound recording should be performed
by microphone array. However, it should be noted that the WFS system as defined
by Berkhout [48] [41] relies on the concept of notional sources, which consists in
substituting close microphones for the microphone array. Each primary source is thus
picked up by one individual microphone and the microphone signal is then propagated
to the virtual microphone array by applying amplitude weight and time delay. Therefore each microphone signal can be identified as to one individual primary source and
may be considered as a virtual substitute for this source, i.e. a notional source.
As a result of this notional source concept the WFS now only deals with the reproduction of the virtual sources, that is the notional sources by a loud speaker array.These
sources are described by specifying their position within the virtual sound scene according to the parameterization. They are reproduced as monopoles located outside
or even inside the listening room.
It is found that aurakization with 2D WFS system has more advantages than compared
to 2 12 D WFS system.In [5] this conclusion is suggested. The resoning for this is the
fact that amplitude errors play a different more prominent part in the reproduction of
room acoustics than dry source reproduction.
One more important limitation with WFS is that the spatial aliasing artifact by it
during spatial sound reproduction. As we saw in the case of microphone arrays , in
WFS also our array is discretized and because of this there is always a spatial aliasing
limit of a WFS system. Frequencies beyond this limit would not be reproducted by
the WFS system accurately.
5 LISTENING TEST
62
5 LISTENING TEST
The work presented above is basically done to understand the effect of various errors
and artifacts which get involved due to processing with spherical microphone arrays.
Mathematically, various factors for spherical microphone arrays have been analytically
exlained in previous chapters. The extent to which these errors and artifacts seep
through the rendering process during auralization and interfere in a corruptive way
needs to established. Spherical microphone array analysis discussed in this thesis
are in particular focused for auralization purposes hence at last it all comes down to
listening test. It is important to know that mathematically if there is some artifact
does it really gives a perceptual effect also and to what extent.
5 LISTENING TEST
63
chapter 3 different errors are described. The aim of this chapter is to check and analyse
the perceptual effect of those errors when spherical microphone array data is auralized
using WFS.
The three factors which induce undesireble artifacts in the spatial sound reproduction
are defined in previous chapters, they are
Microphone noise
Positioning error
Spatial Aliasing
As shown in picture 5.1 the layout of loudspeakers is in a horizontal plane. The height
of this loud speaker layout from the ground is approximately equal to the head level
5 LISTENING TEST
64
when a test subject is in a sitting position. Hence the horizontal plane contains listener
in the middle surrounded by this array of loudspeakers.
Arrangement of listening position
The test subject who were invited for the listening test were made to sit roughly in the
middle of the room so that the listner is almost equidistant from all the loudspeaker
panels. The following graphic much clearly shows where the test subjects were made
to sit for the tests
Listener
Loud speaker
5.3 Auralization
In order to auralize the effect of these errors, impact of a full spectrum free field sound
wave on a sampled spherical microphone array is simulated and plane wave decomposition is done. The direction of propagation of sound wave is = (azimuth, elevation) =
(, ) = (0 , 90 ) i.e., the source lies on the horizontal plane with no vertical elevation.
Plane wave decomposition is done for 12 directions including the the direction of proagation. The simulation is done for free field case. Figure 5.3 shows the 12 different
direction for which the PWD of spherical microphone array data is done. For simulation the spherical microphone array radius r is taken as 15 cm. Free field impulse
responses for all different cases were obtained for 12 different plane wave decomposed
directions. These responses for different directions when convovled with the test audio
5 LISTENING TEST
65
(0,90)
Source direction
(270,90)
ARRAY
(90,90)
(180,90)
5 LISTENING TEST
66
For evaluation we first would define the questions for which we want to get answers
from this listening test.
1. The first question is although if we have our microphone array with enough
number of measurement positions on the sphere as defined by 2.6 then also we
face spatial aliasing (refer section 3.4) as full spectrum audio is auralized hence
the limitations kr < l would not be followed, we want to know the perceptual
effect of spatial aliasing
We investigate is the perceptual effect of changing transform order l for fixed
number of microphones. For these cases number of microphone for spherical
array was fixed to be Q = 302. The highest transform order auralized for which
we perceptually analysed the signal is L = 6, for which according to equation
2.54, Q 66 is sufficeint. The parameter Q i.e the number of microphones is 302
because for firstly it is above the minimum number of microphones required by
all conditions of L analysed in the listening test and secondly with a minimum
302 position for our base transform level L = 3 there is no noticeable spatial
aliasing.
The number of microphones is taken as 302, because the first test which we do is
used to establish a minimum required number of microphone where no aliasing
is perceptually observed. For our base transform order L = 3 at 302 sampling
points there was no aliasing and this fact is substantiated by the listening test
also. Plot are provided in the appendix.
2. For microphone noise, we add additive white gaussian noise to the frequency
domain output of the sampled sphere and then continue with the process of
plane wave decomposition. Two question are there for which we attempt to find
answers.
For a fixed transform level and a fived number of microphone positions what
is the level of microphone noise which becomes perceptially significant and
for what value it remains percetually in significant.
The other question is we investigate the effect of transform level l on the
microphone noise. We change the transform order for fixed noise level and
see how it impact for different levels
5 LISTENING TEST
67
3. The last aspect which we check is positioning errors (refer section 3.5) and
equation 3.6. The positioning error are checked againt varying transform orders
L = 3, 4, 5, 6.
The positioning error is added to the quadrature values obtained from Lebedev
grid structure. The position error is an agular offset in azimuth and elevation
which is added to = (, ). Positioning error values are normally distributed
with defined degree of standard deviation (SD) that is, we simulate the sound field
for normally distributed error values with particular level of standard deviation.
Two types of positioning errors are separately evaluated for their pecpetual effect
on the sound.
Positioning error in Azimuth: Here the error is added only in azimuth
Positioning error in Elevation: Error is added only in elevation
For both these errors a minimum value of error is investigated for base transform
order L = 3 which is found to have no peceptual impact when compared with
reference, this same error level is investigated for different transform order.
5.3.2 Processing
The flow diagram 5.4 shows the sequence of processing done for auralization spherical
microphone array data through WFS system. The block Positioning error and Microphone noise in figure 5.4 are the addition of noise and positioning error, noise is added
in the second stage to the pressure responses of microphone elements. Positioning
error is added in the first stage when discretization process is done and quadratures
are calculated.
5 LISTENING TEST
Positioning error
68
Sampling of spherical
Microphone array aperture
Mocriphone noise
Impulse Response
Convolution
with dry audio
(12 channel audio
file)
WFS
rendering
system
5 LISTENING TEST
69
Excellant
Good
Fair
Poor
Bad
5 LISTENING TEST
70
5 LISTENING TEST
71
5 LISTENING TEST
72
Microphone noise
In perceptual evaluation of microphone noise again two different test cases were structured. Refer table 5.3
Perceptual analysis of different microphone noise level: Here we analyse different
noise levels for a fixed number of microphones and a fixed taransform order. We
try to establish the case for which noise level peceptually insignificant.
Microphone noise Vs Transform order: The value of the noise level obtained
in the above test is taken and checked for auralization with different transform
orders.
Table 5.3: Test conditions for perceptual evaluation of micophone noise
Microphone noise
I. Perceptual analysis of different microphone noise level
Track Condition1 Condition2 Condition3 Condition4 Condition5 Reference
(ANCHOR)
L=3
L=3
L=3
L=3
L=3 Mic=6 L=3 ConMic=302
Mic=302
Mic=302
Mic=302
Noise=40dB tinuous
Noise=Noise=Noise=Noise=aperture
80dB
65dB
55dB
40dB
(g)
castanet
castanet
castanet
castanet
castanet
castanet
(h)
speech
speech
speech
speech
speech
speech
(i)
music
music
music
music
music
music
II. Microphone Noise Vs Transform order
Track Condition1 Condition2 Condition3 Condition4 Condition5 Reference
(ANCHOR)
L=3
L=4
L=5
L=6
L=12
L=3 ConMic=302
Mic=302
Mic=302
Mic=302
Mic=302
tinuous
Noise=Noise=Noise=Noise=Noise=40dB aperture
80dB
80dB
80dB
80dB
(j)
castanet
castanet
castanet
castanet
castanet
castanet
(k)
speech
speech
speech
speech
speech
speech
(l)
music
music
music
music
music
music
5 LISTENING TEST
73
Positioning error
For positioning error two cases were formed one for error in azimuth and the other for
error in elevation. Refer table 5.4
Positioning Error (Elevation) Vs Transform order
Positioning Error (Azimuth) Vs Transform order
Table 5.4: Test conditions for perceptual evaluation of positioning error
Positioning error
I. Positioning Error (Elevation) Vs Transform order
Track Condition1 Condition2 Condition3 Condition4 Condition5 Reference
(ANCHOR)
L=3
L=4
L=5
L=6
L=12
L=3 ConMic=302
Mic=302
Mic=302
Mic=302
Mic=302
tinuous
SD=0.15
SD=0.15
SD=0.15
SD=0.15
SD=10
aperture
(m) castanet
castanet
castanet
castanet
castanet
castanet
(n)
speech
speech
speech
speech
speech
speech
(o)
music
music
music
music
music
music
II. Positioning Error (Azimuth) Vs Transform order
Track Condition1 Condition2 Condition3 Condition4 Condition5 Reference
(ANCHOR)
L=3
L=4
L=5
L=6
L=12
L=3 ConMic=302
Mic=302
Mic=302
Mic=302
Mic=302
tinuous
SD=0.15
SD=0.15
SD=0.15
SD=0.15
SD=10
aperture
(p)
castanet
castanet
castanet
castanet
castanet
castanet
(q)
speech
speech
speech
speech
speech
speech
(r)
music
music
music
music
music
music
aware of sApatial
sound systems hence, all the test subject were given an introductory
5 LISTENING TEST
74
overview of spatial sound systems and were explained the listening test set up. This
kind of orientation was felt important as the listening test and the wave studio lab at
first always give an impression of three dimensional surround sound auralization, but
in reality our work is more focus to know the perceptual impact of various noise and
artifacts. Hence an introductory level of information and main motive of the listening
test was beifly explained to the listeners
5.6 Evaluation
5.6.1 Test subject screening
Out of 21 participants, 14 were able to identify the hidden reference and anchor and
7 did not identify either one or both of them. Hence scores for 14 test subjects were
considered valid and the other 7 were considered as outliers.
1 X
xztc
N z
s
Stc =
P
x2ztc ( z x2ztc )2
N (N 1)
(5.1)
(5.2)
where t: track
c: test condition
z: index of the test subject
N : Number of test subjects
5 LISTENING TEST
75
5.6.3 Definations
Statistical significance (p value): The statistical significance of a result is the
probability that the observed relationship (e.g., between variables) or a difference
(e.g., between means) in a sample occurred by pure chance (luck of the draw),
and that in the population from which the sample was drawn, no such relationship
or differences exist. Using less technical terms, one could say that the statistical
significance of a result that tells us something about the degree to which the
result is true (in the sense of being representative of the population). More
technically, the value of the p-value represents a decreasing index of the reliability
of a result (see Brownlee, 1960). The higher the p-value, the less we can believe
that the observed relation between variables in the sample is a reliable indicator
of the relation between the respective variables in the population. Specifically,
the p-value represents the probability that is involved in accepting our observed
result as valid, that is, as representative of the population [52].
Confidence Interval: The confidence interval gives us the information about the
reliability of the calculated mean. It is defined as the range in which the maen
would exist with a given probability if the test is reapeated [].
Calculation of confidence interval:
X tc tc , X tc + tc
Stc
tc = tp
N
(5.3)
The value of tp is extracted from the t-table distribution according to the number
of test subjects N .
Analysis of variance (ANOVA): The purpose of analysis of variance (ANOVA)
is to test for significant differences between means. In ANOVA the statistical
significance between means are tested by comparing the (i.e., analyzing) variances. In order to establish that the data obtained for different conditions show
perceptual difference, we further analyse the measured characteristics [53].
The 2-way-ANOVA analysis we get three p values (also explained in as statistical significance), if p-value is near zero this means that the associated null
hypothesis is in doubt. A sufficiently small p-value suggests that atleast one col-
5 LISTENING TEST
76
umn sample mean is significantly different than that of the other column sample
means. Interpreting for the different test conditions used in our test. Now if
p-value is sufficiently small than it proves the fact that there is some effect due
to conditions imposed by transform order. The p-value for the test conditions
is zero. This proves the fact that affect of transform order is significant. Refer
[54] [55] for more description in 2-way-ANOVA.
condition 1 represented as 1 reference
condition 2 represented as 2 number of microphone=302
condition 3 represented as 3 number of microphone=194
condition 4 represented as 4 number of microphone=86
condition 5 represented as 5 number of microphone=26
condition 6 represented as 6 anchor
5 LISTENING TEST
77
120
castanet
speech
music
Perceptual scores
100
80
60
40
20
3
4
Test conditions
5 LISTENING TEST
78
120
castanet
speech
music
100
Perceptual scores
80
60
40
20
3
4
Test conditions
Figure 5.6: Analysis of positioning error in azimuth for all three test items
Comparing both these figure we see a obvious pattern, first is that in both these
error cases increasing the transform order degrades the audio quality. But the next
interesting part is that the slope or extent to which error in elevation corrupts the
audio quality is not the same in azimuth. In the case of Elevation error, firstly all
three different test items have similar perceptual performance. There overlaping in
confidence intervals further substantiate this conclusion.
On the other hand in azimuth we see a different behaviour, for speech and music it
follows the same trend and seems as if elevation and azimuth have same effect, on
speech and music but for the castanets, azimuth error does not seem to degrade the
5 LISTENING TEST
79
120
castanet
speech
music
100
Perceptual scores
80
60
40
20
3
4
Test conditions
Figure 5.7: Analysis of positioning error in elevation for all three test items
signal in the same way as it is for other tracks. The confidence intervals although
overlap with each other but only to a small extent.
Considering the above discussion in mind we are tempted to investigate more on this
issue. In order to establish whether perceptual effect cast by these two error cases are
similar or not, we first check for the hidden significance among the different test items
in each error case.
Although in the case of elevation error if we look at the plots closely the confidence
interval among diffrerent test items i.e, music, speech, and castanet overlap to a relatively high degree, hence we can fairly conclude on the basis of confidence interval
plots that in elevation error condition 3,4, and 5 (which corresponds to transform order
4,5 and 6) share a high degree of similarity in there corruptive behaviour towards the
auralization of plane wave decomposition.
In the case of azimuth we need more evidence to establish the extent of impact. For
azimuth error we do a 2 way ANOVA analysis. In this analysis we compare effect of
test items and the effect of different conditions simultaneously.
2-way ANOVA: In our test we assume the configence interval level of 95% hence, any
p-value bigger than 0.05 that is considered high.
5 LISTENING TEST
80
The second p-value corresponds to the effect caused by test items, and second pvalue is 0.0006, this is also very small value and hence, it suggests that different test
items do have an impact on the complete test scenario for azimuth error. The third
value corresponds to the fact that there is no interaction between test items and test
conditions. As we can observe that the p-value for third category is quite high.
On performing 2-way-ANOVA on elevation error we get the following values of statistical significance.
Table 5.7: 2-way ANOVA analysis for elevation error
The statistical significance values in the case of elevation suggest that only the test
condition have there influence on the perceptual scores and test item do not have any
impact.
Finally we plotted combined confidence interval for all test items for both positioning
errors investigate whether any one of the error have a higher impact on perceptual
scores or not.
5 LISTENING TEST
81
120
elevation
azimuth
100
80
60
40
20
3
4
Test conditions
Figure 5.8: Average positioning error in elevation and azimuth for all three test items
5 LISTENING TEST
82
100
Perceptual scores
80
60
40
20
3
4
Test conditions
It is observed from the figure that as the degree of noise is increased the perceptual
response also goes down. For all the test items the response towards noise is somewhat similar. One case stands out and shows an equivalent perceptual performance in
comparison to Reference, that is case when noise level is -80 dB, i.e.,it is also observed
that noise level of -80 dB is perceptually indistinguishable in comparison to refernce.
the reference
Table 5.8: 2-way ANOVA analysis for Noise levels
5 LISTENING TEST
83
Table 5.8 gives the values for 2-way ANOVA test. From the p-values it is evident that
test items had no hidden significant influence. And as expected only the noise levels
cast a significant impact on the perceptual evaluation.
In 5.10 we have compared a noise level of -80dB against transform order which varies
from 3...6.
Figure 5.10: Noise vs transform order
120
castanet
speech
music
100
Perceptual scores
80
60
40
20
3
4
Test conditions
The plot show all the test items and there perceptual degradation when transform
orders are increased. The noise gets heavily effected even when the transform order
is changed from 3 to 4. At a transform order of 3 the test item showed equivalent
perceptual performance as compared to the reference. A 2-way ANOVA test further
substantiates the significant impact of transform orders on noise.
Table 5.9: 2-way ANOVA analysis for Noise levels vs transform order
The significance values in table 5.9 show the significance of transform order on perceptual evaluation of different test items.
6 Conclusions
84
6 Conclusions
Spherical microphone arrays were studied and analysed. The purpose of thesis was
to auralize the spherical microphone array data with the help of wave field synthesis
and then check out various errors and artifacts. Our aim was to simulate these errors
in the simulation encvironment and then design a listening test to do a perceptual
evaluation in order to establish whether in reality any particular said parameter had
any perceptual effect or not and to what extent.
There are three major limitations which affect its performance are evaluated in this
work
Spatial aliasing
Positioning error
Microphone noise
A full spectrum wave impact is sampled on a spherical microphone array and plane
wave decomposition for 12 direction is done. We simulated many test cases and analysed those test cases our selves. After indetailed analysis and auralization we design
the listening test.
For all the purposes of auralization the transform order of L = 3 was selected as the
base transform order. WFS system has a spatial aliasing frequency of 1000 Hz, and
in order to have aliasing free sampling, not only a sufficient number of microphone
positions is required but product of wavenumber and radius of the sphere kr L, this
is one of the conditions which need to be satisfied in spherical microphone arrays, on
the other side the WFS spatial aliasing should also not be crossed. The model filters
have a shape of bandpass filters except l = 0. Therefore, keeping L = 3 would restrict
6 Conclusions
85
the bandwidth on spherical microphone array side also and that is why for L = 3 we do
not see any significant corruption of the signal specifically by spatial aliasing artifacts.
Three different type of audio tracks were used. They are music, speech and castanet
Following conclusions were drawn on the basis of listening test.
1. For the perceptual analysis of each category of error we simulated the free field
data with a fixed degree of errors and artifacts and presented it in the listening
test experiment.
2. Spatial aliasing get magnified as we inrease the transform order in spherical
microphone array processing
3. The errors were evaluated for there perceptual identifiability as the transform
order is increased. It is found from the test results that microphone noise gets
amplified with the inrease in transform order.
4. The simulated spherical microphone array was based on lebedev grid sampling
scheme and it is noticed that even after having way more than the required
number of sampling positions aliasing effects were observed. It is concluded that
the number of microphone positions as per the calculation of lebedev grid do not
necessarily provide a aliasing free impulse response measurement.
5. The degradation of perceptual quality with increasing number of taransform
order is very steep.
6. Positioning errors in azimuth and elevation also get amplified with increasing
tansform order.
7. Azimuth error is found out to be influenced by the audio tracks also (substantiated by 2 way ANOVA test).
8. It was observed that in all the cases speech is affected badly by all the error
categories equally.
Further as a next step real room auralization could be implemented and then these
errors could be check again.
Bibliography
86
Bibliography
[1] D. Malham, Higher order ambisonic systems for the spatialisation of sound, in
Proceedings, ICMC99, Beijing. Beijing, China: International Computer Music
Association, 1999, pp. 484487.
[2] M. Kolundzija, C. Faller, and M. Vetterli, Reproducing sound fields using mimo
acoustic channel inversion, J. Audio Eng. Soc, vol. 59, no. 10, pp. 721734, 2011.
[3] V. Pulkki, Spatial sound generation and perception by amplitude panning techniques, Ph.D. dissertation, Helsinki University of Technology, Helsinki, Finland,
2001.
[4] F. M. Fazi, Sound field reproduction, February 2010.
[5] E. Hulsebos, Auralization using wave field synthesis, Ph.D. dissertation, Delft
University of Technology, 2004.
[6] J. Sonke, Variable acoustic by wave field synthesis, Ph.D. dissertation, Delft
University of Technology, 2000.
[7] D. de Vries and E. M. Hulsebos, Auralization of room acoustics by wave field synthesis based on array measurements of impulse responses, 12th European Signal
Processing Conference (EUSIPCO),, no. 12, 2004, eng.
[8] F. Melchior, Investigations on spatial sound design based on measured room
impulse responses, Ph.D. dissertation, TU Ilmenau, 2011.
[9] E. G. Williams, Fourier Acoustics: Sound Radiation and Nearfield Acoustical
Holography. Academic Press, 1999.
Bibliography
87
[10] F. Zotter, Analysis and synthesis of sound- radiation with spherical arrays, Ph.D.
dissertation, University of Music and Performing Arts, 2009.
[11] O. Thiergart, Sound field analysis on the basis of a spherical microphone array
for auralization applications, 2007.
[12] W. P.A.Gauthier, A.Berry, An introduction to the foundations, the technologies
and the potential applications of the acoustic field synthesis for audio spatialization on loudspeaker arrays, in Proceedings of the Harvest Moon symposium on
multichannel sound, Montreal, Canada, 2004.
[13] E. Verheijen, Sound reproduction by wave field reproduction, Ph.D. dissertation,
Delft University of Technology, 1997.
[14] R. Boone, Design and development of a synthetic acoustic antenna for highly direction sound measurements, Ph.D. dissertation, Delft University of Technology,
1987.
[15] J. Meyer and T. Agnello, Spherical microphone array for spatial sound recording,
in Audio Engineering Society Convention 115, Oct 2003.
[16] J. Daniel, S. Moreau, and R. Nicol, Further investigations of high-order ambisonics and wavefield synthesis for holophonic sound imaging, in Audio Engineering
Society Convention 114, Mar 2003.
[17] B.Rafaely, Plane-wave decomposition of the sound field on a sphere by spherical
convolution, Journal of the Acoustical Society of America, vol. 116, no. 4 I, pp.
21492157, 2004.
[18] M. A. Poletti, Three-dimensional surround sound systems based on spherical
harmonics, J. Audio Eng. Soc, vol. 53, no. 11, pp. 10041025, 2005.
[19] K. Brandenburg, S. Brix, and T. Sporer, Wave field synthesis, in 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2009,
May 2009, pp. 14.
[20] D.T.Blackstock, Fundamentals of Physical Acoustics. John Wiley, 2000.
[21] S. Spors, Active listening room compensation for spatial sound reproduction
systems, Ph.D. dissertation, University of Erlangen-Nuremberg, 2006.
Bibliography
88
[22] A.D. Pierce, Acoustics-An Introduction to its physical principles and applications.
Acoustical Society of America, 1991.
[23] P.M.Morse and H.Feshbach, Methods of theoretical physics. New York: McGrawHill, 1953, vol. Part I.
[24] Morse and Feshbach, Methods of theoretical physics.
1953, vol. Part II.
[25] E. Skudrzyk, The foundations of acoustics: basic mathematics and basic acoustics.
Springer-Verlag, 1971.
[26] B. Girod, R. Rabenstein, and A. Stenger, Signals and systems. Wiley, 2001.
[27] J. Feldman, Solution of the wave equation by separation of variables, January
2007, lecture Notes.
[28] R. Collins, Mathematical Methods for Physicists and Engineers, ser. Dover books
on physics. Dover Publications, 1999.
[29] J. Driscoll and D. Healy, Computing fourier transforms and convolutions on the
2-sphere, Advances in Applied Mathematics, vol. 15, no. 2, pp. 202 250, 1994.
[30] T. Abhayapala and D. B. Ward, Theory and design of high order sound field
microphones using spherical microphone array, in Acoustics, Speech, and Signal
Processing (ICASSP), 2002 IEEE International Conference on, vol. 2, May 2002,
pp. II1949II1952.
[31] B. Rafaely, Analysis and design of spherical microphone arrays, Speech and Audio Processing, IEEE Transactions on, vol. 13, no. 1, pp. 135143, Jan 2005.
[32] J. Meyer and G. Elko, A highly scalable spherical microphone array based on an
orthonormal decomposition of the soundfield, in Acoustics, Speech, and Signal
Processing (ICASSP), 2002 IEEE International Conference on, vol. 2, May 2002,
pp. II1781II1784.
[33] G. B. Arfken, H.-J. Weber, and F. E. Harris, Mathematical Methods for Physicists.
Oxford: Academic, 2012.
[34] B. Rafaely, B. Weiss, and E. Bachmat, Spatial aliasing in spherical microphone
arrays, Signal Processing, IEEE Transactions on, vol. 55, no. 3, pp. 10031010,
Bibliography
89
March 2007.
[35] G. Galdo, Geometry-based channel modeling for multi-user mimo systems and
applications, Ph.D. dissertation, 2007.
[36] V. Lebedev, Quadratures on a sphere, {USSR} Computational Mathematics and
Mathematical Physics, vol. 16, no. 2, pp. 10 24, 1976.
[37] , A quadrature formula for the sphere of 59th algebraic order of accuracy,
Russian Academy of Sciences-Doklady Mathematics-AMS Translation, vol. 50,
no. 2, pp. 283286, 1995.
[38] V. Lebedev and D. Laikov, A quadrature formula for the sphere of the 131st
algebraic order of accuracy, vol. 59, no. 3, pp. 477481, 1999.
[39] V. Lebedev, Fortran code for lebedev grids. Internet resource, 2009.
[40] Z. Li, R. Duraiswami, E. Grassi, and L. Davis, Flexible layout and optimal cancellation of the orthonormality error for spherical microphone arrays, in Acoustics,
Speech, and Signal Processing, 2004. Proceedings. (ICASSP 04). IEEE International Conference on, vol. 4, May 2004, pp. iv41iv44 vol.4.
[41] A. J. Berkhout, A holographic approach to acoustic control, J. Audio Eng. Soc,
vol. 36, no. 12, pp. 977995, 1988.
[42] F. Melchior and S. Spors, Spatial audio reproduction: from theory to production,
in tutorial, 129th Convention of the AES, 2010.
[43] E. W. Stuart, Application of curved arrays in wave field synthesis, in Audio
Engineering Society Convention 100, May 1996.
[44] N. Bleistein, Mathematical Methods for Wave Phenomena. Academic Press, July
1984.
[45] Wave field synthesis techniques for spatial sound reproduction, in Topics in
Acoustic Echo and Noise Control, ser. Signals and Communication Technology,
E. HAnsler
and G. Schmidt, Eds. Springer Berlin Heidelberg, 2006, pp. 517545.
[46] E. Corteel, U. Horbach, and R. Pellegrini, Multichannel inverse filtering of multiexciter distributed mode loudspeakers for wave field synthesis, in Audio Engineering Society Convention 112, Apr 2002.
Bibliography
90
[47] U. Horbach, D. de Vries, and E. Corteel, Spatial audio reproduction using distributed mode loudspeaker arrays, in Audio Engineering Society Conference: 21st
International Conference: Architectural Acoustics and Sound Reinforcement, Jun
2002.
[48] A. J. Berkhout, D. de Vries, and P. Vogel, Acoustic control by wave field synthesis, The Journal of the Acoustical Society of America, vol. 93, no. 5, pp.
27642778, 1993.
[49] A. Avni, J. Ahrens, M. Geier, S. Spors, H. Wierstorf, and B. Rafaely, Spatial
perception of sound fields recorded by spherical microphone arrays with varying
spatial resolution, The Journal of the Acoustical Society of America, vol. 133,
no. 5, pp. 27112721, 2013.
[50] D.-M. et. al., A comparative study of spherical microphone arrays based on subjective assessment of recordings reproduced over different audio systems, in Proceedings of Forum Acusticum 2011, Aalborg, Danemark, Jun 2011, pp. 22272230.
[51] I. Recommendation, 1534-1: Method for the subjective assessment of intermediate quality level of coding systems, Tech. Rep., 2003.
[52] G. Enderlein, Brownlee, k. a.: Statistical theory and methodology in science and
engineering, Biometrische Zeitschrift, vol. 3, no. 3, pp. 221221, 1961.
[53] R. A. S. Fisher, Statistical methods for research workers. Edinburgh Oliver and
Boyd, 1938.
[54] R. V. Hogg and J. Ledolter., Engineering Statistics. New York: MacMillan, 1987.
[55] MATLAB, 2-way-anova, Internet, May 2014.
List of Figures
91
List of Figures
1.1
1.2
1.3
4
6
7
12
2.1
2.2
16
19
22
24
25
26
31
34
36
37
3.1
3.2
3.3
4.1
4.2
49
50
List of Figures
92
4.3
52
53
54
56
56
57
58
60
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
63
64
65
68
77
78
79
81
82
83
50
51
List Of Tables
93
List of Tables
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
MUSHRA scale . . . . . . . . . . . . . . . . . . . . . . . . .
Test conditions for perceptual evaluation of spatial aliasing .
Test conditions for perceptual evaluation of micophone noise
Test conditions for perceptual evaluation of positioning error
2-way ANOVA for Spatial Aliasing . . . . . . . . . . . . . .
2-way ANOVA analysis for azimuth error . . . . . . . . . . .
2-way ANOVA analysis for elevation error . . . . . . . . . .
2-way ANOVA analysis for Noise levels . . . . . . . . . . . .
2-way ANOVA analysis for Noise levels vs transform order .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
69
71
72
73
77
80
80
82
83
APPENDIX
94
APPENDIX
A Derivations
95
A Derivations
Ylm ()Ylm
0 ()dd = l0 l m0 m
(A.1)
m
where Ylm
0 ()d is the conjugate complex of Yl0 () and l0 l is the Kronecker delta,
which is defined as
1, if l0 = l
l0 l =
0, otherwise
(A.2)
Any arbitrary function on a sphere f ()can be expanded in terms of spherical harmonics [29, page 202]
f () =
l=0 m=l
X
X
flm (k)Ylm ()
(A.3)
A Derivations
96
here flm (k) are the complex constants. Equation A.3 is also called inverse spherical
Fourier transform. Further exploiting this mathematical expression and on multiplying
equation A.3 with Ylm and integrating over a unit sphere we get
Z
S2
l=0 m=l
X
X
f () Yl ()d =
=
S2
l=0 m=l
X
X
(A.4)
Z
flm (k)
S2
S2
f () Yl ()d =
l=0 m=l
X
X
(A.5)
flm (k)l0 l m0 m
with the help of equation A.2, finally we define the spherical Fourier cofficient flm (k)
as
Z
S2
f () Yl ()d =
l=0 m=l
X
X
(A.6)
flm (k)
f (, ) =
f (, )Ylm ()d = F T {f (, )}
l=0 m=l
X
X
(A.7)
flm Ylm () = F T 1 {flm }
Also the surface integral over the sourface of the sphere is described as
Z
d =
S2
sin dd
2
(A.8)
A Derivations
97
(r, ) (r, , )
Figure A.1: Spherical coordinate system and its relation to Cartesian coordinate system
A Derivations
98
x = r sin cos
y = r cos sin
z = r cos
y
= arctan
x
z
= arccos
x2 + y2 + z2
r = x2 + y2 + z2
(A.9)
where 0 2, 0 and 0 r
k is wave vector defined as
k2 = kx2 + ky2 + kz2
kx = k sin cos
(A.10)
ky = k cos sin
kz = k cos
k is the wave number, defined in the direction of propagation of the wave front of the
sound field.
In some literatures pressure and other quantities of sound wave are represented in terms
of and in some it is represent by wave number k, in both the cases its equivalent as
k = /c.
A Derivations
99
Figure A.2: Pressure field of a 1 kHz plane wave for different levels l using equation
2.46. The pressure field is shown for (x, z) plane with y = 0. [11]
Figure A.3: Pressure field of a 1 kHz plane wave for different levels l using equation
2.46. The pressure field is shown for (x, y) plane with z = 0. [11]
A Derivations
100
A Derivations
101
Declaration
I hereby certify that this thesis was created autonomously without using other than
the stated references. All parts which are cited directly or indirectly are marked as
such. This thesis has not been used in the same or similar forms in parts or total in
other examinations.
Signature
Theses
102
Theses
1. A complete understanding of spherical microphone array processing was developed in this thesis.
2. In this thesis we simulated and auralized the free field spherical microphone array
impulse response
3. Auralization part was done with wave field synthesis
4. Various errors which palgue the performance of spherical microphone array are
analysed
5. We design a listenig test for the perceptual evaluation of various error and artifacts
6. Investigate the effect of different orders on errors and look for patterns