0% found this document useful (0 votes)

120 views27 pages

Sound Source Localization

Sound Source Localization project report

Uploaded by

Srikanth Varanasi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views27 pages

Sound Source Localization

Sound Source Localization project report

Uploaded by

Srikanth Varanasi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Masters Thesis MEE04:20

Acoustic speech localization with

microphone array in real time

Mikael Swartling

Examensarbete
Teknologie Magisterexamen i Elektroteknik

Blekinge Tekniska Hogskola

Januari 2005

Blekinge Tekniska Hogskola

Sektionen for Teknik
Avdelningen for Signalbehandling
Examinator: Nedelko Grbic
Handledare: Nedelko Grbic

Acoustic speech localization with microphone

array in real time
Mikael Swartling
Blekinge Institute of Technology

Abstract
The purpose of this thesis is to evaluate and implement algorithms
for robust localization and tracking of moving acoustic sources in real
time using a microphone array. To identify inter-sensor delays, the
generalized cross correlation is used together with a filter bank. From
the inter-sensor delays, position is estimated using a linear intersection
algorithm. Position estimates are associated with tracks, which are filtered by a Kalman filter. Results from two real-room experiments are
presented to demonstrate the localization and tracking performance,
along with a discussion on real time implementation issues.

Contents
1 Introduction

2 Delay estimation

2.1

Signal model . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

The generalized cross correlation method . . . . . . . . . . . .

2.3

Angle of arrival . . . . . . . . . . . . . . . . . . . . . . . . . .

2.4

Multiple sensors . . . . . . . . . . . . . . . . . . . . . . . . . .

2.5

Optimizing the cross correlation function . . . . . . . . . . . .

3 Filter banks

4 Position estimation

4.1

Source localization problem . . . . . . . . . . . . . . . . . . . 11

4.2

Linear intersection . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Track association and filtering

5.1

Track association . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.2

Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 Experiments
6.1

Testing the angle of arrival . . . . . . . . . . . . . . . . . . . . 16

6.1.1

6.2

Bias and variance . . . . . . . . . . . . . . . . . . . . . 17

Testing the localization and tracking . . . . . . . . . . . . . . 17

6.2.1

Two fixed talkers . . . . . . . . . . . . . . . . . . . . . 18

6.2.2

Single moving talker . . . . . . . . . . . . . . . . . . . 18

7 Real time implementation

8 Conclusion and further development

List of Figures
1

Delay due to extra progagation distance. . . . . . . . . . . . .

Path of possible source locations. . . . . . . . . . . . . . . . .

Sensor arrangement and delays when using multiple sensors. .

Uniform DFT analysis filter bank. . . . . . . . . . . . . . . . . 11

Linear intersection. . . . . . . . . . . . . . . . . . . . . . . . . 14

Room with moderate echo. . . . . . . . . . . . . . . . . . . . . 22

Room with low echo. . . . . . . . . . . . . . . . . . . . . . . . 22

Bias of estimated angles. . . . . . . . . . . . . . . . . . . . . . 23

Standard deviation of estimated angles. . . . . . . . . . . . . . 24

Two speakers having a conversation. . . . . . . . . . . . . . . 25

Single speaker moving in a circle. . . . . . . . . . . . . . . . . 26

Single talker moving in a circle. . . . . . . . . . . . . . . . . . 26

Introduction

An array of microphones has the ability to be steered electronically to change

its directivity pattern to only receive sounds from certain directions. This
ability can be used to replace directed microphones, as it has the advantage
of rapidly changing its directivity pattern, allowing it to pick up new sources
and follow source movements. Instead of steering the arrays directivity pattern to a specific location, it can also be used to search for acoustic sources
by dynamically forming the directivity pattern to sweep over the surrounding
environment.
The problem of locating a source is often split into three parts; intersensor delay estimation, position estimation and tracking association and
filtering. The most important of these parts is a precise and robust algorithm
for inter-sensor delay estimation, since the delay estimates forms the base for
further calculations and location estimates. To work in real time, it must
also be computationally inexpensive to be able to process the signals as they
are sampled and to provide a continuous flow of inter-sensor delay estimates
to the location estimator.
All three parts will be discussed in this report. Experiments are also
performed to demonstrate the performance, along with a discussion on real
time implementation issues and finally, conclusions and possible further developments are given.

Delay estimation

2.1

Signal model

Given two spatially separated sensors (in this thesis, the sensors are microphones), the signal received from an acoustic source at one sensor will be
shifted in time relative the other sensor due to an extra propagation distance
from source to sensor. Figure 1 illustrates this delay where the source is
located in the near and far field, respectively. In the near field case, the
direction of arrival is different for the two sensors. In the far field case, the
direction of arrival can be considered parallel and will therefore be the same
for both sensors.
Assuming the relative attenuation between the two sensors is negligible,

(a) Near field source.

(b) Far field source.

Figure 1: Delay due to extra progagation distance.

the received signals x0 (t) and x1 (t) can be modelled as

x0 (t) = s (t 0 ) + n0 (t)
x1 (t) = s (t 1 ) + n1 (t)

(1)

where s(t) is the acoustic source signal, 0 and 1 are the propagation delays
from the source to the sensors and n0 (t) and n1 (t) are noise signals. The
noise received at the sensors are considered mutually uncorrelated and also
uncorrelated with the source signal. The relative delay between the sensors,
= 1 0 , is the delay caused by the extra propagation distance.
The task is to estimate the delay from finite size blocks of data from
x0 (t) and x1 (t). To track a talker, to locate new sources and alternate between
several sources quickly, a method to quickly estimate the delay is required.

2.2

The generalized cross correlation method

The method used to estimate inter-sensor delays in this thesis is based on

the generalized cross correlation method, described in [KC76]. The delay is
estimated by maximizing the cross correlation between the two signals x0 (t)
and x1 (t), and can be expressed as
= arg max
Rx0 x1 ( )

(2)

The cross correlation Rx0 x1 ( ) is related to the cross power spectrum Gx0 x1 ()
by the Fourier transform as
Z

Rx0 x1 ( ) =

Gx0 x1 () ej d

(3)

The cross power spectrum of x0 and x1 , Gx0 x1 (), is calculated as

Gx0 x1 () = X0 () X1 ()

(4)

where X0 () and X1 () are the Fourier transforms of x0 and x1 , respectively,

and denotes complex conjugate.
The generalized cross correlation is defined in [KC76] as
Rx0 x1 ( ) =

() Gx0 x1 () ej d

(5)

where () is a general weighting function. The generalized correlation

method known as phase transform, or PHAT, is obtained by setting the
weighting function to
P HAT () =

1
|Gx0 x1 ()|

(6)

This weighting function normalizes the absolute value of all coefficients in

the cross spectrum to unity, and uses only the phase information to calculate
the cross correlation.

2.3

Angle of arrival

When the time delay of arrival is estimated and the array geometry is known,
a direction of arrival can also be estimated. From a given delay, a path can
be calculated along which the source is located. It is not possible, using only
two sensors, to determine where along the path the source is located. The
path is a parabolic curve in two dimensions as illustrated by the dashed line
in figure 2(a). The curve is actually mirrored along the line connecting the
two sensors. However, only one half-space is concidered here; the source is
assumed to be located in front of the sensor array.
In the far field, the parabolic curve approaches a straight line. Assuming
the source is always located in the far field, it is possible to approximate the
6

(a) Near field source.

(b) Far field source.

Figure 2: Path of possible source locations. A source located in the near field
results in a parabolic curve of possible source locations (a), and
in the far field the parabolic path ca be approximated by a straight
line (b).

parabolic curve with a straight line, as shown in figure 2(b). The angle is
the angle or arrival for a distant source.
The angle of arrival can be calculated as
1

= sin

c
d fs

(7)

where c is the speed of sound, d is the distance between the two sensors,
fs is the sample rate and is the estimated delay between the two sensors
measured in samples. An estimate of the variance of the estimated angle is
[ADBS95]
]
V [
]
(8)
V [
cos2

2.4

Multiple sensors

To increase the accuracy of the delay estimate, multiple sensors can be used.
Here, the sensors are placed on a line, evenly spaced. Assuming a far-field
source, the sensor arrangement and their delays relative other sensors are as
in figure 3.
The SRP based algorithms, steered response power, are algorithms based
on steering a beamformer, searching for maximum power output. The type
7

mN-1

Figure 3: Sensor arrangement and delays when using multiple sensors.

of beamformer used is a delay-and-sum beamformer, which delays the output

signals from the individual sensors and them sums them together to form the
output of the beamformer.
A generalization of the GCC-PHAT is the SRP-PHAT algorithm, defined
as
= arg max

1 Z
N
2 N
X
X

P HAT () Gx0 x1 () ej(mn) d

(9)

n=0 m=n

where P HAT () is the weighting function defined in (6).

The SRP-PHAT algorithm maximizes the cross correlation between all
combinations of sensor pairs in the array. As the number of sensor increases,

the variance of the estimate decreases. For N sensors, there is a total of N2
pairs of sensors for which the sum of the cross correlation is being maximized.

2.5

Optimizing the cross correlation function

The optimization problem presented in (9) generally lacks a closed form

solution, so a numerical search method is used. The method used is the
Golden section search, described in [LRV01]. The Golden section search is a
one dimensional search method that searches for a maxima (or minima when
minimizing a function) between two end-points.
The first thing to do before optimizing is to determine the interval over
which to optimize. The relative delay between two sensors in the array
can never be larger than the delay caused by the distance between the two
sensors. The largest relative delay occurs when the source is located on the
8

line connecting the two sensors. Therefore, in (9), it is known that

d fs d fs
,

c
c

(10)

This is also the interval of for which (7) is defined, since the domain of
sin1 is [1, 1].
Assume the search interval for iteration i is [i , i ], where i < i . Two
new points, li and ri , are choosen such that i < li < ri < i . The search
interval is then updated depending on the function values at the points li and
ri . If f (li ) > f (ri ), the new search interval [i+1 , i+1 ] = [i , ri ], otherwise
[i+1 , i+1 ] = [li , i ].
By keeping the ratio bewteen all points constant for each iteration, the
inner points li and ri can be reused in the next iteration, not only as an
endpoint for the new search interval, but also as one of the new inner points.
Therefore, only a single new point and corresponding function value must be
calculated for each iteration. The ratio between the points can be expressed
as
ri i
li i
=
(11)
ri i
i i
The ratio is the Golden ratio, hence the name of the algorithm. The Golden
ratio is calculated as

3 5
=
0,3820
(12)
2
The algorithm for the Golden section search is shown in algorithm 1.
Eligible parameters in the algorithm are the search interval [, ] and the
tolerance . The algorithm returns the value that maximizes the function
f ( ) over the search interval, with a tolerance of units.
For the Golden section search to work, the function being optimized must
be unimodal; it must have one, and only one, maxima in the interval being
optimized. In general, the cross correlation is not unimodal. However, investigating the cross correlation for real recordings have shown that the cross
correlation can, in practice, under the circumstances given in this thesis, be
considered unimodal often enough for the Golden section search to be an
option. Sometimes the optimization returns a local maxima instead of the
global (in the range specified) maxima, but not often enough to notably affect
the general performance.

Algorithm 1 The Golden section search algorithm.

Require: < and > 0
Ensure: = arg max f ( )

1: l = + ( )
2: r = l + ( l)
3: fl = f (l)
4: fr = f (r)
5: while < do
6:
if fl < fu then
7:
=l
8:
l=r
9:
r = + ( )
10:
fl = fr
11:
fr = f (r)
12:
else
13:
r
14:
rl
15:
l + (r )
16:
fr fl
17:
fl f (l)
18:
end if
19: end while
20: if fl > fr then
21:
l
22: else
23:
r
24: end if

X(z)
z

X (z)

H (z)

-1

IFFT

N-1

H (z)

N-1

X (z)

Figure 4: Uniform DFT analysis filter bank.

Filter banks

The generalized cross correlation, described in section 2.2, estimates the intersensor delays using the cross power spectrum. The cross power spectrum
is calculated as shown in (4). Instead of calculating the discrete Fourier
transform of the signals x0 and x1 directly, a uniform DFT analysis filter
bank is used.
The signal x (n) is decomposed into a set of N subbands by the filter bank.
The filter bank consists of a set of bandpass filters derived from a prototype
filter. The prototype filter is a lowpass filter whose frequency response is
shifted in frequency domain, making it a bandpass filter. The prototype filter
is used to create one bandpass filter for each of the N subbands, with center
, n = 0 . . . N 1, for the n:th subband. After filtering, the
frequency at 2n
N
subband signals are decimated. If the sample rate of the subband signals are
decimated by a multiple of the number of subbands, N , an efficient polyphase
implementation is possible, as shown in figure 4

4
4.1

Position estimation
Source localization problem

From a set of N pairs of sensors {mi0 , mi1 }, i = 0 . . . N 1, the time delay

between the two sensors in the pair, given the knowledge about the position

for the two sensors, mi0 and mi1 , and the position of the source, s, is
|s mi0 | |s mi1 |
(13)
c
where c is the speed of sound. For each pair, there is an estimated time
delay i between the two sensors, and an estimated variance i . If the delay
estimates i are corrupted by uncorrelated, zero-mean gaussian noise, the
maximum likelihood estimate of the source location sM L is found by minimizing a least-square error function JM L (s) [BAS97].
T ({mi0 , mi1 } , s) =

sM L = arg min
JM L (s)
s
where
JM L (s) =

N
1
X
i=0

4.2

1
[
i T ({mi0 , mi1 } , s)]2
i2

(14)

(15)

Linear intersection

Minimizing the error function in (14) involves searching for a position s from
which the theoretical delays, as closely as possible, matches the measured
delays. Instead of using a numerical search method to find the location of
the source, a numerically less expensive closed-form solution is used instead.
The algorithm used is based on the Linear intersection algorithm described
in [BAS97], modified from three- to two-dimensional intersections.
Once the direction of arrival is calculated for each sensor pair, the intersection of all estimated directions of arrival, together with the sensor position,
can be calculated. Given the position of sensor pair i, mi , and its direction
of arrival, vi , any point pi on the line originating from the array location in
the direction vi can be described as
pi = mi + ti vi

(16)

where ti > 0, as shown in figure 5. pi also describes all possible locations of

the source as seen from the sensor pair. By using two pairs, {mi0 , mi1 } and
{mj0 , mj1 }, the source location can be found by calculating the intersection
of the lines pi and pj .
pi = pj mi + ti vi = mj + tj vj
ti vi tj vj = mj mi

(17)

On matrix form, the equation becomes

Vt = m
12

(18)

where

vi vj

(19)

and
m = mj mi

(20)

t = V1 m

(21)

Seeking t, the solution is

and the intersection point can then be calculated as
sij,LI = mi + ti vi = mj + tj vj

(22)

When using N > 2 sensor pairs, or more generally, sensor subarrays

when

multiple sensors are used per pair for increased accuracy, N2 possible
intersections can be calculated; one for each combination of 2 subarrays.
Assuming there are at least 2 subarrays, the final position can be estimated
as
1
NP
2 NP
sij,LI
i=0 j=i

sLI =
(23)
N
2

Since no information regarding propagation delay from the source to a

sensor subarray, or between subarrays, is available, problem arises when the
source is located near the line connecting the two subarrays or far away from
the subarray compared to the distance between them. In those cases the
direction of arrival vectors are almost parallel, and the matrix V in (21) is
badly conditioned, or even non-invertable.

Track association and filtering

This section describes the algorithm used for tracking sources from individual
positional estimates. Section 4 describes an algorithm to estimate a position
for the source given the time delay between sensors in a sensor array, and
using several sensor subarrays to estimate a position. The algorithm gives a
set of points sampled at a certain time interval. The positional estimates are
distorted by noise and needs to be filtered spatially.

p1
p0
v0

Figure 5: Linear intersection.

5.1

Track association

When there are multiple sources being located (for example, two or more talkers having a conversation), simply filtering the samples as they are calculated
is not an option. An algorithm to determine which source a sample belongs
to must be implemented, and only then can samples be filtered properly. The
track association algorithm is based on a method described in [SBS97].
A track is a state vector following a source. When a new sample is
calculated, one of the currently stored tracks is first associated with it. The
track associated with the sample is the nearest track, but the track must also
be within a certain distance from the sample.
If no track is good enough to be associated, a new track is created. An
association can fail because of two main reasons; the sample belongs to a
completely new source, or the sample was distorted by so much noise it fell
outside the acceptance region for the correct source. When a new track is
created, it is not yet known whether the sample is a new source being active,
or just a noise-corrupted sample from a current track. Therefore, all new
tracks are marked as potential tracks, so if no new samples falls within the
acceptance regions within a certain time, it can be assumed it was created
from a noise-corrupted sample and it will be dropped. However, if more
samples starts to fall within the acceptance region, it is assumed that the
track is indeed tracking an active source, and the track is promoted to an
active track.
A track associated with a sample is updated. The sample is added to
the list of samples for that track, and eventually filtered to smooth the path

formed by the samples.

When a track is not updated with new samples within a certain time,
the track is considered abandoned, and the track is dropped from the list
of potential or active tracks. A completed track is an active track that was
dropped. Potential track not yet promoted to active tracks are not considered
completed tracks when they are dropped. That is because a potential track
is a track that is not yet classified as being a real source.

5.2

Filtering

Filtering is performed using a Kalman filter. The source being tracked is

assumed to be humans talking, and since the source can move around, a
simple Newtonian motion model is used to model the motions of the talker.
Therefore, the state vector for the Kalman filter is
x
n =

xn yn x n y n

(24)

where xn and yn represents the two-dimensional position of the source, and

x n and y n the velocity, at iteration n.
The filter used is a one-step predictor as described in [Hay02]. The transition matrix F is
#
"
I2 T I2
F=
(25)
02
I2
and the measurement matrix C is
C=

I2 02

(26)

where In is an n n identity matrix, 0n is an n n zero-matrix and T is the

time since last update of the state vector. The filter is updated at constant
time intervals T , so the transition matrix F is also constant, and the inverse
of the transition matrix is
"

I2 T I2
02
I2

(27)

The correlation matrices for the process and measurement noise, Q1 and Q2
respectively, is
Q1 = q1 I4 , Q2 = q2 I2
(28)
where q1 and q2 are the variances of the process and measurement noise.

n,
The algorithm for estimating the sources state vector at iteration n, x
given the estimated position samples, yn , is show in algorithm 2. The initial
0 is the estimated position and velocity of the source at the
state vector x
time the Kalman filter starts tracking the source. The position is estimated
from the samples collected before the track was promoted to an active track
(see section 5.1) and the velocity is assumed to be zero. The initial predicted
state-error correlation matrix K0 = 04 .
Algorithm 2 Kalman filter based on one-step prediction.
1: for n = 1, 2, 3 . . . do
h
i1
2:
Gn = F Kn CH C Kn CH + Q2
3:
an = yn C
xn
n+1 = F
4:
x
xn + Gn an
5:
Kn+1 = F [Kn F1 Gn Kn ] FH + Q1
6: end for
Instead of iterating through all the samples at once with the for-loop in
algorithm 2, each new sample calculated will trigger a single pass in the loop.
This is necessary for real time filtering where the filtered result is needed as
new samples are calculated.

Experiments

6.1

Testing the angle of arrival

The algorithm to estimate the angle of arrival is evaluated using measurements with different types of sound and room environments and from different
angles relative the sensor array. The three scenarios are:
Speech in a room with low echo.
Speech in a room with moderate echo.
White gaussian noise in a room with low echo.
The speech used is pre-recorded speech of random phrases. The room is
of size 45 m. One wall have an acousting damper covering it, and the
other walls are unblocked walls, giving a moderate echo. Along the walls
are some tables with computer equipment and home entertainment systems,
16

speakers and some chairs. Figure 6 shows a general overview of the room, the
placement of the sensor and placement of the source in the different angles.
The source is placed in four angles; 0 , 22,5 , 45 and 67,5 . Figure 7 shows
the same room, but with acoustic dampers placed along the walls around the
sensor array to reduce the echo.
The sound is played using a speaker placed at the angles shown in figure 6
and 7, at a distance of 2 m away from the array. The sound is played
at normal speech level. Noise is present in the form of computer fans and
ventilation, and the signal to noise ratio at the sensors are about 15 dB.
The sample rate is 8 kHz. The array consists of 6 microphones with an
inter-sensor distance of 4 cm.
6.1.1

Bias and variance

Bias is the introduction of an offset in the estimated parameter compared

to the real parameter. Figure 8 shows the estimated angles for the different
scenarios. The performance is evaluated as a function of the number of
subbands in the DFT filter bank.
White noise is fairly accurate to locate. As the angle of arrival approaches
the edges and as the reverberation level increases, the bias also increases. By
using a high number of subbands and with a source not located at the edge
of a sensor array, the bias can be kept below 5 . That is roughly equivalent
to an offset of about 2,5 dm, 3 m away from the array.
The variance, or the standard deviation, of the estimate is a measurement
of how much a specific sample generally deviates from the average value.
Figure 9 shows the deviation measured at different angles for the different
scenarios.
As with bias, the variance of white noise is very low. For speech, the
variance is about the same for low and moderate echo as long as the source
is not located near the edge of the sensor array.

6.2

Testing the localization and tracking

The localization and tracking algorithms are tested in the same room as
before. Two scenarios are tested:
Two fixed talkers having a conversation.
Single talker moving in a circle.
17

In both scenarios, the sample rate is 8 kHz and 512 subband filter bank
is used.
6.2.1

Two fixed talkers

The scenario setup is given in figure 10. The distance between the two
subsensor arrays is 1,5 m, and the two talkers are located 1,7 m out from the
arrays.
The scenario simulates two talkers having a conversation. The test consists of three phases. They begin by speaking one at a time for about 20 s
each. Then they start talking for 5 s each to simulate more rapid changes in
the location estimates, and in the last phase they talk at the same time to
see how the algorithms handle two simultaneous sources.
Figure 11 shows the result from the evaluation after track association
and filtering. Figure 11(a) shows the x and y position components over time.
The first two phases pass without problems, the sources are clearly separated
and located. In the third phase, the algorithm can find two separate sources
and can track them independently, although tracks are sometimes lost and
recreated. Figure 11(b) shows the positions of the sources as a view from
above.
6.2.2

Single moving talker

The setup in this scenario is shown in figure 12. The distance between the
sensor subarrays is, as in the previous scenario, 1,5 m. The talker is now
moving in a circle, about 1,8 m out from the arrays. The result from this
evaluation is shown in figure 13, where figure 13(a) shows the x and y position
components over time and figure 13(b) the position from above.

Real time implementation

The algorithms were first implemented and evaluated in Matlab. When

the algorithms was working properly, the Matlab M-code was translated,
by hand, to C++. Around the translated code, an interface was implemented
for interaction with the user. The program is written for the the Windows
platform, using the ASIO standard for communication with sound recording equipment. Because everything were thoroughly tested in Matlab, the

translation went smooth. The general structure of the code in both Matlab
and C++ are similar, so the translation was basically a line-by-line translation.
The main concern in the beginning was the available CPU time. It was
later found that it wasnt really the biggest problem in implementing the
algorithms in real time. A standard-equiped Pentium 4 at 1,5 GHz could
easily handle 2-3 arrays with 4-6 sensors per array, at sample rates up to 16
kHz, enough to sample speech at good quality, and filter banks with 1024
subbands. As new computers have significantly more computing power, the
CPU time is not a problem unless the arrays becomes too large and too many.

Conclusion and further development

Different algorithms was first evaluated to estimate the angle of arrival.

Other than the Steered response power algorithm described in this thesis,
the algorithms tried initially was the following.
Using the cross correlation calculated in time domain and search for a
peak in the cross correlation.
Using an LMS-filter where the adaptive filter is used to estimate the
delay between a signal from a reference sensor and the other sensors.
The slope of the phase response of the filter determines the delay. Ideally, the impulse response of the filter is a delayed -impulse, and the
phase response is a straight line.
Estimating the slope of the phase of the cross power spectrum, as described in [ADBS95]. Ideally, only a delay is present, and the cross
power spectrum is on the form ej .
Except for the first, using the cross correlation calculated in time domain,
they all work well on synthetic data. The cross correlation calculated in time
domain did not have enough resolution as the delay could only be estimated
as multiples of the sampling period. When real recorded data was used, the
LMS-filter and the cross power spectrum method was too inaccurate when
estimating the slope of the phase.
For speech in reverberant rooms, only the SRP algorithm used in this
thesis worked well enough to be used in practice. Together with the PHATweighting function in the general cross correlation, the SRP-PHAT algorithm
forms a robust method of estimating the angle of arrival for a sensor array.
19

It it also a good choise for real time applications, as its doesnt require
much computing power compared to whats available in a standard desktop
computer.
The filter bank was also a huge improvement compared to only using the
DFT. The filter bank forms a time-averaged spectrum, making the important phase information less variant for the inter-sensor delay estimator. The
computational complexity of the filter bank is higher, but well within the
limits for real time applications and the improved precision was well worth
it.
The linear intersection, a closed-form algorithm, is computationally very
efficient. By associating samples with tracks, and spatially filtering the
tracks, the location algorithms is able to quickly locate and track multiple
sources; not just alternating sources, but also, to some extent, simultaneous
sources.
Further, the algorithms can be improved with smart acoustic detectors
and classificators to classify sounds and locate only certain types of events
(or ignore them), such as tracking speech only or locating noise sources. The
method for detecting multiple sources can also be improved. The current
implementation relies on the two sources being at about the same signal
power level at the subarrays.

References
[ADBS95] John E. Adcock, Joseph H. DiBiase, Michael S. Brandstein, and
Harvey F. Silverman. Practical issues in the use of a frequencydomain delay estimator for microphone-array applications, January 1995.
[BAS97]

Michael S. Brandstein, John E. Adcock, and Harvey F. Silverman. A closed form location estimator for use with room environment microphone arrays. IEEE Transaction on Speech and Audio
processing, 5(1):4550, January 1997.

[Hay02]

Simon Haykin. Adaptive filter theory. Prentice Hall, fourth edition, 2002.

[KC76]

Charles H. Knapp and G. Clifford Carter. The generalized correlation method for estimation of time delay. IEEE Transaction on
Acoustics, Speech and Signal Processing, 24(4):320327, August
1976.

[LRV01]

Jan Lundgren, Mikael Ronnqvist, and Peter Varnblad. Linj

ar och
icke-linjar optimering. Studentlitteratur, 2001.

[SBS97]

Douglas E. Sturim, Michael S. Brandstein, and Harvey F. Silverman. Tracking multiple talkers using microphone-array measurements. IEEE Transaction on Acoustics, Speech and Signal
Processing, 1:371374, 1997.

200 cm

0=0

1=22,5
2=45
3=67,5

Figure 6: Room with moderate echo.

200 cm

0=0

1=22,5
2=45
3=67,5

Figure 7: Room with low echo.

45
Speech, moderate echo
Speech, low echo
Noise, low echo

20
15

Speech, moderate echo

Speech, low echo
Noise, low echo

Angle of arrival [degrees]

35
10
5
0
5
10

30
25
20
15
10

15
5
20
64

128

256

512

1024

0
64

2048

128

256

Subbands

512

1024

2048

Subbands

(a) Real angle is 0 .

(b) Real angle is 22,5 .

Speech, moderate echo

Speech, low echo
Noise, low echo

65
60

Speech, moderate echo

Speech, low echo
Noise, low echo

Angle of arrival [degrees]

80
55
50
45
40
35

75
70
65
60
55

30
50
25
64

128

256

512

1024

45
64

2048

128

256

Subbands

512

1024

Subbands

(c) Real angle is 45 .

(d) Real angle is 67,5 .

Figure 8: Bias of estimated angles.

2048

Speech, moderate echo

Speech, low echo
Noise, low echo
Standard deviation [degrees]

Standard deviation [degrees]

Speech, moderate echo

Speech, low echo
Noise, low echo

128

256

512

1024

2048

128

256

Subbands

512

(a) Standard deviation at 0 .

Speech, moderate echo

Speech, low echo
Noise, low echo
Standard deviation [degrees]

Speech, moderate echo

Speech, low echo
Noise, low echo
Standard deviation [degrees]

2048

(b) Standard deviation at 22,5 .

1024

Subbands

128

256

512

1024

2048

Subbands

128

256

512

1024

Subbands

(c) Standard deviation at 45 .

(d) Standard deviation at 67,5 .

Figure 9: Standard deviation of estimated angles.

2048

75 cm 75 cm

Speaker A
y-axis
x-axis

Speaker B
150 cm

Figure 10: Two speakers having a conversation.

3
Position, x [m]

0
2
0

60
Time [s]

100

Position, y [m]

1
120

Position, y [m]

1
2

60
Time [s]

100

120

(a) x and y values as a function of time.

0
Position, x [m]

(b) x and y values against eachother.

Figure 11: Two speakers having a conversation.

75 cm 75 cm

y-axis
x-axis

150 cm

Figure 12: Single speaker moving in a circle.

3
Position, x [m]

0
2
0

20
Time [s]

Position, y [m]

1
30

Position, y [m]

1
2

20
Time [s]

(a) x and y values as a function of time.

0
Position, x [m]

(b) x and y values against eachother.

Figure 13: Single talker moving in a circle.

V8 Engine Project
No ratings yet
V8 Engine Project
12 pages
Ata 71-80 CFM56 - 5B
100% (1)
Ata 71-80 CFM56 - 5B
500 pages
Sound Source Ion Using LabVIEW
No ratings yet
Sound Source Ion Using LabVIEW
63 pages
Accuracy of Phyphox Sonar
100% (1)
Accuracy of Phyphox Sonar
4 pages
Haas Mill Programming Manual PDF
No ratings yet
Haas Mill Programming Manual PDF
115 pages
Theory Photoacoustics
100% (1)
Theory Photoacoustics
8 pages
Notes
No ratings yet
Notes
146 pages
Transfer Function Method of Measuring In-Duct Acoustic Properties. I. Theory
No ratings yet
Transfer Function Method of Measuring In-Duct Acoustic Properties. I. Theory
8 pages
Principles of Aperture and Array System Design Part 2
No ratings yet
Principles of Aperture and Array System Design Part 2
238 pages
Modi Ed Ohio's Curves: A Rapid Estimation of Compaction Curves For Coarse - and Fine-Grained Soils
100% (1)
Modi Ed Ohio's Curves: A Rapid Estimation of Compaction Curves For Coarse - and Fine-Grained Soils
12 pages
AEEC432 Rectilinear Control
No ratings yet
AEEC432 Rectilinear Control
46 pages
Engine Driven Miller D302K3-12 Technical Manual
100% (1)
Engine Driven Miller D302K3-12 Technical Manual
96 pages
Integrated Construction Information PDF
100% (3)
Integrated Construction Information PDF
419 pages
Beam Forming Using Conformal Microphone Arrays - Thesis
100% (7)
Beam Forming Using Conformal Microphone Arrays - Thesis
125 pages
Acoustic Sensors
No ratings yet
Acoustic Sensors
106 pages
Time Frequency Analysis Tutorial
No ratings yet
Time Frequency Analysis Tutorial
19 pages
DSP Using Matlab® - 4
88% (8)
DSP Using Matlab® - 4
40 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
13 pages
Spectrogram
No ratings yet
Spectrogram
17 pages
Digital Processing of Speech Signals (Rabiner & Schafer 1978)
No ratings yet
Digital Processing of Speech Signals (Rabiner & Schafer 1978)
265 pages
Assessment of Sub-Sea Acoustic Noise and Vibration From Offshore Wind Turbines
No ratings yet
Assessment of Sub-Sea Acoustic Noise and Vibration From Offshore Wind Turbines
72 pages
Review Parametric Acoustics
No ratings yet
Review Parametric Acoustics
9 pages
Shallow Water Acoustic Networks
No ratings yet
Shallow Water Acoustic Networks
7 pages
Velocity Probe Work Instruction
No ratings yet
Velocity Probe Work Instruction
9 pages
FIB Theory
No ratings yet
FIB Theory
38 pages
Harrier in India
100% (1)
Harrier in India
19 pages
1.2 Delay Laws, or Focal Laws
No ratings yet
1.2 Delay Laws, or Focal Laws
5 pages
Underwater Communications: Milica Stojanovic Massachusetts Institute of Technology Millitsa@mit - Edu
No ratings yet
Underwater Communications: Milica Stojanovic Massachusetts Institute of Technology Millitsa@mit - Edu
16 pages
Earth Tides and Tides On Its Moon
No ratings yet
Earth Tides and Tides On Its Moon
34 pages
Arctic Engineering
No ratings yet
Arctic Engineering
13 pages
ITC Hydrophones Applications
No ratings yet
ITC Hydrophones Applications
1 page
QURESHI Adaptive Equalization
100% (1)
QURESHI Adaptive Equalization
39 pages
Phase Plane Analysis
No ratings yet
Phase Plane Analysis
83 pages
2004 S.W. Rienstra & A. Hirschberg-Acoustics
No ratings yet
2004 S.W. Rienstra & A. Hirschberg-Acoustics
296 pages
Tornado
No ratings yet
Tornado
51 pages
Introduction To Ocean Remote Sensing
No ratings yet
Introduction To Ocean Remote Sensing
27 pages
'R' 'r1/21' 'R - P' 'Pressure Reflection Coefficient Vs r1/r1'
No ratings yet
'R' 'r1/21' 'R - P' 'Pressure Reflection Coefficient Vs r1/r1'
4 pages
INSTRUMENT-V5 System 2000 MTU
No ratings yet
INSTRUMENT-V5 System 2000 MTU
191 pages
Acoustic Modem Introduction
No ratings yet
Acoustic Modem Introduction
13 pages
Traffic Detector Handbook Third Edition Volume I PDF
No ratings yet
Traffic Detector Handbook Third Edition Volume I PDF
236 pages
Application of Acoustic Imaging
No ratings yet
Application of Acoustic Imaging
13 pages
Environmental Geophysics: Techniques, Advantages and Limitations
No ratings yet
Environmental Geophysics: Techniques, Advantages and Limitations
49 pages
Mechanical Waves and Sound
100% (1)
Mechanical Waves and Sound
46 pages
7.5 GPR Vs SeismicReflection
No ratings yet
7.5 GPR Vs SeismicReflection
16 pages
(Advances in Geophysics 2, Part A) W.H.K. Lee and S.W. Stewart (Eds.) - Principles and Applications of Microearthquake Networks (1955, Academic Press) PDF
No ratings yet
(Advances in Geophysics 2, Part A) W.H.K. Lee and S.W. Stewart (Eds.) - Principles and Applications of Microearthquake Networks (1955, Academic Press) PDF
299 pages
Engineering Drawing
No ratings yet
Engineering Drawing
42 pages
0016-7606 (1988) 100 1181 Piujot 2.3.co 2 PDF
No ratings yet
0016-7606 (1988) 100 1181 Piujot 2.3.co 2 PDF
25 pages
GIS Project Report
No ratings yet
GIS Project Report
16 pages
Hel. (Frank - Fahy - and - Paolo - Gardonio - (Auth.) ) - Sound - and PDF
No ratings yet
Hel. (Frank - Fahy - and - Paolo - Gardonio - (Auth.) ) - Sound - and PDF
7 pages
Imaging and Detectors For Medical Physics Lecture 1: Medical Imaging
No ratings yet
Imaging and Detectors For Medical Physics Lecture 1: Medical Imaging
29 pages
Laser Ultrasonics Techniques and Applications Drain, L E Z Library
100% (1)
Laser Ultrasonics Techniques and Applications Drain, L E Z Library
465 pages
Green House Effect and Sea Level Rise
No ratings yet
Green House Effect and Sea Level Rise
28 pages
Lead Free Wave Soldering Project: Performance of Lead Free Through-Hole Electrical Interconnects
100% (1)
Lead Free Wave Soldering Project: Performance of Lead Free Through-Hole Electrical Interconnects
8 pages
1underwater Acoustics Course
No ratings yet
1underwater Acoustics Course
99 pages
Implementation of An Acoustic Localization Algorithm For Video Camera Steering
No ratings yet
Implementation of An Acoustic Localization Algorithm For Video Camera Steering
14 pages
3-D Localization of An Underwater Sound Source (Pinger) Using A Passive SONAR System
No ratings yet
3-D Localization of An Underwater Sound Source (Pinger) Using A Passive SONAR System
5 pages
GCC
No ratings yet
GCC
14 pages
Matlab Sound Locatization
No ratings yet
Matlab Sound Locatization
86 pages
TFG-Carlos Vidal
No ratings yet
TFG-Carlos Vidal
42 pages
Estimating Time Delay Using GCC For Speech Source Localisation
No ratings yet
Estimating Time Delay Using GCC For Speech Source Localisation
7 pages
Sound Localization Using Microphone Arrays: Anish Chandak 10/12/2006 COMP 790-072 Presentation
No ratings yet
Sound Localization Using Microphone Arrays: Anish Chandak 10/12/2006 COMP 790-072 Presentation
33 pages
Iterative Compensation of Microphone Array and Sound Source Movements Based On Minimization of Arrival Time Differences
No ratings yet
Iterative Compensation of Microphone Array and Sound Source Movements Based On Minimization of Arrival Time Differences
5 pages
Acoustic Source Localization and Beamforming: Theory and Practice
No ratings yet
Acoustic Source Localization and Beamforming: Theory and Practice
13 pages
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Baby Alarm Proposal
No ratings yet
Baby Alarm Proposal
2 pages
Java Programming
80% (5)
Java Programming
200 pages
Introduction To Wavelets and Wavelet Transforms - A Primer, Brrus C. S., 1998.
100% (3)
Introduction To Wavelets and Wavelet Transforms - A Primer, Brrus C. S., 1998.
281 pages
Digital Exam Pad
No ratings yet
Digital Exam Pad
1 page
Finding The Spectrum of A Sinusoidal Signal: Program
No ratings yet
Finding The Spectrum of A Sinusoidal Signal: Program
63 pages
TMP ScoreCardV2 116056284
No ratings yet
TMP ScoreCardV2 116056284
2 pages
Ch3new PDF
No ratings yet
Ch3new PDF
73 pages
8086 Bios and Dos Interrupts (IBM PC)
100% (1)
8086 Bios and Dos Interrupts (IBM PC)
23 pages
8086 Programming
71% (7)
8086 Programming
16 pages
Young's Double Slit
No ratings yet
Young's Double Slit
2 pages
Datasheet - Solargiga 560W JMPV-X1 72
No ratings yet
Datasheet - Solargiga 560W JMPV-X1 72
2 pages
JRD32V - 80W
No ratings yet
JRD32V - 80W
1 page
Detec%on and Es%ma%on Theory: Class Notes Ell 719
No ratings yet
Detec%on and Es%ma%on Theory: Class Notes Ell 719
29 pages
HAECO EASA Flyer
No ratings yet
HAECO EASA Flyer
4 pages
Ajax - RSS Atom Feed
No ratings yet
Ajax - RSS Atom Feed
24 pages
Iron
No ratings yet
Iron
7 pages
Canadair Ctiauentjer: Section 1
No ratings yet
Canadair Ctiauentjer: Section 1
35 pages
WM Rbii Racon
No ratings yet
WM Rbii Racon
2 pages
Home Design: Kitchen Living Room Bathroom Bedroom
No ratings yet
Home Design: Kitchen Living Room Bathroom Bedroom
7 pages
TB-1, FB-1: Beam Sections
No ratings yet
TB-1, FB-1: Beam Sections
1 page
Manual Varioguard
No ratings yet
Manual Varioguard
11 pages
Chapter 5 Multimedia Database System
No ratings yet
Chapter 5 Multimedia Database System
47 pages
PWM DC Motor Controller Using MOSFETs and IR2110 H
100% (1)
PWM DC Motor Controller Using MOSFETs and IR2110 H
3 pages
S&R POWER@Freeindicatores
No ratings yet
S&R POWER@Freeindicatores
3 pages
Unox Service Manual Series 5E and 5E Advance - EN
No ratings yet
Unox Service Manual Series 5E and 5E Advance - EN
104 pages
ARCH 400 Lecture Notes
No ratings yet
ARCH 400 Lecture Notes
10 pages
Open Ended Hydraulics Laboratory PDF
No ratings yet
Open Ended Hydraulics Laboratory PDF
15 pages
ASTM C115 - 2010e1
No ratings yet
ASTM C115 - 2010e1
8 pages
Equallogic Ps6500e Specsheet
No ratings yet
Equallogic Ps6500e Specsheet
2 pages
Comparatives and Superlatives Treasure Hunt Homework - Ver - 3
No ratings yet
Comparatives and Superlatives Treasure Hunt Homework - Ver - 3
3 pages
Mughal Architecture
No ratings yet
Mughal Architecture
6 pages
L12 - Tool Failure Tool Life
No ratings yet
L12 - Tool Failure Tool Life
28 pages
Syllabus
No ratings yet
Syllabus
3 pages
Defects of Rolling
No ratings yet
Defects of Rolling
12 pages
Raw Materials Delivery Log
No ratings yet
Raw Materials Delivery Log
40 pages