0% found this document useful (0 votes)

8 views8 pages

University of Groningen: 10.1109/AVSS.2017.8078461

Uploaded by

ruguoisdorn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views8 pages

University of Groningen: 10.1109/AVSS.2017.8078461

Uploaded by

ruguoisdorn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

University of Groningen

A real-time system for audio source localization with cheap sensor device
Saggese, Alessia; Strisciuglio, Nicola; Vento, Mario; Petkov, Nicolai

Published in:
2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017

DOI:
10.1109/AVSS.2017.8078461

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from
it. Please check the document version below.

Document Version
Publisher's PDF, also known as Version of record

Publication date:
2017

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Saggese, A., Strisciuglio, N., Vento, M., & Petkov, N. (2017). A real-time system for audio source
localization with cheap sensor device. In 2017 14th IEEE International Conference on Advanced Video and
Signal Based Surveillance, AVSS 2017 Article 8078461 Institute of Electrical and Electronics Engineers
Inc.. https://doi.org/10.1109/AVSS.2017.8078461

Copyright
Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the
author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license.
More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne-
amendment.

Take-down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the
number of authors shown on this cover page is limited to 10 maximum.

Download date: 24-12-2024

A real-time system for audio source localization with cheap sensor device

Alessia Saggese1 , Nicola Strisciuglio2 , Mario Vento1 , Nicolai Petkov2

1
University of Salerno - DIEM, Italy
2
University of Groningen - JBI, The Netherlands
[email protected], [email protected]

Abstract organized in two groups depending on the acquisition de-

vice that they employ. In the first group methods for bin-
We propose an architecture for real-time audio source lo- aural localization are present, which use two microphones
calization based on the integration of localization method- as input devices. Such methods are based on the compu-
ologies within a framework that employs a cheap acquisi- tation of bio-inspired cues: the interaural level difference
tion sensor. The architecture that we present takes as input (ILD) and the interaural time difference (ITD). The former
the audio signals from two calibrated microphones. Then, measures the decay of the volume of the sound with dis-
it computes biological-inspired features of the sound signal tance. It also considers the absorption effect introduced by
and estimates its direction by means of a Gaussian Mixture the head that stops the propagation of the sound waves [23].
Model estimator. We carried out an extensive experimen- The latter concerns the differences in the arrival time of the
tal analysis on four data sets, one of which we realized and sound waves between the two microphones. The compu-
made publicly available. We evaluated several characteris- tation of ILD is generally approached by considering the
tics of the sound localization architecture and its use in real energy ratio between the two audio streams at frequency
scenarios. higher than 1500 Hz [1]. The computation of ITD is in-
stead based on the evaluation of the cross-correlation of the
two audio streams at frequency lower than 1500 Hz [14].
1. Introduction The estimation of the auditory cues is done with a prob-
abilistic approach in [23] and [15] or by comparing input
When we perceive sounds, we simultaneously identify values with theoretically generated curves in [7]. The lat-
their direction and are able to recognize the type of sound. ter approach shows less flexibility and lower robustness to
From psychological studies we know that humans feel more noise with respect to the former ones. Other methods in-
comfortable when the sound source (e.g. a speaker) can be tegrate audio and visual information to improve the relia-
located accurately. Generally, in the case of surveillance ap- bility of localization [18, 8]. In [17, 19], auditory epipolar
plications, we can say that a system that knows the source of geometry, inspired by epipolar geometry in stereo vision,
a sounds provides better information to improve the safety was introduced. An artificial human-like dummy head has
of the environment. been proposed for binaural hearing in telepresence opera-
In the last years, applications of audio source localiza- tions [21].
tion received a growing interest from research and industry
in the field of intelligent audio surveillance with the aim In the second group, methods that employ array of more
of localizing the sound source of hazardous events. Audio than two microphones are collected. Such methods aim at
analytic systems can be deployed together with and are a improving the reliability of binaural localization approaches
complementary tool to existing video surveillance infras- and are designed to perform 3-dimensional localization. A
tructures. Many IP surveillance cameras are, indeed, al- localization system that uses 4 microphones arranged on the
ready equipped with or ready to be connected to a micro- surface of a spherical robot head was suggested in [9]. The
phone, making possible a combined analysis of audio and time-delay of arrival was computed on 6 pairs of micro-
video streams [20]. phones and the location estimations were combined so as to
Two important applications in intelligent audio surveil- improve the reliability of the decision. In [22], the estima-
lance are abnormal event detection [3, 4] and sound source tion of ITD is performed by cross-correlation of the audio
localization [13]. A comprehensive review of methods for signals for an array of 8 microphones in order to perform
audio surveillance has been recently published [2]. 3-dimensional sound source localization. A study about the
Existing methods for sound source localization can be optimization of the number of microphones in the array was

978-1-5386-2939-0/17/$31.00 2017
c IEEE IEEE AVSS 2017, August 2017, Lecce, ITALY
(a) (b) (c) (d) (e) (f)

azimuth estimator
data acquisition gammatone ILD/ITD GMM
ﬁlters estimator
kinect sensor

θe training model
results
visualization

(i) (h) (g)

Figure 1: Architectural overview of the localization system. (a) A Kinect sensor (b) captures the audio signal. The localization
method (c,d,e,g,h) is composed of the modules represented within the two pairs of straight lines, which perform a sound
direction estimation at frame level. A (f) azimuth estimator integrates the frame-level decisions over a time window Wt and
the results are presented to the user through a result visualization module.

presented in [6]. The authors exploit redundancy of spatial and discuss the results that we achieved in the experiments.
information to scale the system linearly with the number Finally, we draw conclusions in Section 4.
of microphones. Beamforming techniques and Kalman fil-
ter have been used to reduce the errors in tracking multiple 2. System Architecture
moving speakers in noisy and echoic environments [16].
Such methods are more accurate and robust to noise than In Figure 1, we depict the architectural overview of the
binaural localization methods, but require larger hardware proposed system. The presented framework integrates a
resources for processing the input signals. cheap sensor device, namely the microphone array of the
In this paper, we propose a real-time architecture for Kinect sensor, with a localization method based on proba-
sound source localization based on a cheap microphone ar- bilistic estimation of sound directions [15]. Together with
ray, namely the audio acquisition card provided together the processing architecture, we propose an integration rule
with the Kinect sensor. Our contribution is in the design of the short-time estimations that improves the reliability of
and implementation of a system for hardware-software inte- the overall system.
gration of the estimates at frame-level provided by the bin-
aural sound source localization method proposed in [15]. 2.1. Localization method
The proposed architecture is modular and, in principle, can The considered localization method is inspired by how
be used together with any binaural localization method. the human auditory system processes sounds. The cochlea
The real-time implementation of the system and the use membrane in the inner ear vibrates according to the en-
of a cheap audio sensor give the possibility of deploying ergy contained in frequency sub-bands of the input sound.
large installations of the localization system with reduced In order to model this behavior, a Gammatone filterbank
costs. Furthermore, the proposed solution provides means was adopted. As suggested in [5], the input signals to each
for distributed analysis of sound sources and eventually to microphone are decomposed into Nc = 32 auditory chan-
track them within the monitored areas. We evaluate the nels using a fourth-order gammatone filterbank, where the
performance and the characteristics of the proposed archi- channel center frequencies are distributed on the equiva-
tecture in different conditions by carrying out experiments lent rectangular bandwidth (ERB) scale between 80 Hz and
on four data sets, namely the Surrey, Oldenburg, Aachen 5 kHz. Successively, binaural cues are estimated using a
and MIVIA data sets (the final name will be released after rectangular window of 20ms at a sampling frequency of
anonymous submission and revision). In this work, we con- f s = 44.1kHz. The impulse response gfc (t) of a Gam-
structed and made public available for benchmark purpose matone filter at the time t and central frequency fc is:
the OUR localization data set1 .
The paper is organized as follows: in Section 2 we gfc (t) = D tη−1 cos(2πfc t + φ)e−2πtb(fc ) u(t), (1)
present the proposed hardware and software architecture,
while in Section 3 we describe the data sets that we used where D is an arbitrary scaling constant, η the order of the
filter (fourth in our case), φ is the phase, u(t) is the unit
1 The data set is available at the url http://OUR.– step function (1 for t > 0 and 0 otherwise), and b(fc ) is a
function which determines the bandwidth for a given center
frequency. It is formally defined as: b(fc ) = a(24.7 +
0.108fc ), where a is a proportionality constant.
Given the response of the Gammatone filterbank for
both microphones, the the Interaural Level Difference (ILD)
and the Interaural Time Difference (ITD) are computed.
ILD and ITD contain complementary information about the Figure 2: Setup of the kinect sensor for the realization of
sound source position with respect the microphone array. the MIVIA data set.
Thus they are combined into a two-dimensional feature vec-
tor, called binaural vector, in order to represent the single
frame within a specific Gammatone channel. 3. Experimental analysis
Finally, a Gaussian mixture model (GMM) is used to 3.1. Data sets
estimate the position of the sound source from the set of
We performed experiments on four publicly available
binaural feature vectors. In particular, considering that the
data sets, namely the Surrey [10], Oldenburg [12],
binaural features corresponding to different channels tend
Aachen [11] and MIVIA data sets. The first three data sets
to cluster in the feature space, the azimuth-dependent pdf is
contain sound sources at different angles and distances from
modeled by summing up the superimposed Gaussian com-
the microphone array and sounds recorded in environments
ponents. During a preliminary training phase, a set of train-
with different reverberation.
ing sounds is used to learn the GMM model, which will be
used during the operating phase for estimating the source The Surrey data set was recorded by a Cortex Instru-
direction. For more details about the localization method ments Mk.2 Head and Torso Simulator (HATS). The loud-
we refer the reader to [15]. speakers were placed around the HATS on an arc with a
1.5 m radius between −90◦ and +90◦ . The sounds were
recorded at intervals of 5◦ and four different room config-
urations were considered. For each room a reverberation
2.2. Integration of azimuth estimation time RT60 was estimated according to BS EN ISO 3382
standard.
The GMM provides an estimation of the direction of the The Oldenburg data set [12] was recorded by a HATS
sound source for small overlapped (50% of their length) in five semi-controlled environments.
frames of the input audio signal of duration 20ms. The com- The Aachen data set [11] was recorded in two rooms,
bination of the estimation for the N channels of the Gam- with sound sources at a distance up to 10m from the micro-
matone filterbank provides a reliable measure of short-time phones. The characteristics of the rooms for such data sets
characteristics of the input signal. We integrate the short- are reported in Table 1.
time estimations on a larger time scale, using a sliding time We recorded the MIVIA data set and made it available
window of length Wt that forward shifts on the audio signal for research purposes. It was acquired with a double aim:
by half of its size. This allows for more robust estimations (1) to evaluate the performance of localization systems with
to noise and outliers. The direction of the input sound is cheap and less accurate devices, instead of very expensive
thus taken as the one of the audio frame with the highest devices which are more difficult to be used in real applica-
estimated GMM likelihood within the time window Wt . tions; (2) to evaluate the performance of the proposed ap-
proach by varying the distance between the microphones.
We considered two different environments, namely a living
2.3. System implementation room 6 × 4 × 2, 7 m (M.LR) and a laboratory of our univer-
sity 8 × 8 × 7 m (M.L). The sounds were recorded by using
The proposed architecture acquires the audio signal the microphone array of a Kinect sensor. The hardware set
from the microphone array of the Microsoft Kinect sen- was expanded so as to build an array of four microphones,
sor (Figure 1a). The software module for hardware inter- where the maximum distance between them is 138 cm, as
face and data acquisition (Figure 1b) is developed in C++ shown in Figure 2. Note that the distance between the two
and communicates with a Matlab implementation of the central microphones were fixed to 23 cm, as in the original
method for the estimation of the sound source direction setup of the Kinect sensor. The recording was made by con-
(Figure 1c,d,e,g,h) at frame-level through a bridge inter- sidering the following acquisition angles: -90◦ , -60◦ , -45◦ ,
face. The final estimated direction is obtained by integration -20◦ , 0◦ , 20◦ , 45◦ , 60◦ , 90◦ . As for M.LR, a man repeats
of frame-level decisions (Figure 1f) and the result is made the same word aloud for 10 seconds at a distance of 2 m
available by a visualization module (Figure 1i). from the microphones. In the second scenario, a Bose loud-
Surrey data set
S.AR the room size was 17.04 × 14.53 × 6.5 m (l x w x h) and the anechoic conditions were simulated by truncating the first reflected
waves
S.A the room is a small sized (5.72 × 6.64 × 2.31) office with seats for 8 persons. The reverberation time is RT60 = 320ms
S.B medium sized classroom (4.65 × 9.6 × 2.68) with RT60 = 470ms
S.C large sized room (23.5 × 18.8 × 2.31) used as cinema hall for more than 400 people with RT60 = 680ms
S.D a medium sized room (8.72 × 8.02 × 4.25) for presentations with high ceiling and RT60 = 890ms

Oldenburg data set

O.AR Anechoic chamber; the HATS was mounted in front of the speaker, with a distance varying from 0.8m to 3m. The azimuth angle of
the source ranged in the interval −90◦ , +90◦ , with a reverberation time RT60 < 50 ms
O.O1 Ofﬁce room (3.20 × 4.55 m) The HATS was mounted on a desk close to the center of the room, while the speaker was moved in the
front hemisphere −90◦ , +90◦ at a distance of 1 and m with an elevation angle of 0◦ . The reverberation time RT60 = 300 ms
O.O2 Small ofﬁce room (3.30 × 6.00 m) by using the same setup used for O.O1. The door and the window were left opened.
O.OC Busy cafeteria at lunch time. The HATS was moved in different positions, with an average reverberation time of 1250 ms and an
average SNR of 75.6 dB.
O.CY Courtyard crossed by pedestrians and bicycles. The setup is similar to the one in the cafeteria, with an average reverberation time of
900 ms and an average SNR of 86.1 dB

Aachen data set

A.MR Medium sized room 8.00 × 5.00 × 3.10 m suitable for a meeting. The distance between the loudspeaker and the receiver ranged from
1.45 m up to 2.80 m, with an average reverberation time of 0.23 s
A.LR Lecture room whose size is 10.80m x 10.90m x 3.15m. The loudspeaker was placed in different positions, with a distance ranging
from 4 m up to 10.20 m. The average reverberation time is 0.78 s.

Table 1: Description of the rooms where the sounds contained in the data sets were recorded.

speaker was used to reproduce part of a song (10 seconds where N is the number of considered sounds in a particular
for each recording) at two distances from the array center: test and Ne is the number of wrongly estimated angles. For
1 m and 3 m. the computation of the MAE, xi is the estimated direction
while xi is the nominal one.
3.2. Performance evaluation
We organized the experiments in four groups, so as to 3.3. Results discussion
evaluate the performance of the system when the sound In the analysis of the performance of the real-time local-
source is (1) at different angles and (2) different distances ization method we considered several tolerance values for
from the microphones, (3) different SNR values and (4) dif- the localization error. Specifically, we considered different
ferent configurations of the binaural microphone set. Each margins for maximum acceptable error on the estimation of
group of tests involves specific rooms from the four data the sound source position. A sound source is correctly lo-
sets, which we report in the following: calized if the estimated angle falls into a tolerance interval
around the nominal value of the sound direction. The value
• TEST1: S.AR, S.A, S.B, S.C, S.D from the Surrey of the tolerance degree influences both accuracy and MAE.
data set and the rooms O.AR and O.O1 from the Old- Indeed, only wrongly estimated directions are considered
enburg data set. for the computation of the MAE.
• TEST2: A.MR and A.LR from the Aachen data set. For the experiments in the groups “TEST1” and
“TEST2” we performed a quantitative analysis of results by
• TEST3: O.O2, O.C and O.CY from the Oldenburg computing the accuracy and MAE. In Table 2 and Table 3
data set. we report the results that we achieved for the different envi-
ronments in “TEST 1” and “TEST 2”, respectively. We ob-
• TEST4: M.LR and M.L from the MIVIA data set. served generally high performance and precise estimations.
The results decrease for highly noisy environments.
We measured the performance of the real-time sound lo-
In the experiment “TEST3”, we evaluated the perfor-
calization system by computing the accuracy (Acc) and the
mance of the system in environments with several values of
mean absolute error (MAE) of the angle estimation, as fol-
signal-to-noise ratio. For the rooms in the Oldenburg data
lows:
set, precise measures on the nominal sound source direction
N are not provided, but rather only an indication of the direc-
Ne ∗ 100 i=1 |xi − x
i |
Acc[%] = 100 − , M AE = tion of the sound. In Table 5, we report the angle estimated
N N
Test 1 Test 3
Tolerance Expected Estimated
Room Configuration
Room 5◦ 10◦ 15◦ direction angle
Acc 100% 100% 100% O.O2 1C Frontal −5◦
S.AR
MAE 0◦ 0◦ 0◦ O.O2 1D Frontal 5◦
Acc 78.37% 100% 100% O.O2 1B Right 35◦
S.A 15◦
MAE 1.08◦ 0◦ 0◦ O.O2 2A Right
Acc 81.08% 97.29% 100% O.O2 2B Left −10◦
S.B 0◦
MAE 1.08◦ 0◦ 0◦ O.C 1A Frontal
Acc 86.48% 94.59% 94.59% O.C 2D Frontal 5◦
S.C
MAE 2.83◦ 2.43◦ 2.43◦ O.C 2E Right 60◦
Acc 51.35% 70.27% 70.27% O.C 1D Right 90◦
S.D
MAE 14◦ 13◦ 13◦ O.C 1B Left −30◦
Acc 97.29% 100% 100% O.C 1C Left −90◦
O.AR
MAE 0◦ 0◦ 0◦ O.CY 1A Right 35◦
Acc 48.64% 70.27% 75.67% O.CY 1B Right 10◦
O.O1
MAE 11.41◦ 10.33◦ 9.79◦ O.CY 1C Left −20◦
O.CY 1D Left −50◦
O.CY 1E Left −60◦
Table 2: Performance results for the group of experiments
“TEST 1” with different values of the error tolerance inter- Table 4: Qualitative analysis of the results for the group of
val. experiments “TEST 3”.
Test 2
Test 4
Tolerance
Expected Angle CONF1 CONF2
Room 5◦ 10◦ 15◦
-90◦ -58◦ -45◦
Acc 100% 100% 100%
Living Room (M.LR)
A.MR -60◦ -45◦ -33◦
MAE 0◦ 0◦ 0◦ -40◦ -37◦
-45◦
Acc 50% 100% 100% -20◦ -26◦ -21◦
A.LR
MAE 3.53◦ 0◦ 0◦ 0◦ 0◦ 0◦
20◦ 23◦ 19◦
Table 3: Performance results for the group of experiments 45◦ 30◦ 37◦
60◦ 55◦ 40◦
“TEST 2” with different values of the error tolerance inter-
90◦ 56◦ 55◦
val.
-90◦ -62◦ -62◦
-60◦ -55◦ -52◦
Laboratory (M.L)

-45◦ -40◦ -38◦

by the real-time localization system together with the indi- -20◦ -23◦ -21◦
cation of the direction of the sound. For each room, various 0◦ 0◦ 0◦
20◦ 20◦ 19◦
positions of the sound source are considered.
45◦ 44◦ 41◦
For the experiments “TEST4”, we evaluated the perfor- 60◦ 57◦ 55◦
mance of the method when the physical configuration of the 90◦ 56◦ 62◦
microphone set changes. In particular we compared the es-
timated direction when the pair of microphones from the Table 5: Qualitative analysis of the results for the group of
Kinect sensor are placed at a distance of 23cm (CONF1) experiments “TEST 4”.
and 138cm (CONF2). We observed a decrease of the per-
formance of the localization system for sound sources at
directions farther than 60◦ with respect to the frontal direc- system. Block #1 refers to the acquisition module, blocks
tion. This is mainly due to the directionality of the micro- #2 and #3 are the features computation and frame-level an-
phones of the Kinect sensor. When considering the con- gle estimation, while block #4 refers to the integration mod-
figuration CONF2, we observed that the reliability of the ule. We used audio chunks of 4 seconds to be analyzed and
direction estimation decreases. The model proposed in [15] observed that in order to obtain real-time response on a ma-
is constructed by taking inspiration from the functions of chine with an Intel i5 CPU, a reasonable choice for the size
the human auditory system, which are not effectively appli- of Wt is 0.8 seconds.
cable for large binaural inter-distances. The real-time response and the non-expensiveness of the
Furthermore, we studied the average required processing audio sensor are key factors for the deployment of the pro-
time with respect to different sizes of the time windows Wt . posed system in real scenarios. The sound directions esti-
We report in Table 6 the processing time required by each mated by several sensors can be combined together in order
block of the architecture and the total processing time of the to obtain more precise localization and to eventually track
Time analysis motor maps based on the hrtf. In IROS, 2006 IEEE/RSJ In-
Wt #1 #2 #3 #4 Total tern. Conf. on, pages 1170–1176, Oct 2006.
0.8 0.51 0.5 2.63 0.002 3.642 [9] J. Huang, K. Kume, A. Saji, M. Nishihashi, T. Watanabe,
1.1 0.51 0.5 2.92 0.002 3.932
and W. L. Martens. Robotic spatial sound localization and
1.4 0.51 0.5 3.24 0.002 4.252
1.7 0.51 0.5 3.56 0.002 4.572 its 3-d sound human interface. pages 0191–, Washington,
2.0 0.51 0.5 3.97 0.002 4.982 DC, USA, 2002. IEEE Computer Society.
[10] C. Hummersone, R. Mason, and T. Brookes. Dynamic prece-
Table 6: Contribution to the processing time of the modules dence effect modeling for source separation in reverberant
of the architecture for an input sound of 4 seconds. #1 is environments. Audio, Speech, and Language Processing,
IEEE Trans. on, 18(7):1867–1871, Sept 2010.
the acquisition module, #2 and #3 are the features computa-
[11] M. Jeub, M. Schafer, and P. Vary. A binaural room impulse
tion and frame-level angle estimation, while #4 refers to the
response database for the evaluation of dereverberation algo-
integration module. Time values are in seconds. rithms. In Digital Signal Processing, 2009 16th Inter. Conf.
on, pages 1–5, July 2009.
[12] H. Kayser, S. D. Ewert, J. Anemüller, T. Rohdenburg,
the movements of the sound sources within the test environ-
V. Hohmann, and B. Kollmeier. Database of multichannel
ment. in-ear and behind-the-ear head-related and binaural room im-
pulse responses. EURASIP J. Adv. Signal Process, 2009:6:1–
4. Conclusions 6:10, Jan. 2009.
[13] F. Keyrouz, K. Diepold, and S. Keyrouz. High performance
We proposed a real-time architecture for sound source
3d sound localization for surveillance applications. In Ad-
localization that combines existing localization techniques vanced Video and Signal Based Surveillance, 2007. AVSS
with a cheap audio sensor, namely the Microsoft Kinect 2007. IEEE Conference on, pages 563–566, Sept 2007.
sensor. The architecture that we presented provides real- [14] C. Knapp and G. Carter. The generalized correlation method
time responses. Moreover, it is flexible and modular, which for estimation of time delay. Acoustics, Speech and Signal
makes it usable with any localization method that requires Processing, IEEE Trans. on, 24(4):320–327, Aug 1976.
up to four microphones. We carried out an extensive ex- [15] T. May, S. van de Par, and A. Kohlrausch. A probabilistic
perimental analysis on four data sets and evaluated different model for robust localization based on a binaural auditory
conditions that a localization method has to face with in real front-end. Audio, Speech, and Language Processing, IEEE
scenarios. We realized and made publicly available one of Transactions on, 19(1):1–13, Jan 2011.
the four data sets used for the experimental analysis. [16] M. Murase, S. Yamamoto, J. marc Valin, K. Nakadai, K. Ya-
mada, K. Komatani, T. Ogata, and H. G. Okuno. Multi-
ple moving speaker tracking by microphone array on mobile
References
robot. 1:143–145, 2005.
[1] J. Blauert. Spatial hearing : the psychophysics of human [17] K. Nakadai, H. Okuno, and H. Kitano. Auditory fovea based
sound localization. Cambridge, Mass. MIT Press, 1997. speech separation and its application to dialog system. In
[2] M. Crocco, M. Cristani, A. Trucco, and V. Murino. Au- IROS, 2002. IEEE/RSJ Intern. Conf. on, volume 2, pages
dio surveillance: A systematic review. ACM Comput. Surv., 1320–1325 vol.2, 2002.
48(4):52:1–52:46, Feb. 2016. [18] K. Nakadai, H. G. Okuno, H. Kitano, H. G. Okuno, and
[3] P. Foggia, N. Petkov, A. Saggese, N. Strisciuglio, and H. Kitano. Real-time sound source localization and sepa-
M. Vento. Audio surveillance of roads: A system for detect- ration for robot audition. In in Proceedings IEEE ICSLIP,
ing anomalous sounds. Intelligent Transportation Systems, 2002, pages 193–196, 2002.
IEEE Trans. on, PP(99):1–10, 2015. [19] H. Okuno, K. Nakadai, K. Hidai, H. Mizoguchi, and H. Ki-
[4] P. Foggia, N. Petkov, A. Saggese, N. Strisciuglio, and tano. Human-robot interaction through real-time auditory
M. Vento. Reliable detection of audio events in highly noisy and visual multiple-talker tracking. In IROS, 2001. Pro-
environments. Pattern Recognition Letters, 65:22 – 28, 2015. ceedings. 2001 IEEE/RSJ Intern. Conf. on, volume 3, pages
[5] B. R. Glasberg and B. C. Moore. Derivation of auditory 1402–1409 vol.3, 2001.
filter shapes from notched-noise data. Hearing Research, [20] S. T. Shivappa, B. D. Rao, and M. M. Trivedi. Audio-
47(1):103 – 138, 1990. visual fusion and tracking with multilevel iterative decoding:
[6] A. Handzel. Planar spherical diffraction-arrays: Linear Framework and experimental evaluation. IEEE J. Sel. Topics
sound localization algorithms. In Sensor Array and Multi- Signal Process., 4(5):882–894, Oct 2010.
channel Processing, 2006. Fourth IEEE Workshop on, pages [21] I. Toshima, S. Aoki, and T. Hirahara. An acoustical tele-
655–658, July 2006. presence robot: Telehead ii. In IROS, 2004. (IROS 2004).
[7] A. Handzel and P. Krishnaprasad. Biomimetic sound-source Proceedings. 2004 IEEE/RSJ Intern. Conf. on, volume 3,
localization. IEEE Sensors J., 2(6):607–616, Dec 2002. pages 2105–2110 vol.3, Sept 2004.
[8] J. Hornstein, M. Lopes, J. Santos-Victor, and F. Lacerda. [22] J. Valin, F. Michaud, J. Rouat, and D. Letourneau. Robust
Sound localization for humanoid robots - building audio- sound source localization using a microphone array on a
mobile robot. In IROS, 2003. (IROS 2003). Proceedings.
2003 IEEE/RSJ Intern. Conf. on, volume 2, pages 1228–
1233 vol.2, Oct 2003.
[23] V. Willert, J. Eggert, J. Adamy, R. Stahl, and E. Korner. A
probabilistic model for binaural sound localization. Systems,
Man, and Cybernetics, Part B: Cybernetics, IEEE Trans. on,
36(5):982–994, Oct 2006.

Sound Source Ion Using LabVIEW
No ratings yet
Sound Source Ion Using LabVIEW
63 pages
Subliminal Studio
92% (12)
Subliminal Studio
59 pages
A Survey of Sound Source Localization and Detectio
No ratings yet
A Survey of Sound Source Localization and Detectio
25 pages
Three-Dimensional Sound Source Localization Using
No ratings yet
Three-Dimensional Sound Source Localization Using
15 pages
Sound Source Localization Using A Convolutional Neural
No ratings yet
Sound Source Localization Using A Convolutional Neural
17 pages
Localization of Simultaneous Moving Sound Sources For Mobile Robot Using A Frequency-Domain Steered Beamformer Approach
No ratings yet
Localization of Simultaneous Moving Sound Sources For Mobile Robot Using A Frequency-Domain Steered Beamformer Approach
6 pages
Audio-Visual Spatial Integration and Recursive Attention For Robust Sound Source Localization
No ratings yet
Audio-Visual Spatial Integration and Recursive Attention For Robust Sound Source Localization
10 pages
Acoustic Localization by Interaural Leve
No ratings yet
Acoustic Localization by Interaural Leve
4 pages
Lightweight and Optimized Sound Source Localization and Tracking Methods For Open and Closed Microphone Array Configurations
No ratings yet
Lightweight and Optimized Sound Source Localization and Tracking Methods For Open and Closed Microphone Array Configurations
21 pages
A Sound Source Localization Method Based On Frequency Divider and Time Difference of Arrival
No ratings yet
A Sound Source Localization Method Based On Frequency Divider and Time Difference of Arrival
13 pages
Sound Source Localization
No ratings yet
Sound Source Localization
27 pages
TDS 1
No ratings yet
TDS 1
19 pages
Scream and Gunshot Detection and Localization For Audio-Surveillance Systems
No ratings yet
Scream and Gunshot Detection and Localization For Audio-Surveillance Systems
6 pages
Beam Learning 2022
No ratings yet
Beam Learning 2022
17 pages
Final Version
No ratings yet
Final Version
14 pages
Scream and Gunshot Detection and Localization For Audio-Surveillance Systems
No ratings yet
Scream and Gunshot Detection and Localization For Audio-Surveillance Systems
6 pages
Robot Source Ion
No ratings yet
Robot Source Ion
6 pages
Major Project
No ratings yet
Major Project
84 pages
Robust Localization and Tracking of Simultaneous Moving Sound Sources Using Beamforming and Particle Filtering
No ratings yet
Robust Localization and Tracking of Simultaneous Moving Sound Sources Using Beamforming and Particle Filtering
13 pages
An HRTF Based Approach Towards Binaural Sound Source Localization
No ratings yet
An HRTF Based Approach Towards Binaural Sound Source Localization
8 pages
Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array
No ratings yet
Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array
6 pages
Seismic Instrumentation Design: Selected Research Papers on Basic Concepts
From Everand
Seismic Instrumentation Design: Selected Research Papers on Basic Concepts
Raman K. Attri
No ratings yet
【05 1 3】论文
No ratings yet
【05 1 3】论文
15 pages
Sound Source Localization
No ratings yet
Sound Source Localization
4 pages
GCC
No ratings yet
GCC
14 pages
Detecting The Direction of Emergency Vehicle Sirens With
No ratings yet
Detecting The Direction of Emergency Vehicle Sirens With
7 pages
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
3D Sound Source Localization Using A Spherical Mic
No ratings yet
3D Sound Source Localization Using A Spherical Mic
8 pages
【05 1 4】论文
No ratings yet
【05 1 4】论文
13 pages
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ultrasonic Source Localization With A Beamformer Implemented On An Fpga Using A High Density Microphone Array 2018
No ratings yet
Ultrasonic Source Localization With A Beamformer Implemented On An Fpga Using A High Density Microphone Array 2018
57 pages
Objectives Hardware Setup: Acoustic Event Localization and Its Video Tracking in 3D Space
No ratings yet
Objectives Hardware Setup: Acoustic Event Localization and Its Video Tracking in 3D Space
1 page
Proceedings etc2016: European Telemetry and Test Conference etc2016
From Everand
Proceedings etc2016: European Telemetry and Test Conference etc2016
BoD - Books on Demand
No ratings yet
Direction of Arrival Estimation Using Microphone Array
No ratings yet
Direction of Arrival Estimation Using Microphone Array
6 pages
Final Project Sound Detector
No ratings yet
Final Project Sound Detector
8 pages
Risoud - Sound Source Localization
No ratings yet
Risoud - Sound Source Localization
6 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Acoustic Source Localization and Beamforming: Theory and Practice
No ratings yet
Acoustic Source Localization and Beamforming: Theory and Practice
13 pages
Real-Time Multiple Audio Beamforming System: Johan Lindqvist Martin Sollenberg
No ratings yet
Real-Time Multiple Audio Beamforming System: Johan Lindqvist Martin Sollenberg
61 pages
Mobile Computing Solutions for Healthcare Systems
From Everand
Mobile Computing Solutions for Healthcare Systems
Sivakumar R.
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Proceedings etc2014: European Telemetry and Test Conference
From Everand
Proceedings etc2014: European Telemetry and Test Conference
The European Society of Telemetry
No ratings yet
The Fusion of Distributed Microphone Arrays For Sound Localization
No ratings yet
The Fusion of Distributed Microphone Arrays For Sound Localization
10 pages
A Microphone Array System For Speech Source Localization, Denoising, and Dereverberation
No ratings yet
A Microphone Array System For Speech Source Localization, Denoising, and Dereverberation
163 pages
Acoustic Source Localization Based On Time-Delay Estimation Method
No ratings yet
Acoustic Source Localization Based On Time-Delay Estimation Method
5 pages
A Review of Acoustic Emission Source Localization Techniques in Different Dimensions
No ratings yet
A Review of Acoustic Emission Source Localization Techniques in Different Dimensions
24 pages
Sound Localization Using Microphone Arrays: Anish Chandak 10/12/2006 COMP 790-072 Presentation
No ratings yet
Sound Localization Using Microphone Arrays: Anish Chandak 10/12/2006 COMP 790-072 Presentation
33 pages
Handbook of Ultra-Wideband Short-Range Sensing: Theory, Sensors, Applications
From Everand
Handbook of Ultra-Wideband Short-Range Sensing: Theory, Sensors, Applications
Jürgen Sachs
No ratings yet
Speech Enhancement 2019
No ratings yet
Speech Enhancement 2019
3 pages
A Survey On Acoustic Sensing
No ratings yet
A Survey On Acoustic Sensing
33 pages
An Acoustic Source Localization Method Using A Dro
No ratings yet
An Acoustic Source Localization Method Using A Dro
17 pages
Audio-Visual Cross-Attention Network For Robotic Speaker Tracking
No ratings yet
Audio-Visual Cross-Attention Network For Robotic Speaker Tracking
13 pages
Level and Time Panning of Phantom Images For Musical Sources
No ratings yet
Level and Time Panning of Phantom Images For Musical Sources
12 pages
Automatic Speech Activity Detection, Source Localization, and Speech Recognition On The Chil Seminar Corpus
No ratings yet
Automatic Speech Activity Detection, Source Localization, and Speech Recognition On The Chil Seminar Corpus
4 pages
Fujipress - JRM 29 1 4
No ratings yet
Fujipress - JRM 29 1 4
13 pages
Furletov Thesis RMR
No ratings yet
Furletov Thesis RMR
149 pages
Thesis Soft
No ratings yet
Thesis Soft
244 pages
HEar User Manual
No ratings yet
HEar User Manual
7 pages
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
p67 - Tracheal Sound Acquisition Using Laser Doppler Vibrometer and Stethoscopes
No ratings yet
p67 - Tracheal Sound Acquisition Using Laser Doppler Vibrometer and Stethoscopes
8 pages
Preview-9781000910759 A46453091
No ratings yet
Preview-9781000910759 A46453091
17 pages
Close-Range Variation in Binaural Responses To Orally-Radiated Sources
No ratings yet
Close-Range Variation in Binaural Responses To Orally-Radiated Sources
7 pages
SIST EN 60268 7 2011 Opra1 2020
No ratings yet
SIST EN 60268 7 2011 Opra1 2020
8 pages
IASS 2016 Paper 21
No ratings yet
IASS 2016 Paper 21
8 pages
Hihat Manual 1
No ratings yet
Hihat Manual 1
63 pages
Applied Sciences: Manipulating The Hardness of HATS-Mounted Ear Pinna Simulators To Reproduce Cartilage Sound Conduction
No ratings yet
Applied Sciences: Manipulating The Hardness of HATS-Mounted Ear Pinna Simulators To Reproduce Cartilage Sound Conduction
9 pages
RD9 Modifications V2
No ratings yet
RD9 Modifications V2
12 pages
Hats Me 102.1 V1.0
No ratings yet
Hats Me 102.1 V1.0
12 pages
Hi-Fi With Your PI Resources Hands On
No ratings yet
Hi-Fi With Your PI Resources Hands On
2 pages
KEMAR HATS Head Orientation Directivity
No ratings yet
KEMAR HATS Head Orientation Directivity
3 pages
Hatz Manual
No ratings yet
Hatz Manual
3 pages
Ellis14 Interspeech
No ratings yet
Ellis14 Interspeech
5 pages
Venue Package ACE 207 v7 Aug 2021
No ratings yet
Venue Package ACE 207 v7 Aug 2021
11 pages
Simmons Titan70 Owners Manual
No ratings yet
Simmons Titan70 Owners Manual
40 pages
MOT Audio Inventory 2023
No ratings yet
MOT Audio Inventory 2023
1 page
Geonkick User Guide
No ratings yet
Geonkick User Guide
11 pages
IqsakUnit5 LessonPlan2
No ratings yet
IqsakUnit5 LessonPlan2
8 pages
Pgurl 14318
No ratings yet
Pgurl 14318
1 page
Simmons Titan 50 Manual
No ratings yet
Simmons Titan 50 Manual
36 pages
HATS Agenda 6-30-21
No ratings yet
HATS Agenda 6-30-21
1 page
ANC Headphone Battery Life Test
No ratings yet
ANC Headphone Battery Life Test
5 pages
Product-Info SOR Hats & Tops-Vol.2
No ratings yet
Product-Info SOR Hats & Tops-Vol.2
5 pages
PIM541
No ratings yet
PIM541
2 pages
PRIME Drum Module - User Guide - v1.0
No ratings yet
PRIME Drum Module - User Guide - v1.0
68 pages
Drm1mk4 Manual Web en
No ratings yet
Drm1mk4 Manual Web en
26 pages
GXL800 Spec SHT Rev 00
No ratings yet
GXL800 Spec SHT Rev 00
2 pages
Bellows22 Interspeech
No ratings yet
Bellows22 Interspeech
5 pages
D7712fa1 The 20quality 20and 20reliability 20of 20the 20mechanical 20stethoscopes 20and 20LDV Accepted 20version
No ratings yet
D7712fa1 The 20quality 20and 20reliability 20of 20the 20mechanical 20stethoscopes 20and 20LDV Accepted 20version
11 pages
Measuring Room Acoustic Parameters Using A Head An
No ratings yet
Measuring Room Acoustic Parameters Using A Head An
9 pages
2001 - Spatial Sound Generation and Perception - Ville Pulkki
No ratings yet
2001 - Spatial Sound Generation and Perception - Ville Pulkki
59 pages
Ali Abdal Time Management Course
No ratings yet
Ali Abdal Time Management Course
32 pages
Sound Hack Manual
No ratings yet
Sound Hack Manual
56 pages
Chapter 5 Sensation and Perception
No ratings yet
Chapter 5 Sensation and Perception
13 pages
Effect of Torso Reflections From Simplified Torso Models On Hea 2022 Applied
No ratings yet
Effect of Torso Reflections From Simplified Torso Models On Hea 2022 Applied
5 pages
Profound Meditation User Manual
100% (3)
Profound Meditation User Manual
21 pages
Binaural Science PDF
No ratings yet
Binaural Science PDF
5 pages
Binaural Beats
No ratings yet
Binaural Beats
6 pages
Win Aural
No ratings yet
Win Aural
2 pages
22468
No ratings yet
22468
66 pages
Master Thesis Simone Vinkel
No ratings yet
Master Thesis Simone Vinkel
92 pages
Binaural Rendering of Dynamic Head and Sound Source Orientation Using High-Resolution HRTF and Retarded Time
No ratings yet
Binaural Rendering of Dynamic Head and Sound Source Orientation Using High-Resolution HRTF and Retarded Time
5 pages
Multibook of Architectural Acoustics en
100% (1)
Multibook of Architectural Acoustics en
152 pages
Psychoacoustics
No ratings yet
Psychoacoustics
4 pages
AMBEO Orbit Quickstart
No ratings yet
AMBEO Orbit Quickstart
7 pages
Increase Height and Grow Taller Using Brain Wave Binaural Beats - Natural Height Growth
100% (1)
Increase Height and Grow Taller Using Brain Wave Binaural Beats - Natural Height Growth
12 pages
A Systematic Review of The Most Appropriate Methods of Achieving Spatially Enhanced Audio For Headphone Use
No ratings yet
A Systematic Review of The Most Appropriate Methods of Achieving Spatially Enhanced Audio For Headphone Use
27 pages
Cerrato Jay2001
No ratings yet
Cerrato Jay2001
10 pages
International Consensus On Bilateral Cochlear Implants and Bimodal PDF
No ratings yet
International Consensus On Bilateral Cochlear Implants and Bimodal PDF
2 pages
Audiology Answers For Otolaryngologists 2nd Edition All Chapters Included
100% (12)
Audiology Answers For Otolaryngologists 2nd Edition All Chapters Included
14 pages
Auditory Localisation of Low-Frequency Sound Sources
No ratings yet
Auditory Localisation of Low-Frequency Sound Sources
9 pages
The Science Behind Holosync and Other Neurotechnologies
No ratings yet
The Science Behind Holosync and Other Neurotechnologies
13 pages
Hehrmann2012 - Improved Speech Intelligibility With Cochlear Impl
No ratings yet
Hehrmann2012 - Improved Speech Intelligibility With Cochlear Impl
3 pages
Audibility and Interpolation of Head-Above-Torso Orientation in Binaural Technology
No ratings yet
Audibility and Interpolation of Head-Above-Torso Orientation in Binaural Technology
12 pages
Effects of Hearing Impairment On Listening Effort and Speech Intelligibility: A Systematic Review of Literature
No ratings yet
Effects of Hearing Impairment On Listening Effort and Speech Intelligibility: A Systematic Review of Literature
5 pages
Binaural Beats
100% (3)
Binaural Beats
8 pages
Music'S Material Dependency: What Underwater Opera Can Tell Us About Odysseus's Ears
100% (1)
Music'S Material Dependency: What Underwater Opera Can Tell Us About Odysseus's Ears
31 pages
Dearvr Pro Manual
No ratings yet
Dearvr Pro Manual
46 pages
Final Year Project
No ratings yet
Final Year Project
53 pages

University of Groningen: 10.1109/AVSS.2017.8078461

Uploaded by

University of Groningen: 10.1109/AVSS.2017.8078461

Uploaded by

University of Groningen

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Download date: 24-12-2024

Alessia Saggese1 , Nicola Strisciuglio2 , Mario Vento1 , Nicolai Petkov2

Abstract organized in two groups depending on the acquisition de-

(i) (h) (g)

Oldenburg data set

Aachen data set

-45◦ -40◦ -38◦

You might also like