Three-Dimensional Sound Source Localization Using
Three-Dimensional Sound Source Localization Using
ARTICLE
DOI: 10.5772/61652
© 2015 Author(s). Licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
Abstract
Keywords Three-dimensional Sound Source Localization,
Sound source localization is one of the basic and essential Inter-channel Time Difference Trajectory, Rotating Micro‐
techniques for intelligent robots in terms of human-robot phone Array, Ray-tracing Formula, Human-Robot Interac‐
interaction and has been utilized in various engineering tion
fields. This paper suggests a new localization method using
an inter-channel time difference trajectory, which is a new
localization cue for efficient 3-D localization. As one of the
ways to realize the proposed cue, a two-channel rotating 1. Introduction
array is employed. Two microphones are attached on the
left and right sides of the spherical head. One microphone Recently, intelligent robots have been developed to not
is in a circular motion on the right side, while the other is only support arduous human tasks, but also to interact with
fixed on the left side. According to the rotating motion of people in order to meet various human needs [1, 2]. As an
the array, the (source) direction-dependent characteristics independent object with its own intelligence [3], a robot
of the trajectories are analysed using the Ray-Tracing needs to recognize environmental changes, such as the
formula extended for 3-D models. In simulation, the appearance of unidentified objects or the acoustic events
synthesized signals generated by the fixed and rotating for missions completed. For example, robots working in
microphone signal models were used as the output signals households should detect user voices and simultaneously
of the two microphones. The simulation showed that the be aware of other acoustic events, such as noises emitted
localization performance is strongly dependent on the from home appliances and other voices from electric
azimuthal position of a source, which is caused by the devices. As a result, they can pay attention to speakers with
asymmetry of the trajectory amplitude. Additionally, the more natural human-robot interaction (HRI) skills. In this
experimental results of the two experiments carried out in situation, the technology of sound source localization (SSL)
the room environment demonstrated that the proposed is employed to estimate the acoustic source direction using
system can localize a Gaussian noise source and a voice the acoustic signals from the microphone array; this is one
source in 3-D space. of the most important building blocks of HRI. In addition,
^ is the estimated
where c is the speed of sound and φ S
azimuth. For example, SSL using ICTD maps [20, 21] was
applied to the microphone array fixed on the robot head.
Gxi x j ( f )
Rxi x j (t ) = ò
¥
e j 2p f t df (2) where θshift ∈ 0, 2π and is measured clockwise from the
Gxi x j ( f )
-¥
+ZR axis. Also, θshift indicates the position of the microphone
on the right side given a constant rotating radius (rR).
sive medium. In order to derive the propagation distance, q duuur duur (q shift ) = cos -1 (cosq R sin jS
R S
three direction vectors fromCH to the positions of the two + sin q R cos jS cosq R sin q shift (9)
microphones and the source need to be expressed as a + sin q R cos jS sin qS cosq shift )
function of the shift angle and the source direction.
uur p
(
d R = rH cosq R , rR sin q shift , rR cosq shift ) (5) q duuur duur =
F S 2
+ jS (10)
uur → →
d F = ( -rH ,0,0 ) (6) If θd→R d→s ≤ θc where θc is cos−1( | d R | / | d S | ) ~ 90o, then the
wave propagates along the direct path only. Otherwise, if
uur the microphone is hidden by the spherical head, which
dS = rS ( sin jS ,cos jS cosqS ,cos jS sin qS ) (7) mathematically indicates that θd→R d→s ≥ θc , the propagation
distance by the diffracted wave motion along the surface
→ →
where d R and d F are the direction vectors fromCH to the should also be considered. Here, equation (11) shows the
rotating and fixed microphones respectively, rS is the propagation distance from the source to the rotating
→
distance between a source and CH, and dS is the direction microphone, denoted by DR , and equation (12) shows the
vector from CH to the source. distance from the source to the fixed microphone, repre‐
sented by DF :
The Ray-Tracing formula for the 3-D model includes the
concept of the critical circle, which is the counterpart of the
critical point in the 2-D model [22]. When the observation
ì 2
DR = í
2
( )
ï rS + rH - 2rSrH cos q dR dS ,q dR dS £ q c
uuur uur uuur uur
(11)
point is hidden by the head, the wave-front from the source
initially propagates to the critical circle directly and then î dR dS( )
ïrS sin (q c ) + rH q uuur uur - q c ,q c £ q uuur uur £ p
dR dS
(12)
we denote the direction vector to the ith microphone on the
→
→
surface as dI , then the angle between two direction vectors
→
î dF dS( )
ïrS sin (q c ) + rH q uuur uur - q c ,q c £ q uuur uur £ p
dF dS
elevation.
shown. In addition, the amplitude of the ICTD trajectory
will be presented as a function of the azimuth angle only.
Nomenclatures Numerical Values
rH 15 cm
1 2p ¥
D (q ) dq shift
2p c ò0 d shift
ICTDT= (13)
wR = d θshift / dt 600 rpm.
θR 0.236 rad.
The wave propagation from a source to a microphone is
CR (0.146 m, 0 m, 0 m)
strongly dependent on the azimuth angle of a source; see
CH (0 m,0 m, 0 m) Figure 4 and equation (10). For example, when a sound
Table 1. The dimensions of the rotating microphone array and the angular source is to the left of the head, only direct wave propaga‐
speed of the rotating part are given tion occurs to the fixed microphone. On the other hand, the
consecutive propagation along the direct and indirect paths
occurs from the source to the rotating microphone because
2.3 Characteristics of the ICTD Trajectory of the Rotating the rotating microphone is hidden by the head from the
Microphone Array view of the source. If a source is to the right of the head, the
wave propagation characteristics are reversed. In particu‐
In section 2.2, examples of the ICTD trajectories obtained
lar for a source with an azimuth angle within
using the Ray-Tracing formula were shown. In section 2.3,
− θR , + θR , the propagation characteristic to the rotating
we describe the characteristics of the ICTD trajectories of
microphone changes according to its shift angle. In order
the rotating microphone array. First, the relation between
the mean of the ICTD trajectory and the azimuth angle of to represent propagation characteristics more precisely, we
the source will be derived. Second, the relation between the divide them into three categories: (case 1) the wave
phase shift of the trajectory and the elevation angle will be propagation is along the direct path only, (case 2) the
− π / 2, − θR Case 1 Case 2
− θR , 0 Case 1 Case 3
0, + θR Case 2 Case 3
+ θR , π / 2 Case 2 Case 1
Table 2. The wave propagation characteristics from the source to the rotating
and fixed microphones
Figure 6. The ICTD trajectories for sources on the median plane are shown.
As the source elevation changes, the trajectory pattern shifts. For the source
on the top of the head, the distance between the rotating microphone and
the source is the shortest at θshift = 0 ° , which indicates that the ICTD is
maximal. The distance increases up to θshift = 180 ° and goes back to the
shortest distance within a single period.
Figure 7. The mean value of the ICTD trajectory as a function of the azimuth
angle only is shown. The one-to-one relationship between the mean value
consecutive propagation is along the direct and indirect of the ICTD trajectory and the azimuth angle is clearly defined. The vertical
dashed lines indicate the azimuth angles (i.e., ±θR . θR is cos−1( rH 2 − rR 2 / rH )
paths and (case 3) There is a transition between case 1 and
= 13.4934°) in the rotating microphone array with dimensions as in Table 1.
case 2, depending on the shift angle. In terms of these
categories, Table 2 shows the propagation characteristics
On the other hand, the specific shift angles, which corre‐
according to the azimuth angle of the source. For the
spond to the maximal or minimal values of the ICTD
sources with azimuth angles within − θR , 0 , the propa‐ trajectory, are useful for finding the elevation angle of the
gation to the rotating microphone corresponds to case 3. source. These specific shift angles are defined as below:
The transition from case 1 to case 2 occurs when the rotating
microphone passes θb and the subsequent transition from
case 2 to case 1 occurs at π + θb, where θb is defined in max
q shift = arg maxqshift Dd¥ (q shift )
equation (14). For the sources with azimuth angles within (15)
min
q shift = arg minqshift Dd¥ (q shift )
0, + θR , the transition from case 1 to case 2 occurs at θb
and the consecutive transition from case 2 to case 1 occurs
at π − θb : max min
which implies that θshift and θshift are equal to π / 2 − θS and
3π / 2 − θS , respectively. It is obvious that the elevation angle
q shift
ë ( )
q b = min arg ésin q shift £ tan jS cot q R ù
û (14) increases from the +Y axis in an anticlockwise direction,
while the shift angle of the rotating microphone increases
from the +ZR axis in a clockwise direction (see Figures 1 and
where θshift ∈ 0, 2π . With respect to the azimuth interval, 3). In summary, by finding the two parameters of the ICTD
the mean value of the ICTD trajectory is derived as a ¯ and θ max/min , the azimuth and
trajectory, i.e., ICTDT shift
function of the azimuth angle of the source only as shown
elevation angles of the source can be found independently.
in Figure 7. It is apparent that a one-to-one relationship
exists between the mean value of the ICTD trajectory and In addition, as shown in Figure 5, the amplitude of the ICTD
the azimuth angle of the source. Therefore, it is possible to trajectory changes as the azimuth angle is varied. Natural‐
estimate the azimuth angle of the source once the mean ly, we can expect the trajectory amplitude to be dependent
value of the ICTD trajectory is obtained. on the azimuth angle only. Its definition is given below:
( )e
GxF xR f q shift
( )
¥
ICTDTPP = RxF xR t q shift = ò j 2p f t
df (18)
ì rH cos ( cosq R sin jS - sin q R cos jS )
-1
é p ùü
-¥
G xF xR (f q )
shift
ï jS Î ê - , -q R ú ï
ï -rH cos -1 ( cosq R sin jS + sin q R cos jS ) ë 2 ûï
ï ï
ï æ p ö ï (17) where GxF xR ( f | θshift) is calculated by using microphone
( )
= írH ç cos -1 sin (jS - q R ) - + sin (jS + q R ) ÷ jS Î éë -q R , +q R ùû ý
2
ï è ø ï
ï é p ùï signals that are collected while the rotating microphone is
ï 2rH sin q R cos jS jS Î ê +q R , + ú ï passing around θshift. Details about the measurement and
ï ë 2 ûï
î þ
the signal processing are presented in sections 4.1 and 4.2.
Thus, G xF xR ( f | θshift) is strongly dependent on the shift
angle of the rotating microphone. It should be noted that
the relative motion between a sensor and a source is so
small that the Doppler effect in the measured signals is
negligible [24]. Therefore, it is reasonable to assume that
RxF xR (τ | θshift) should have time-varying peak positions.
Based on the time- or (shift) angle-dependent feature, we
can define the source direction estimator (SDE) as below:
) ( dq
dt j ,q q shift )
(( )
2p S S
ò
0
RxF xR t jS ,qS q shift q shift dq shift
shift (19)
=
2p (
dt jS ,qS q shift )
ò0 dq shift
In this section, the proposed SSL algorithm is described. On In section 4, we evaluate the performance of the proposed
the basis of the weak Doppler effect (due to the small SSL algorithm using synthesized signals. To do this, signal
relative motion), the collected signals of the fixed and models of the fixed and rotating microphones were needed.
rotating microphones within (at least) a single period are These models are given in section 4.1 and the results of the
segmented intoNf frames, each including Nfft samples. In simulation for a single source are described in section 4.2.
addition, the angle allocated to each frame is the shift angle, The localization performance is evaluated with respect to
which is measured at the time the middle sample in the the localization error, which is defined as the angle between
frame is collected. Sections 4.1 and 4.2 give more informa‐ the true and perceived direction vectors. In this simulation,
tion about the segmentation process. In the real system, the the physical dimensions of the rotating microphone array
shift angle is measured directly by the encoder signal; see are given in Table 1.
Figures 17 and 18 for more details. Then, we can obtain
RxF xR (τ | θshift) and SDE (φS , θS ) for every possible direction 4.1 Signal Models of the Fixed and Rotating Microphones
using equations (18-19). The final decision is made by
As shown in Figure 3, the rotating microphone array is
detecting the peak in SDE; we assume that the number of
installed on a spherical head with a radius of rH . One of the
dominant sources is given by the recognition group prior
two microphones is fixed at ( − rH , 0, 0) on the surface of the
to the SSL process. If it is reported that a single source is
recognized, then the estimation of the source direction can spherical head (this microphone is hereafter called the
be done by equation (20): “fixed microphone” for convenience). Then, the output
signal of the fixed microphone in a continuous time
domain, denoted as xF (t), can be modelled as below:
(j$ ,q$ ) = arg max
S S jS ,qS SDE (jS ,qS ) (20)
(
xF ( t ) = hSxF ( t ) * s t jS ,qS
T
) (21)
^ and θ
where φ ^ are the estimated azimuth and elevation
S S
x
angles of a source, respectively. For multiple SSL, various where h S F (t ) is the spherical impulse response [25] from the
peak detection strategies are applicable when multiple source position to the fixed microphone position on the
peaks in the SDE are present. However, since our research spherical head, s(t | φs , θs ) is the source signal contents, and
focused on a single SSL, we used the simplest global peak
* indicates the convolution operator. As shown in equation
detection using equation (20). Figure 9 shows the proce‐ x
(21), h S F (t ) is not a function of θshift because this microphone
dure of the proposed SSL algorithm.
does not move. However, the other microphone (i.e., the
rotating microphone) is located on the rotating plate and
moves in a circular motion on theYRZR plane (see Figures
2 and 3). Then, the measured signal of the rotating micro‐
phone should be strongly dependent on the shift angle. The
signal model of the rotating microphone denoted as xR (t)
can be defined as:
(
xR ( t ) = hSxR t q Shift ) ( )
T
* s t jS ,qS (22)
x
where h S R (t | θshift) is the spherical impulse response from
the source position to the rotating microphone position. In
x
this case, h S R is a function of θshift due to its circular motion.
The synthesized signal refers to the discrete-time domain
signal. The generation of the synthesized signal of the fixed
microphone, denoted as xF n , is carried out by simply
discretizing xF (t), as shown below:
Figure 9. The proposed SSL algorithm based on SDE. Two measured time-
domain signals are divided into the given number of frames, N f , and each xF éënùû = xF ( nDtS ) (23)
frame has N fft samples. The shift angle corresponding to the middle sample
in each frame is allocated to each frame. The framed signals are then used where ΔtS is the sampling time and xF n is the nth sample
to calculate Rx x (τ | θshift). Next, by using the constructed ICTD trajectory
F R
of the synthesized signal of the fixed microphone. On the
database, the SDE can be obtained for every direction. Finally, by finding
the dominant peak(s) in SDE in descending order, the source direction(s) other hand, the motion of the rotating microphone makes
can be estimated. the generation of xR n more complicated. For example,
MxR ( ×) = ê
(
ê xR t q shift = Dq N ) ú
ú uuur
(25) tdv = ( sin jS , cosjS cosqS , cosjS sin qS )
ê M ú
uuuur (28)
ê
( (
ê xR t q shift = N f - 1 × Dq N
ë
) )
ú
ú
û
(
pdv = sin j$ S , cosj$ S cosq$ S , cosj$ S sin q$ S )
where xR (t|θshift) is equal to xR (t|θshift + 2π ) due to the
circular motion of the rotating microphone, which means a
cyclo-stationary process when a source content is station‐
ary [26]. We assume that the other dimensions do not vary.
In this simulation, we set the sampling frequency (fS) and
the number of frames (Nf) as 44.1 kHz and 360, respectively.
Thus, ΔθN becomes 1°, and M xR ⋅ , which is the matrix of
conditioned discretized signals, can be modelled in
equation (26):
ê
(
é xR nDtS q shift = 0 ) ù
ú
MxR éë N f , Dq N ùû = ê
(
ê xR nDtS q shift = Dq N ) ú
ú (26)
ê M ú
ê ú
( (
ê xR nDtS q shift = N f - 1 × Dq N
ë
) ) ú
û
From equation (26), the synthetized signal of xR n along Figure 10. M xR ⋅ of conditioned and discretized output signals of the
the shift angle axis can be represented as follows: rotating microphone with respect to the time and shift angle axes
(
xR éënùû = xR nDtS q shift = ( n - 1) × Dq N ) (27)
GCC-
mated
SDE. Thus, the total SRP-PHAT processing cost is M(M+1)/2 X
5Nfftlog2Nfft + M(M-1)/2 X (NφNθ + 7Nfft). In the same way, the
side
cost of the proposed localization algorithm is (3M-1)/2 X
arily 5Nfftlog2Nfft+ (M-1) X (NφNθ+ 7/2Nfft) and the cost of the
e to delay-and-sum beamformer is MN X (NφNθ) where N is the
mally
m frame length, 100. Therefore, the approximated costs are
ever, computed: (1) delay-and-sum beamformer, MN (NφNθ), (2)
SRP-PHAT method, M2(NφNθ+12Nfft), (3) the proposed
ions,
ions method, M (NφNθ+9/2Nfft). It is noteworthy that if the Nfft is
ories smaller than NφNθ, the signal processing cost of the
tions proposed localization method is N or M times less than the
ched delay and sum beamformer and SRP-PHAT methods. It is
Figure 14.Localization
Localization errors for 228 directions are depicted. As
reasonable that the proposed localization cue can be
time we can see, the elevation-dependent dependent feature was not found.computed by M microphone pairs when using the (M+1)-
, we However, it was quite visible that the SSL performance ischannel microphone array. However, when using the SRP-
the strongly dependent on the azimuth angle of a source only. PHAT method, the M(M-1)/2 microphone pairs are utilized
alues for a single localization process. On the other hand, it is
elevation angle varies from 0o to +330o (-30o) with 30oknown that the SRP-PHAT method can be applied to the
ndly, Figure 14. Localization errors for 228 directions are depicted. As we can see,
intervals. The number of source directions is 228. It issituation where the signal to noise ratio (SNR) is less than
with the elevation-dependent feature was not found. However, it was quite
assumed
visible thatthat the
the SSL rotating
performance microphone
is strongly dependent onarray angle waszero. However, in the proposed localization method, the
system
the azimuth
pear,
located in a free field.Figure 14 shows the localizationTDE error will have an effect directly on the localization
of a source only.
n be performance because the proposed cue is based on the
error distribution for all of the source directions.
s not measured ICTD trajectory. Thus, it can be reasonably
number ofthe
Generally, microphones
performance is denotedgets asbetter
M, theas computation
the source isexpected that the proposed localization performance will
the
of all
close tothe
thepossible GCC-PHAT
left, opposite functions
to the rotatingrequires M(M-1)/2 duebe significantly more degraded than that of the SRP-PHAT
microphone,
microphone
ain
ains, phase transforms. For a discrete Fourier transform size of method when SNR<0.
to the left and right asymmetry of the azimuth azimuth-dependent
ng in Nfft, a single FFT takes 5Nfftlog2Nfft operations.
ICTD trajectory amplitude (see Figure igure 8). Also, it is
ating 5. Experiment
1. DFT ofthat
reasonable the all
anthe microphones: M Χ (5Nfeature
elevation-dependent
dependent fftlog2Nfft) was not
visible. The distribution of the he mean errors along theWe developed a rotating microphone array according to
2. Spectral processing: 7NfftM(M-1)/2
osed azimuth angle was shown in Figure igure 115. the proposed design (see Figure 2 and Table 1). It should
3. Inverse DFT: M(M-1)/2 Χ (5Nfftlog2Nfft)
be noted that the two microphone signals needed to be
with
4.3 Computational load comparison transmitted wirelessly for safety reasons. Thus, both a
The
T 4. SRP-PHAT calculation for possible directions (NφNθ):
microphone and a transmitter needed to be placed inside
muth M(M-1)/2
To be an Χ N3-D
efficient φNDθ SSL method, the signal processingthe rotating block. An ultrasonic motor was chosen to make
s costs must be light. In this section, the computationalthis block rotate inside the head. Details about the structure
load of the proposed localization method is comparedof the proposed array and the measurement process are
provided in section 5.1. Section 5.2 shows the results of the
Mean error distribution for sources
two experiments for the feasibility test: one involving a
on a sagittal plane
8 Gaussian white noise source and the other involving a voice
source.
7
6
Mean absolute error (degree)
ating
0
ound -90 -60 -30 0 30 60 90
azimuth angle of a source (degree)
bell
bell-
ional Figure 15.The The mean error along the azimuth angle of a source.Figure 16. A top view of the hemisphere showing the interior arrangement
Figure 15. The mean error along the azimuth angle of a source. The of the rotating and fixed blocks, two ultrasonic motors, one encoder, and
eak is Thelocalization
localization
errors errors of thesources
of the left-sided left-sided
sided sources
are almost areexcept
the same, almost
for theone motor driver. The rotating block on the right side contains the electronic
o the the leftmost
same, exceptsource.
for The
theright-sided
leftmostsources tend toThe
source. be estimated
right with worse
right-sided sourcesboards for transmitting the microphone signal. The shift angle of the rotating
resolution compared with the left-sided sources.
hone
hone, tend to be estimated with worse resolution compared with themicrophone is measured by using the encoder.
left-sided sources.
Sangmoon Lee, Youngjin Park and Youn-sik Park: 11
Three-dimensional Sound Source Localization Using Inter-channel Time Difference Trajectory
2 Nfft)
Figure 16. A top view of the hemisphere showing the interior
arrangement of the rotating and fixed blocks, two ultrasonic
s (NφNθ): motors, one encoder, and one motor driver. The rotating block
on the right side contains the electronic boards for transmitting
the microphone signal. The shift angle of the rotating
M+1)/2 X microphone is measured by using the encoder
ame way, 5.1
5.1 Experimental
Experimental Set-up
Set-up
(3M-1)/2
For
For our
our proposed
proposed rotating
rotating microphone
microphone array,array, wewe chose
chose aa
he cost of
wireless
wireless system
system (Q240,
(Q240, RFQ)
RFQ) consisting
consisting of of aa dual-channel
dual-channel
where N
receiver,
receiver, two
two transmitters,
transmitters, and and two
two microphones
microphones (QB686,
(QB686,
oximated
RFQ).
RFQ). InInorder to put
order to a transmitter unit and aunit
put a transmitter microphone
and a
mer, MN
together
microphone in a rotating
togetherblock,
in a the electronic
rotating boards
block, the inside the
electronic
t), (3) the
transmitter
boards inside unitthehad to be rearranged
transmitter unit hadand installed
to be in a
rearranged
orthy that
cylindrical
and installed plastic
in ablock. Figureplastic
cylindrical 16 showsblock.theFigure
interior16
ssing cost
arrangement of the necessary
shows the interior arrangement blocksof and
the other units blocks
necessary inside
imes less
the
andspherical
other units head. There
inside theare two cylindrical
spherical blocks,
head. There are two
two
RP-PHAT
ultrasonic
cylindricalmotors,
blocks,one twoencoder, and motors,
ultrasonic one motor onedriver. The
encoder,
calization
cylindrical
and one motor blockdriver.
on theThe right side is called
cylindrical blocktheon “rotating
the right
hen using
block”
side is and
called thistheblock consists
“rotating of theand
block” rearranged
this block electronic
consists
er, when
boards used to transmit
of the rearranged the microphone
electronic boards usedsignal (#. 1) and
to transmit the
crophone
the pin-type microphone
microphone signal (#. 1)located
and the 3 cm from the
pin-type centre of
microphone
s. On the
the cap. 3This
located cm block
from is theconnected
centre oftothe thecap.
ultrasonic
This blockmotor is
ethod can (USR-E3T/24V, SHINSEI), which is driven by the motor
oise ratio connected to the ultrasonic motor (USR-E3T/24V,
driver (D6060E,
SHINSEI), whichSHINSEI).
is driven by Additionally, the encoder
the motor driver (D6060E, is
proposed attached to the motor. Thus, the shift angle of the micro‐
an effect SHINSEI). Additionally, the encoder is attached to the
phone
motor. isThus,
measured the using the encoder
shift angle of thesignal. The other
microphone is
ause the block on the left side is hereafter called the “fixed block”
rajectory. measured using the encoder signal. The other block on
for convenience. The pin-type microphone (#. 2) is attached Figure 18. The spherical head equipped with the rotating microphone array
proposed the left side is hereafter called the “fixed block” for is set up in the measurement room
at the centre of the cap. The transmitter unit is outside the
ly more convenience. The pin-type microphone (#. 2) is attached
block. The left and right side views are also presented in
at the centre of the cap. The transmitter unit is outside the
od when Figure 17. The physical dimensions are the same as those ment, the SSL performance for a source in the median plane
block. The left and right side views are also presented in
in Table 1, except the rotating radius rR , which is 3 cm in was evaluated. Only the elevation angle of a source was
Figure 17. The physical dimensions are the same as those
the real 1,array. Therefore, the radius
ICTD ݎtrajectory database varied from -30° to 210° with 10° intervals. The source
in Table except the rotating ோ , which is 3 cm in
needed to be reconstructed. content was Gaussian white noise signal with frequency
the real array. Therefore, the ICTD trajectory database
ording to contents from 1.5 kHz to 20 kHz generated by the random
needed to be reconstructed.
It should noise generator (SF-06, RION) and was produced longer
ded to be than the one rotating period. The angular frequency was
s, both a set to 54 rpm. For example, when the source is at (0o, 0o),
ed inside the measured microphone signals and the z-phase encoder
chosen to signal are depicted in Figure 19. The total measurement
about the time was 3 seconds and the signal duration was set to 2
surement seconds. By using the encoder signal in the z-phase, we
hows the collected the samples within a single rotating period and
test: one (a) (b) allocated Nfft samples to each frame. For 25 directions in the
the other Figure 17. The right-side view of the spherical head is shown in
Figure 17. The right-side view of the spherical head is shown in the left and median plane, the mean localization error was 1.75° and the
the left and the left-side view is presented in the right.
the left-side view is presented in the right standard deviation was 1.65°. Therefore, the experimental
result showed that our proposed SSL algorithm is applica‐
5.2 Experimental Results ble to the SSL of a Gaussian noise source.
o
Figure 22. The source direction estimator when a source was at (45 , 0o).
o o
The estimated source direction was (39 , − 1 ) even though the silent
region was included.
6. Discussion
Figure 20. The output signals of two microphones and the encoder signal (z-
o The concept of the proposed localization cue, which is a
phase) in the time domain when the voice source is at (45 , 0o). As shown,
the voice signal is non-stationary. (source) direction- and (microphone) position-dependent
ICTD trajectory, can be applied to the circular microphone
array as well. In general, if a microphone array is composed
of (M+1) sensors, all the information from every possible
microphone pair is under consideration, in order to
practically improve the SSL resolution. If the M-channel
circular microphone array is located on the right side of the
sphere and the one additional microphone is fixed on the
other side, the (microphone which is the element of the M-
channel circular array) position-dependent ICTD trajectory
can be reproduced exactly the same as the proposed ICTD
trajectory. Thus, the proposed localization cue-based 3-D
SSL can be also applicable to the circular microphone array.
However, the more microphones that are used for SSL, the
more costly it is to produce the microphone array, espe‐
cially due to the price of the Analog-to-Digital converters
(ADC), which is proportional to the number of channels.
Figure 21. The GCC-PHAT functions along the shift angle of the rotating
However, sequential sampling and signal processing could
microphone. In the region with sufficient signal contents, the functions were
obtained easily. This can be interpreted to mean that the peak location of be an alternative to reduce the production cost.
each function is shifting up and down as the rotating microphone moves in
a circular motion. The more smoothed peaks result from the comparatively On the other hand, the source position was supposed to be
narrow frequency band of the measured voice signals. outside the rotating microphone array. However, noises
8. Acknowledgements
Figure 23. Directivity patterns of the two microphones, i.e., 1/4 inch 9. References
microphone (B&K) and pin-type microphone (RFQ). The asymmetry in the
directivity of the pin-type microphone is clearly visible. [1] Kerstin D (2007) Socially intelligent robots: dimen‐
sions of human–robot interaction. Phil. Trans. R. Soc.
As mentioned before, we assumed that a sound source is B. 362:679-704.
fixed. In daily life, a source moves slowly compared with [2] Fong T, Illah N, Kerstin D (2003) A survey of socially
the rotation period of the array. However, in a situation interactive robots. Robot Auton. Syst. 42:143-166.
where there is a fast-moving source, the patterns of the peak [3] Anderson M.L (2003) Embodied cognition: A field
and the side edges in the SDE would be quite different guide. Artif. Intel. 149:91-130.
compared with those in Figures 12 and 21. Usually, the
[4] Valin J. M, Michaud F, Rouat J, Létourneau D (2003)
movement of the source occurs along the azimuth angle
Robust sound source localization using a micro‐
axis. Therefore, the peak shape in the SDE would be
phone array on a mobile robot. In: Proceedings of
stretched along the time axis according to the direction of
the IEEE/RSJ International Conference on Intelli‐
the source movement and the magnitude of the peaks
gent Robots and Systems 2:1228-1233.
would be suppressed. In this case, without the information
about the initial direction of the fast-moving source, its [5] Mumolo E, Massimiliano N, Gianni V (2003)
direction cannot be estimated using a single measurement Algorithms for acoustic localization based on
because the peak shape in the SDE is not a time-dependent microphone array in service robotics. Robot Auton.
feature. Even though it is possible to track a fast-moving Syst. 42:69-88.
source when increasing the angular velocity of the rotating [6] Wang H, Peter C (1997) Voice source localization for
part, a safety issue can arise. automatic camera pointing system in videoconfer‐
encing. In: Proceedings of the IEEE International
7. Conclusion Conference on Acoustics, Speech, and Signal
Processing 1:187-190.
This paper proposed an ICTD trajectory as the new 3-D SSL [7] Valenzise G, Gerosa L, Tagliasacchi M, Antonacci F,
cue and, as one of the possible ways to realize the proposed Sarti A (2007) Scream and gunshot detection and