0% found this document useful (0 votes)
85 views

Three-Dimensional Sound Source Localization Using

Three-Dimensional Sound Source Localization Using
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Three-Dimensional Sound Source Localization Using

Three-Dimensional Sound Source Localization Using
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

International Journal of Advanced Robotic Systems

ARTICLE

Three-dimensional Sound Source


Localization Using Inter-channel Time
Difference Trajectory
Regular Paper

Sangmoon Lee1*, Youngjin Park1 and Youn-sik Park1

1 KAIST, Daejeon, Republic of Korea


*Corresponding author(s) E-mail: [email protected]

Received 04 July 2013; Accepted 28 September 2015

DOI: 10.5772/61652

© 2015 Author(s). Licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.

Abstract
Keywords Three-dimensional Sound Source Localization,
Sound source localization is one of the basic and essential Inter-channel Time Difference Trajectory, Rotating Micro‐
techniques for intelligent robots in terms of human-robot phone Array, Ray-tracing Formula, Human-Robot Interac‐
interaction and has been utilized in various engineering tion
fields. This paper suggests a new localization method using
an inter-channel time difference trajectory, which is a new
localization cue for efficient 3-D localization. As one of the
ways to realize the proposed cue, a two-channel rotating 1. Introduction
array is employed. Two microphones are attached on the
left and right sides of the spherical head. One microphone Recently, intelligent robots have been developed to not
is in a circular motion on the right side, while the other is only support arduous human tasks, but also to interact with
fixed on the left side. According to the rotating motion of people in order to meet various human needs [1, 2]. As an
the array, the (source) direction-dependent characteristics independent object with its own intelligence [3], a robot
of the trajectories are analysed using the Ray-Tracing needs to recognize environmental changes, such as the
formula extended for 3-D models. In simulation, the appearance of unidentified objects or the acoustic events
synthesized signals generated by the fixed and rotating for missions completed. For example, robots working in
microphone signal models were used as the output signals households should detect user voices and simultaneously
of the two microphones. The simulation showed that the be aware of other acoustic events, such as noises emitted
localization performance is strongly dependent on the from home appliances and other voices from electric
azimuthal position of a source, which is caused by the devices. As a result, they can pay attention to speakers with
asymmetry of the trajectory amplitude. Additionally, the more natural human-robot interaction (HRI) skills. In this
experimental results of the two experiments carried out in situation, the technology of sound source localization (SSL)
the room environment demonstrated that the proposed is employed to estimate the acoustic source direction using
system can localize a Gaussian noise source and a voice the acoustic signals from the microphone array; this is one
source in 3-D space. of the most important building blocks of HRI. In addition,

Int J Adv Robot Syst, 2015, 12:171 | doi: 10.5772/61652 1


intelligent robots need to estimate the azimuth and In general, most of the SSL systems use more than four
elevation angles of a source together (i.e., 3-D SSL), due to microphones for 3-D SSL. Except for circumstances in
the fact that a sound event occurs at an arbitrary direction which directional microphones are used, if only two
in 3-D space. It is noteworthy, however, that a lot of microphones are used and their locations do not change
techniques need to be carried out simultaneously with the with time, 3-D SSL is not possible, because front-back
given limited resources and the computational power for confusion occurs due to the existence of many directions
SSL is restricted. As a result, the computationally efficient sharing the same localization cues, even for a single SSL
3-D SSL method is increasingly required. In addition, the [13, 14]. This is called the cone-of-confusion in 3-D space.
source direction is defined by the inter-aural polar coordi‐ In the absence of an additional structure (e.g., a spiral-
nates shown in Figure 1. shaped structure), more than four microphones in different
(imaginable) planes are necessary in order to solve the
In the last few decades, many different SSL algorithms that
cone-of-confusion problem inherent to 3-D SSL [15].
are applicable to intelligent robots [4, 5] and also other
However, in the situation where a two-channel moving
engineering systems (e.g., teleconference systems [6] and
array for 3-D SSL is used, the measured localization cue
surveillance units [7]) have been proposed. Even if the
such as ICTD will change according to the array motion.
microphone array size, shape and number of microphones
Then, from the changing pattern of ICTDs, it is expected
differ due to the constraints of various applications, certain
that 3-D SSL can be achievable. Therefore, the (microphone)
localization cues (or direction estimation cues) are com‐
position-dependent ICTDs can be represented as below:
monly used: inter-channel time difference (ICTD), inter-
channel level difference (ICLD) and inter-channel spectral
difference (ICSD). ICTD is defined as the time difference
t (jS ,qS a ) (1)
between the arrivals of a sound wave-front to the micro‐
phones; ICLD is defined as the difference between the
sound pressure levels at the microphones; and ICSD means
where α is the parameter that identifies the location or the
the difference between the spectral contents at the micro‐
motion of the moving microphone. Here, the (source)
phones. ICTD has been used as the most powerful locali‐
direction- and (microphone) position-dependent ICTDs
zation cue [8-10] in almost all applications. Comparatively,
are named as the ICTD trajectory, which is the new concept
ICLD and ICSD have not been used as frequently for
for a localization cue for efficient 3-D SSL.
practical applications, but are nevertheless employed in
biomimetic research, including that of ear-based SSL This paper proposes a 3-D SSL method using the ICTD
systems [11, 12]. trajectory induced by the circular motion of the two-
channel rotating array. This array was selected as one of the
possible ways to realize a specific ICTD trajectory. One of
the two microphones is attached to the rotating plate on the
right side of the spherical head, indicated by the red-
coloured circle in Figure 2; the other microphone is fixed
on the left side of the head, indicated by the blue-coloured
circle. Figure 2 shows the schematic drawing of the
suggested two-channel rotating microphone array instal‐
led on the spherical head. In this paper, in order to generate
the known movement of the array, the circular motion is
given to the right-sided plate.

This paper is organized as follows: in section 2, we intro‐


duce the new specific localization cue that is the ICTD
trajectory relevant to the circular motion of the array. The
mathematical derivation of the ICTD trajectory is presented
using the extended Ray-Tracing formula for 3-D models.
The relationship between the parameters of the ICTD
trajectory and the source direction is also presented. Section
3 describes the proposed 3-D SSL algorithm based on the
source direction estimator. In section 4, the localization
Figure 1. A sound source direction is defined in the inter-aural polar performance of the proposed SSL algorithm is examined
coordinates by both the azimuthal angle (φS ) and the elevation angle (θS ). using simulations: the signal models of both rotating and
The sagittal plane is a vertical plane dividing the space into right and left fixed microphones are presented. Section 5 shows the
halves. Sources on each sagittal plane share the same azimuth. The median
experimental setup and the results using two kinds of
plane is the mid-sagittal plane that bisects the space symmetrically from left
to right. The horizontal plane is perpendicular to the sagittal plane and sources. The discussion is presented in section 6 and the
passes through the centre of the coordinates. concluding remarks are given in section 7.

2 Int J Adv Robot Syst, 2015, 12:171 | doi: 10.5772/61652


æ c × t 12 ö
j$ S = cos -1 ç ÷ (3)
è L ø

^ is the estimated
where c is the speed of sound and φ S
azimuth. For example, SSL using ICTD maps [20, 21] was
applied to the microphone array fixed on the robot head.

2.1 ICTD Trajectory

When using an immovable microphone array, the obser‐


vation of the constant localization cues (e.g., the ICTDs) can
be used to efficiently estimate a direction, provided there
Figure 2. Schematic of the rotating microphone array installed on the
are sufficient sensors. However, like the two-channel array,
spherical head. One of two microphones is attached to the rotating plate on
the right side of the head (red-coloured circle). In this paper we call this the measurement of the single constant localization cue
component the “rotating microphone”. The rotating part is moving in a does not guarantee successful SSL due to the cone-of-
clockwise direction on theYRZR plane and the shift angle of this part is
confusion problem. Thus, it is concluded that the useful 3-
measured with respect to the +ZR axis. The other microphone is fixed on the
left side of the head (blue-coloured circle). It is called the “fixed micro‐ D SSL cue should have other characteristics dependent on
phone”. the source direction, i.e., azimuth and elevation angles.

When using the two-channel rotating array, if the angular


2. Localization Cue: ICTD Trajectory velocity is given as wR , the position-dependent ICTD must
Most of the conventional localization algorithms have be a periodic function with a period of 2π / wR . It is also
been developed using a microphone array fixed at given noteworthy that the localization cue we want to suggest is
positions [4, 5, 8-12, 15] and with constant direction- similarly based on the ICTD concept. We assumed that the
dependent cues, on the assumption that a source position Doppler effect caused by the relative motion between the
does not vary. The most commonly used time delay rotating microphone and a source can be ignored, because
estimation (TDE) method is to use the generalized cross- the speed of the rotating microphone, i.e., the radius of the
correlation (GCC) function, which is employed to rotating circle multiplied by the rotating angular speed is
estimate ICTD between the selected pair of micro‐ quite small compared with the speed of sound. Further‐
phones [16, 17]. Among various GCC functions, the GCC- more, when considering the application scenario where the
phase transform (PHAT) function is widely used because talking person as a target source is walking inside a room,
it is well known for its robust estimations of ICTD in the the sound source is supposed to move slowly. Thus, the
reverberant field [18]. new specific localization cue, including the position-
dependent feature of the circular motion of the rotating
If we use N omnidirectional microphones, N(N-1)/2 GCC- array, is defined in equation (4) for the source at (φS , θS ):
PHAT functions are obtainable. The GCC-PHAT function
between the ith and jth microphones is defined as below [16]:
t (jS ,qS q Shift ) (4)

Gxi x j ( f )
Rxi x j (t ) = ò
¥
e j 2p f t df (2) where θshift ∈ 0, 2π and is measured clockwise from the
Gxi x j ( f )

+ZR axis. Also, θshift indicates the position of the microphone
on the right side given a constant rotating radius (rR).

where f is the frequency and τ is a time delay variable. xi


2.2 Extended Ray-Tracing Formula for 3-D Models
and xj are defined as the ith and jth microphone output
signals. Gxi xj and Rxi xj are the cross-spectral density function The well-known Ray-Tracing formula [22, 23] has been
and the GCC-PHAT function between xi and xj , respective‐ widely used for 2-D models to approximate the inter-aural
ly. The τij , the measured ICTD between xi and xj , is calcu‐ time difference. However, in order to achieve the ICTD
trajectories, this formula should be extended for 3-D
lated by τij = argmaxτ Rxi xj (τ ). After that, the estimated
models such as a two-channel rotating array installed on
direction of the sound source can be found by the relation‐ the sphere. Figure 3 shows the nomenclatures required to
ship between the geometry of the microphone array and model the propagation distance between the source and the
the multiple ICTDs. For example, when two microphones sensor locations. Once the propagation distance is derived,
are placed on the horizontal plane and apart from each depending on the shift angle of the rotating part, the ICTD
other by L m, the azimuth of a source can be estimated by trajectory is obtainable with the assumption that the speed
using equation (3) [19]: of the sound is independent of frequency in a non-disper‐

Sangmoon Lee, Youngjin Park and Youn-sik Park: 3


Three-dimensional Sound Source Localization Using Inter-channel Time Difference Trajectory
Figure 3. Nomenclatures related to the rotating microphone array on the
spherical head. The centre of the head (CH ) is at the origin of the XYZ
coordinates. The red circle represents the rotating microphone at the rotating
part on the Y R Z R plane. This microphone is rotating with a constant speed Figure 4. Two steps of wave propagation from the sound source to the
(rR × wR ), where rR and wR are the rotating radius and angular velocity, hidden observation point are illustrated by the green and purple lines. If the
observation point (indicated by the red point) is hidden by the sphere from
respectively. `The rotation centre CR is located at ( rH2 − rR2, 0, 0). The shift
the view of the source, the wave-front approaches the critical circle directly.
angle (θshift ) of the rotating plate is defined as the angle from the + Z R axis After that, the wave-front reaches the observation point along the surface.
in a clockwise direction. The blue circle represents the fixed microphone that
is located at ( − rH , 0, 0). rH is the radius of the spherical head and θR is the
Then, θd→R d→s and θd→F d→s can be expressed by equations (9) and
angle between the X axis and the direction vector to the rotating microphone
from CH. (10) with the assumption that rS ≫ rH :

sive medium. In order to derive the propagation distance, q duuur duur (q shift ) = cos -1 (cosq R sin jS
R S

three direction vectors fromCH to the positions of the two + sin q R cos jS cosq R sin q shift (9)
microphones and the source need to be expressed as a + sin q R cos jS sin qS cosq shift )
function of the shift angle and the source direction.

uur p
(
d R = rH cosq R , rR sin q shift , rR cosq shift ) (5) q duuur duur =
F S 2
+ jS (10)

uur → →
d F = ( -rH ,0,0 ) (6) If θd→R d→s ≤ θc where θc is cos−1( | d R | / | d S | ) ~ 90o, then the
wave propagates along the direct path only. Otherwise, if
uur the microphone is hidden by the spherical head, which
dS = rS ( sin jS ,cos jS cosqS ,cos jS sin qS ) (7) mathematically indicates that θd→R d→s ≥ θc , the propagation
distance by the diffracted wave motion along the surface
→ →
where d R and d F are the direction vectors fromCH to the should also be considered. Here, equation (11) shows the
rotating and fixed microphones respectively, rS is the propagation distance from the source to the rotating

distance between a source and CH, and dS is the direction microphone, denoted by DR , and equation (12) shows the
vector from CH to the source. distance from the source to the fixed microphone, repre‐
sented by DF :
The Ray-Tracing formula for the 3-D model includes the
concept of the critical circle, which is the counterpart of the
critical point in the 2-D model [22]. When the observation
ì 2
DR = í
2
( )
ï rS + rH - 2rSrH cos q dR dS ,q dR dS £ q c
uuur uur uuur uur

(11)
point is hidden by the head, the wave-front from the source
initially propagates to the critical circle directly and then î dR dS( )
ïrS sin (q c ) + rH q uuur uur - q c ,q c £ q uuur uur £ p
dR dS

secondarily propagates along the surface to the observation


point. These propagation steps are shown in Figure 4 and
the critical circle is represented by the red-coloured line. If
ì 2
DF = í
2
( )
ï rS + rH - 2rSrH cos q dF dS ,q dF dS £ q c
uuur uur uuur uur

(12)
we denote the direction vector to the ith microphone on the



surface as dI , then the angle between two direction vectors

î dF dS( )
ïrS sin (q c ) + rH q uuur uur - q c ,q c £ q uuur uur £ p
dF dS

(i.e., dI and dS ) is denoted as θd→I d→s and defined below:


If we denote Dd∞ as lim DF − DR , the ICTD trajectories can
uur uur rs →∞
æ d ×d ö be obtained by dividing Dd∞ by c. For example, when the
-1 ç
uur uur ÷
I S
q duur duur = cos (8)
I S çç d d ÷÷ physical dimensions of the rotating microphone array are
è I S ø those given in Table 1, the resulting ICTD trajectories of the

4 Int J Adv Robot Syst, 2015, 12:171 | doi: 10.5772/61652


frontal sources on the horizontal plane are those shown in
Figure 5. It is clear that the mean of the ICTD trajectories
varies according to the change in the azimuth angle of the
source, because theYRZR plane, on which the circular
motion of the rotating microphone occurs, is perpendicular
to the X axis. Also, for the frontal sources, the resulting up-
and-down pattern of the ICTD trajectories is expected
because of the clockwise microphone motion from the +ZR
axis. On the other hand, we can conjecture that the pattern
of the ICTD trajectories of the rear sources will be reversed.
In addition, for the laterally biased sources, no significant
features are visible because the source is on the X axis,
which is the symmetric axis of the array’s rotating motion.
Figure 6 shows the ICTD trajectories for the sources on the
median plane. The vertical axis has no physical meaning
because we added a 0.3 ms offset to the original ICTD
trajectories as the elevation angle of the source increases by
45°, in order to present all ICTD trajectories in a single
graph. As shown in Figure 6, as the elevation angle is
increased, the phase of the trajectory is shifted. The source
elevation angle is defined on the sagittal plane and the shift
angle of the rotating part is also defined on the sagittal
plane, i.e., the YRZR plane; see Figures 1 and 3. This is why
the ICTD trajectory is shifted as much as the change of the
elevation angle of a source. Then, it seems possible to Figure 5. The ICTD trajectories of the frontal sources on the horizontal plane.
estimate a source’s elevation angle by finding the phase The up-and-down motion of the ICTD trajectories is apparent because the
shift angle of the rotating microphone increases in a clockwise direction from
shift of the trajectory. Therefore, we can expect that the the +Z R axis. Additionally, for the left- and right-sided sources, the ICTD
mean value of the ICTD trajectories will be a useful cue for trajectories described by the cyan dotted lines have no significant features
the azimuth estimation and an amount of the shift of the because the propagation from the laterally-biased source to the rotating
ICTD trajectory can be used to efficiently estimate the microphone does not change as the shift angle varies.

elevation.
shown. In addition, the amplitude of the ICTD trajectory
will be presented as a function of the azimuth angle only.
Nomenclatures Numerical Values

rS 10 m First of all, the mean value of the ICTD trajectory is defined


as equation (13):
rR 3.5 cm

rH 15 cm
1 2p ¥
D (q ) dq shift
2p c ò0 d shift
ICTDT= (13)
wR = d θshift / dt 600 rpm.

θR 0.236 rad.
The wave propagation from a source to a microphone is
CR (0.146 m, 0 m, 0 m)
strongly dependent on the azimuth angle of a source; see
CH (0 m,0 m, 0 m) Figure 4 and equation (10). For example, when a sound
Table 1. The dimensions of the rotating microphone array and the angular source is to the left of the head, only direct wave propaga‐
speed of the rotating part are given tion occurs to the fixed microphone. On the other hand, the
consecutive propagation along the direct and indirect paths
occurs from the source to the rotating microphone because
2.3 Characteristics of the ICTD Trajectory of the Rotating the rotating microphone is hidden by the head from the
Microphone Array view of the source. If a source is to the right of the head, the
wave propagation characteristics are reversed. In particu‐
In section 2.2, examples of the ICTD trajectories obtained
lar for a source with an azimuth angle within
using the Ray-Tracing formula were shown. In section 2.3,
− θR , + θR , the propagation characteristic to the rotating
we describe the characteristics of the ICTD trajectories of
microphone changes according to its shift angle. In order
the rotating microphone array. First, the relation between
the mean of the ICTD trajectory and the azimuth angle of to represent propagation characteristics more precisely, we
the source will be derived. Second, the relation between the divide them into three categories: (case 1) the wave
phase shift of the trajectory and the elevation angle will be propagation is along the direct path only, (case 2) the

Sangmoon Lee, Youngjin Park and Youn-sik Park: 5


Three-dimensional Sound Source Localization Using Inter-channel Time Difference Trajectory
Corresponding Microphone
The Azimuth Intervals
Fixed Mic. Rotating Mic.

− π / 2, − θR Case 1 Case 2

− θR , 0 Case 1 Case 3

0, + θR Case 2 Case 3

+ θR , π / 2 Case 2 Case 1

Table 2. The wave propagation characteristics from the source to the rotating
and fixed microphones

Figure 6. The ICTD trajectories for sources on the median plane are shown.
As the source elevation changes, the trajectory pattern shifts. For the source
on the top of the head, the distance between the rotating microphone and
the source is the shortest at θshift = 0 ° , which indicates that the ICTD is
maximal. The distance increases up to θshift = 180 ° and goes back to the
shortest distance within a single period.
Figure 7. The mean value of the ICTD trajectory as a function of the azimuth
angle only is shown. The one-to-one relationship between the mean value
consecutive propagation is along the direct and indirect of the ICTD trajectory and the azimuth angle is clearly defined. The vertical
dashed lines indicate the azimuth angles (i.e., ±θR . θR is cos−1( rH 2 − rR 2 / rH )
paths and (case 3) There is a transition between case 1 and
= 13.4934°) in the rotating microphone array with dimensions as in Table 1.
case 2, depending on the shift angle. In terms of these
categories, Table 2 shows the propagation characteristics
On the other hand, the specific shift angles, which corre‐
according to the azimuth angle of the source. For the
spond to the maximal or minimal values of the ICTD
sources with azimuth angles within − θR , 0 , the propa‐ trajectory, are useful for finding the elevation angle of the
gation to the rotating microphone corresponds to case 3. source. These specific shift angles are defined as below:
The transition from case 1 to case 2 occurs when the rotating
microphone passes θb and the subsequent transition from
case 2 to case 1 occurs at π + θb, where θb is defined in max
q shift = arg maxqshift Dd¥ (q shift )
equation (14). For the sources with azimuth angles within (15)
min
q shift = arg minqshift Dd¥ (q shift )
0, + θR , the transition from case 1 to case 2 occurs at θb
and the consecutive transition from case 2 to case 1 occurs
at π − θb : max min
which implies that θshift and θshift are equal to π / 2 − θS and
3π / 2 − θS , respectively. It is obvious that the elevation angle

q shift
ë ( )
q b = min arg ésin q shift £ tan jS cot q R ù
û (14) increases from the +Y axis in an anticlockwise direction,
while the shift angle of the rotating microphone increases
from the +ZR axis in a clockwise direction (see Figures 1 and
where θshift ∈ 0, 2π . With respect to the azimuth interval, 3). In summary, by finding the two parameters of the ICTD
the mean value of the ICTD trajectory is derived as a ¯ and θ max/min , the azimuth and
trajectory, i.e., ICTDT shift
function of the azimuth angle of the source only as shown
elevation angles of the source can be found independently.
in Figure 7. It is apparent that a one-to-one relationship
exists between the mean value of the ICTD trajectory and In addition, as shown in Figure 5, the amplitude of the ICTD
the azimuth angle of the source. Therefore, it is possible to trajectory changes as the azimuth angle is varied. Natural‐
estimate the azimuth angle of the source once the mean ly, we can expect the trajectory amplitude to be dependent
value of the ICTD trajectory is obtained. on the azimuth angle only. Its definition is given below:

6 Int J Adv Robot Syst, 2015, 12:171 | doi: 10.5772/61652


1 ¥ max 3. Localization Algorithm
ICTDTPP = D q (
c d shift
) min
- Dd¥ q shift ( ) (16)
The localization of a source can be achieved using the one-
to-one relationship between the parameters of an ICTD
trajectory and a source direction, as described in section 2.3.
We express the amplitude of the ICTD trajectory as its peak- However, it is not easy to apply this approach to a real
to-peak value using the specific shift angles in equation situation where a source and other noises are present
(15). Figure 8 visualizes its amplitude as a function of the simultaneously. In addition, the duration of a source varies
azimuth angle. It is notable that the ICTD trajectories of the and can be too short to calculate τ (θshift), even for a single
left-sided sources have larger ICTDTpp compared with source case. Therefore, to apply the practically feasible SSL
those of the right-sided sources, except the source at to a real environment, a new SSL method is necessary.
(φS , θS ) = ( − 90 ° , 0 ° ). The variation of the ICTD trajectory Section 3.1 presents the source direction estimator (SDE)
is affected due to the motion of the rotating microphone based on the ICTD trajectory and section 3.2 summarizes
only. When the sphere hides the entire trajectory of the the proposed 3-D SSL algorithm.
rotating microphone’s motion from the view of the source,
the wave propagation in case 2 occurs, and the variation of 3.1 Source Direction Estimator
the propagation distances becomes the largest (see Table
2). Also, when the source moves from the left to the right, As mentioned before, we used the conventional GCC-
the portion of the direct wave propagation increases and PHAT function [16] to obtain the ICTD trajectories.
the ICTDTpp decreases. Equation (17) shows the ICTDTpp Equation (18) redefines a GCC-PHAT function that is
according to azimuth intervals: dependent on the shift angle of the rotating microphone:

( )e
GxF xR f q shift
( )
¥
ICTDTPP = RxF xR t q shift = ò j 2p f t
df (18)
ì rH cos ( cosq R sin jS - sin q R cos jS )
-1
é p ùü

G xF xR (f q )
shift
ï jS Î ê - , -q R ú ï
ï -rH cos -1 ( cosq R sin jS + sin q R cos jS ) ë 2 ûï
ï ï
ï æ p ö ï (17) where GxF xR ( f | θshift) is calculated by using microphone
( )
= írH ç cos -1 sin (jS - q R ) - + sin (jS + q R ) ÷ jS Î éë -q R , +q R ùû ý
2
ï è ø ï
ï é p ùï signals that are collected while the rotating microphone is
ï 2rH sin q R cos jS jS Î ê +q R , + ú ï passing around θshift. Details about the measurement and
ï ë 2 ûï
î þ
the signal processing are presented in sections 4.1 and 4.2.
Thus, G xF xR ( f | θshift) is strongly dependent on the shift
angle of the rotating microphone. It should be noted that
the relative motion between a sensor and a source is so
small that the Doppler effect in the measured signals is
negligible [24]. Therefore, it is reasonable to assume that
RxF xR (τ | θshift) should have time-varying peak positions.
Based on the time- or (shift) angle-dependent feature, we
can define the source direction estimator (SDE) as below:

SDE (jS ,qS )

) ( dq
dt j ,q q shift )
(( )
2p S S
ò
0
RxF xR t jS ,qS q shift q shift dq shift
shift (19)
=
2p (
dt jS ,qS q shift )
ò0 dq shift

where τ (φS , θS | θshift) is one of the constructed ICTD


trajectory databases for a source at (φS , θS ). SDE at
Figure 8. The values of ICTDTpp are a function of the azimuth angle. In (φS , θS ) is in the form of a line integral of RxF xR (τ | θshift)
particular, the left-sided sources within − π / 2 + θR , − θR have the same
ICTDTpp, which corresponds to the time taken for a wave-front to travel the
along the line of τ (φS , θS | θshift). For example, if
length of 2rR θR , and 2rR θR / c is equal to 0.206 msec. 2rR θR is the greatest RxF xR (τ | θshift) is equal to 1 along the line of τ (φ a, θ a | θshift)
length made by the rotating motion on the surface within a full revolution.
As a source approaches the right, ICTDTpp decreases. Exceptionally, the only, then SDE is 1 at (φ, θ ) = (φ a, θ a) and 0 at other
ICTDTpps of the sources at (-90°, 0°) and (+90°, 0°) are zero, because these directions, ideally. Thus, if SDE is generated once, it is
sources are located on the X axis, which is perpendicular to the Y R Z R plane.
possible to estimate the source direction via peak detection.

Sangmoon Lee, Youngjin Park and Youn-sik Park: 7


Three-dimensional Sound Source Localization Using Inter-channel Time Difference Trajectory
3.2 Localization Algorithm for Rotating Microphone Array 4. Simulation

In this section, the proposed SSL algorithm is described. On In section 4, we evaluate the performance of the proposed
the basis of the weak Doppler effect (due to the small SSL algorithm using synthesized signals. To do this, signal
relative motion), the collected signals of the fixed and models of the fixed and rotating microphones were needed.
rotating microphones within (at least) a single period are These models are given in section 4.1 and the results of the
segmented intoNf frames, each including Nfft samples. In simulation for a single source are described in section 4.2.
addition, the angle allocated to each frame is the shift angle, The localization performance is evaluated with respect to
which is measured at the time the middle sample in the the localization error, which is defined as the angle between
frame is collected. Sections 4.1 and 4.2 give more informa‐ the true and perceived direction vectors. In this simulation,
tion about the segmentation process. In the real system, the the physical dimensions of the rotating microphone array
shift angle is measured directly by the encoder signal; see are given in Table 1.
Figures 17 and 18 for more details. Then, we can obtain
RxF xR (τ | θshift) and SDE (φS , θS ) for every possible direction 4.1 Signal Models of the Fixed and Rotating Microphones
using equations (18-19). The final decision is made by
As shown in Figure 3, the rotating microphone array is
detecting the peak in SDE; we assume that the number of
installed on a spherical head with a radius of rH . One of the
dominant sources is given by the recognition group prior
two microphones is fixed at ( − rH , 0, 0) on the surface of the
to the SSL process. If it is reported that a single source is
recognized, then the estimation of the source direction can spherical head (this microphone is hereafter called the
be done by equation (20): “fixed microphone” for convenience). Then, the output
signal of the fixed microphone in a continuous time
domain, denoted as xF (t), can be modelled as below:
(j$ ,q$ ) = arg max
S S jS ,qS SDE (jS ,qS ) (20)

(
xF ( t ) = hSxF ( t ) * s t jS ,qS
T
) (21)
^ and θ
where φ ^ are the estimated azimuth and elevation
S S
x
angles of a source, respectively. For multiple SSL, various where h S F (t ) is the spherical impulse response [25] from the
peak detection strategies are applicable when multiple source position to the fixed microphone position on the
peaks in the SDE are present. However, since our research spherical head, s(t | φs , θs ) is the source signal contents, and
focused on a single SSL, we used the simplest global peak
* indicates the convolution operator. As shown in equation
detection using equation (20). Figure 9 shows the proce‐ x
(21), h S F (t ) is not a function of θshift because this microphone
dure of the proposed SSL algorithm.
does not move. However, the other microphone (i.e., the
rotating microphone) is located on the rotating plate and
moves in a circular motion on theYRZR plane (see Figures
2 and 3). Then, the measured signal of the rotating micro‐
phone should be strongly dependent on the shift angle. The
signal model of the rotating microphone denoted as xR (t)
can be defined as:

(
xR ( t ) = hSxR t q Shift ) ( )
T
* s t jS ,qS (22)

x
where h S R (t | θshift) is the spherical impulse response from
the source position to the rotating microphone position. In
x
this case, h S R is a function of θshift due to its circular motion.
The synthesized signal refers to the discrete-time domain
signal. The generation of the synthesized signal of the fixed
microphone, denoted as xF n , is carried out by simply
discretizing xF (t), as shown below:

Figure 9. The proposed SSL algorithm based on SDE. Two measured time-
domain signals are divided into the given number of frames, N f , and each xF éënùû = xF ( nDtS ) (23)
frame has N fft samples. The shift angle corresponding to the middle sample
in each frame is allocated to each frame. The framed signals are then used where ΔtS is the sampling time and xF n is the nth sample
to calculate Rx x (τ | θshift). Next, by using the constructed ICTD trajectory
F R
of the synthesized signal of the fixed microphone. On the
database, the SDE can be obtained for every direction. Finally, by finding
the dominant peak(s) in SDE in descending order, the source direction(s) other hand, the motion of the rotating microphone makes
can be estimated. the generation of xR n more complicated. For example,

8 Int J Adv Robot Syst, 2015, 12:171 | doi: 10.5772/61652


when we assume that the rotating microphone is shifted 4.2 Simulation Results
+θN° in a clockwise direction from the +ZR axis and fixed
during the measurement, then the output signal xR (t) of the Various criteria to evaluate the SSL performance have been
rotating microphone is: suggested by previous researchers [4, 6, 11-12, 15, 20]. One
of the most commonly used criteria is based on the absolute
error between true and perceived directions and it can be
( ) ( ) ( )
T
xR t q shift = q N = hSxR t q shift = q N * s t jS ,qS (24)
applied to the evaluation of azimuth or elevation angle
estimations separately. However, for the evaluation of 3-D
By using this notation, M xR ( ⋅ ), which is the matrix of SSL performance, it would be more reasonable to incorpo‐
conditioned (continuous) output signals, can be modelled rate both azimuth and elevation together. If we express the
as equation (25) and is composed of Nf output signals with perceived (or estimated) azimuth and levation angles as φ ^
S
ΔθN degree resolution: ^ respectively, then the true and perceived direction
and θ S
→ →
vectors (tdv , pdv ) are defined with respect to the inter-aural
ê
(
é xR t q shift = 0 ) ù
ú
polar coordinate:

MxR ( ×) = ê
(
ê xR t q shift = Dq N ) ú
ú uuur
(25) tdv = ( sin jS , cosjS cosqS , cosjS sin qS )
ê M ú
uuuur (28)
ê
( (
ê xR t q shift = N f - 1 × Dq N
ë
) )
ú
ú
û
(
pdv = sin j$ S , cosj$ S cosq$ S , cosj$ S sin q$ S )
where xR (t|θshift) is equal to xR (t|θshift + 2π ) due to the
circular motion of the rotating microphone, which means a
cyclo-stationary process when a source content is station‐
ary [26]. We assume that the other dimensions do not vary.
In this simulation, we set the sampling frequency (fS) and
the number of frames (Nf) as 44.1 kHz and 360, respectively.
Thus, ΔθN becomes 1°, and M xR ⋅ , which is the matrix of
conditioned discretized signals, can be modelled in
equation (26):

ê
(
é xR nDtS q shift = 0 ) ù
ú

MxR éë N f , Dq N ùû = ê
(
ê xR nDtS q shift = Dq N ) ú
ú (26)
ê M ú
ê ú
( (
ê xR nDtS q shift = N f - 1 × Dq N
ë
) ) ú
û

From equation (26), the synthetized signal of xR n along Figure 10. M xR ⋅ of conditioned and discretized output signals of the
the shift angle axis can be represented as follows: rotating microphone with respect to the time and shift angle axes

(
xR éënùû = xR nDtS q shift = ( n - 1) × Dq N ) (27)

For instance, when the source is located in the direction of


(φs , θs ) = (0o, 0o) and its signal content is a Gaussian white
noise signal, the resulting values included in Mxs ⋅ are
presented in Figure 10. It is found that the amplitude of the
synthesized signal is increasing as the shift angle of the
rotating microphone gets close to 90° and is generally
decreasing as the shift angle becomes close to 270°. This is
a reasonable result: when the rotating microphone ap‐
proaches the source direction, the measured signal must be
less attenuated by the spherical head. In this simulation Figure 11. The synthesized output signals of the rotating and fixed
model, the angular velocity of the rotating plate is 600 rpm. microphones are xR n (top) and xF n (bottom), respectively
The synthesized output signal of the rotating microphone
is collected along the signal detection line with wR of 600 Using these definitions, the localization error is defined as
→ →
rpm. In this case, the synthesized microphone outputs are cos−1tdv , pdv . For example, when a Gaussian white noise
presented in Figure 11. source is at (0o, 0o) and its duration is longer than the

Sangmoon Lee, Youngjin Park and Youn-sik Park: 9


Three-dimensional Sound Source Localization Using Inter-channel Time Difference Trajectory
rotating period, GCC-PHAT functions corresponding to
the circular motion of the rotating microphone were shown
in Figure 12. Each GCC-PHAT function was calculated by
using the segments of the synthesized signals of the two
microphones. These segments have 1,024 (i.e., Nfft) samples
and there are 900 overlapping samples between adjacent
frames. The meaningful region in the time domain is from
- 1.4 ⋅ 10−3 seconds to + 1.4 ⋅ 10−3 seconds. It is found that the
peak location of the GCC-PHAT function moves up and
down in the time domain, as expected (see section 2.2). In
the situation where one fixed sound source is at (0o, 0o), the
peak position is the highest in the time domain when θshift
is equal to 90o and is the lowest when θshift is 270o. No other
significant features were found because additional noise
signals were not included in the synthesized signals.
Figure 13. SDE for the source at the front side of the rotating array is shown.
It was found that the dominant peak is around the true direction of the
source (0o, 0o). Additionally, the bell-shaped side edges originated from the
peak due to the regional overlap of adjacent ICTD trajectories. The shape of
the peak is stretched in the direction of the elevation angle axis due to the
short up-and-down motion of the rotating microphone, compared with the
width of the array.

domain with a denser time resolution, the side edges


should appear, because the signal bandwidth is limited.
Thus, it can be expected that the calculated GCC-PHAT
function is not equal to an ideal impulse. Besides, the effect
of the rotational motion on the synthesized signals remains,
although it is not remarkable. Therefore, the processing in
the discrete-time domain and the motion of the rotating
microphone cause the bell-shaped edges.
We examined the 3-D SSL performance of the proposed SSL
algorithm for a Gaussian white noise source with respect
to the localization error as mentioned above. The range of
the source direction is as follows: its azimuth angle spans
from -90° to +90° with 10° intervals and its elevation angle
Figure 12. The GCC-PHAT functions when the source is located directly at
the front side (0o, 0o). As we expected, the up-and-down pattern of the peak
varies from 0° to +330° (-30°) with 30° intervals. The number
location is clearly visible. In this noise-free simulation of a single source, of source directions is 228. It is assumed that the rotating
there are no distinguishing local peaks along the time axis. microphone array system was located in a free field. Figure
14 shows the localization error distribution for all of the
By using equation (19), SDE is obtained using the GCC- source directions. Generally, the performance gets better as
PHAT functions and the database of the approximated the source is close to the left, opposite to the rotating
ICTD trajectories. Figure 13 shows the calculated SDE. The microphone, due to the left and right asymmetry of the
dominant peak is quite visible and bell-shaped side edges azimuth-dependent ICTD trajectory amplitude (see Figure
originating from the peak are spread out primarily along 8). Also, it is reasonable that an elevation-dependent
the elevation angle axis. This result is due to several factors. feature was not visible. The distribution of the mean errors
If the time resolution is infinitesimally small, the bell- along the azimuth angle was shown in Figure 15.
shaped edges become invisible. However, the acquisition
or processing system has its limitations, such as finite fS. As
4.3 Computational load comparison
a result, adjacent ICTD trajectories may overlap each other.
More specifically, the locations of the peaks of the GCC- To be an efficient 3-D SSL method, the signal processing
PHAT functions are matched with more than one ICTD costs must be light. In this section, the computational load
trajectory partially in the time domain. Thus, the side edges of the proposed localization method is compared with
become visible. Also, we can expect that as the time interval those of the delay-and-sum beamformer [27] and the
increases, the overlapped region will expand and the SDE steered response power (SRP) – PHAT method [28]. For
values corresponding to the side edges will increase. example, SRP-PHAT requires the frequency-domain
Secondly, even if SDE is calculated in the discrete-time processing to do the phase transform (PHAT). Here, if the

10 Int J Adv Robot Syst, 2015, 12:171 | doi: 10.5772/61652


is
୦୧୤୲
. No
ional
gnals.

GCC-
mated
SDE. Thus, the total SRP-PHAT processing cost is M(M+1)/2 X
5Nfftlog2Nfft + M(M-1)/2 X (NφNθ + 7Nfft). In the same way, the
side
cost of the proposed localization algorithm is (3M-1)/2 X
arily 5Nfftlog2Nfft+ (M-1) X (NφNθ+ 7/2Nfft) and the cost of the
e to delay-and-sum beamformer is MN X (NφNθ) where N is the
mally
m frame length, 100. Therefore, the approximated costs are
ever, computed: (1) delay-and-sum beamformer, MN (NφNθ), (2)
SRP-PHAT method, M2(NφNθ+12Nfft), (3) the proposed
ions,
ions method, M (NφNθ+9/2Nfft). It is noteworthy that if the Nfft is
ories smaller than NφNθ, the signal processing cost of the
tions proposed localization method is N or M times less than the
ched delay and sum beamformer and SRP-PHAT methods. It is
Figure 14.Localization
Localization errors for 228 directions are depicted. As
reasonable that the proposed localization cue can be
time we can see, the elevation-dependent dependent feature was not found.computed by M microphone pairs when using the (M+1)-
, we However, it was quite visible that the SSL performance ischannel microphone array. However, when using the SRP-
the strongly dependent on the azimuth angle of a source only. PHAT method, the M(M-1)/2 microphone pairs are utilized
alues for a single localization process. On the other hand, it is
elevation angle varies from 0o to +330o (-30o) with 30oknown that the SRP-PHAT method can be applied to the
ndly, Figure 14. Localization errors for 228 directions are depicted. As we can see,
intervals. The number of source directions is 228. It issituation where the signal to noise ratio (SNR) is less than
with the elevation-dependent feature was not found. However, it was quite
assumed
visible thatthat the
the SSL rotating
performance microphone
is strongly dependent onarray angle waszero. However, in the proposed localization method, the
system
the azimuth
pear,
located in a free field.Figure 14 shows the localizationTDE error will have an effect directly on the localization
of a source only.
n be performance because the proposed cue is based on the
error distribution for all of the source directions.
s not measured ICTD trajectory. Thus, it can be reasonably
number ofthe
Generally, microphones
performance is denotedgets asbetter
M, theas computation
the source isexpected that the proposed localization performance will
the
of all
close tothe
thepossible GCC-PHAT
left, opposite functions
to the rotatingrequires M(M-1)/2 duebe significantly more degraded than that of the SRP-PHAT
microphone,
microphone
ain
ains, phase transforms. For a discrete Fourier transform size of method when SNR<0.
to the left and right asymmetry of the azimuth azimuth-dependent
ng in Nfft, a single FFT takes 5Nfftlog2Nfft operations.
ICTD trajectory amplitude (see Figure igure 8). Also, it is
ating 5. Experiment
1. DFT ofthat
reasonable the all
anthe microphones: M Χ (5Nfeature
elevation-dependent
dependent fftlog2Nfft) was not

visible. The distribution of the he mean errors along theWe developed a rotating microphone array according to
2. Spectral processing: 7NfftM(M-1)/2
osed azimuth angle was shown in Figure igure 115. the proposed design (see Figure 2 and Table 1). It should
3. Inverse DFT: M(M-1)/2 Χ (5Nfftlog2Nfft)
be noted that the two microphone signals needed to be
with
4.3 Computational load comparison transmitted wirelessly for safety reasons. Thus, both a
The
T 4. SRP-PHAT calculation for possible directions (NφNθ):
microphone and a transmitter needed to be placed inside
muth M(M-1)/2
To be an Χ N3-D
efficient φNDθ SSL method, the signal processingthe rotating block. An ultrasonic motor was chosen to make
s costs must be light. In this section, the computationalthis block rotate inside the head. Details about the structure
load of the proposed localization method is comparedof the proposed array and the measurement process are
provided in section 5.1. Section 5.2 shows the results of the
Mean error distribution for sources
two experiments for the feasibility test: one involving a
on a sagittal plane
8 Gaussian white noise source and the other involving a voice
source.
7

6
Mean absolute error (degree)

ating
0
ound -90 -60 -30 0 30 60 90
azimuth angle of a source (degree)
bell
bell-
ional Figure 15.The The mean error along the azimuth angle of a source.Figure 16. A top view of the hemisphere showing the interior arrangement
Figure 15. The mean error along the azimuth angle of a source. The of the rotating and fixed blocks, two ultrasonic motors, one encoder, and
eak is Thelocalization
localization
errors errors of thesources
of the left-sided left-sided
sided sources
are almost areexcept
the same, almost
for theone motor driver. The rotating block on the right side contains the electronic
o the the leftmost
same, exceptsource.
for The
theright-sided
leftmostsources tend toThe
source. be estimated
right with worse
right-sided sourcesboards for transmitting the microphone signal. The shift angle of the rotating
resolution compared with the left-sided sources.
hone
hone, tend to be estimated with worse resolution compared with themicrophone is measured by using the encoder.
left-sided sources.
Sangmoon Lee, Youngjin Park and Youn-sik Park: 11
Three-dimensional Sound Source Localization Using Inter-channel Time Difference Trajectory
2 Nfft)
Figure 16. A top view of the hemisphere showing the interior
arrangement of the rotating and fixed blocks, two ultrasonic
s (NφNθ): motors, one encoder, and one motor driver. The rotating block
on the right side contains the electronic boards for transmitting
the microphone signal. The shift angle of the rotating
M+1)/2 X microphone is measured by using the encoder
ame way, 5.1
5.1 Experimental
Experimental Set-up
Set-up
(3M-1)/2
For
For our
our proposed
proposed rotating
rotating microphone
microphone array,array, wewe chose
chose aa
he cost of
wireless
wireless system
system (Q240,
(Q240, RFQ)
RFQ) consisting
consisting of of aa dual-channel
dual-channel
where N
receiver,
receiver, two
two transmitters,
transmitters, and and two
two microphones
microphones (QB686,
(QB686,
oximated
RFQ).
RFQ). InInorder to put
order to a transmitter unit and aunit
put a transmitter microphone
and a
mer, MN
together
microphone in a rotating
togetherblock,
in a the electronic
rotating boards
block, the inside the
electronic
t), (3) the
transmitter
boards inside unitthehad to be rearranged
transmitter unit hadand installed
to be in a
rearranged
orthy that
cylindrical
and installed plastic
in ablock. Figureplastic
cylindrical 16 showsblock.theFigure
interior16
ssing cost
arrangement of the necessary
shows the interior arrangement blocksof and
the other units blocks
necessary inside
imes less
the
andspherical
other units head. There
inside theare two cylindrical
spherical blocks,
head. There are two
two
RP-PHAT
ultrasonic
cylindricalmotors,
blocks,one twoencoder, and motors,
ultrasonic one motor onedriver. The
encoder,
calization
cylindrical
and one motor blockdriver.
on theThe right side is called
cylindrical blocktheon “rotating
the right
hen using
block”
side is and
called thistheblock consists
“rotating of theand
block” rearranged
this block electronic
consists
er, when
boards used to transmit
of the rearranged the microphone
electronic boards usedsignal (#. 1) and
to transmit the
crophone
the pin-type microphone
microphone signal (#. 1)located
and the 3 cm from the
pin-type centre of
microphone
s. On the
the cap. 3This
located cm block
from is theconnected
centre oftothe thecap.
ultrasonic
This blockmotor is
ethod can (USR-E3T/24V, SHINSEI), which is driven by the motor
oise ratio connected to the ultrasonic motor (USR-E3T/24V,
driver (D6060E,
SHINSEI), whichSHINSEI).
is driven by Additionally, the encoder
the motor driver (D6060E, is
proposed attached to the motor. Thus, the shift angle of the micro‐
an effect SHINSEI). Additionally, the encoder is attached to the
phone
motor. isThus,
measured the using the encoder
shift angle of thesignal. The other
microphone is
ause the block on the left side is hereafter called the “fixed block”
rajectory. measured using the encoder signal. The other block on
for convenience. The pin-type microphone (#. 2) is attached Figure 18. The spherical head equipped with the rotating microphone array
proposed the left side is hereafter called the “fixed block” for is set up in the measurement room
at the centre of the cap. The transmitter unit is outside the
ly more convenience. The pin-type microphone (#. 2) is attached
block. The left and right side views are also presented in
at the centre of the cap. The transmitter unit is outside the
od when Figure 17. The physical dimensions are the same as those ment, the SSL performance for a source in the median plane
block. The left and right side views are also presented in
in Table 1, except the rotating radius rR , which is 3 cm in was evaluated. Only the elevation angle of a source was
Figure 17. The physical dimensions are the same as those
the real 1,array. Therefore, the radius
ICTD ‫ݎ‬trajectory database varied from -30° to 210° with 10° intervals. The source
in Table except the rotating ோ , which is 3 cm in
needed to be reconstructed. content was Gaussian white noise signal with frequency
the real array. Therefore, the ICTD trajectory database
ording to contents from 1.5 kHz to 20 kHz generated by the random
needed to be reconstructed.
It should noise generator (SF-06, RION) and was produced longer
ded to be than the one rotating period. The angular frequency was
s, both a set to 54 rpm. For example, when the source is at (0o, 0o),
ed inside the measured microphone signals and the z-phase encoder
chosen to signal are depicted in Figure 19. The total measurement
about the time was 3 seconds and the signal duration was set to 2
surement seconds. By using the encoder signal in the z-phase, we
hows the collected the samples within a single rotating period and
test: one (a) (b) allocated Nfft samples to each frame. For 25 directions in the
the other Figure 17. The right-side view of the spherical head is shown in
Figure 17. The right-side view of the spherical head is shown in the left and median plane, the mean localization error was 1.75° and the
the left and the left-side view is presented in the right.
the left-side view is presented in the right standard deviation was 1.65°. Therefore, the experimental
result showed that our proposed SSL algorithm is applica‐
5.2 Experimental Results ble to the SSL of a Gaussian noise source.

The experiments for the feasibility test were carried out in


the room environment: the room size was 3.2 x 5.5 x 2.8
m3 (width x length x height) and the reverberation time was
0.26 seconds (t60). The input signal was produced through
a full range speaker (TC9FSD13, VIFA) on the speaker jig.
Figure 18 shows the rotating array system placed in the
room. Two experiments were conducted in order to check
the feasibility: one involving a Gaussian white noise signal
and the other using a male voice as a source signal.

5.2.1 Gaussian white noise source


Figure 19. The output signals of two microphones and the encoder signal (z-
First, the experiment using a Gaussian noise source as an o
phase) when the source is at (0 , 0o)
input signal to the speaker was conducted. In this experi‐

12 Int J Adv Robot Syst, 2015, 12:171 | doi: 10.5772/61652


5.2.2 Male voice source Figure 22 shows the SDE for all possible directions with 2°
resolution on both the azimuth and elevation directions.
The previous experiment employed a Gaussian white noise
Consequently, the dominant peak in the SDE was found.
signal as a source. In this experiment, a male voice was used
As we examined earlier, bell-shaped side edges originate
as the sound source, without using a speaker jig. The male’s
from the peak. Negative values were found at some
position was fixed during the measurement so that his
regions. This result seems reasonable because a GCC
mouth was at (45°, 0°) while speaking. The angular
function can have a negative value, which indicates that
frequency of the rotating block was reduced to 21 rpm in
considerable contents in the measured signals are out-of-
order to involve the silent region. The output signals of the
phase with each other. The final step to find the location of
two microphones and the encoder signal are depicted in
the (positive) peak in the SDE was carried out to estimate
Figure 20. It is known that voice signals are not stationary
the direction of a source as equation (20).
with time. Also, the spectral modification is strongly
dependent on the relative position of the sensor and the
source. If the microphones are not attached on an object
such as a sphere, but located in the free field, the spectral
contents in the measured microphone signals will be the
same. Figure 21 shows the GCC-PHAT functions along the
shift angle of the rotating microphone. In the region where
sufficient signal contents were collected, the GCC functions
were obtained quite reasonably because the peak location
seemed to change in a sinusoidal form. The empty black-
coloured circles show the estimated ICTDs.

o
Figure 22. The source direction estimator when a source was at (45 , 0o).
o o
The estimated source direction was (39 , − 1 ) even though the silent
region was included.

6. Discussion
Figure 20. The output signals of two microphones and the encoder signal (z-
o The concept of the proposed localization cue, which is a
phase) in the time domain when the voice source is at (45 , 0o). As shown,
the voice signal is non-stationary. (source) direction- and (microphone) position-dependent
ICTD trajectory, can be applied to the circular microphone
array as well. In general, if a microphone array is composed
of (M+1) sensors, all the information from every possible
microphone pair is under consideration, in order to
practically improve the SSL resolution. If the M-channel
circular microphone array is located on the right side of the
sphere and the one additional microphone is fixed on the
other side, the (microphone which is the element of the M-
channel circular array) position-dependent ICTD trajectory
can be reproduced exactly the same as the proposed ICTD
trajectory. Thus, the proposed localization cue-based 3-D
SSL can be also applicable to the circular microphone array.
However, the more microphones that are used for SSL, the
more costly it is to produce the microphone array, espe‐
cially due to the price of the Analog-to-Digital converters
(ADC), which is proportional to the number of channels.
Figure 21. The GCC-PHAT functions along the shift angle of the rotating
However, sequential sampling and signal processing could
microphone. In the region with sufficient signal contents, the functions were
obtained easily. This can be interpreted to mean that the peak location of be an alternative to reduce the production cost.
each function is shifting up and down as the rotating microphone moves in
a circular motion. The more smoothed peaks result from the comparatively On the other hand, the source position was supposed to be
narrow frequency band of the measured voice signals. outside the rotating microphone array. However, noises

Sangmoon Lee, Youngjin Park and Youn-sik Park: 13


Three-dimensional Sound Source Localization Using Inter-channel Time Difference Trajectory
emitted by the (ultrasonic) motor and its driver inside the cue concept, the two-channel rotating microphone array
sphere could be interior noisy sources. Thus, we needed to was discussed. The characteristics of the ICTD trajectory
suppress the propagation of these noises into the micro‐ induced by the circular motion of the rotating array were
phone by combining the microphone and the electronic presented by the Ray-Tracing method: the mean value of
boards in a cylindrical block, as shown in Figures 16 and the ICTD trajectory is dependent on the azimuth angle of a
17. In addition, the directivity of the pin-type microphone source only and the shift angle corresponding to the
(QB686, RFQ) utilized in the research was compared with maximum (or minimum) ICTD is directly related to the
that of the omnidirectional ¼ inch microphone (4178, B&K). elevation angle of a source. Also, the amplitude of the ICTD
It is generally known that the remote microphone is used trajectory is asymmetric with respect to the front side,
for public speaking, i.e., the primary source is a speaker’s which is caused by the circular motion of the rotating
voice. Thus, this type of microphone needs to have direc‐ microphone on the right side of the sphere. The simulation
tionality. For comparison, two directivity patterns were results demonstrated that the amplitude of the ICTD
measured and shown in Figure 23. The omni-directionality trajectory is the essential factor for the SSL performance.
of the B&K microphone is clearly visible and the directivity The results of the two experiments carried out in the room
pattern of the pin-type microphone is asymmetric with environment demonstrated that the 3-D SSL method using
respect to the 90° direction. If we consider that the micro‐ the ICTD trajectory of the two-channel rotating micro‐
phones are facing outward through the block cap and that phone array can effectively localize a Gaussian white noise
the directivity pattern of the microphone is asymmetric, the source and a voice source in 3-D space. It is noteworthy that
interior noises are not a serious problem. the estimator was in the form of the line-integral of GCC-
PHAT functions similar to the steered beam power (SRP)-
PHAT method [27, 28].

8. Acknowledgements

This work was supported by the second stage of the Brain


Korea 21 Project, the Intelligent Robotics Development
Program, one of the Frontier R&D Programs funded by the
Ministry of Knowledge Economy (MKE) in 2012, and the
National Research Foundation of Korea (NRF) grant
funded by the Korean government (MSIP) (No.
2010-0028680).

Figure 23. Directivity patterns of the two microphones, i.e., 1/4 inch 9. References
microphone (B&K) and pin-type microphone (RFQ). The asymmetry in the
directivity of the pin-type microphone is clearly visible. [1] Kerstin D (2007) Socially intelligent robots: dimen‐
sions of human–robot interaction. Phil. Trans. R. Soc.
As mentioned before, we assumed that a sound source is B. 362:679-704.
fixed. In daily life, a source moves slowly compared with [2] Fong T, Illah N, Kerstin D (2003) A survey of socially
the rotation period of the array. However, in a situation interactive robots. Robot Auton. Syst. 42:143-166.
where there is a fast-moving source, the patterns of the peak [3] Anderson M.L (2003) Embodied cognition: A field
and the side edges in the SDE would be quite different guide. Artif. Intel. 149:91-130.
compared with those in Figures 12 and 21. Usually, the
[4] Valin J. M, Michaud F, Rouat J, Létourneau D (2003)
movement of the source occurs along the azimuth angle
Robust sound source localization using a micro‐
axis. Therefore, the peak shape in the SDE would be
phone array on a mobile robot. In: Proceedings of
stretched along the time axis according to the direction of
the IEEE/RSJ International Conference on Intelli‐
the source movement and the magnitude of the peaks
gent Robots and Systems 2:1228-1233.
would be suppressed. In this case, without the information
about the initial direction of the fast-moving source, its [5] Mumolo E, Massimiliano N, Gianni V (2003)
direction cannot be estimated using a single measurement Algorithms for acoustic localization based on
because the peak shape in the SDE is not a time-dependent microphone array in service robotics. Robot Auton.
feature. Even though it is possible to track a fast-moving Syst. 42:69-88.
source when increasing the angular velocity of the rotating [6] Wang H, Peter C (1997) Voice source localization for
part, a safety issue can arise. automatic camera pointing system in videoconfer‐
encing. In: Proceedings of the IEEE International
7. Conclusion Conference on Acoustics, Speech, and Signal
Processing 1:187-190.
This paper proposed an ICTD trajectory as the new 3-D SSL [7] Valenzise G, Gerosa L, Tagliasacchi M, Antonacci F,
cue and, as one of the possible ways to realize the proposed Sarti A (2007) Scream and gunshot detection and

14 Int J Adv Robot Syst, 2015, 12:171 | doi: 10.5772/61652


localization for audio-surveillance systems. In: [18] Gustafsson T, Rao B. D, Trivedi M (2003) Source
Proceedings of the IEEE Conference on Advanced localization in reverberant environments: Modeling
Video and Signal Based Surveillance pp. 21-26. and statistical analysis. IEEE Speech Audio Proc.
[8] Chen J, Jacob B, Yiteng A.H (2006) Time delay 11:791-803.
estimation in room acoustic environments: an [19] Brandstein M. S, Harvey F. S (1997) A practical
overview. EURASIP J. Adv. Sig. Pr. 2006:1-19. methodology for speech source localization with
[9] Chen J.C, Kung Y, Ralph E.H (2002) Source locali‐ microphone arrays. Computer Speech and Language
zation and beamforming. IEEE Signal Proc. Mag. 11:91-126.
19:30-39. [20] Kwon B, Park Y, Park Y (2009) Multiple sound
[10] Georgiou P.G, Chris K, Panagiotis T (1997) Robust source localization using the spatially mapped GCC
time delay estimation for sound source localization functions. In: Proceedings of ICROS-SICE pp.
in noisy environments. In: Proceedings of the IEEE 1773-1776.
ASSP Workshop on Applications of Signal Process‐ [21] Lee. S, Park Y (2014) Estimation of multiple sound
ing to Audio and Acoustics. source directions using artificial robot ears. Appl.
[11] Hwang S, Park Y, Park Y (2011) Sound direction Acoust. 77 :49-58.
estimation using an artificial ear for robots. Robot [22] Woodworth R. S, Schlossberg S (1954) Experimental
Auton. Syst. 59: 208-217. Psychology. New York: Holt.
[12] Kim U, Kazuhiro N, Hiroshi G. O (2013) Improved
[23] Blauert J (1997) Spatial hearing: the psychophysics
Sound Source Localization and Front-Back Disam‐
of human sound localization. Cambridge: MIT
biguation for Humanoid Robots with Two Ears. In:
press.
Moonis A, Tibor B, Koen V.H, Mark H, Catholijn
[24] Knapp C.H, Carter G.C (1977) Estimation of time
M.J, Jan T, editors. Recent Trends in Appl. Artif.
delay in the presence of source or receiver motion.
Intell. Springer Berlin Heidelberg pp. 282-291.
J. Acoust. Soc. Am. 61:1545-1549.
[13] Cheng C. I, Gregory H. W (1999) Introduction to
head-related transfer functions (HRTFs): Represen‐ [25] Duda R.O, William L.M (1998) Range dependence
tations of HRTFs in time, frequency, and space. J. of the response of a spherical head model. J. Acoust.
Audio. Eng. Soc. 49:231-249. Soc. Am. 5:3048-3058.
[14] Shinn-Cunningham B. G, Scott S, Norbert K (2000) [26] Giannakis G.B (1998) Cyclostationary signal
Tori of confusion: Binaural localization cues for analysis. In: Madisetti V.K, Williams D.B editors.
sources within reach of a listener. J. Acoust. Soc. Digital Signal Processing Handbook. Boca Raton,
Am. 107:1627-1636. FL:CRC.
[15] Huang J, Ohnishi N, Sugie N (1998) Spatial Locali‐ [27] Cai W., Wang S., Wu Z (2010) Accelerated steered
zation of Sound Sources: Azimuth and Elevation response power method for sound source localiza‐
Estimation. In: Proceedings of IEEE Instrumenta‐ tion using orthogonal linear array. Appl. Acoust.
tion and Measurement Technology Conference pp. 71:134-139.
330-333. [28] DiBiase J. H., Silverman H. F., Brandstein M. S.
[16] Knapp C, Glifford C (1976) The generalized corre‐ (2001) Robust localization in reverberant rooms in
lation method for estimation of time delay. IEEE Microphone Arrays. Springer Berlin Heidelberg.
Acoust. Speech Signal Proc. 24:320-327.
[17] Azaria M, David H (1984) Time delay estimation by
generalized cross correlation methods. IEEE Acoust.
Speech Signal Proc. 32:280-285.

Sangmoon Lee, Youngjin Park and Youn-sik Park: 15


Three-dimensional Sound Source Localization Using Inter-channel Time Difference Trajectory

You might also like