Direction of Arrival Estimation Using Microphone Array
Direction of Arrival Estimation Using Microphone Array
Array
Junu Jahana C Dr. Sinith MS Dr. Lalu PP
M.Tech scholar Asst. Professor Asst. Professor
Dept. of ECE Dept. of ECE Dept. of ME
GEC Thrissur GEC Thrissur GEC Thrissur
2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS) | 978-1-6654-4885-7/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICMSS53060.2021.9673617
Abstract—Human beings can tell the direction of sound by would be hard to get with single microphones[1][2]. The
utilising both their ears. They may instinctively determine the di- time delay of arrival between microphone array channels,
rection of sound by combining the somewhat varied impulses that which may be estimated using generalised cross correlation
arrive at our ears. Similarly, an array of microphones connected
to a computer may be used to create a sound localisation system. or least squares, is a good technique to DOA estimation[3].
The basic idea behind utilising microphone arrays to estimate The DOA may be calculated simply from the array structure
Direction Of Arrival (DOA) is to leverage phase information in and the known TDOA[4]. Another method, like in the MUSIC
signals picked up by spatially separated sensors (microphones). algorithm, is to use the signal subspace[5]. These approaches
The acoustic signals arrive to the microphones with temporal are highly successful when work conditions are limited. They
delays when they are spatially distant. These time-delays are
determined by the signal’s DOA for a known array geometry. The do not, however, function well in extremely reverberant and
audio signal is recorded using a miniDSP UMA-16 microphone loud settings, especially when signal sources are placed very
array with plug and play USB audio connection. For linear close. Researchers have recently used contemporary machine
arrays, the angle between the array’s orientation and the sound learning approaches to speech DOA estimation in order to
source is calculated here. Given that the sound signal arrives at improve performance in noisy, realistic settings that may be
each microphone at various times, corresponding to different
propagation paths, it’s reasonable to infer that the recorded classified into classification networks[6].
signals in each microphone have a Time Difference Of Arrival The Cross-Correlation approach, which does cross-
(TDOA), which is an important factor in microphone array correlation between two microphones to calculate the Time
processing. With the aid of the UMA16 microphone array, more Difference Of Arrival (TDOA) between the microphones,
visible and accurate sound source localization was achievable is the simplest fundamental technique utilised in the time
at lower sound power intensities, which was considered to be a
significant innovation in the field of sound source localization. In domain[4]. For the estimate of DOA, Time Delay Estimation
addition, the DOA was calculated using an SVM classifier, which (TDE)-based techniques have proven the most prevalent. Be-
can categorise audio signals in coarse as left, right or front, cause of their simplicity and minimal computing needs, they
and the performance metrics including accuracy, specificity and are popular[7]. Signals are processed in array systems with
sensitivity are analysed. regard to the spatial geometry of the microphones and sources.
Index Terms—Direction Of Arrival, Sound Source localization,
Time Delay Estimate, SVM As a result, in addition to the standard time and frequency
characterizations of audio sources and receivers, locations and
spatial pathways must be understood and accounted in the
I. I NTRODUCTION
processing[4].
On a daily basis, humans are exposed to a variety of sound The other method is based on support vector machines
sources coming from diverse angles. Since human beings have (SVMs). The SVMs’ main characteristics are their mathemat-
two ears, we can determine the position of the sound source. ically rigorous formulation and high resilience[8]. It is vital
We can also tell the difference between intermixed speech to choose a method that allows quick calculation time when
sources without much effort. Despite the fact that our ears selecting an appropriate approach for finding a sound source
catch up every sound source, our brain manages to adjust in real-time.
quickly when we’re having a conversation with someone, The source location will be determined using an SVM-
while ignoring the rest to some extent. based approach and an enhanced TDE-based DOA estimate
This project focuses on a simple approach to detect the methodology. With the use of a Uniform Linear Array (ULA)
direction of a source with the use of a microphone array. Two of microphones, this research focuses on a straightforward
of these approaches are given and evaluated in this work to approach to detect the direction of a source.
ensure their dependability. The system modelling is done using
the computer tools Matlab and Anaconda spyder (Python). II. DOA ESTIMATION OF AUDIO: AN OVERVIEW
Microphone arrays would provide spatial information for The objective of Sound Source Localization (SSL) is to
incoming acoustic waves by gathering important data that estimate the location of sound sources automatically. The
978-1-6654-4885-7/21/$31.00 ©2021 IEEE.
Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
computation is very simple when there is just one sound
source, but it gets exponentially more complicated when there
are numerous sound sources. a) Uniform Linear Array(ULA)
The primary goal of this research is to use the signals
recorded by microphone arrays to automatically determine
the location of an audio source identified in a particular
environment.
A. Microphone Array
A microphone array is a group of microphones that work
together. The use of a microphone array for recording speech
signals has a number of advantages over using a single
microphone. For example, it captures spatial characteristics b) Circular Microphone Array(CMA)
of the speech signal, increases SNR, and may be used to
guide response in multiple directions, among other things[9].
Microphone arrays are utilised in a variety of cutting-edge
acoustic signal processing techniques, including beamforming,
automatic speech recognition, and speech signal separation.
The array’s response is also influenced by the shape of the
microphone array. Microphone arrays of various geometrical
forms are utilised in many applications[10]. Linear, circular,
triangular, and spherical arrays are the most frequent geometri-
cal configurations of microphones. The form of a microphone
c) Triangular Microphone Array(TMA)
array is determined by the geometric pattern employed, and
as a result, several microphone arrays have been developed,
such as the linear microphone array, circular microphone array, Fig. 1. Microphone array with a variety of geometries. (a) A ULA (b) CMA
triangular microphone array, spherical array, and so on, as (c) TMA
shown in the fig.1.Here (a) shows A ULA in which all of
the components are equally spaced and collinear,(b) shows D. Audio recording using microphone array
CMA, in which components are arranged around a circle’s The linear array was used to make the measurements. Each
perimeter and (c) shows TMA components or microphones recording is 20-25 seconds long and 44.1 kHz sampled.The
are positioned at the triangle’s vertices. microphone array was mounted on a tripod, making it very
easy to position it towards the reference sound source.
B. miniDSP UMA-16 microphone array Measurement Setup : The measurement equipment con-
The UMA-16 is a sixteen-channel microphone array with sists of:
USB audio connection that plugs in and plays. • One microphone array miniDSP UMA 16, of which 4
The Acoustic Camera is made up of 16 microphones and a mic elements arranged along a straight line with uniform
camera that is pointed in the direction of the scanning region. distances (-0.012m, 0.004m, 0.012m, 0.004m) are con-
16 × SPH1668LM4H MEMS Knowles are put out in a Uni- sidered.
form Rectangular Array on the microphone array PCB (URA). • One laptop PC, running on MS Windows 10, MATLAB
An optional USB camera, for applications like acoustic cam- R2017a and Python 3.7.3
era, can be inserted into the centre hole. 16 × SPH1668LM4H • One reference sound source.
MEMS Knowles are put out in a Uniform Rectangular Array • Audacity sound recording software used to record the
on the microphone array PCB (URA). sound signal using microphone array (Audacity is a
free and open-source digital audio editor and recording
C. Direction Of Arrival(DOA) application software)
The audio signals were recorded and analysed using Audac-
The direction of arrival (DOA) in signal processing refers ity, a free and open-source digital audio editing and recording
to the direction from which a propagating wave arrives at a program software. We can create our recording interface inside
certain point of incidence. The propagating sound is deemed Audacity, and it may be used with any microphone array with
planar if it is produced by a source that is far away (far-field). ease.
These, along with the array, will create an angle of incidence
that can only be specified by using θ as the angle of incidence. III. SVM IN DOA ESTIMATION
We may try to compute the DOA if we know the TDOA for Experimental data are used to evaluate an effective DOA
each microphone as well as its spatial coordinates. estimation technique based on support vector regression. For
Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
additional processing and classification using SVM, the Ana- A. Microphone Array Signal Processing
conda spyder environment is utilised with Python. Scipy,
Figure 2 shows a 4-element linear array of microphones
tkinter, tensorflow, matplotlib, and other python libraries are
and a sound source in the far field of the array. The array
utilised in the overall system.
Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
using Audacity, a free and open-source digital audio editing
and recording program software.
Input Signal
The four-channel data was merged into a single wave file,
which was then used in Matlab code to compute DOA. Frame-
1000
by-frame audio reading into the workspace dsp AudioFil-
eReader is a Matlab function. The audio file reader keeps track
500
of the sample rate of the audio file.
Amplitude
In fact, if a multichannel audio input interface is available,
change the script to set sourceChoice to ’live’ so we may use 0
live audio input signals as well. The code utilises audioDe-
viceReader to collect four live audio channels using a mic −500
array when sourceChoice = ’live’. Since we’re using recorded
audio signals, set sourceChoice to’recorded’ in our case. −1000
After selecting the source choice (live or recorded; here
we’re using recorded), set the duration of live processing and 0 200000 400000 600000 800000 1000000
Samples
also set how many samples per channel to acquire and process
each iteration. Then we have to define the array geometry. Fig. 3. Input test audio signal
Actually, the technique consists of computing the time
delay estimates (TDE) between all pairs of microphones, and
then integrating them with the array geometry knowledge to
produce the DOA estimate. TDE-based techniques are the DWT Signal 2
most efficient in terms of computing needs since they do not
require a thorough search over all potential angles. 10
A plotting object DOA Display is built as an aid in the
application. As seen in the result, this displays the estimated
5
DOA live with an arrow on a polar plot. Subsequently, reorder
Amplitude
in Fig.3. 0
Using the feature extraction approach, the mean value of the
sampled signal, filtered signal, and DWT feature are retrieved −50
from each frame of the audio. The wavelet coefficients are
used to provide a concise representation of the signal’s energy −100
distribution in time and frequency.The approximation and
detailed outputs of DWT is depicted in Fig. 4 and Fig. 5 −150
0 20 40 60 80 100 120 140
respectively.
After fitting the model and predicting, the audio signal is
categorised as front-sided, left microphone, or right micro- Fig. 5. Detailed output of DWT
phone, and the output is presented in the spyder console. Along
Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Performance parameter obtained from spyder con-
sole
90 1
60
Performance 0.8
Obtained Value(%) 0.6
parameter
30
Accuracy 96.0784313 0.4
Specificity 98.039215 0.2
Sensitivity 94.11764
0
-0.02
-0.04
This demonstrates that our microphone array system has
0 10 20 30 the capability of accurately estimating the location of sound
Time source.
Fig. 6. Amplitude vs. Time relationship for a multiple test sound recorded
by 4 channel linear microphone array
90 1
60
In Fig 6, data1 to data4 represents the audio signal recorded 0.8
by Mic1 to Mic4 of the mic array respectively. Each data was 0.6
30
represented by different colours here. 0.4
The algorithm operates with separate pairs of microphones. 0.2
The different DOA estimations are then combined to provide 0
a single live DOA output.
After combining DOA estimation across pairs by selecting
the median value, DOA pointer directed towards the median 330
value of 44.529 degree towards right to the origin as shown
in Fig.7. 300
Consider another recorded audio signals. The DOA of signal 270
with respect to mic array is given in Fig.8 and Fig.9.
We can analyse the angular positional error by comparing
the Measured angle(calculated from opposite side, adjacent Fig. 8. DOA pointer showing audio signal coming from 1.214 degree towards
side and hypotenuse) with the calculated angle as shown in right to the origin.
following table:
Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
GEC thrissur for providing the required facilities throughout
the project work.
90 1
60 R EFERENCES
0.8
[1] McCowan, I., 2001. Microphone arrays: A tutorial. Queensland Univer-
0.6 sity, Australia, pp.1-38.
[2] Alexandridis, A., Griffin, A. and Mouchtaris, A., 2013. Capturing and
30 reproducing spatial audio based on a circular microphone array. Journal
0.4
of Electrical and Computer Engineering, 2013.
0.2 [3] Fan, J., Luo, Q. and Ma, D., 2010. Localization estimation of sound
source by microphones array. Procedia Engineering, 7, pp.312-317.
[4] Davidsson, J., Postema, L., LTH, M.T. and Smith, D., 2019. Beamform-
0 ing and Blind Signal Separation for Far-field Voice Capture using a
Microphone Array.
[5] Randazzo, A., Abou-Khousa, M.A., Pastorino, M. and Zoughi, R.,
2007. Direction of arrival estimation based on support vector regression:
Experimental validation and comparison with MUSIC. IEEE Antennas
and Wireless propagation letters, 6, pp.379-382.
330 [6] A. Faye, J. D. Ndaw and M. Sène, ”SVM-Based DOA Estimation with
Classification Optimization,” 2018 26th Telecommunications Forum
(TELFOR), 2018, pp. 1-4, doi: 10.1109/TELFOR.2018.8611827..
[7] Varma, K.M., 2002. Time delay estimate based direction of arrival
300 estimation for speech in reverberant environments (Doctoral dissertation,
270 Virginia Tech).
[8] McCowan, I. (2001). Microphone arrays: A tutorial. Queensland Uni-
versity, Australia, 1-38.
[9] McCowan, I., 2001. Microphone arrays: A tutorial. Queensland Univer-
sity, Australia, pp.1-38.
Fig. 9. DOA pointer showing audio signal coming from 28.273 degree [10] Nordholm, S., Abhayapala, T., Doclo, S., Gannot, S., Naylor, P. and
towards right to the origin. Tashev, I., 2010. Microphone array speech processing.
VII. ACKNOWLEDGEMENT
The first author is thankful to the to all faculties of the
Dept.of ECE, Govt. Engineering College, Thrissur, for their
support and extremely grateful to second and third authors for
their valuable guidance, support and suggestions throughout
the project. Also the first author is thankful to NCRAI lab of
Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.