0% found this document useful (0 votes)
8 views

Direction of Arrival Estimation Using Microphone Array

This document discusses the estimation of Direction Of Arrival (DOA) using a microphone array, focusing on techniques such as Time Delay Estimation (TDE) and Support Vector Machines (SVM) for sound source localization. It highlights the advantages of using microphone arrays over single microphones, including improved spatial characteristics and signal-to-noise ratio. The research emphasizes the use of a miniDSP UMA-16 microphone array and various computational tools for processing and analyzing audio signals to enhance DOA estimation in real-world environments.

Uploaded by

Kaan Demirci
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Direction of Arrival Estimation Using Microphone Array

This document discusses the estimation of Direction Of Arrival (DOA) using a microphone array, focusing on techniques such as Time Delay Estimation (TDE) and Support Vector Machines (SVM) for sound source localization. It highlights the advantages of using microphone arrays over single microphones, including improved spatial characteristics and signal-to-noise ratio. The research emphasizes the use of a miniDSP UMA-16 microphone array and various computational tools for processing and analyzing audio signals to enhance DOA estimation in real-world environments.

Uploaded by

Kaan Demirci
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Direction Of Arrival Estimation using Microphone

Array
Junu Jahana C Dr. Sinith MS Dr. Lalu PP
M.Tech scholar Asst. Professor Asst. Professor
Dept. of ECE Dept. of ECE Dept. of ME
GEC Thrissur GEC Thrissur GEC Thrissur
2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS) | 978-1-6654-4885-7/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICMSS53060.2021.9673617

Kerala, India Kerala, India Kerala, India

Abstract—Human beings can tell the direction of sound by would be hard to get with single microphones[1][2]. The
utilising both their ears. They may instinctively determine the di- time delay of arrival between microphone array channels,
rection of sound by combining the somewhat varied impulses that which may be estimated using generalised cross correlation
arrive at our ears. Similarly, an array of microphones connected
to a computer may be used to create a sound localisation system. or least squares, is a good technique to DOA estimation[3].
The basic idea behind utilising microphone arrays to estimate The DOA may be calculated simply from the array structure
Direction Of Arrival (DOA) is to leverage phase information in and the known TDOA[4]. Another method, like in the MUSIC
signals picked up by spatially separated sensors (microphones). algorithm, is to use the signal subspace[5]. These approaches
The acoustic signals arrive to the microphones with temporal are highly successful when work conditions are limited. They
delays when they are spatially distant. These time-delays are
determined by the signal’s DOA for a known array geometry. The do not, however, function well in extremely reverberant and
audio signal is recorded using a miniDSP UMA-16 microphone loud settings, especially when signal sources are placed very
array with plug and play USB audio connection. For linear close. Researchers have recently used contemporary machine
arrays, the angle between the array’s orientation and the sound learning approaches to speech DOA estimation in order to
source is calculated here. Given that the sound signal arrives at improve performance in noisy, realistic settings that may be
each microphone at various times, corresponding to different
propagation paths, it’s reasonable to infer that the recorded classified into classification networks[6].
signals in each microphone have a Time Difference Of Arrival The Cross-Correlation approach, which does cross-
(TDOA), which is an important factor in microphone array correlation between two microphones to calculate the Time
processing. With the aid of the UMA16 microphone array, more Difference Of Arrival (TDOA) between the microphones,
visible and accurate sound source localization was achievable is the simplest fundamental technique utilised in the time
at lower sound power intensities, which was considered to be a
significant innovation in the field of sound source localization. In domain[4]. For the estimate of DOA, Time Delay Estimation
addition, the DOA was calculated using an SVM classifier, which (TDE)-based techniques have proven the most prevalent. Be-
can categorise audio signals in coarse as left, right or front, cause of their simplicity and minimal computing needs, they
and the performance metrics including accuracy, specificity and are popular[7]. Signals are processed in array systems with
sensitivity are analysed. regard to the spatial geometry of the microphones and sources.
Index Terms—Direction Of Arrival, Sound Source localization,
Time Delay Estimate, SVM As a result, in addition to the standard time and frequency
characterizations of audio sources and receivers, locations and
spatial pathways must be understood and accounted in the
I. I NTRODUCTION
processing[4].
On a daily basis, humans are exposed to a variety of sound The other method is based on support vector machines
sources coming from diverse angles. Since human beings have (SVMs). The SVMs’ main characteristics are their mathemat-
two ears, we can determine the position of the sound source. ically rigorous formulation and high resilience[8]. It is vital
We can also tell the difference between intermixed speech to choose a method that allows quick calculation time when
sources without much effort. Despite the fact that our ears selecting an appropriate approach for finding a sound source
catch up every sound source, our brain manages to adjust in real-time.
quickly when we’re having a conversation with someone, The source location will be determined using an SVM-
while ignoring the rest to some extent. based approach and an enhanced TDE-based DOA estimate
This project focuses on a simple approach to detect the methodology. With the use of a Uniform Linear Array (ULA)
direction of a source with the use of a microphone array. Two of microphones, this research focuses on a straightforward
of these approaches are given and evaluated in this work to approach to detect the direction of a source.
ensure their dependability. The system modelling is done using
the computer tools Matlab and Anaconda spyder (Python). II. DOA ESTIMATION OF AUDIO: AN OVERVIEW
Microphone arrays would provide spatial information for The objective of Sound Source Localization (SSL) is to
incoming acoustic waves by gathering important data that estimate the location of sound sources automatically. The
978-1-6654-4885-7/21/$31.00 ©2021 IEEE.

Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
computation is very simple when there is just one sound
source, but it gets exponentially more complicated when there
are numerous sound sources. a) Uniform Linear Array(ULA)
The primary goal of this research is to use the signals
recorded by microphone arrays to automatically determine
the location of an audio source identified in a particular
environment.

A. Microphone Array
A microphone array is a group of microphones that work
together. The use of a microphone array for recording speech
signals has a number of advantages over using a single
microphone. For example, it captures spatial characteristics b) Circular Microphone Array(CMA)
of the speech signal, increases SNR, and may be used to
guide response in multiple directions, among other things[9].
Microphone arrays are utilised in a variety of cutting-edge
acoustic signal processing techniques, including beamforming,
automatic speech recognition, and speech signal separation.
The array’s response is also influenced by the shape of the
microphone array. Microphone arrays of various geometrical
forms are utilised in many applications[10]. Linear, circular,
triangular, and spherical arrays are the most frequent geometri-
cal configurations of microphones. The form of a microphone
c) Triangular Microphone Array(TMA)
array is determined by the geometric pattern employed, and
as a result, several microphone arrays have been developed,
such as the linear microphone array, circular microphone array, Fig. 1. Microphone array with a variety of geometries. (a) A ULA (b) CMA
triangular microphone array, spherical array, and so on, as (c) TMA
shown in the fig.1.Here (a) shows A ULA in which all of
the components are equally spaced and collinear,(b) shows D. Audio recording using microphone array
CMA, in which components are arranged around a circle’s The linear array was used to make the measurements. Each
perimeter and (c) shows TMA components or microphones recording is 20-25 seconds long and 44.1 kHz sampled.The
are positioned at the triangle’s vertices. microphone array was mounted on a tripod, making it very
easy to position it towards the reference sound source.
B. miniDSP UMA-16 microphone array Measurement Setup : The measurement equipment con-
The UMA-16 is a sixteen-channel microphone array with sists of:
USB audio connection that plugs in and plays. • One microphone array miniDSP UMA 16, of which 4

The Acoustic Camera is made up of 16 microphones and a mic elements arranged along a straight line with uniform
camera that is pointed in the direction of the scanning region. distances (-0.012m, 0.004m, 0.012m, 0.004m) are con-
16 × SPH1668LM4H MEMS Knowles are put out in a Uni- sidered.
form Rectangular Array on the microphone array PCB (URA). • One laptop PC, running on MS Windows 10, MATLAB

An optional USB camera, for applications like acoustic cam- R2017a and Python 3.7.3
era, can be inserted into the centre hole. 16 × SPH1668LM4H • One reference sound source.

MEMS Knowles are put out in a Uniform Rectangular Array • Audacity sound recording software used to record the

on the microphone array PCB (URA). sound signal using microphone array (Audacity is a
free and open-source digital audio editor and recording
C. Direction Of Arrival(DOA) application software)
The audio signals were recorded and analysed using Audac-
The direction of arrival (DOA) in signal processing refers ity, a free and open-source digital audio editing and recording
to the direction from which a propagating wave arrives at a program software. We can create our recording interface inside
certain point of incidence. The propagating sound is deemed Audacity, and it may be used with any microphone array with
planar if it is produced by a source that is far away (far-field). ease.
These, along with the array, will create an angle of incidence
that can only be specified by using θ as the angle of incidence. III. SVM IN DOA ESTIMATION
We may try to compute the DOA if we know the TDOA for Experimental data are used to evaluate an effective DOA
each microphone as well as its spatial coordinates. estimation technique based on support vector regression. For

Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
additional processing and classification using SVM, the Ana- A. Microphone Array Signal Processing
conda spyder environment is utilised with Python. Scipy,
Figure 2 shows a 4-element linear array of microphones
tkinter, tensorflow, matplotlib, and other python libraries are
and a sound source in the far field of the array. The array
utilised in the overall system.

A. Preprocessing and Feature extraction


Numerous audio files in ’.wav’ format are from the left,
right, and front directions in the collection. If the signal
contains noise, the filtering process is used to eliminate the
noise from the input signal.In this case, the audio signal is pre-
processed with a 3rd order Butterworth filter. A Butterworth
filter is a signal processing filter with a frequency response
in the passband that is as flat as feasible. As a result, the
Butterworth filter is sometimes referred to as a ’maximally flat
magnitude filter’. A Butterworth filter may be used to create
an effective audio noise reduction tool. The DWT, which is
based on subband coding, computes the wavelet transform
in less time. The original signal may be created using a
linear combination of wavelet functions and appropriate data
processing utilising the wavelet coefficients.
In this work, the Haar wavelet(db1) is used to extract
Fig. 2. Microphone Array with Far Field Source
features. The mean value of the sampled signal, filtered
signal, sound pressure level information, time of arrival, and
is made up of four microphones arranged in a straight line
DWT feature are extracted from each frame of the audio
with a constant distance between them, d. It is believed that
using the feature extraction technique. The wavelet coefficients
the sound source is in the array’s far field. This indicates that
are used to describe the signal’s energy distribution in time
the distance between the microphones and the source, S, is
and frequency in a simple manner. To further reduce the
significantly higher. The spherical wavefront that originates
dimensionality of the produced feature vectors, statistics over
from the source may be approximated as a plane wavefront
the collection of wavelet coefficients are used. Thus temporal
using this assumption, as illustrated in the image. As a
as well as spectral properties of non-stationary signals like
result, it’s fair to say that the sound waves approaching each
audio can be efficiently extracted using DWT.
microphone are parallel. The broadside direction, often known
B. SVM for audio signal classification as the broadside of the array, is the perpendicular direction to
the array. The source’s signal arrives at the microphones at
The main characteristics of SVMs are their rigorous math-
various times. This is due to the fact that each sound wave
ematical formulation and excellent robustness, which means
must travel a different distance to reach each microphone.
that they perform well even when input signals that were not
In comparison to the signal incident on microphone M4 , the
initially included in the training set are used after a training
signal incident on microphone M3 must travel an additional
phase in which several known input/output mappings are used
distance of d sinθ.
to determine the parameters of the SVMs.
The signal at microphone M3 becomes a time-delayed
We will use the scikit-learn package in Python to implement
replica of the signal at microphone M4 as a result of this.
SVM. To begin, import the SVM module and use the SVC()
This parameter can be extended to the array’s additional
method to build a support vector classifier object. After that,
microphones. The pair-wise time delays associated with either
use fit() to fit the model on the train set then predict on the
source will be the same if we regard the microphone as
test set using predict (). As a result of the SVM analysis, we
pairwise. This is assuming that the microphones are omni-
can predict the direction of audio arrival, i.e. front, left or right
directional, meaning that the gain of the microphone has no
sided, which is presented as the output in the spyder console.
effect on the acoustic wavefront’s orientation. This means that
Finally, the performance metrics including accuracy, speci-
the LA can only tell if the source is at an angle to the array’s
ficity and sensitivity are determined as explained in the results
line, but not where exactly it is around the line. This is referred
session.
to as array front-back ambiguity. A ULA may discern angles
IV. TIME DELAY BASED DIRECTION OF ARRIVAL between – 90° and +90° with relation to the array’s broadside.
ESTIMATION
B. Methodology
Because of its ease of usage and minimal computational
demands For the estimate of direction of arrival (DOA), time A song was played once the UMA 16 was linked to the
delay estimation (TDE)-based techniques have proven the available USB connection. The song was recorded for 20-
most prevalent. 25 seconds. The audio signals were recorded and analysed

Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
using Audacity, a free and open-source digital audio editing
and recording program software.
Input Signal
The four-channel data was merged into a single wave file,
which was then used in Matlab code to compute DOA. Frame-
1000
by-frame audio reading into the workspace dsp AudioFil-
eReader is a Matlab function. The audio file reader keeps track
500
of the sample rate of the audio file.

Amplitude
In fact, if a multichannel audio input interface is available,
change the script to set sourceChoice to ’live’ so we may use 0
live audio input signals as well. The code utilises audioDe-
viceReader to collect four live audio channels using a mic −500
array when sourceChoice = ’live’. Since we’re using recorded
audio signals, set sourceChoice to’recorded’ in our case. −1000
After selecting the source choice (live or recorded; here
we’re using recorded), set the duration of live processing and 0 200000 400000 600000 800000 1000000
Samples
also set how many samples per channel to acquire and process
each iteration. Then we have to define the array geometry. Fig. 3. Input test audio signal
Actually, the technique consists of computing the time
delay estimates (TDE) between all pairs of microphones, and
then integrating them with the array geometry knowledge to
produce the DOA estimate. TDE-based techniques are the DWT Signal 2
most efficient in terms of computing needs since they do not
require a thorough search over all potential angles. 10
A plotting object DOA Display is built as an aid in the
application. As seen in the result, this displays the estimated
5
DOA live with an arrow on a polar plot. Subsequently, reorder
Amplitude

the input samples based on the choice of microphone pairs.


The cross-correlation function of two signals may be computed 0
first to determine the temporal delay between signals from
any pair of microphones. The temporal delay between the −5
two signals is calculated as the lag at which the cross-
correlation function reaches its maximum.To provide a finer
DOA resolution, a cross-correlator is employed in combination −10
with an interpolator. The program operates with separate 0 20 40 60 80 100 120 140
pairs of microphones. The different DOA estimations are then
combined to provide a single live DOA output.i.e., the median Fig. 4. Approximation output of DWT
value is used to estimate DOA across pairings. Finally, the
result on a custom polar representation with arrows.
V. R ESULTS AND A NALYSIS
DWT Signal 1
A. Results and plots from SVM Method using Python
After selecting the audio data as input, it was preprocessed. 100
If the signal contains the noise, then the filtering is undertaken
to remove the initial noise in the input signal. Butterworth filter 50
is used in preprocessing here.The input audio signal is shown
Amplitude

in Fig.3. 0
Using the feature extraction approach, the mean value of the
sampled signal, filtered signal, and DWT feature are retrieved −50
from each frame of the audio. The wavelet coefficients are
used to provide a concise representation of the signal’s energy −100
distribution in time and frequency.The approximation and
detailed outputs of DWT is depicted in Fig. 4 and Fig. 5 −150
0 20 40 60 80 100 120 140
respectively.
After fitting the model and predicting, the audio signal is
categorised as front-sided, left microphone, or right micro- Fig. 5. Detailed output of DWT
phone, and the output is presented in the spyder console. Along

Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Performance parameter obtained from spyder con-
sole
90 1
60
Performance 0.8
Obtained Value(%) 0.6
parameter
30
Accuracy 96.0784313 0.4
Specificity 98.039215 0.2
Sensitivity 94.11764
0

with that the performance analysis of SVM classifier is also


done as shown in table 1. Parameters like accuracy, sensitivity 330
and specificity are measured.
300
270
B. TDE based DOA Result Analysis
Initially, the recorded ’.wav’ files were loaded. Then Ampli-
Fig. 7. DOA pointer showing audio signal coming from 44.529 degree
tude vs. Time relationship for a multiple test sound recorded towards right to the origin.
by 4 channel linear microphone array was plotted as shown
in Fig.6. TABLE II: Measured angle Vs Calculated angle

Measured Calculated Angular Percentage


0.04
Sl No. angle angle Difference angular error
data1
data2 (Degree) (Degree) (Degree) (%)
0.02
data3 1. 12.814 13.32 0.506 3.94
data4
2. 27.420 28.273 0.853 3.11
3. 38.234 36.720 1.514 3.95
Amplitude

0 4. 45.120 44.529 0.591 1.30


5. 54.580 56.530 1.950 3.570

-0.02

-0.04
This demonstrates that our microphone array system has
0 10 20 30 the capability of accurately estimating the location of sound
Time source.

Fig. 6. Amplitude vs. Time relationship for a multiple test sound recorded
by 4 channel linear microphone array
90 1
60
In Fig 6, data1 to data4 represents the audio signal recorded 0.8
by Mic1 to Mic4 of the mic array respectively. Each data was 0.6
30
represented by different colours here. 0.4
The algorithm operates with separate pairs of microphones. 0.2
The different DOA estimations are then combined to provide 0
a single live DOA output.
After combining DOA estimation across pairs by selecting
the median value, DOA pointer directed towards the median 330
value of 44.529 degree towards right to the origin as shown
in Fig.7. 300
Consider another recorded audio signals. The DOA of signal 270
with respect to mic array is given in Fig.8 and Fig.9.
We can analyse the angular positional error by comparing
the Measured angle(calculated from opposite side, adjacent Fig. 8. DOA pointer showing audio signal coming from 1.214 degree towards
side and hypotenuse) with the calculated angle as shown in right to the origin.
following table:

Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.
GEC thrissur for providing the required facilities throughout
the project work.
90 1
60 R EFERENCES
0.8
[1] McCowan, I., 2001. Microphone arrays: A tutorial. Queensland Univer-
0.6 sity, Australia, pp.1-38.
[2] Alexandridis, A., Griffin, A. and Mouchtaris, A., 2013. Capturing and
30 reproducing spatial audio based on a circular microphone array. Journal
0.4
of Electrical and Computer Engineering, 2013.
0.2 [3] Fan, J., Luo, Q. and Ma, D., 2010. Localization estimation of sound
source by microphones array. Procedia Engineering, 7, pp.312-317.
[4] Davidsson, J., Postema, L., LTH, M.T. and Smith, D., 2019. Beamform-
0 ing and Blind Signal Separation for Far-field Voice Capture using a
Microphone Array.
[5] Randazzo, A., Abou-Khousa, M.A., Pastorino, M. and Zoughi, R.,
2007. Direction of arrival estimation based on support vector regression:
Experimental validation and comparison with MUSIC. IEEE Antennas
and Wireless propagation letters, 6, pp.379-382.
330 [6] A. Faye, J. D. Ndaw and M. Sène, ”SVM-Based DOA Estimation with
Classification Optimization,” 2018 26th Telecommunications Forum
(TELFOR), 2018, pp. 1-4, doi: 10.1109/TELFOR.2018.8611827..
[7] Varma, K.M., 2002. Time delay estimate based direction of arrival
300 estimation for speech in reverberant environments (Doctoral dissertation,
270 Virginia Tech).
[8] McCowan, I. (2001). Microphone arrays: A tutorial. Queensland Uni-
versity, Australia, 1-38.
[9] McCowan, I., 2001. Microphone arrays: A tutorial. Queensland Univer-
sity, Australia, pp.1-38.
Fig. 9. DOA pointer showing audio signal coming from 28.273 degree [10] Nordholm, S., Abhayapala, T., Doclo, S., Gannot, S., Naylor, P. and
towards right to the origin. Tashev, I., 2010. Microphone array speech processing.

VI. CONCLUSION AND FUTURE SCOPE


An effective approach for detecting the DOA was im-
plemented and analysed in this paper. The arrival direction
was represented on a custom arrow-based polar representation
(Using the audio tool box in matlab). The microphone array
employed is quite good at estimating the angle of arrival
precisely. DOA estimate of sound signal was done effectively
in ML classification technique, using python with anaconda,
by identifying right, left, and unidirectional audio signals.
For the supplied test signals, the direction of arrival was
detected and shown appropriately in the console. The accuracy,
sensitivity, and specificity performance characteristics were
analysed and displayed. A greater number of microphones
improves SSL quality, but it also increases computing load,
making real-time reaction difficult.
More microphone components can provide a more precise
position (including azimuth and elevation). By inserting a
USB camera into the miniDSP UMA 16’s centre hole, the
DOA may be mapped with captured photos. Also by using
sophisticated acoustic techniques, DOA estimation for various
sound sources may be integrated.Using a higher sample rate,
upsampling, or altering the microphone configuration might
enhance performance even further.

VII. ACKNOWLEDGEMENT
The first author is thankful to the to all faculties of the
Dept.of ECE, Govt. Engineering College, Thrissur, for their
support and extremely grateful to second and third authors for
their valuable guidance, support and suggestions throughout
the project. Also the first author is thankful to NCRAI lab of

Authorized licensed use limited to: ULAKBIM UASL - YILDIZ TEKNIK UNIVERSITESI. Downloaded on March 05,2025 at 16:05:08 UTC from IEEE Xplore. Restrictions apply.

You might also like