Report Horn Detection
Report Horn Detection
Submitted by
Arun Kumar Sahoo
19CS4162
December 2020
Contents
ABSTRACT:..................................................................................................................................................... 1
1. Introduction: ............................................................................................................................................. 2
2. Literature Survey:...................................................................................................................................... 3
3. Methods Used: .......................................................................................................................................... 4
3.1 Short Time Fourier Transform: ........................................................................................................... 4
3.2 Datasets Used: .................................................................................................................................. 10
Conclusion: .................................................................................................................................................. 11
References .................................................................................................................................................. 12
ABSTRACT:
The aim of the paper is to introduce a horn detection system that can be recognizing the horn
sound and its direction for a safety of driving. This system can help deaf persons as well as the
persons who were difficult to find in which side the vehicles are overtaking them during driving.
The flashing lights are the visual cues to which Deaf people pay attention while driving, but
honking horns does not inhibit any visual cues. For the safety of deaf people and to reduce the
discrimination of society about deaf people driving, there is a need of a device that will alert deaf
drivers and also indicate them from which side the sound is coming. For this, we propose a Short
Time Fourier Transform (STFT) technique which is giving us a data with several segments of
same lenth for detection of horn sounds by matching from the background noise. We are only
matching the segment frequency of sample data and test data. When it matches with some
threshold value it assigns a value and giving us the result at which time period actually horn
sounds. This is basically uses many python libraries and different modules like FFmpeg which is
audio conversion tool used loading of audio files.
1
1. Introduction:
Many research study is being conducted by professional researchers to collect data about the
importance of hearing for safety driving. Some of the people believe that in the overall driving
task, hearing takes only minor significance. It is very difficult for many drivers to detect the
presence of vehicles of emergency in particular when all windows are closed and the car is
running radio. This makes it difficult for the driver to react in time and free up space, and thus
not restrict a passing ambulance. This opens up an area important for maintaining safety on the
road and rescue vehicles detection in normal operation.
According to research from 1999 dealing with soundproof insulation in modern cars and
ambulance sirens downturn. An ambulance was equipped with a siren, which produced the
intensity of the noise level of 115 dB. The first test was carried out on a car at a speed of 60 km/h
when the car radio and ventilation systems were turned off. The siren was heard at a distance of
100 meters from a moving car. Turning on the air conditioning and radio on the minimum
audibility sirens rapidly reduced to 50 meters from the car. Once the radio volume increased to
the level of 90dB siren it was worth about 82 dB nearly inaudible. The sound of sirens
ambulance could be heard up to a distance of 2 meters from the vehicle
This project will help the deaf persons to find in which side the vehicle is overtaking. Deaf driver
or the driver who keeps window closed with loud music during driving, finds difficult to be
aware about surrounding. To overcome this, sensors are used to sense the horn sound outside the
vehicle and make the driver aware of surrounding. The literature survey of our report states that
the sound of horn can be differentiated from the other sounds within the range of distance. So,
the horn sound can be detected from the sensor using the following methods. In our method i.e
Short Time Fourier Transform (STFT) giving us the timing at which the horn sounds. We can
also get more information by making modification of our technique. I have taken few samples
for training purpose and I have got the horn sounds for our test data and also the time duration at
which the horn sounds. Our method basically matches with the frequency of each sample data
with the test data.
2
2. Literature Survey:
• From geeksforgeeks we get the information that Fast Fourier Transformation algorithm
which plays a very important role in the computation of the Discrete Fourier Transform
of a sequence. It converts a space or time signal to signal of the frequency domain. The
datasets I have used from internet by collecting from different websites and also some
datasets I have recorded of my own. By Cooley, James W., and John W. Tukey I got the
idea about how to use Fourier Transform by python from different research papers. From
github I got some idea about the coding that how the audio detection works for different
methods and the working procedures.
• Other methods for detecting emergency vehicles are based on the measurement of sound,
or optical signal. Examples includes Integrated Circuit Solution for Detection of Acoustic
Signals in Emergency Road Traffic which refers to the sound detection solutions using
basic mathematical operations such as convolution and FFT. With the subsequent
application to the silicon due to the size, power, current consumption, and SNR ASIC
showed good qualities, but the development group warned that further testing under real
conditions. Polish group of scientists from the Technical University of Lodz used to
detect signals of basic signal processing methods. Method was used for the FFT and after
its application, basic parameters were calculated spectrum (the maximum frequency,
minimum, average). The algorithm has been applied to an embedded platform. The
results of this method, published in the article were not sufficient, because the authors did
not count the Doppler effect, and also was applied signal filtering. Based Acoustic Safety
Emergency Vehicle Detection for Intelligent Transport Systems, which uses the
microphone array spread crosswise. Use mathematical methods such as correlation
method, the time delay of sound, least squares method and adaptive filter. With the
subsequent comparison of methods for detecting and using the place where the audio.
3
3. Methods Used:
3.1 Short Time Fourier Transform:
I have used Short Time Fourier Transform (STFT) for detecting horn sound. This method divides
the test data as well as the sample data into many segments as per sample rate. Then it is
matching with the matching_score(frequency parameter) of each segments with test data and
sample data. If it is matching with test data as well as sample data, then only it will assign a
value and that will be displayed on result page. I have taken a threshold value of .15 as
matching_minscore i.e it will not assign any value(matching_score) till it will not matches
minimum of 85 percentage of segment min_score. Following description is the algorithm and the
working procedure of STFT.
The Short Time Fourier Transform (STFT) is a special form of a Fourier transform where you
can see how your frequencies in your signal change through time. It works by slicing up your
signal into many small segments and taking the fourier transform of each of these. The result is
usually plots graph which shows frequency against time. I have used here the python libraries
for implementing the STFT technique.
STFT Algorithm:
4
6. Scale the resulting spectrum into dB for easier viewing.
7. Clip the signal to remove noise past the noise floor which we don’t care about.
Step 1 – Pick segment:
We need to find our current segment to process from the overall data set. We use the concept of
a ‘sliding window’ to help us visualize what is happening throughout the method. The data
inside the window is the current segment to be processed.
The window’s length remains same during the processing of the data, but the offset changes
with each step of the algorithm. Usually when processing the STFT, the change in offset will be
less than one window length, meaning that the last window and the current window overlap. If
we define the window size, and the percentage of overlap, we know all the information we need
about how the window moves throughout the processing.
This helps to alleviate a problem created by segmenting the data. When the data is cut into
pieces, the edges make a sharp transition that didn’t exist before. Multiplying by a half cosine
function helps to fade the signal in and out so that the transitions at the edges do not affect the
Fourier transform of the data.
5
Data multiplied by a half cosine
We pad the end of the current segment with a number of zeros equal to the length of the
window. Or in other words, the new segment will be twice as long as the original segment.
Here we finally take the Fourier transform of the data. Fourier transform takes a signal and extracts the
frequency content from that signal. These are some facts about the Fourier transform.
• The number of samples of data in is equal to the number of frequency bins out.
• The maximum frequency that can be represented is the Nyquist frequency, or half the
sampling frequency of the data.
• In the the Discrete Fourier Transform (what we are using) the order of the frequency bins
is 0 Hz (DC component), positive frequencies, Nyquist Frequency, negative frequencies.
• The data from the Fourier transform needs to be scaled by the number of samples in the
transform to maintain equal energy.
Step 5 – Autopower:
Autopwer spectrum is throwing away the upper half of the array of data. This is why we padded
the segment to twice it’s size – so that when the result was cut down by half by the autopower
6
transform, we would still have the same number of frequency bins as samples in our
segment. Note that this isn’t a strict requirement of the algorithm – to have the segment size as
frequency bins. We could have chosen our pad length to be any number from zero (no padding)
to infinity.
This is another transform to make the data easier to look at. It turns out that peoples ears work
on a logarithmic scale so that the ear can detect much finer changes in amplitude at low
amplitudes than at high amplitudes. We transform the data into decibels (dB) which is a
logarithmic scale, so that we can see the energy content of a signal more how our ears would
detect it. Converting the data to decibels has the effect of stretching peaks downwards towards
the average sound level, and bringing throughs upwards. This allows us to compare content at
all amplitude levels. If we didn’t do this and we scaled the data so that the peaks were visible, it
would be impossible to see any shape in the quieter values vs if we looked at the quieter values,
the peaks would explode off the top of the screen.
This also makes the data easier to look at. We know that everything below -40 dB is well below
the noise floor and is probably just numerical error in the algorithm. Therefore, we can clip the
data so that everything below -40 dB is set to -40 dB exactly. This gives us more color range to
apply to significant portions of our data.
SciPy(Scintific Python)package:
• Extends the functionality of Numpy.
7
Scipy
Uses Numpy arrays as the bsic data structure.
▪ Linear algebra
▪ Integration
▪ Ordinary differencial eaquation
▪ Signal processing
Scipy – fast Fourier Transform (fft) pack:
Fft()pack
It allows us to
• Check the signal’s behavior in the frequency domain.
Applications:
▪ Signal and noise processing
▪ Image processing
▪ Audio processing
Here we have use audio processing using STFT model.
8
Librosa:
It is powerful Python library built to work with audio and perform analysis on it. It is the starting
point towards working with audio data at scale for a wide range of applications such as detecting
voice from a person to finding personal characteristics from an audio.
It help us to implement:
FFmpeg:
It is a free and open-source software project consisting of a large suite
of libraries and programs for handling video, audio, and other multimedia files and streams.
9
3.2 Datasets Used:
I have used 20 mp3 files as training datasets. From which I have recorded 10 of them. Each
datasets have only horn sounds of short duration. Maximum of datasets I have used of short
interval of 2 to 5 seconds for getting more accuracy in matching. Short interval audio files are
best for training and getting out- put by matching segments. Some audio files are of long interval
for testing purpose because I can get the output as more no of intervals. This algorithm is
matching with each segment of test data and sample data then it is giving the time duration of
matching interval. So when you run the program it will load test data first then it it will load all
sample data and will match with each segment.
10
Conclusion:
By STFT segment matching I have used two mp3 files as test data where the first one having
horn sound and the second one having without horn sound . In the result I have seen the presence
and absence of horn sound. Due to less training datasets I have not got the desired accuracy for
detecting horns that means in some segments where some noises are there those segments are
also detecting as horn sound but with less matching score. So this can be resolved by taking more
number of datasets. We are getting this result as follows
#human-start=0:00:15
#human-end=0:00:19
#sample=horn2.mp3
11
References
[1] Cooley, James W., and John W. Tukey, 1965, “An algorithm for the machine calculation of
complex Fourier series,” Math. Comput. 19: 297-301.
[2] J. Makhoul, 1980, ‘A Fast Cosine Transform in One and Two Dimensions’, IEEE
Transactions on acoustics, speech and signal processing vol. 28(1), pp. 27-34.
[3] Y. Cao, T. Iqbal, Q. Kong, Y. Zhong, W. Wang, M. D. Plumbley, EVENT-INDEPENDENT
NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION,
Detection and Classification of Acoustic Scenes and Events 2020.
[4] J. Bai, R. Wu, M. Wang, D. Li, D. Li, X. Han, Q. Wang, Q. Liu, B. Wang, Z. Fu, CIAIC-
BAD SYSTEM FOR DCASE2018 CHALLENGE TASK3, Detection and Classification of
Acoustic Scenes and Events 2018.
12