Urban Sound Classification PaperV2

This document describes a student project classifying urban sounds using machine learning models. The project compares different feature extraction techniques and machine learning algorithms on a dataset of urban sounds. The dataset contains over 8,000 audio clips from 10 classes labeled by their sound source. The document outlines extracting features directly from the audio signals and from time series representations. Random forest, SVM, and neural network models are tested on the different feature sets to classify the urban sounds.

Uploaded by

Shree Bohara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views6 pages

Urban Sound Classification PaperV2

Uploaded by

Shree Bohara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

CSCI E-81 Machine Learning & Data Mining Final Project Fall 2016

Urban Sound Classification:

With Random Forest, SVM, DNN, RNN, and CNN Classifiers

Chih-Wei Chang Benjamin Doran

Harvard Univerisity Harvard University
Cambridge, MA Cambridge, MA
USA USA
[email protected] [email protected]

ABSTRACT extractions are the filterbank, and log-filterbank methods. These

In this paper we describe two methods of extracting features from extraction techniques filter important information in each of the
sound data, one focusing on maintaining the time-series nature of smaller time segments (frames) of the original signal, thus
the audio sequence – the other expanding signal characteristics. preserving the time-series format of the raw data.
We go on to detail the effectiveness of different models on each We applied general classifiers such as Random Forest, Naïve
method, including tests of Random Forests, Naïve Bayes, Support Bayes, Support Vector Machine, as well as neural network models
Vector Machines, and Neural Network architectures such as deep such as deep neural network, convolutional neural network, and
neural network, convolutional neural network, and recurrent recurrent neural network.
neural network. Additional steps such as feature engineering, and
Section 2 of this paper describes the UrbanSound dataset in more
future research steps are touched on briefly.
detail. Section 3 discusses different feature extraction techniques.
CCS Concepts Section4 presents the models, and section 5 concludes this paper.
• Computing Methodologies Machine Learning 2. DATASET
Machine Learning Algorithms Feature Selection
We use the UrbanSound 8k Dataset by Justin Salamon,
• Computing Methodologies Machine Learning Christopher Jacoby, and Juan Pablo Bello. [1] The UrbanSound
Machine Learning Approaches Neural Networks 8k dataset contains 8723 real-field recording samples from 10
classes of different sound sources: air conditioner, car horn,
Keywords children playing, dog bark, drilling, engine idling, gun shot,
Mel Banks Cepstral Coefficients (MFCC); Filterbanks; Sound jackhammer, siren and street music. Most of the samples have
Classification; Feature Extraction; Neural Networks durations of 4 seconds. But some of them can be as short as 2
seconds. The total length of the 8732 samples is 8.75 hours.
1. INTRODUCTION Counts of samples in each class are displayed in Figure1.
Automatic sound classification has been a field of growing The creators of UrbanSound8k has pre-divided the 8732 samples
research. In particular, the sonic analysis on environment sounds into 10 subgroups for 10-fold cross validation. Because the 8732
has generated increasing researches because of its multiple sound excerpts are cropped from a smaller number of longer
applications to large-scale content-based multimedia indexing and recordings, randomly partitioning the 8732 samples for cross
retrieval. However, sonic analysis researches have focused mostly validation can result in over optimistic results because excerpts
on music or speech recognition. Not only are there scarce works from the same original recording can appear in both the training
on environment sound, but also very few database for labeled and validation/test dataset. The pre-divided 10 folds avoid the
environment audio data. One of the few free large sound data is above issue and make classes in each fold roughly balanced.
the UrbanSound dataset created by Justin Salamon, Christopher
Jacoby and Juan Pablo Bello in 2014. The UrbanSound dataset is The downloaded UrbanSound dataset contains 8732 .wav format
unique for its classification is not just based on the auditory scene files of different length and resolutions. We parsed through all the
type such as nature, human, animal, but on the sources of sound, .wav files and use the python Sound File library
such as dog bark, car horn. (https://pypi.python.org/pypi/SoundFile/) to convert the .wav files
into matrixes of numbers.
One of the main challenges facing sound data classification
problem is feature extraction. The features of sound data cannot
be expressed in vector forms like other type of data such as
images and texts. So the feature extraction for sound data is less
straightforward. In this project, we applied different feature
extraction techniques and compared the model performances on
different feature sets. We applied two categories of feature
extraction techniques: signal characteristic feature extraction and
time series feature extraction. The first method of extraction
approaches is expansion and isolation of important
“characteristics” from each sample, such as Mel Frequencies
Cepstral Coefficient (MFCC), spectrograms, spectral contrasts, Figure 1 Number of occurrences of sounds in different classes
and tonal centroid features. The second category of feature
CSCI E-81 Machine Learning & Data Mining Final Project Fall 2016

Figure 2 shows the plot of raw data for a sample from each class Figure 3 shows the log spectrogram filterbanks

5. “Take the discrete cosine transformation (DCT) of the

log filterbank energies. This step de-correlate
3. FEATURE EXTRACTION overlapping frames. This is essential for HMM classifier
Feature extraction for sound data is less straightforward than other
to work but is not as relevant for other classifiers.
data formats. The parsed raw data of a 4 second sample contains
6. “Keep DCT coefficients 2-13, discard the rest.
44,100 values.
Experiment results show that dropping the coefficient
Although each signal from different classes shows its distinct above the 13th improve performance in automatic
patterns, it will be prohibitively expensive to handle such a long speech recognition.” [2]
sample. Feature extraction from the raw data is hence critical for
building classifiers.
There are other commonly used feature extraction methods, such
The first category of our feature extraction approaches is to
as Mel-scaled spectrogram, chromagram, spectral-contrast, and
extract characteristics from each sample. So the number of
the tonal centroid features. The Librosa library has functions to
features is fixed regardless of the shape of raw data.
extract all the above characteristics. We ended up extracting a
One commonly used feature extraction technique in speech total of 193 characteristics (features) for each sample using these
recognition is isolating the Mel Frequencies Cepstral Coefficients methods.
(MFCC). It is also widely used in environmental sound analysis
The second category of our feature extraction approaches is
and has become competitive baseline for bench marking new
filterbanks. This approach allows us to keep the time-series
techniques. Steps to extract MFCC as detailed by James Lyons in
attribute of the raw data. (e.g. The extracted series of features are
his Mel Frequency Cepstral Coefficient (MFCC) Tutorial include:
also time series)
1. “Frame the signal into short frames of 20 – 40
Computing filter banks and MFCCs involve somewhat the same
milliseconds. Each frame overlaps 50% of its neighbor
procedure, where in both cases filter banks are computed and with
frames.
a few more extra steps MFCCs can be obtained.)[3] To obtain
2. “For each frame calculate the periodogram estimate of
MFCCs, a Discrete Cosine Transform (DCT) is applied to the
the power spectrum. This step records what frequencies
filter banks retaining a number of the resulting coefficients.
are presented in each frame.
3. “Apply the Mel filterbank to the power spectra, sum the Because of the way we partition each sample into overlapping
energy in each filter. This step takes clumps of frames, each filterbanks we extract is highly correlated with its
periodogram bins and sum them up to further reduce the neighbor filterbanks. This autocorrelation can be problematic for
number of features. some classifiers such as the hidden Markov Chain. However, as
4. “Take the logarithm of all filterbank energies. This step we will be implementing the neural network family classifiers, the
is motivated by the fact that human does not perceive autocorrelation will not be an issue. It might be beneficial for the
loudness in a linear scale. classifiers to learn directly from the signal in the time domain.
CSCI E-81 Machine Learning & Data Mining Final Project Fall 2016

The challenge we faced in extracting the filterbanks features are 140 times larger than the features193 set, however its size did not
different-sized raw data. Since our samples include sound bring enough new information about the samples to compensate
excerpts of different length, resolution, and number of channels. for the slowness of training.
The shape of filterbank features also varies across samples. We did not see as much advantage to maintaining the time-series
Because neither Sklearn nor Tensorflow allows varied data shape, nature of the data as expected. In comparison between
we would need to make the size of extracted features identical const_MFCC (filterbank method) and features193 (signal type
across samples. Our first solution is zero padding, making the isolation method). The features193 training set gave us
shorter signals as long as the longest ones. Our second solution is moderately better performance. One factor is model turn-over
to cut each sample into the same number of windows (frames), so rate, computing on 193 features to 820 total features. Another
each sample will have the same number of windows regardless of factor may be the shape of the const_MFCC training set. The
the length of the original sound. Shorter samples will have a frame filterbank split each audio file into smaller segments. Meaning
partition that each frame overlaps with others more. that any models learning on the time series was learning on the
smaller patterns contained and not the overall pattern of the audio
sequence. In contrast, for the flat0pad data (also a filterbank
4. MODELS method) we flattened the data to a single time series, which let our
RNN learn on the overall pattern instead of the smaller segments.
4.1 Comparison of Neural Networks across And as we can see in Fig. 4, the RNN gave better performance in
Data Sets comparison to the DeepNN on the flattened dataset (flat0pad) to
We tested with three different architectures of neural network: the split dataset (const_MFCC).
Recurrent Neural Networks (RNN), Deep Neural Networks The feature engineering we attempted with the const_LogMFCC
(DeepNN), and a Convolutional Neural Networks (CNN). We dataset failed because the const_MFCC data, which the
kept each model simple initially, using at most 3 hidden layers for const_LogMFCC dataset is produced from, contained negative
numbers. Meaning that we were losing half the data when
converting the dataset. We tried to correct that by taking the log of
the absolute values, yet we were still losing half the scale, which
is the result we see above in the above figure. Unfortunately, even
correctly scaling the data by shifting it all to positive values then
taking the log, log(𝑑𝑎𝑡𝑎 + −𝑚𝑖𝑛 + 1 ) did not give us results
that were any better due to the function condensing the upper
bounds into the the middle. While our manual attempts at feature
engineering were failures, we did have success using Sci-Kit
Learn’s standard scaler, which added about another 3-4
percentage point increase in accuracy for our CNN and DeepNN
(though we only had time to test this result on the features193
dataset).
With only 8732 samples in total, overfitting was a major issue
with this dataset. In our RNN, and DeepNN models especially, we
Figure 4. We see that feature193 and const_MFCC sets gave had to control with high rates of dropout (as much as 80% on each
us the best performance. And among those sets our CNN layer) and L2 regularization (lambda of 0.01 on each layer). Even
model gave the best accuracy, with 58% on const_MFCC and with those controls we were unable to completely prevent the
64% on features193 (we later optimized it to 72.8% accuracy RNN from overfitting, as shown in the plot below. The
on features193) complexity of the RNN made it the slowest and hardest to tune.
Indeed, the RNN “cell” in Tensorflow acted much like a
our DeepNN and 2 for our RNN and CNN. As we see in our
blackbox, making it challenging to add regularization inside. The
comparison plot above, our const_MFCC and features193 datasets
RNN also needed as much as 80% dropout, and actually required
gave us the best accuracy across all models, with features193
beating const_MFCC by an average of about 5 percentage points.
The only model we were unable to get working for a data set was
the CNN for the flat0pad training set that had 27,600 features per
sample. Our laptop (i5 CPU, 8gb ram) persistantly froze and/or
gave memory errors when attempting to run that particular model.
A large part of why the flat0pad did not do well is simply that It
took too long to run models on it. Even for our DeepNN, it could
take over two hours to train for 5000 epochs. The RNN could take
over seven hours. This meant that we simply did not have as much
time to really tune the model to the extent we could the others. To
illustrate this point more clearly, the fastest time we got was 30
minutes with our DeepNN, and CNN on the features193 training
set. The longest was over 25 hours, for our models on the flat0pad
training set. This difference means that we had roughly 50 more
chances to turn-over and tune our models on features193 training
set than the flat0pad set. A difference in time that is Figure 5 All models reach their peak very early, before 5,000
understandable considering that the flat0pad training set is over a epochs. And while DeepNN and CNN plateau, RNN begins to
overfit despite high dropout and regularization.
CSCI E-81 Machine Learning & Data Mining Final Project Fall 2016

a higher level of regularization than the DeepNN using a lambda 5. FINAL RESULTS
of 0.04.
5.1 Training Curve Results
We had the best results with our CNN as can be seen in the fig. 4
Although images distortions did not help much with “increasing”
and 5 above. While the DeepNN was comparable, the CNN
our amount of data, an actual increase in data samples would. We
consistently beat it by a few percentage points across all datasets.
plotted a training curve using the features193 dataset in
Part of this success stems from the CNN sharing its weights
increments of 1,000 samples recording the training and test
across multiple features. Therefore it was not as susceptible to
accuracies for each batch size after 15,000 epochs with our
overfitting as the other models we tried. Indeed, while both the
DeepNN model. While is not difinitve, the plot does to trend
RNN and DeepNN needed as much as 80% dropout, our CNN
toward collision of the training and test accuracies. A trend that
worked just fine with 50%. We did also add regularization to the
appears to continue beyond the sample sizes we are able to
CNN, however it did not give us as much improvement as we
capture.
found with the other models.
Remarkably, even though our features193 dataset’s features are
not related together like the image pixels CNNs are known to be
good at classifying against, treating the data like an image worked
to an extent. We were able to use the patch size in convolution to
really control how much detail from the features we wanted. We
found a patch size that combined 10 features to a single set of
weights and biases to provide the best results.
This analogy of our features as connected like an image goes only
so far. In an attempt to artificially increase our data size we tried
some of Tensorflow’s image distortion functions on our
features193 training set to limited effect. Unsurprisingly, these
functions lengthened the time it takes to train to a similar accuracy
as not using the distortions. However the accuracy curves, as
shown below, do not seem to indicate that training longer would
improve accuracy.
Figure 7. Training sets top line, Test sets bottom line. We see a
trend toward convergence between the two lines off to the
right indicating that more data samples would increase
accuracy.

5.2 Optimization Results

As we have written earlier, in the context of this dataset prone to
overfitting, simpler models that gave the most control to add
restrictions won out. As such our CNN model in Tensorflow gave
the both the highest accuracy as well as AUC score. Our final best
predictions are:
1st CNN 73% acc. 93% AUC
nd
2 DeepNN 68% acc. 91% AUC
rd
3 Random Forest 61% acc. -
4th SVM 59% acc. -
th
5 RNN 56% acc. 87% AUC
Figure 6. CNN with no distortion on training samples th
maintains an accuracy consistently higher than the models 6 Naïve Bayes 23% acc. -
using randomized brightness and contrast distortions. Table 1. Final Accuracy results for models on the features193
dataset.
While the CNN does have better performance and is more robust,
we should also note that it can be as much as three times slower Looking at the confusion maps of our models on the features193
than the DeepNN. The CNN is simply not as scalable as the test set, we can see that our models is making the same types of
DeepNN. If needing to work with a larger sound dataset where a mistakes that a human might: air conditioners for drilling and
small lowering of accuracy is acceptable the DeepNN serves well drilling for jack hammering. This similarity indicates that our
as the more easily modified and faster model. feature extraction techniques are validly representing the data, and
that we are predicting on the data in a way useful to human
interpretation albeit with less accuracy than is hoped for.
CSCI E-81 Machine Learning & Data Mining Final Project Fall 2016
CSCI E-81 Machine Learning & Data Mining Final Project Fall 2016

6. ACKNOWLEDGMENTS machine-learning/guide-mel-frequency-cepstral-coefficients-
Our thanks to Justin Salamon, Christopher Jacoby, and Juan Pablo mfccs/
Bello for creating the UrbanSound8K dataset [3] Haytham Fayek. 2016. Speech Processing for Machine
Learning: Filter banks, Mel-Frequency Cepstral Coefficients
7. REFERENCES (MFCCs) and What's In-Between. In
[1] Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. http://haythamfayek.com/blog/ Online.
2014. A Dataset and Taxonomy for Urban Sound Research. URL=http://haythamfayek.com/2016/04/21/speech-
In Proceedings of the 22nd ACM international conference on processing-for-machine-learning.html
Multimedia (MM '14). ACM, New York, NY, USA, 1041-
[4] Aaqid Sayeed. 2016 Urban Sound Classification, Part I, Part
1044. DOI=http://doi.acm.org/10.1145/2647868.2655045
II. https://aqibsaeed.github.io/2016-09-03-urban-sound-
[2] James Lyons. 2013. Mel Frequency Cepstral Coefficient classification-part-1/
(MFCC) tutorial. In Practical Cryptography Online.
URL=http://www.practicalcryptography.com/miscellaneous/

SAH LAB Risk Assesssment Tool
100% (1)
SAH LAB Risk Assesssment Tool
10 pages
Distinguishing Between Two Human Voices Using AI
No ratings yet
Distinguishing Between Two Human Voices Using AI
11 pages
Audio Classification
No ratings yet
Audio Classification
5 pages
Feature Extraction Techniques Comparison For Emotion Recognition Using Acoustic Features
No ratings yet
Feature Extraction Techniques Comparison For Emotion Recognition Using Acoustic Features
4 pages
Biometrics Lecture Speech
No ratings yet
Biometrics Lecture Speech
38 pages
ML Assignment 2 Report
No ratings yet
ML Assignment 2 Report
59 pages
Comparative Analysis of Different Sampling Rates o
No ratings yet
Comparative Analysis of Different Sampling Rates o
9 pages
MLSP Lab Exp3
No ratings yet
MLSP Lab Exp3
6 pages
Tata Aig General Insurance Company Limited Smart Care-Extended Warranty Insurance Certificate of Insurance
No ratings yet
Tata Aig General Insurance Company Limited Smart Care-Extended Warranty Insurance Certificate of Insurance
7 pages
Automatic Audio Feature Extraction For Keyword Spotting
No ratings yet
Automatic Audio Feature Extraction For Keyword Spotting
5 pages
1-S2.0-S0885230824000962-Main Significance of Chirp MFCC As A Feature in Speech and Audio
No ratings yet
1-S2.0-S0885230824000962-Main Significance of Chirp MFCC As A Feature in Speech and Audio
11 pages
04 Pointers
No ratings yet
04 Pointers
103 pages
Rauc Iom 14 - 06012007
No ratings yet
Rauc Iom 14 - 06012007
76 pages
Cs 230 Final Project Paper
No ratings yet
Cs 230 Final Project Paper
6 pages
Audio Classification
No ratings yet
Audio Classification
1 page
Audio Processing Feature Extraction Methods
No ratings yet
Audio Processing Feature Extraction Methods
7 pages
Rogers Sip Trunk
No ratings yet
Rogers Sip Trunk
49 pages
Presentation Slides
No ratings yet
Presentation Slides
13 pages
PDF 24
0% (1)
PDF 24
2 pages
Pas Bahasa Inggris Kelas Ix
No ratings yet
Pas Bahasa Inggris Kelas Ix
7 pages
AIM:-Audio Signal Feature Extrac On Objec Ve
No ratings yet
AIM:-Audio Signal Feature Extrac On Objec Ve
5 pages
Article - Audio Intent Detection Classification Problem
No ratings yet
Article - Audio Intent Detection Classification Problem
4 pages
Case Presentation On Pem
82% (22)
Case Presentation On Pem
22 pages
Final Project Report
No ratings yet
Final Project Report
15 pages
The Nearest Neighbour Algorithm
No ratings yet
The Nearest Neighbour Algorithm
3 pages
Speech Recognition and PCA
No ratings yet
Speech Recognition and PCA
14 pages
Music Genre Classification
No ratings yet
Music Genre Classification
33 pages
Feature Extraction Techniques For Speech Processing A Review
No ratings yet
Feature Extraction Techniques For Speech Processing A Review
8 pages
A Novel Approach For MFCC Feature Extraction
No ratings yet
A Novel Approach For MFCC Feature Extraction
5 pages
Mrac Paper1a
No ratings yet
Mrac Paper1a
11 pages
Self Assessment and Reflection 1
100% (2)
Self Assessment and Reflection 1
7 pages
Meaning of The Term Childhood As The Happiest Period of Life
No ratings yet
Meaning of The Term Childhood As The Happiest Period of Life
2 pages
Empirical Study of Features and Classifiers For Fault Diagnosis in Motorcycles Based On Acoustic Signals
No ratings yet
Empirical Study of Features and Classifiers For Fault Diagnosis in Motorcycles Based On Acoustic Signals
28 pages
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
No ratings yet
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
12 pages
FMA-V8-October - CSSE .V4
No ratings yet
FMA-V8-October - CSSE .V4
14 pages
MSUAAF Glidden 2013 Plans Book
No ratings yet
MSUAAF Glidden 2013 Plans Book
24 pages
2022 - Digital Transformation Towards Education 4.0
No ratings yet
2022 - Digital Transformation Towards Education 4.0
28 pages
2017 Bookmatter SpeechRecognitionUsingArticula
No ratings yet
2017 Bookmatter SpeechRecognitionUsingArticula
8 pages
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
No ratings yet
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
34 pages
Sách ĐH Nư C Ngoài
No ratings yet
Sách ĐH Nư C Ngoài
76 pages
Building Services Compressed Compressed
No ratings yet
Building Services Compressed Compressed
79 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
Chapter - 1: 1.1 Introduction To Music Genre Classification
No ratings yet
Chapter - 1: 1.1 Introduction To Music Genre Classification
57 pages
Data Sheet 80x65 FS2GA 6 15
No ratings yet
Data Sheet 80x65 FS2GA 6 15
5 pages
Recall What Are Sound Features? Feature Detection and Extraction Features in Sphinx III
No ratings yet
Recall What Are Sound Features? Feature Detection and Extraction Features in Sphinx III
11 pages
Audit of The Acquisition and Payment Cycle: Tests of Controls, Substantive Tests of Transactions, and Accounts Payable
No ratings yet
Audit of The Acquisition and Payment Cycle: Tests of Controls, Substantive Tests of Transactions, and Accounts Payable
39 pages
Marisela Frasuto - Beverly Hills Cop
No ratings yet
Marisela Frasuto - Beverly Hills Cop
5 pages
Environmental Sound Recognition and Classification
No ratings yet
Environmental Sound Recognition and Classification
36 pages
Intechopen 80419
No ratings yet
Intechopen 80419
18 pages
Pad Assignment 2
No ratings yet
Pad Assignment 2
12 pages
FR 107 Datasheet PDF
No ratings yet
FR 107 Datasheet PDF
2 pages
DWT and Mfccs Based Feature Extraction Methods For Isolated Word Recognition
No ratings yet
DWT and Mfccs Based Feature Extraction Methods For Isolated Word Recognition
6 pages
M FCC Review
No ratings yet
M FCC Review
10 pages
Timbre Id
No ratings yet
Timbre Id
6 pages
CBM342 BCI Unit III
No ratings yet
CBM342 BCI Unit III
16 pages
Muzic Genre Classification
No ratings yet
Muzic Genre Classification
4 pages
An Approach To Extract Feature Using MFC
No ratings yet
An Approach To Extract Feature Using MFC
5 pages
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
No ratings yet
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
232 pages
Khushboo Plastics Project 2
No ratings yet
Khushboo Plastics Project 2
42 pages
13MFCC Tutorial
No ratings yet
13MFCC Tutorial
6 pages
LSB Exercise 1 Boot Sequence
No ratings yet
LSB Exercise 1 Boot Sequence
11 pages
Pert Usa PHD
No ratings yet
Pert Usa PHD
232 pages
Gender Recognition Using Fast Fourier Transform With Ann
No ratings yet
Gender Recognition Using Fast Fourier Transform With Ann
6 pages
Predicting Singer Voice Using Convolutional Neural Network
No ratings yet
Predicting Singer Voice Using Convolutional Neural Network
17 pages
WEEK 1-2 Individual Report 2019
No ratings yet
WEEK 1-2 Individual Report 2019
4 pages
Nietjet 0602S 2018 003
No ratings yet
Nietjet 0602S 2018 003
5 pages
Introduction To Community Health and Environmental Sanitation
100% (3)
Introduction To Community Health and Environmental Sanitation
44 pages
Control of Robot Arm Based On Speech Recognition Using Mel-Frequency Cepstrum Coefficients (MFCC) and K-Nearest Neighbors (KNN) Method
No ratings yet
Control of Robot Arm Based On Speech Recognition Using Mel-Frequency Cepstrum Coefficients (MFCC) and K-Nearest Neighbors (KNN) Method
6 pages
Urban Sound Classification
No ratings yet
Urban Sound Classification
6 pages
Content-Based Classification of Musical Instrument Timbres: Agostini Longari Pollastri
100% (1)
Content-Based Classification of Musical Instrument Timbres: Agostini Longari Pollastri
8 pages
Voice Activation Using Speaker Recognition For Controlling Humanoid Robot
No ratings yet
Voice Activation Using Speaker Recognition For Controlling Humanoid Robot
6 pages
Sound Classification
No ratings yet
Sound Classification
5 pages
Feature Extraction Methods LPC, PLP and MFCC
100% (1)
Feature Extraction Methods LPC, PLP and MFCC
5 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
MFCC Feature Extraction
No ratings yet
MFCC Feature Extraction
9 pages
MFCC PDF
No ratings yet
MFCC PDF
14 pages
Voice Recognition
No ratings yet
Voice Recognition
6 pages
Unit No. Topics To Be Covered Hours: Ground Breaking Operation
No ratings yet
Unit No. Topics To Be Covered Hours: Ground Breaking Operation
32 pages
MFCC Step
100% (1)
MFCC Step
5 pages
Speaker Identification E6820 Spring '08 Final Project Report Prof. Dan Ellis
No ratings yet
Speaker Identification E6820 Spring '08 Final Project Report Prof. Dan Ellis
16 pages
Role of Statistics in Psychology
No ratings yet
Role of Statistics in Psychology
4 pages
App Selection Checklist: The Padagogy Wheel ENG V5.0 For Both Apple iOS and Android
No ratings yet
App Selection Checklist: The Padagogy Wheel ENG V5.0 For Both Apple iOS and Android
8 pages
WSP ELE ES 002 00 Engineering Specification For Electrical Facilities
No ratings yet
WSP ELE ES 002 00 Engineering Specification For Electrical Facilities
24 pages
SurgeTesting EARbasics 0716
100% (1)
SurgeTesting EARbasics 0716
2 pages
World Religion Week 2 PDF
No ratings yet
World Religion Week 2 PDF
9 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Digital Filters Design for Signal and Image Processing
From Everand
Digital Filters Design for Signal and Image Processing
Mohamed Najim
No ratings yet
Shift Left (Left) and KCS: Working Towards Better Services
No ratings yet
Shift Left (Left) and KCS: Working Towards Better Services
3 pages
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

Urban Sound Classification PaperV2

Uploaded by

Urban Sound Classification PaperV2

Uploaded by

CSCI E-81 Machine Learning & Data Mining Final Project Fall 2016

Urban Sound Classification:

Chih-Wei Chang Benjamin Doran

ABSTRACT extractions are the filterbank, and log-filterbank methods. These

5. “Take the discrete cosine transformation (DCT) of the

5.2 Optimization Results

You might also like