0% found this document useful (0 votes)
91 views11 pages

Proiect ME

This document summarizes a process for analyzing music data using spectral analysis and k-means clustering. It first describes accessing a dataset of 10,000 songs from the Million Song Dataset in HDF5 format. It then explains performing feature extraction on the data using fast Fourier transforms (FFT) to project the high-dimensional input to a lower dimension. Finally, it outlines applying k-means clustering to group the features into K clusters, iterating until cluster centroids converge. Potential weaknesses of k-means clustering are also noted.

Uploaded by

Razvan Mazilu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views11 pages

Proiect ME

This document summarizes a process for analyzing music data using spectral analysis and k-means clustering. It first describes accessing a dataset of 10,000 songs from the Million Song Dataset in HDF5 format. It then explains performing feature extraction on the data using fast Fourier transforms (FFT) to project the high-dimensional input to a lower dimension. Finally, it outlines applying k-means clustering to group the features into K clusters, iterating until cluster centroids converge. Potential weaknesses of k-means clustering are also noted.

Uploaded by

Razvan Mazilu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Million Song Dataset

Feature extraction with Spectral Analysis


Classification with k-Means algorithm
Data Set used (1)

MillionSongSubset from https://labrosa.ee.columbia.edu. 10000 songs (1%


from Million Song Dataset) selected random.
Data are in HDF5 format, which is a dedicated format to organize big data
arrays.
I have used a Matlab wrapper in order access the from the HDF5 files. This
wrapper was found on https://labrosa.ee.columbia.edu also.
Data Set used (2)

• Data for each song is wrapped in a .h5 . It looks like in the bellow pictures:
• There are no audio signal data, only metadata like year,
artist…
Input set

1000 arrays like in picture with ascii code of songs


name
Feature extraction using Spectral Analysis

• Features extraction means to create a projection form a M dimensional


space of the input features to N dimensional space (N < M). The new
features from the N dimensional spaces shall be uncorrelated.
• Spectral Analysis can be done using FFT, which is already implemented in
MATLAB. The function for FFT is fft();
Apply fft to input data

• we observe that only the first element has a


significant value
• we are going to select only 1st element from
each row from the input data.
Classification using K-means algorithm

• Classification using K-means algorithm means to group the input features in K


clusters using an iterative method.
• Steps for K-means algorithm are next ones:
• Set randomly K centroids in input features spaces.
• Calculate distances from each features to the all centroids and assign the feature to the
closest one.
• Recalculate the centroids based on the features in each cluster.
• Repeat until convergence (there is no more features which change the cluster from they
appear)
K Means Clustering

http://rossfarrelly.blogspot.ro/2012/12/k-meansclustering.html
Weakness of K-means Algorithm

• It is not robust to outliners. Very far data from the centroid, will pull the centroid away from
the real one
• The result is circular cluster shape because is based on distance
• Sensitive to initial condition. Different initial condition may produce different result of
cluster. The algorithm may be trapped in the local optimum.
• When the numbers of data are not so many, initial groping will determine the cluster
significantly

http://people.revoledu.com/kardi/tutorial/kMean/Weakness.htm
Thank you!

You might also like