Song Classification Using Machine Learning
Song Classification Using Machine Learning
https://doi.org/10.22214/ijraset.2023.50890
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Abstract: The classification of music by genre is crucial in the modern world since the number of music tracks, both online and
offline, is growing quickly. We must appropriately index them in order to have greater access to them. To retrieve music from a
vast collection, automatic music genre classification is crucial. The majority of the current methods for categorising music
genres rely on machine learning. We give a music dataset with ten distinct genres in this article. The system is trained and
classified using a Deep Learning technique. Convolution neural networks are employed in this instance for training and
classification. For audio analysis, feature extraction is the most important step. For sound samples, the Mel Frequency Cepstral
Coefficient (MFCC) is employed as a feature vector. The suggested technique uses feature vector extraction to categorise music
into different genres. Our findings indicate that our system's accuracy level is approximately 76%, which will significantly
increase and facilitate the automatic classification of musical genres.
Keywords: Classification of music genres, deep learning, convolutional neural networks, and neural networks.
I. INTRODUCTION
With the abundance of music at consumers' fingertips throughout the globe, there is a growing need for automatic classification of
music for indexing of music and easier retrieval, which is frequently done manually by specialists in the field. In a nutshell, the issue
statement for our project may be stated as follows: Given a number of audio recordings, the job is to classify each audio file into a
specific category, such as audio that belongs to happy, sad, etc. Audio processing is one of the more difficult data science projects
compared to image processing and other classification techniques.One such use is the classification of music genres, which seeks to
place audio files in the appropriate sound groups to which they belong. Because classifying music manually requires listening to
each song for the entirety, the application is crucial and needs automation to reduce manual error and time. Therefore, we will
employ machine learning and deep learning techniques to automate the procedure.
In a nutshell, the issue statement for our project may be stated as follows: Given a number of audio files, the job is to classify each
audio file into a specific genre, such as disco, hip-hop, etc.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3760
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
A classification algorithm uses a dataset of labelled examples as inputs to create a model that can automatically categorise unlabeled
examples when presented with new, unlabeled data. A binary classification problem is one where there are just two labels (such as
"calm" or "rock"). The challenge of multi-class classification arises when there are three or more labels in the set. We are looking at
a multi-class problem because the set contains a variety of genres.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3761
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
musical genres, particularly the significance of STFT measured using the Bark Scale (Zwicker and Fastl, 1999). Among the
features used by (Tzanetakis and Cook, 2002)[4] were spectral contrast, spectral roll-off, and mel-frequency cepstral
coefficients (MFCCs). In Nanni et al. (2016), SVM and AdaBoost classifiers are trained using a combination of audio and
visual information.
III. METHODOLOGY
Convolutional neural networks (CNNs) are fed data from these visual representations.[14] The authors of Lidy and Rauber
(2005)[13] talk on the use of psychoacoustic properties for classifying musical genres, particularly the significance of STFT
measured using the Bark Scale (Zwicker and Fastl, 1999). Among the features used by (Tzanetakis and Cook, 2002)[4] were
spectral contrast, spectral roll-off, and mel-frequency cepstral coefficients (MFCCs). In Nannietal. (2016), SVM and AdaBoost
classifiers are trained using a combination of audio and visual information.
A. Common ML Algorithms
A few of the algorithms are described below.
1) Artificial Neural Network (ANN): ANNs are effective parallel-processing mathematical modelling systems that may simulate
biological neural networks by using interconnected neuron units. The most well-liked learning algorithms in ML are ANNs,
which are well-known for their adaptability, efficiency, and ability to represent complex flood processes with high fault
tolerance and precise approximation. As a result, ANNs are regarded as trustworthy data-driven tools for developing black-box
models of intricate and nonlinear interactions between rainfall and flooding as well as forecasting river flow and discharge.
Numerous flood prediction applications, such as streamflow forecasting, river flow, rainfall-runoff, precipitation-runoff
modelling, water quality, evaporation, river stage prediction, low-flow estimation, flood mapping and susceptibility, and river
time series, have already been successfully implemented using artificial neural networks (ANNs). Iterative parameter
adjustment is one of the main drawbacks of ANN use.
2) Support Vector Machine (SVM): Flood modelling makes extensive use of SVM, a supervised learning machine that operates on
the principles of structural risk minimization and statistical learning theory. The SVM's training process creates models that
assign new non-probabilistic binary linear classifiers that, by using inverse problem-solving, minimise the empirical
classification error and maximise the geometric margin. Based on training from historical data, SVM is used to predict a
quantity going forward in time. SVMs are now recognised as reliable and effective ML flood prediction systems. As ML
alternatives to ANNs, SVM and SVR have gained appeal among hydrologists for flood prediction. As a result, they are used to
predict floods in a variety of situations with promising results, superior generalisation ability, and higher performance when
compared to ANNs, such as in cases of extreme rainfall, precipitation, rainfall-runoff, reservoir inflow, streamflow, flood
quantiles, flood time series, and soil moisture.
3) K- Nearest Neighbour (KNN): Problems involving classification and regression can both be solved using this approach. It
appears that the solution of categorization issues is more frequently applied within the Data Science business. It is a
straightforward algorithm that sorts new instances by getting the consent of at least k of its neighbours and then saves all of the
existing cases.This calculation is made using a distance function.By drawing parallels between.KNN and actual life, it is simple
to comprehend. For instance, it makes sense to speak with a person's friends and co-workers if you want to learn more about
them.
Before using the K Nearest Neighbours Algorithm, keep the following points in mind:
KNN is computationally expensive
Variables should be normalised to prevent greater range variables from skewing the algorithm
Data still needs to be pre-processed.
4) Convolutional Neural Network (CNN): Convolutional neural network is a Deep Learning method built specifically for working
with photos and videos. It uses photographs as inputs, extracts and learns the image's attributes, then categorises the images
using the learned features. This programme takes its cues from how the Visual Cortex functions in the human brain. Processing
of visual data from the outside world is carried out by the visual cortex, a region of the human brain. It has many levels, and
each layer functions independently, extracting different information from images or other visuals. Once all the information from
the various layers has been merged, the picture or visual is then evaluated or classed.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3762
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
A neural network type called a convolutional neural network, or CNN or ConvNet, is particularly adept at processing input with a
grid-like architecture, like an image. A binary representation of visual data is a digital image. It is made up of a grid-like
arrangement of pixels, each of which has a pixel value to indicate how bright and what colour it should beIn CNN, rather than all
the neurons in the fully linked layer, a layer's neurons will only be connected to a tiny portion of the layer.
The picture classification model will be created, trained, and tested using the Python programming language. The model could be
categorised roughly into:
a) Importing libraries and getting data ready
b) Model definition
c) Report on classification
d) Confusion Matrix
e) Last classified photos
V. CONCLUSION
Our application successfully categorises playlists according to mood with the aid of machine learning, giving users a categorised
playlist. When a playlist is being listened to, the listener feels more at ease and filled with emotions, which boosts their mood and
improves their mental condition. Marilyn Manson once said that "Music is the strongest form of magic" because music has the
power to heal people and transform their emotions, which is equivalent to any form of magic.
Different music from your mood can make you feel stressed and unhappy, which can lead to low energy or inappropriate actions.
However, this application's playlist perfectly matches the user's mood. The right music energises and inspires people to combat or
handle their current predicament.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3763
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
REFERENCES
[1] Neural Network Music Genre Classification des genres de par reseau-neuronal (Nikki Pelchat).
[2] Music Genre Classification using Machine Learning Techniques (by Hareesh Bahuleyan)
[3] Music Genre Classification Using Deep Learning (by Navneet Parab, Shikta Das, Gunj Goda, Ameya Naik)
[4] George Tzanetakis and Perry Cook. 2002. Musical genre classification of audio signals. IEEE Transactions on speech and audio processing 10(5):293– 302.
[5] Y. M. Costa, L. S. Oliveira, and C. N. Silla, “An evaluation of convolutional neural networks for music classification using spectrograms,” Appl. Soft Comput.,
vol. 52, pp. 28–38, Mar. 2017. Accessed: Dec. 16, 2018. [Online]
[6] “On Combining Diverse Models for Lyrics-Based Music Genre Classification ,Caio Luiggy Riyoichi SawadaUeno;Diego Furtado Silva, 2019 8th Brazilian
Conference on Intelligent Systems (BRACIS).
[7] J. Despois. Finding the Genre of a Song With Deep Learning— A.I. Odyssey Part. 1. Accessed: Dec. 27, 2018. [Online]. Available:
https://hackernoon.com/finding-the-genre-of-a-song-with-deep-learningda8f59a61194.
[8] F. Pachet and D. Cazaly, “A classification of musical genre,” in Proc. RIAO Content-Based Multimedia Information Access Conf., Paris, France, Mar. 2000.
[9] S. Gollapudi, Practial Machine Learning. Birmingham, U.K.: Packt, 2016.
[10] T. O’Brien. (2017). Learning to Understand Music From Shazam. Accessed: Dec. 19, 2018. [Online]. Available: https://blog.shazam. com/learning-to-
understand-music-from-shazam-56a60788b62
[11] T. Feng. Deep learning for music genre classification. 2014.
[12] R. Panda and R. P. Paiva, “Mirex 2012: Mood classification tasks submission,” Machine Learning, vol. 53, no. 1-2, pp. 23–69, 2003
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3764