0% found this document useful (0 votes)

19 views4 pages

DM Final Report

Uploaded by

sweety.reddy2727

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views4 pages

DM Final Report

Uploaded by

sweety.reddy2727

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Spotify Genre Prediction

Nandini Chaganti Manya Mallikarjun Keerthi Reddy Sure

Luddy School of Informatics, Luddy School of Informatics, Luddy School of Informatics,
Computing and Engineering Computing and Engineering Computing and Engineering
Indiana University Bloomington Indiana University Bloomington Indiana University Bloomington
Bloomington, Indiana Bloomington, Indiana Bloomington, Indiana
[email protected] [email protected] [email protected]

Keywords: Genre prediction, One-Hot Encoding, MinMax Scalar,

LGBM classifier, Decision Tree Classifier,
III. DATASET
I. ABSTRACT
Genres vary from song to song on different The Dataset contains a varied set of columns, Fig A
properties. Proper classification of these features will contains the information about these columns, and the
help us predict the genre which can be quite useful number of rows in the data set is 50000. The columns
for customers who are interested in hearing songs refer to different properties of the song which differ
from a particular genre. A dataset consisting of these one song from another.
features is collected to perform the classification
techniques. Before that, preprocessing steps are
employed in order to clean the data. The techniques
used to remove null values, negative values and
unidentified characters(?). Different modeling
techniques[1] are applied to predict the genre and
accuracies are computed for each of the models. The
results indicate that the LGBM classifier has better
performance compared to other classifiers like KNN
and ridge.

II. INTRODUCTION
Fig A. Analyzing dataset
Music is a very subjective topic. Nowadays, everyone
includes music in their daily lives. There are various In the collected dataset, we have 10 types of genres.
kinds of songs. Music tastes vary from person to Which are as follows Electronic, Anime, Jazz,
person. The type of music a person prefers relies on Alternative, Country, Rap, Blues, Rock, Classical,
their mood as well as their interests or preferences[6]. Hip-Hop. All these genres are considered popular
But then. If a lot of people enjoy a song, it is genres around the world.
unquestionably regarded as a hit because it is popular
and is frequently played. Based on the preferences of
the listeners, music can be grouped using genre
prediction. Spotify can categorize songs from all
genres with ease and can even suggest musical Fig B. Types of Music Genre
subgenres to its customers.
The considered dataset has equal composition of all
This project aims at predicting the genre for the genres which is 10%, constituting 100% together,
the songs and the genre relies on various factors like which is the entire dataset.
danceability, energy, liveliness, tempo, acousticness,
etc.
other columns, so we are using
normalization on this column
Fig D indicates the skewed data for the
duration_ms column.

Fig C. Ratio of data for every genre in the dataset FigD.Normalization graph for duration_,ms

IV. DATA PREPROCESSING 6. Feature Selection:

The data is cleaned using different preprocessing
techniques which are explained below. These
processes helped to obtain better accuracy.

1. Removing Columns with 0 correlation:

track_name and instance_id columns are
dropped as they do not have any correlation
with other columns, they only have unique
values .
2. Removing NaN and duplicate values:
Duplicates and NaN values are dropped
using drop_duplicates and dropna.
3. One-Hot Encoding:
It is necessary to transform categorical data
into a numerical form and to reclassify
model predictions into a categorical form. Fig D. Heat Map for columns in the data set
Further, One Hot Encoding is used to
convert the categorical data into numeric After the observation of the heat map, columns were
data. removed which had a threshold of more than 80 so
With this method, 16 new columns, based on energy, and 4-Apr columns were dropped as they had
key, mode and obtained_data are generated a threshold more than 80 when compared with
which are filled with 0’s and 1’s. loudness and 3-Apr columns.
Key, mode and obtained_data columns are The python packages used for this process are
dropped. Sequential Feature Selector, from mlxtend
4. Replacing -1 and ? values with the mean feature_selection From sklearn linear_model
value: For feature selection, Linear regression is considered.
Duration_ms and tempo columns contain -1 After successful implementation the code has
and ? values, which are replaced with the selected the best 13 features in order to reduce the
mean values of the same columns. dimensionality, ignoring the extra columns. These 13
5. MinMax Scaler: features have created a good balance between the
It is a form of normalization which uses the accuracy and efficiency of the model.
standard deviation and mean values to map The best 13 features that are selected are listed
all the available data into the range between below: popularity, acousticness ,danceability,
min and max values. In most cases, the min duration_ms, instrumentalness, liveness, loudness,
and max values are going to be default speechiness, tempo, valence, F, Minor, 3-Apr.
which are 0 and 1.
MinMax scalar is performed on the V. MODELS
duration_ms column as this column values After preprocessing the music genre data using
have a very high range when compared to multiple techniques[4] which were discussed above.
The models are trained on 80% of the data and the
predictions are made on 20% of the data. All the the main goal while drawing the hyperplane. As a
genres in the data have equal composition. On this maximum-margin hyperplane, the depicted
data different classification methods are applied hyperplane was referred to.
which are as follows. The basic definition of these
classifiers and the pseudo-code used to obtain the G.Logistic Regression
accuracy is as follows. Only when a decision threshold is included does
logistic regression become a classification approach.
A.LGBM Classifier The classification problem itself determines the
It is a gradient boosting[2] framework that uses threshold value, which is a crucial component of
tree-based learning algorithms and is regarded as one logistic regression. Based on a given dataset of
of the most powerful computation-based algorithms. independent factors, logistic regression calculates the
It is regarded as a processing algorithm with quick likelihood that an event will occur. Given that the
speeds. By using this classifier the obtained accuracy result is a probability, the dependent variable's range
is comparatively higher than other classifiers. is 0 to 1.

B. KNN H.Naive Bayes

The k-nearest neighbor's algorithm, often known as Simple models like the Naive Bayes Classifier are
KNN or k-NN, is a supervised learning classifier that frequently employed in classification issues. The core
makes predictions or classifications about how a principles are extremely simple to understand, and
single data point will be grouped. Although it can be the arithmetic that supports them is also quite
used to solve classification or regression problems, it understandable. However, this model performs
is frequently used as a classification technique surprisingly well in a lot of situations, and it as well
because it is predicated on the notion that similar as its modifications are utilized to solve a lot of
points can be found nearby. issues.

C.Ridge Classifier VI. RESULTS

In order to solve the problem, the Ridge Classifier, The accuracy is finally obtained after the successful
which is based on the Ridge regression methodology, execution of all the defined\classifier models on the
transforms the label data into the range [-1, 1]. given dataset. The classifiers have to predict the
Multiple output regression is used for multiclass data, correct data genre from the set of ten genres. The
and the target class is the one with the greatest classifiers highly vary in the results. The one that
prediction value. stands out with highest accuracy of 62.8% is LGBM
Classifier. KNN takes the next place with 51.98%
D. Random Forest accuracy. Eight different classifiers are used in order
The widely used machine learning technique known to compare and obtain the best accuracy possible.
as random forest combines the output of multiple These accuracies are clearly tabularized in table A.
decision trees to get a single conclusion. Its
adaptability and use have boosted its popularity since Classifiers ROC AUC Score Accuracy
it can resolve classification and regression
challenges.[3] LGBM Classifier 0.9822 62.8
Logistic Regression 0.7640 29.14
E. Decision Tree
KNN 0.9647 51.98
Models for supervised machine learning include
decision tree classifiers. This indicates that they train Support Vector 0.561 17.99
an algorithm that can make predictions using pre
DecisionTree 0.999 43.81
labeled data. Regression issues can also be solved
with decision trees. Random Forest 0.884 47.12
Naïve Bayes 0.853 41.81
F.Support Vector Classifier
The training examples are plotted in space. There Ridge 0.465 46.84
should be an apparent gap between these data points.
A straight hyperplane splitting two classes is what is Table A. Table with accuracy results of genre
predicted. Maximizing the distance from the prediction
hyperplane to the closest data point of either class is
Fig E represents the confusion matrix constructed on [4] Adragna, Robert, and Yuan Hong Bill Sun.
top of the predicted data and actual results when the "Music Genre Classification." MIE324 Project
LGBM classifier is used. The Confusion matrix
Report (2019).
clearly shows the prediction for all the ten genres.

[5] Huang, Derek A., Arianna A. Serafini, and Eli J.

Pugh. "Music Genre Classification." CS229 Stanford
(2018).

[6] Dawson Jr, Christopher E., et al. "Spotify: You

have a Hit!." SMU Data Science Review 5.3 (2021): 9

Fig E. Confusion Matrix for LGBM Classifier

VII. DISCUSSION

From the results, it is clear that the LGBM classifier

provides sufficient accuracy as compared to other
models like Logistic regression, KNN, Decision tree,
support vector, random forest, naive bayes and ridge.
It provides an accuracy of 62.8. KNN and random
forest provide an accuracy of 51.98 and 47.12
respectively. So, for this particular dataset, using the
LGBM classifier, the music genre ‘Anime’ seems to
be the best predicted[5] followed by ‘Rock’with this
classification technique and the least predicted is
‘Country ’.

VIII. REFERENCES

[1] Luo, Kehan. "Machine Learning Approach for

Genre Prediction On Spotify Top Ranking Songs."
(2018).

[2] Bang-Dang Pham, Minh-Triet Tran, and

Hoang-Long Pham. Hit song prediction based on
gradient boosting decision tree. In 2020 7th
NAFOSTED Conference on Information and
Computer Science (NICS), pages 356–361. IEEE,
2020

[3] Leo Breiman. Random forests. Machine learning,

45(1):5–32, 2001.

Introduction To Data Mining 2005
60% (5)
Introduction To Data Mining 2005
400 pages
Model Combination in Multiclass Classification
No ratings yet
Model Combination in Multiclass Classification
182 pages
Supervised Learning - Basics
No ratings yet
Supervised Learning - Basics
115 pages
Compose Compute - Computer Generation and Classification of Music Through Operations Research Methods
No ratings yet
Compose Compute - Computer Generation and Classification of Music Through Operations Research Methods
250 pages
Predict
No ratings yet
Predict
196 pages
Bryn Lansdown
No ratings yet
Bryn Lansdown
48 pages
Final Project ML Classification Heart Disease Prediction
No ratings yet
Final Project ML Classification Heart Disease Prediction
39 pages
Lecture 11 - 09.09.24 Classification Part 1
No ratings yet
Lecture 11 - 09.09.24 Classification Part 1
51 pages
Standard Machine Learning Techniques in Audio Beehive Monitoring
No ratings yet
Standard Machine Learning Techniques in Audio Beehive Monitoring
57 pages
Features of Bayesian Learning Methods
No ratings yet
Features of Bayesian Learning Methods
39 pages
Music Genre Classification With ResNet and
No ratings yet
Music Genre Classification With ResNet and
17 pages
Music - Genre - Classification Final Paper1 Copy Final
No ratings yet
Music - Genre - Classification Final Paper1 Copy Final
16 pages
Music Genre Classification Using Machine Learning
No ratings yet
Music Genre Classification Using Machine Learning
3 pages
T Sivaprakash MBA BA03 040 Capstone Project
No ratings yet
T Sivaprakash MBA BA03 040 Capstone Project
16 pages
Class 2a-Decision Trees
No ratings yet
Class 2a-Decision Trees
28 pages
Major
No ratings yet
Major
15 pages
Genre Classification Detailed Comparison Final
No ratings yet
Genre Classification Detailed Comparison Final
8 pages
Automated Music Genre Classification Through Deep
No ratings yet
Automated Music Genre Classification Through Deep
12 pages
LS Project Report
No ratings yet
LS Project Report
10 pages
Mini Project - Aiml
No ratings yet
Mini Project - Aiml
8 pages
5 Markd
No ratings yet
5 Markd
24 pages
Generating Music Using AI: Ebba Rickard
No ratings yet
Generating Music Using AI: Ebba Rickard
66 pages
Deep BiDirec Transformers-Base Masked Predictive
No ratings yet
Deep BiDirec Transformers-Base Masked Predictive
17 pages
B.SC Cs Batchno 8
No ratings yet
B.SC Cs Batchno 8
40 pages
Concise Papers: Fast Recognition of Musical Genres Using RBF Networks
No ratings yet
Concise Papers: Fast Recognition of Musical Genres Using RBF Networks
5 pages
AI and DS QB1
No ratings yet
AI and DS QB1
31 pages
Ecs 171 Project Report Group 11
No ratings yet
Ecs 171 Project Report Group 11
34 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
Full Text 02
No ratings yet
Full Text 02
52 pages
Lab 4 - Logistic Regression - KNN
No ratings yet
Lab 4 - Logistic Regression - KNN
4 pages
Report
No ratings yet
Report
14 pages
Seminar Report - 3sem
No ratings yet
Seminar Report - 3sem
34 pages
A Comprehensive Review of Machine Learning Algorithms For Classification
No ratings yet
A Comprehensive Review of Machine Learning Algorithms For Classification
2 pages
Music Genre Classification
No ratings yet
Music Genre Classification
33 pages
Pivot Excel
No ratings yet
Pivot Excel
50 pages
UCLA Electronic Theses and Dissertations: Title
No ratings yet
UCLA Electronic Theses and Dissertations: Title
43 pages
1 en 4 Chapter Author
No ratings yet
1 en 4 Chapter Author
10 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Prediction Stock Price Using Data Science Technique
No ratings yet
Prediction Stock Price Using Data Science Technique
11 pages
ML Case Study
No ratings yet
ML Case Study
5 pages
Irjet Music Information Retrieval and Ge
No ratings yet
Irjet Music Information Retrieval and Ge
8 pages
Mcon 1 2 ML Methodology MCON
No ratings yet
Mcon 1 2 ML Methodology MCON
2 pages
Classification and Regression Tyiannak - pyAudioAnalysis Wiki GitHub
No ratings yet
Classification and Regression Tyiannak - pyAudioAnalysis Wiki GitHub
12 pages
7 Types of Classification Algorithms
No ratings yet
7 Types of Classification Algorithms
9 pages
Rachel Mellon, Dan Spaeth, Eric Theis, Genre Classification Using Graph Representations of Music
No ratings yet
Rachel Mellon, Dan Spaeth, Eric Theis, Genre Classification Using Graph Representations of Music
5 pages
CS229 Final Report - Music Genre Classification
No ratings yet
CS229 Final Report - Music Genre Classification
6 pages
Data Science Assignment 2
No ratings yet
Data Science Assignment 2
14 pages
Comparing Xgboost With Logistic Regression and K-Nearest Neighbours in Music Genre Classification
No ratings yet
Comparing Xgboost With Logistic Regression and K-Nearest Neighbours in Music Genre Classification
11 pages
Music Genre Classification Using A Hierarchical Long Short Term Memory (LSTM) Model
No ratings yet
Music Genre Classification Using A Hierarchical Long Short Term Memory (LSTM) Model
6 pages
Music Genre Classification
No ratings yet
Music Genre Classification
5 pages
Big Data in The Construction
No ratings yet
Big Data in The Construction
36 pages
Classification and Popularity Assessment of English Songs Based On Audio Features
No ratings yet
Classification and Popularity Assessment of English Songs Based On Audio Features
3 pages
Music Genre Classification Using Machine Learning: Prajwal R, Shubham Sharma, Prasanna Naik, Mrs. Sugna MK
No ratings yet
Music Genre Classification Using Machine Learning: Prajwal R, Shubham Sharma, Prasanna Naik, Mrs. Sugna MK
5 pages
Characterizing and Classifying Music Subgenres: Adam Lefaivre John Z. Zhang
No ratings yet
Characterizing and Classifying Music Subgenres: Adam Lefaivre John Z. Zhang
5 pages
Music Genre Classification Using Machine Learning Techniques: April 2018
No ratings yet
Music Genre Classification Using Machine Learning Techniques: April 2018
13 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Abstract Book-ICMSc 2021
No ratings yet
Abstract Book-ICMSc 2021
127 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Music Genre Classification For Indian Mu
No ratings yet
Music Genre Classification For Indian Mu
9 pages
MT1 SP19 Solutions
No ratings yet
MT1 SP19 Solutions
14 pages
Spam Detection
No ratings yet
Spam Detection
39 pages
Chapter 1: Introduction: 1.1 Overview
No ratings yet
Chapter 1: Introduction: 1.1 Overview
8 pages
ML Last Min Notes
No ratings yet
ML Last Min Notes
81 pages
Unit 2
No ratings yet
Unit 2
57 pages
Generative AI and Machine Learning Course Content
No ratings yet
Generative AI and Machine Learning Course Content
19 pages
Full Grokking Machine Learning MEAP v07 Luis G Serrano Ebook All Chapters
100% (4)
Full Grokking Machine Learning MEAP v07 Luis G Serrano Ebook All Chapters
55 pages
Learning Python
No ratings yet
Learning Python
8 pages
40 Algorithms Every Data Scientist Should Know Jurgen Weichenberger Huw Kwon
No ratings yet
40 Algorithms Every Data Scientist Should Know Jurgen Weichenberger Huw Kwon
39 pages
Text Clustering
No ratings yet
Text Clustering
47 pages
Machine Learning - Machine - Learning - Tutorial
No ratings yet
Machine Learning - Machine - Learning - Tutorial
35 pages
Cs6601 Project 2 Paper
No ratings yet
Cs6601 Project 2 Paper
4 pages
A Study of Classification Algorithms Using Rapidminer
No ratings yet
A Study of Classification Algorithms Using Rapidminer
12 pages
Disease Prediction Using Symptoms Based On Machine Learning Algorithms and Natural Language Processing
No ratings yet
Disease Prediction Using Symptoms Based On Machine Learning Algorithms and Natural Language Processing
7 pages
Efficiency Improvement in Classification Tasks Using Naive Bayes PDF
No ratings yet
Efficiency Improvement in Classification Tasks Using Naive Bayes PDF
5 pages
Personality Classification System Using Data Mining: Abstract - Personality Is One Feature That Determines How
No ratings yet
Personality Classification System Using Data Mining: Abstract - Personality Is One Feature That Determines How
4 pages
Applying Machine Learning Methods To Predict Geology Using Soil Sample Geochemistry
No ratings yet
Applying Machine Learning Methods To Predict Geology Using Soil Sample Geochemistry
13 pages
Predictive Analytics For Enhanced Passenger Satisfaction in The Airline Industry: Leveraging Machine Learning To Drive Strategic Decision-Making
No ratings yet
Predictive Analytics For Enhanced Passenger Satisfaction in The Airline Industry: Leveraging Machine Learning To Drive Strategic Decision-Making
6 pages
(IJCST-V11I6P2) :ms. Madhuri P. Narkhede, Dr. Harshali B Patil
No ratings yet
(IJCST-V11I6P2) :ms. Madhuri P. Narkhede, Dr. Harshali B Patil
5 pages
VBD Surveillance Case Studies
No ratings yet
VBD Surveillance Case Studies
12 pages
Plant Disease Diagnosis Expert System Cardamom Amm
No ratings yet
Plant Disease Diagnosis Expert System Cardamom Amm
6 pages
Naivebayes
No ratings yet
Naivebayes
7 pages
Purnamasari 2020 J. Phys. Conf. Ser. 1641 012010
No ratings yet
Purnamasari 2020 J. Phys. Conf. Ser. 1641 012010
9 pages
Lecture 3
No ratings yet
Lecture 3
6 pages
Skin Detection A Bayesian Network Approach
No ratings yet
Skin Detection A Bayesian Network Approach
5 pages
Department of Computing
No ratings yet
Department of Computing
5 pages
Genie Modeler User Manual: Version 3.0.R2, Built On 11/5/2020 Bayesfusion, LLC
No ratings yet
Genie Modeler User Manual: Version 3.0.R2, Built On 11/5/2020 Bayesfusion, LLC
614 pages
Survey of Heart Disease Prediction Based On Data Mining Algorithms Ijariie1844
No ratings yet
Survey of Heart Disease Prediction Based On Data Mining Algorithms Ijariie1844
5 pages
Video Lectures: Week 1 Course Introduction
No ratings yet
Video Lectures: Week 1 Course Introduction
5 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

DM Final Report

Uploaded by

DM Final Report

Uploaded by

Spotify Genre Prediction

Nandini Chaganti Manya Mallikarjun Keerthi Reddy Sure

Keywords: Genre prediction, One-Hot Encoding, MinMax Scalar,

IV. DATA PREPROCESSING 6. Feature Selection:

1. Removing Columns with 0 correlation:

B. KNN H.Naive Bayes

C.Ridge Classifier VI. RESULTS

[5] Huang, Derek A., Arianna A. Serafini, and Eli J.

[6] Dawson Jr, Christopher E., et al. "Spotify: You

Fig E. Confusion Matrix for LGBM Classifier

From the results, it is clear that the LGBM classifier

[1] Luo, Kehan. "Machine Learning Approach for

[2] Bang-Dang Pham, Minh-Triet Tran, and

[3] Leo Breiman. Random forests. Machine learning,

You might also like