A Song Classifier For Predicting User Preference Based On Spotify Song Attributes
A Song Classifier For Predicting User Preference Based On Spotify Song Attributes
Yong Yang Boon Siew Mooi Lim Annebel Yun Ying Choong
Faculty of Computing and Faculty of Computing and Faculty of Computing and
Information Technology Information Technology Information Technology
Tunku Abdul Rahman University Tunku Abdul Rahman University Tunku Abdul Rahman University
of Management and Technology of Management and Technology of Management and Technology
Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia Kuala Lumpur, Malaysia
boonyy- [email protected] annebelcyy-
[email protected] [email protected]
Abstract—This study addresses the challenges of predicting industry, including developing recommender systems and
user preferences for songs by utilizing machine learning song classifiers.
algorithms. Existing research in this area has primarily focused
on user-based collaborative filtering or content-based One example of such an application is the Spotify Song
approaches, neglecting the potential of utilizing song attributes Attributes dataset, which contains data on many songs,
for personalized song recommendations. Several algorithms are podcasts, and their attributes [1]. By using this dataset, it is
evaluated in this study, including Random Forest Classifier, possible to gain insights into what factors influence a song's
Logistic Regression, Gaussian Naive Bayes, Extreme Gradient popularity and how different songs compare in terms of their
Boosting, Dummy Classifier, and Stacking Classifier. The attributes.
Stacking Classifier model was chosen as the best model due to its
consistently high accuracy, precision, recall, and F1 score. In this study, we developed a song classifier to predict
Spotify API is used in the deployment process to retrieve song whether a user likes or dislikes a song based on its attributes.
attributes, encode them, and input them into the model for The approach uses these machine learning techniques:
prediction. In addition, the model's accuracy is evaluated using Random Forest Classifier, Extreme Gradient Boosting (XGB)
two different playlists, with predicted results of songs that the Classifier, Logistic Regression, Gaussian Naive Bayes,
user would like or dislike. Overall, the study suggests that the
Dummy Classifier, and Stacking Classifier to classify songs
Stacking Classifier model is suitable for predicting song
preferences on Spotify. Furthermore, the deployment process
based on their attributes accurately. For each technique,
outlined in this study offers a convenient tool for users to predict hyperparameters were selected based on previous literature
their preferences for individual songs or playlists. This can and cross-validation results.
empower users to curate their music collections more effectively The dataset used is the Spotify Song Attributes. We
and help music streaming platforms like Spotify to further preprocessed the dataset to handle missing values, outliers,
improve their recommendation systems. and categorical features. Feature engineering techniques were
also applied, such as normalization and feature selection.
Keywords: Spotify API, Song Attributes, Predictive Modeling,
Music Preference, Stacking Classifier Model evaluation metrics used in this project include
accuracy, precision, recall, F1-score, and area under the curve
I. INTRODUCTION (AUC). Additionally, we explored the deployment of the
Digital music streaming services have revolutionized how model using Spotify's API and discussed its potential
people interact with and discover music. Spotify is one of the applications in the music industry.
most renowned platforms among these services, permitting The study aims to contribute to the growing research on
users access to an expansive collection of music and audio music recommendation systems and song classifiers. The
content. With millions of daily users across the globe, Spotify results could have practical implications for music streaming
collects user data to optimize customer experience and better services such as Spotify by improving their recommendation
meet user needs. This data includes various attributes about algorithms' accuracy and offering users more personalized
the songs and podcasts, such as their acoustics, danceability, content. In addition, the results could also be significant for
duration, energy, instrumentation, liveliness, volume, the music industry in understanding the factors that influence a
linguistic quality, tempo, and value. This data's availability song's popularity and creating content that resonates with
has led to several data science applications in the music listeners.
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on August 06,2024 at 15:47:09 UTC from IEEE Xplore. Restrictions apply.
2023 13th International Conference on Information Technology in Asia (CITA)
II. RELATED WORKS AND ALGORITHMS [10] have used four machine learning techniques: K-
Numerous studies have explored machine learning Nearest Neighbor (KNN), Support Vector Machine, Logistic
techniques for predicting song popularity. For instance, [2] Regression, and Random Forest for classifying audio music
concluded that social influence significantly determines a based on acoustic and spectrogram features. Initially, the
song's popularity and quality. [3] also employed the Million models had low performance, but they applied techniques like
Song Dataset to evaluate various classification and regression GridSearchCV, RandomizedSearchCV, and found the optimal
algorithms to forecast song popularity and determine the K value for the KNN model. As a result, the accuracy of the
features with the highest predictive power. Underscores the models was greatly improved. The authors also found that
significance of analyzing acoustic and metadata features to selecting relevant features and training the model with those
forecast song popularity, which our project aims to features could significantly increase model performance.
accomplish. Future work includes exploring deep learning approaches and
optimizing model parameters to improve predictions.
[4] also explored the impact of social influence on song
popularity by predicting billboard success based on peer-to- [11] compared the performance of two models for
peer networks. They utilized Multiple Regression and categorizing music files into genres. The first model utilized a
classification algorithms to make their predictions. Finally, [5] deep learning approach where a CNN model is trained end-to-
demonstrated that machine-learning techniques could label end to predict the genre label of an audio signal using its
songs based on acoustic features. They used AdaBoost and spectrogram. In comparison, the second model used hand-
FilterBoost to predict social tags from acoustic features in an crafted features from the time and frequency domains to train
extensive music dataset [5]. four traditional machine learning classifiers [11]. The study
was conducted on an Audio dataset. An area under the ROC
[6] explore the automatic classification of audio signals curve (AUC) value of 0.894 was reported for an ensemble
into a hierarchy of musical genres, proposing three feature sets classifier that combined both approaches. The CNN model
for representing timbral texture, rhythmic content, and pitch outperformed the feature-engineered models, and ensembling
content. The paper highlights the potential of automatic the CNN and XGBoost model proved beneficial.
musical genre classification as a valuable addition to music
information retrieval systems and a framework for developing The studies above offer valuable insights into the elements
and evaluating features for content-based analysis of musical that influence song popularity and demonstrate the potential of
signals. The study achieves a classification accuracy of 61% machine learning methods in forecasting it. Our project aims
for ten musical genres, comparable to results reported for to build upon these studies by considering acoustic and
human musical genre classification. metadata features to create a more accurate prediction model
for song preferences using Spotify's song attributes.
[7] pointed out the importance of categorizing music into
genres to satisfy the needs of different listeners and cultures. III. DATASET AND FEATURES
The proposed research compares various classification We used the Spotify Song Attributes dataset in this project
models, including a new Convolutional Neural Network [1]. The dataset contains more than 2,000 songs. 13 attributes
(CNN) model, which outperforms previous models' accuracy. characterize each song in the dataset:
The study uses the GTZAN dataset and achieves an accuracy
of 91%, comparable to human understanding of the genre.
However, some genres, like country and rock, are confused
with other styles. At the same time, traditional and blues are
quickly identified, suggesting potential issues with the models'
ability to distinguish between specific genres.
[8] compared deep learning and traditional machine
learning algorithms for music genre classification using the
GTZAN dataset. They found that both approaches had similar
accuracy and that more training data could improve the
model's performance. At the same time, their CNN model was
expected to outperform traditional models. However, they
acknowledged the need for further testing. Overall, the
research contributed to using CNN architecture for music
genre classification and the potential for improved accuracy
with more data.
[9] proposed a method for music genre classification
based on the visual Mel spectrum using YOLOv4 as the neural
network architecture. The feasibility of the proposed method is Fig. 1. List of attributes of the dataset
evaluated through ten experiments, and the scoring criterion
used is mAP. The results show the highest mAP of 99.26% 1. Acousticness: a measure of the presence of acoustic
and an average mAP of 97.93%. The study demonstrated the instruments in a track, ranging from 0 (least acoustic) to 1
effectiveness of using the visual Mel spectrum for music genre (most acoustic).
classification. Furthermore, it highlights the advantages of the 2. Danceability: a measure of how suitable a track is for
graphical spectrum diagram, including high generalization and dancing, based on its tempo, rhythm stability, and beat
not requiring the building of a professional audio model. strength.
However, a high hardware cost is needed. Future work is 3. Duration: track duration in milliseconds.
suggested to improve the model's performance and reduce the 4. Energy: measurement of the intensity and activity of a
hardware cost. track, ranging from 0 (least energetic) to 1 (most energetic).
2
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on August 06,2024 at 15:47:09 UTC from IEEE Xplore. Restrictions apply.
2023 13th International Conference on Information Technology in Asia (CITA)
5. Instrumentalness: measurement of the presence of vocals ● The majority of songs having a duration between 2.5 to
in a track, ranging from 0 (least instrumental) to 1 (most 4.5 minutes may indicate that this is a typical length for
instrumental). popular songs or that it is a length that is particularly
6. Key: the musical key of a track, represented as an integer well-suited to the preferences of the dataset's author.
from 0 to 11, where 0 corresponds to C, 1 to C#/Db, 2 to D, ● Most songs with 125 tempo might indicate that this tempo
etc. is more favorable to listeners or is commonly used in
7. Liveness: measurement of the presence of a live audience in popular music genres.
a track, ranging from 0 (least live) to 1 (most live). ● Low instrumentalness and acousticness might suggest that
8. Loudness: measurement of the perceived loudness of a
users prefer more electronically produced songs and are
track, in decibels (dB).
less dependent on traditional acoustic instruments.
9. Mode: indicating whether it is in a major or minor key. A
value of 0 indicates a minor key, and a 1 indicates a major ● Low liveness and speechiness might suggest that users
key. prefer songs more focused on the musical aspect and less
10. Speechiness: Measurement of the presence of spoken on live performances or spoken word elements.
word in a track, ranging from 0 (little spoken word) to 1
(much spoken word).
11. Tempo: The tempo of a track in beats per minute (BPM).
12. Time signature: the time signature of a track, represented
as an integer that indicates the number of beats per measure.
13. Valence: a measure of a track's positive or negative mood,
ranging from 0 (least positive) to 1 (most positive).
Moreover, each song in the dataset is labeled with a target
of 0 or 1, indicating whether the dataset's author dislikes or
likes the song. The other available attributes for each song are Fig. 3(a). The distributions of discrete features
the song's title and artist.
A. Data Visualization and Analysis
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on August 06,2024 at 15:47:09 UTC from IEEE Xplore. Restrictions apply.
2023 13th International Conference on Information Technology in Asia (CITA)
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on August 06,2024 at 15:47:09 UTC from IEEE Xplore. Restrictions apply.
2023 13th International Conference on Information Technology in Asia (CITA)
Fig. 9. The playlists used for like or dislike song prediction on Spotify
As shown in Fig. 8, all the song IDs are retrieved and Fig. 10. Comparison between the two playlists' predicted liked or disliked
separated with commas. The next step is to copy all the song songs.
ID retrieved and paste it into the next step to get all the
attributes from the Spotify API and put them into a data frame V. MODELING RESULTS AND ANALYSIS
as a new data set. We explored several machine learning algorithms for
The deployment process involved using the pickle module, predicting whether a user will like or dislike a song on
which is used for serializing and deserializing a Python object Spotify. The algorithms examined include Random Forest
structure. It allows objects in Python to be pickled and saved Classifier, Logistic Regression, Gaussian Naive Bayes,
on disk. Pickling involves converting a Python object (list, Extreme Gradient Boosting, Dummy Classifier, and Stacking
dictionary, etc.) into a character stream containing all the Classifier.
information necessary to reconstruct the object in another The Random Forest Classifier achieved an accuracy of
Python script. In this case, the Stacking Classifiers model was
pickled and saved as a file, which was later loaded for 76.675%, with precision value of 0.78, recall value of 0.76,
prediction. and F1-score value of 0.77. The confusion matrix showed 152
true positives and 157 true negatives, while the ROC curve
The output of this deployment is a prediction of whether
the songs or playlist will be liked or disliked by the user. The revealed an AUC value of 0.767. After applying
Stacking Classifiers model, combined with the Spotify API, GridSearchCV and RandomizedSearchCV for hyperparameter
makes the prediction process accurate and efficient, providing tuning, the original model's accuracy remained the best at
a convenient tool for Spotify users. Once the song or playlist 76.675%.
ID is retrieved, the API retrieves all the song attributes. Then,
we encoded them integers before being put into the Stacking
Classifier model for prediction. Logistic Regression's performance was notably lower, with
an accuracy of 58.809%, precision and recall values of 0.59,
The model prediction is based on the user's past and an F1-score of 0.59. The confusion matrix showed 122
preferences, as it has been trained on a dataset of users' song true positives and 115 true negatives, while the ROC curve
preferences on Spotify. The model analyzes the input song or displayed an AUC value of 0.589. GridSearchCV tuning
playlist's attributes and compares them with the dataset to improved the accuracy to 65.757%, suggesting successful
predict the likelihood of users liking or disliking the songs. tuning for Logistic Regression.
A. Model Results The Gaussian Naive Bayes model achieved an accuracy of
Using the stacking model to predict the new data set, the 65.757%, with precision and recall values of 0.66, and an F1-
model predicts the playlist the user entered. We tested the score of 0.66. The confusion matrix presented 113 true
model with 2 different playlists, the Epic Evil Dramatic Music positives and 152 true negatives, and the ROC curve had an
Playlist and the K-Pop playlist in Fig. 9. For the Epic Evil AUC value of 0.655. GridSearchCV tuning did not change the
Dramatic Music Playlist, the model predicted that the user model's accuracy, while RepeatedStratifiedKFold tuning
would like 71 songs. increased it to 68.238%.
In contrast, 11 songs will be disliked by the user, as shown Extreme Gradient Boosting achieved the highest accuracy
in Fig. 10. While for K-Pop playlist, it predicted that only 15 of 77.419% among the algorithms. The precision, recall, and
songs will be liked by the user and 35 songs will be disliked F1-score values were 0.77, and the confusion matrix showed
by the user, as shown in Fig. 10. All the names of the songs 152 true positives and 160 true negatives. The ROC curve
that are like and not liked are shown. presented an AUC value of 0.774. GridSearchCV tuning
slightly decreased the accuracy to 77.171%, so the original
model was retained.
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on August 06,2024 at 15:47:09 UTC from IEEE Xplore. Restrictions apply.
2023 13th International Conference on Information Technology in Asia (CITA)
The Dummy Classifier, which serves as a baseline, had an the model's prediction of the Epic Evil Dramatic Music
accuracy of 47.643%, with precision, recall, and F1-score Playlist, with a high number of predicted liked songs.
values of 0.48. The confusion matrix showed 97 true positives Additionally, the low instrumentalness and acousticness might
and 95 true negatives, while the ROC curve had an AUC value suggest that users prefer songs that are more electronically
of 0.477, indicating poor performance. produced and less dependent on traditional acoustic
instruments, which is also reflected in the model's prediction.
The Stacking Classifier model achieved an accuracy of
77.419%, with precision, recall, and F1-score values of 0.77. For the K-Pop playlist, the model's prediction of only 15
In addition, the confusion matrix presented 148 true positives liked songs aligns with the fact that low liveness and
and 164 true negatives, and the ROC curve showed an AUC speechiness might suggest that users prefer songs that are
value of 0.774, indicating this model's suitability for more focused on the musical aspect and less on live
predicting Spotify song preferences. performances or spoken word elements. K-Pop songs typically
have a high speechiness level with a significant emphasis on
vocals and spoken elements. This could explain the lower
number of predicted liked songs in the K-Pop playlist. The
model's prediction of the two playlists' likability aligns with
the characteristics of the songs in the dataset, suggesting that
the machine-learning model is heading in the right direction.
This study highlights the potential of machine learning in
predicting music preference, providing a convenient tool for
Spotify users to discover new songs and enhance their
listening experience. Future research could explore additional
song attributes or incorporate user demographic data to
improve the model's accuracy further.
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on August 06,2024 at 15:47:09 UTC from IEEE Xplore. Restrictions apply.