Predicting Spotify Song Popularity

Uploaded by

FabioSantos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

99 views11 pages

Predicting Spotify Song Popularity

Uploaded by

FabioSantos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 11

‘022021 Data Science Predicts Spotiy Song Populaiy | Towards Data Science Get started ] Open in app 548K Followers You have 2 free member-only stories left this month. Sign up for Medium and get an extra one OPINION Predicting Spotify Song Popularity Ranking every Machine Learning algorithm to build the best Data Science model using PyCaret. a @ Matt Praybyla Feb - 7minread * Photo by Cezar Sampaio on Unsplash [1] hitps:/towardsdatascience.comipredicting-spatiy-song-populariy-49d000'254c7 mm‘022021 Data Science Predicts Spotiy Song Popularly | Towards Data Science Open in app Table of Contents 1. Introduction 2. Model Comparison 3, Summary 4. References Introduction Because Spotify and other music streaming services are incredibly popular and widely used, I wanted to apply Data Science techniques with Machine Learning algorithms to this product to predict song popularity. I personally use this product, and what I apply here could be applied to other services as well. I will be examining every popular Machine Learning algorithm and pick the best algorithm based on success metrics or criteria — oftentimes, it is some sort of calculated error. The goal of the best model developed is to predict a song’s popularity based on various features current and historical features. Keep on reading if you would like to learn a tutorial on how to use Data Science to predict the popularity of a song. Model Comparison hitps:ftowardsdatascience.com/predcting-spoty-song-populaiy-<8d000°254c7‘0212021 Os ‘Sciance Predicts Spotty Song Populaiy | Towards Data Science Open in app Photo by Markus Spiske on Unsplash [2]. 1 will be discussing the Python library that I used, along with the data, parameters, models compared, results, and code below. Library Using the power of PyCaret [3], you can now test every popular Machine Learning algorithm against one another (or more of them at least). For this problem, I will be comparing MAE, MSE, RMSE, R2, RMSLE, MAPE, and TT (Sec) — the time it takes for the model to be completed. Some of the benefits of using PyCaret overall, as stated by the developers, is that there is increased productivity, ease of use, and business-ready —all of which I can personally attest to myself. Data The dataset [4] that I am using is from Kaggle. You can download it easily and quickly. It consists of 17MB along with data from Spotify from the years 1921 to 2020, including 160,000+ tracks. It consists of 174,389 rows and 19 columns. Below, isa screenshot of the first few rows along with the first columns: In [3]: spotify.head|) outst: scountioness ait danceablity_doration ms_eneroy explicit _instrumentanoss © ogeran0 "Berle ss teense 2240 ORSATILEUSTEWFEFAAE! 0.000522 ('Seramn’ + ossaoa “Ty aso ssonne 0517 © o_onbarLrzasuanowaor .2asaca Hewaine} 2 —oossono (ame gar 163827 0.188 mTlaigmOKadevautne 000018 2 coma yy ftom eater a7 9 TaLesSuSOIoMYOpNm ———_agor000 Data Sample. Screenshot by Author [5]. Columns: After we eventually pick the best model, we can look at the most important features. I am using the incerpret_mode1 () function of PyCaret, which is based on the popular SHAP library, Here are all of the features possible below: hitps:/towardsdatascience.comipredicting-spatiy-song-populariy-49d000'254c7 amt‘022021 Data Science Predicts Spotiy Song Popularly | Towards Data Science Open in app wanceapirety > ‘duration_ms', tenergy', texplicit', ‘iat, ‘instrumentalness', tkey', ‘liveness', "loudness", 'mode', ‘name', ‘popularity', ‘release _date', ‘speechiness', ‘tempo', ‘valence', tyear'] Here are the most important features using SHAP: High 2 g é year instrumentainess loudness duration_ms acousticness liveness release_date_month_1 speechiness danceability release_date_month_12 valence tempo energy release_date_is_month_start_o key_o - release_date_weekday_1 release_date_month_6 ~ key 4 explicit_1 key 9 0 -is -o 5 0 5 0 SHAP value (impact on model output) hitps:ftowardsdatascience.com/predcting-spotty-song-populaniy-¢9d000'254c7 am‘022021 Data Science Predicts Spotiy Song Popularly | Towards Data Science Open in app eu WULULIID ale USC ad ITALULED, CALEPL LUE LUE LAL yet Vat LaUIE, WHILE ID Lue column popularity . As you can see, the top three features are year, instrumentalness, and loudness. As a future improvement, it would be better to have the categorical features that are broken out into one column instead of tens of columns, then as a next step, be fed into the CatBoost model so that target encoding can be applied vs one-hot- encoding — to perform this action, we would confirm or change the xey column to be categorical instead, and for any other similar columns. Parameters These are the parameters that | used in the setup) of PyCaret. The Machine Learning problem is a regression one, including data from Spotify, with the cazget variable being the popularity field. For reproducibility, you can establish a There are a ton more parameters, but these are the ones that I used, and PyCaret does a great job of automatically detecting information from your data — like picking which features are categorical, and it will confirm that with you in the Oo. Models Compared Iwill be comparing 19 Machine Learning algorithms, some are incredibly popular while some, I have actually not heard of, so it will be interesting to see which one wins with this dataset. For the success criteria, am comparing all of the metrics MAE, MSE, RMSE, R2, RMSLE, MAPE, and TT (Sec), which PyCaret automatically ranks. Here are all of the models that I compared: * Linear Regression * Lasso Regression * Ridge Regression © Elastic Net * Orthogonal Matching Pursuit * Bayesian Ridge © Gradient Boosting Regressor hitps:/towardsdatascience.comipredicting-spatiy-song-populariy-49d000'254c7 sm‘022021 Os ‘Sciance Predicts Spotty Song Populaiy | Towards Data Science Open in app * Decision Tree Regressor * CatBoost Regressor * Light Gradient Boosting Machinee + Extra Trees Regressor * AdaBoost Regressor * K Neighbors Regressor Lasso Least Angle Regression + Huber Regressor * Passive Aggressive Regressor * Least Angle Regression Results It is important to note that I am just using a sample of the data, so the order of these algorithms may rearrange if you use all of the data if you test this code yourself. I used only +,000 rows instead of the total -270, 000 rows. As you can see, catoost was ranked first, having the best RMSE, RMSE, R2. However, it did not have the best MAE, RMSLE, and MAPE, and it was not the fastest. Therefore, you should establish what you mean by success in terms of these metrics. For example, if time is essential, then you will want to rank that higher, or if MAE is higher you might want to pick mxtra Trees Regressor instead to win. se (7/2 compere nodelat) omen BSS ne cs A a =: ws tsi Bie wom Bie nee ms or oa hitps:/towardsdatascience.comipredicting-spatiy-song-populariy-49d000°254c7 em‘022021 Dat cionce Predicls Spotty Sang Popularity | Towards Data Science Open in app ane SSO ee scam ame ons Model Comparison. Screenshot by Author [7] Overall, you can see, even with a small sample of the dataset, we faired pretty well. The popularity target variable has a range of 0 to 91. Therefore, for MAE for example, our average error is 9.7 popularity units. Out of 91 that is not too bad, considering we would be off by up to just a difference of 10 on average. However, the way the algorithm is trained not would probably not generalize that well since we are just using asample, so you can expect all of the error metrics to decrease (which is good) significantly, but unfortunately, you will see the training time increase dramatically. One of the neat features of PyCaret, is the ability for you to remove algorithms in your sompare_models() training —I would start on a small sample of the dataset, and then see which algorithms generally take longer, then remove those when you compare with all of the original data since some of these could take hours to train depending on the dataset. In the screenshot below, I am printing the dataframe with the predictions and the actual values. For example, we can see that popularity or original is compared side-by- side to the abe , which is the prediction. You can see that some predictions were better than others. The last prediction was quite poor, while the first two predictions were great. In [15]: predictions[['popularity', ‘Label']].head(3) out[15]: popularity Label ° 25.0 19.695469 1 28.0 26.412759 2 73.0 37.664414 Predictions. Screenshot by Author [8]. hitps:/towardsdatascience.comipredicting-spatiy-song-populariy-49d000'254c7 mm‘022021 Data Science Predicts Spotiy Song Popularly | Towards Data Science Open in app reading in your data, sampling your data (only if you want), setting up your regression, comparing models, creating your final model, making predictions, and visualizing feature importance[9]: # import libraries from pycaret.regression import * import pandas as pd # read in your stock data spotify = pd.read_csv(‘file location of your data on your computer.csv’) # using a sample of the dataset (you can use any amount) spotify sample = spotify.sample (1000) # setup your regression parameters regression = setup (data = spotify_sample, target = ‘popularity’, session id = 100, ) # compare models compare models () # create a model catboost = create model ("catboost') # predict on test set predictions = predict_model (catboost) # interpreting model interpret_model (catboost) Summary hitps:ftowardsdatascience.com/predicting-spoty-song-populaniy-¢8d000'254c7 am‘022021 Data Science Predicts Spotiy Song Populaiy | Towards Data Science Open in app Photo by bruce mars on Uns} Using Data Science models to predict a variable can be quite overwhelming, but we have seen how, with a few lines of code, we can compare several Machine Learning algorithms efficiently. We have also shown how easy it is to set up different types of data, including data like numeric and categorical. For the next steps, I would apply this to an entire dataset, confirm data types, making sure to remove inaccurate models, as well as models that take too long to train. | msummary, we now know how to perform the following to determine song popularit import libraries read in data setup your model compare models pick and create the best model predict using the best model intepret feature importance I want to give thanks and admiration to Moez Ali for developing this awesome Data Science library. Thope you found my article both interesting and useful. Please feel free to comment down below if you applied this library to a dataset or if you use other techniques. Do you prefer one over the other? What do you think about automatic Data Science? hitps:/towardsdatascience.comipredicting-spatiy-song-populariy-49d000'254c7 om‘022024 Data Science Predicts Spotiy Song Populaiy | Towards Data Science Getstarted | Openinapp @ LinkedIn. | FIgase 1ec1 Hee WU CUEUR YUL uy protic anu vi References [1] Photo by Cezar Sampaio on Unsplash, (2020) [2] Photo by Markus Spiske on Unsplash, (2020) [3] Moez Ali, PyCaret, (2021) [4] Yamac Eren Ay on Kaggle, Spotify Dataset, (2021) [5] M.Przybyla, Dataframe Screenshot, (2021) [6] M.Przybyla, SHAP Feature Importance Screenshot, (2021) [7] M.Przybyla, Model Comparison Screenshot, (2021) [8] M.Przybyla, Predictions Screenshot, (2021) [9] M.Przybyla, Python Code, (2021) [10] Photo by bruce mars on Unsplash, (2018) Sign up for The Daily Pick By Towards Data Science Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to Thursday, Make learning your daily ritual. Take alook Your email By signing up. you willcreate a Medium account ityou dont already have one. Review our Privacy Policy for more information about our privacy practices hitps:/towardsdatascience.comipredicting-spatiy-song-populariy-49d000'254c7 rom‘022021 Data Science Predicts Spotiy Song Popularly | Towards Data Science Vata suiente — MaUINNELeaIINy — AIUNLIANMENYENLE — 1UWarUS Lata suiEHe — spumy CP Google Play hitps:/towardsdatascience.comipredicting-spatiy-song-populariy-49d000'254c7 wm

Heiken-Ashi Trading - The Full Guide in Python
100% (2)
Heiken-Ashi Trading - The Full Guide in Python
14 pages
The RSI Delta Indicator. Enhancing Momentum Trading
No ratings yet
The RSI Delta Indicator. Enhancing Momentum Trading
21 pages
Hidden Divergence - Chamane's Guidelines
100% (1)
Hidden Divergence - Chamane's Guidelines
13 pages
Algorithmic Trading Models - Breakouts
No ratings yet
Algorithmic Trading Models - Breakouts
10 pages
Trading Strategy - Technical Analysis With Python TA-Lib
No ratings yet
Trading Strategy - Technical Analysis With Python TA-Lib
12 pages
How To Trade Forex Using Roboforex Strategyquant Software
No ratings yet
How To Trade Forex Using Roboforex Strategyquant Software
44 pages
Using Machine Learning To Locate Support and Resistance Lines For Stocks
No ratings yet
Using Machine Learning To Locate Support and Resistance Lines For Stocks
14 pages
Creating A Modified Fisher Transformation For Profitable Trading.
No ratings yet
Creating A Modified Fisher Transformation For Profitable Trading.
21 pages
The Augmented Bollinger Bands
No ratings yet
The Augmented Bollinger Bands
23 pages
Gap Trading. An Introduction & Back-Test in Python
No ratings yet
Gap Trading. An Introduction & Back-Test in Python
15 pages
Using Machine Learning To Locate Support and Resistance Lines For Stocks - by Suhail Saqan - The Startup - Jan, 2021 - Medium
No ratings yet
Using Machine Learning To Locate Support and Resistance Lines For Stocks - by Suhail Saqan - The Startup - Jan, 2021 - Medium
14 pages
Building A Stock Option Valuation Model With Python Part II
No ratings yet
Building A Stock Option Valuation Model With Python Part II
18 pages
Estimate Support and Resistance of A Stock With Python
No ratings yet
Estimate Support and Resistance of A Stock With Python
18 pages
Importing High Quality Tick Data On MetaTrader 4 & 5
No ratings yet
Importing High Quality Tick Data On MetaTrader 4 & 5
34 pages
Time Series Forecasting With 2D Convolutions
No ratings yet
Time Series Forecasting With 2D Convolutions
33 pages
Test Strategy in MetaTrader 4 With Tick Precision
0% (1)
Test Strategy in MetaTrader 4 With Tick Precision
15 pages
Technical Indicators and GRU-LSTM To Predict Stock Price
No ratings yet
Technical Indicators and GRU-LSTM To Predict Stock Price
36 pages
A Review of Reinforcement Learning For Financial Time Series Prediction and Portfolio Optimization
No ratings yet
A Review of Reinforcement Learning For Financial Time Series Prediction and Portfolio Optimization
38 pages
Building A Stock Option Valuation Model With Python Part I
No ratings yet
Building A Stock Option Valuation Model With Python Part I
17 pages
How To Export Data From Quant Data Manager and Import To Metatrader 5
No ratings yet
How To Export Data From Quant Data Manager and Import To Metatrader 5
18 pages
How I Created A Bitcoin Trading Algorithm Using Sentiment Analysis With A 29% Return
No ratings yet
How I Created A Bitcoin Trading Algorithm Using Sentiment Analysis With A 29% Return
10 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Predicting Spotify Song Popularity

Uploaded by

Predicting Spotify Song Popularity

Uploaded by

You might also like