The Application of Machine Learning and Deep Learn
The Application of Machine Learning and Deep Learn
To cite this article: Nguyen Hoang Nguyen, Duy Thien An Nguyen, Bingkun Ma & Jiang
Hu (2021): The application of machine learning and deep learning in sport: predicting NBA
players’ performance and popularity, Journal of Information and Telecommunication, DOI:
10.1080/24751839.2021.1977066
1. Introduction
National Basketball Association (NBA) league is one of the most popular sport events in
America and the most well-known basketball league over the world. NBA’s business is
worth billions of dollars in the last few decades with millions of viewers and commercials.
In the 2018–2019 season, the combined avenue of teams of the NBA was at a record of
over 8.7 billion dollars, according to Statista. Forbes magazine showed the top five NBA
teams have a worth network of $16.8 billion. Game broadcasts and advertisements
during the streams are the generating most profit for NBA (Forbes 2019).
The purpose of our study is to evaluate the effectiveness of applying Machine Learning
(ML) and Deep Learning (DL) on Sport Domain, particularly basketball, in perspective of
manpower (players). Most prior studies paid more attention to predicting the outcomes
of games or using ML or DL along with advanced technical tools to collect and analyse
sport players’ data. Our approach is to target the widely available dataset, tracking
players’ primary statistics, which would be used without the need for any complex tech-
nical tools, and use advanced ML and DL to explore the useful information and have an
evaluation about the performance of various ML and DL models in basketball dataset.
Based on this, our two primary objectives were to predict players’ future performance
and popularity through modelling on players’ statistics collected in their regular games.
Regarding our second goal of forecasting players’ popularity, we concentrated on the
forecast if players are chosen to play in the next season NBA All-Star game. NBA All-star
game is an annual February exhibition event in which 24 NBA star players are divided into
2 teams to compete. The procedure to choose players participating in NBA All-star games
involves a voting poll from fans, which has a vigorous influence on selection outcome.
Thus, ‘Being selected in NBA All-star roster’ is a good indicator for us to evaluate
players’ public popularity. In contrast to the effect of player’s performance on the
court, the benefit of player’s popularity is not clear to recognize, so we would like to sum-
marize briefly two essential aspects in perspective of business value creation from popular
players.
First of all, popular or star players can have a significant contribution to their franchises’
brands among the teams in the league (Pifer et al., 2015). Athletes with strong on-field
performance have a better ability to establish their star player’s attributes into a realized
equity for a team’s brand, thus raising the awareness for a team, getting the public atten-
tion and assisting in reaching the new market. Secondly, through different economic
models, star players generate externalities that increase attendance and other revenue
sources beyond their individual contributions to team success (Humphreys & Johnson,
2020; Berri & Schmidt, 2006). In other words, they not only lead their own teams to win
games but also attract more fans and gain public attention to increase overall league’s
business revenue and media coverage, even for their opponents. The historical data
showed the superstar effect on the economy throughout different eras.
With the rapid development of data science in recent years, Machine learning (ML) and
Data mining (DM) have been applied in various fields. As a result of this movement, Sport
Analytics, a field where ML methods and its implementations are used to gain useful
insights from sport data (Apostolou & Tjortjis, 2016), has been emerging as one of the
favourable areas for both business and academic research. As Sport Analytics has
become more prominent and attainable, sport teams, coaches, players and companies
are more likely to use its applications to improve their performance and operation on-
and off-the court (Tichy, 2016). In terms of sport domain, there have been many academic
studies and systematic frameworks developed for basketball using ML and DM techniques
with the purposes of long-term strategy, daily operation and prediction in professional
leagues or college/high school (Thabtah et al., 2019; Miljković et al., 2010; Zuccolotto
et al., 2018).
It is our effort to study further various advanced technological methodologies from our
previous paper in 2020 Computational Collective Intelligence (12th International Confer-
ence, ICCCI 2020, Da Nang, Vietnam, November 30 – December 3, 2020, Proceedings)
which we applied data mining by ML models on identical basketball datasets and
achieved good results with RMSE of 2.1969 and MAE of 1.6465 for the first objective
and Recall of 0.9657 and ROC AUC of 0.9096 for the second objective, respectively
JOURNAL OF INFORMATION AND TELECOMMUNICATION 3
(Nguyen, et al., 2020). Based on our previous study, we decided to extend further in pro-
viding information about our understanding of each ML model and its mechanism, so it
can provide a reason why it was chosen in our study. Moreover, we paid extra times on
data collection and preparation for each variable (feature) to assure the accuracy and
understand more relationships among basketball players’ attributes.
In addition, besides traditional ML applied in our previous study, the potential of deep
learning was also covered. Deep learning has been recently one of the most popular tech-
niques used in computer vision, natural language processing or sentiment analysis. Deep
learning is a statistical technique using neural networks with multiple layers, essentially
for pattern classification (Marcus, 2018). Although Deep learning has the limitation as it
works like a ‘black box’ and we cannot really observe how each predictor variable
affects the final prediction, it is still drawing the attention from both the scientific and
business community by its extraordinary competence in prediction. Our study aims is
to examine how effective Deep learning deals with structured and relatively small data-
sets to make the prediction and compare its performance to the traditional Machine
Learning for our basketball dataset.
CRISP-DM methodology is used as the reference to construct ML and DL models. The
greatest benefit is to provide a common method for communication that helps to
connect a variety of technical tools and people with different skills and backgrounds to
progress an efficient and effective project (Wirth & Hipp, 2000). As a typical CRISP-DM,
our study was divided into six phases:
The study concentrated on the first five phases, while the final phase provided the
limitation for future improvement.
2. Literature review
There have been several published research studies applying machine learning and deep
learning to predict results in a variety of sports. The application of data mining in basket-
ball was started in the 1990s by IBM named Advanced Scout (Colet & Parker, 1997). The
purpose of this tool was to assist the NBA management team to discover the hidden pat-
terns from basketball statistics using data mining techniques. The system used a data
mining technique called Attribute Focusing. This technique compared the overall distri-
bution of an attribute to its distributions from different subsets of data. Then, if any
subset shows a characteristically different distribution, the combination of attributes
4 N. H. NGUYEN ET AL.
describing the subset is marked as ‘interesting’. However, the technique mostly raised the
awareness (similarly as anomaly detection nowadays) about some unusual data distri-
bution with a limited explanation or interpretation about players’ statistics.
Hidden Markov Models (HMMs) were used in recent times to model the progression of
match results (wins/losses) through different times by applying advanced statistics from
NBA games as features and be able to predict the match result achieving a prediction
accuracy of 73% (Madhavan, 2016). Rue and Slavensen used a Bayesian approach, com-
bined with Markovian chains and the Monte-Carlo method, to predict football game
results (Rotshtein et al., 2015). However, as it used the Neural network for predictive
ability enhancement, the approach lacked interpretability and hence cannot be used
for performance analysis or feedback. In another paper, a Bayesian hierarchical model
was also applied to predict football results based on scoring intensity data determined
by the attack and defence strength of the two teams involved (Maher, 1982).
Leung and Joseph (2014) proposed a data mining technique to predict the outcomes
of sport games and discover useful insights. By testing in practical data from college foot-
ball games, the proposed technique showed positive results with high accuracy in predic-
tion. Their technique is based on a combination of four different measures on the
historical results of the games. The core concept is to predict the outcome of the game
between two teams through analysing a set of teams that are most similar to each of
the competing teams, finding the results of the games between the teams in each of
the two sets, and using those game results for prediction.
Zifan Shi, Sruthi Moorthy, and Albrecht Zimmermann used Machine Learning tech-
niques specialized in classifier learning for the purpose of predicting the outcome of
NCAAB matches (Zimmermann et al., 2013). The research discovered some interesting
facts that were unexpected in the scientific community. In the context of 2013, an ML
technique – multi-layer perceptron which was not widely used, was proved to be the
most effective in the explored settings. Moreover, explicit modelling the differences
between NCAAB teams’ attributes did not improve the model predictive accuracy.
Finally, the most impressive fact was that there seemed to be a great milestone of 74%
predictive accuracy that could not be exceeded by ML or statistical techniques.
Deep learning (Artificial Neural Network) has also become more popular in sport pre-
diction. The prediction model of National Football League (NFL) team winning by Kahn
was able to reach the accuracy of 75%, nearly 10% higher than the prediction by
domain experts in NFL. The model was treated as a classification model which was an
improvement from the previous model which was done by Purucker’s study (Kahn,
2003). In Kahn’s model, data were collected from 208 games in the 2003 season. The
Neural Network (NN) 10-3-2 (10 nodes in input layers, 3 nodes in hidden layers, and 2
nodes in the output layers) was used to achieve the result.
Similarly, a 20-10-1 NN model, designed by McCabe and Trevathan, was able to predict
results in four different sports (Rugby League, Australian Rules football, Rugby Union, and
English Premier League Football) using previous season data that had also achieved an
average accuracy result of 67.5% (McCabe, & Trevathan, 2008). Same variables across
different sports were used to build the model.
Through many studies for both Traditional ML and DL, the focus was mostly on predict-
ing the outcomes of the games, while there is no actual research for individual players’
performance and popularity, which was mentioned in the Introduction to have a
JOURNAL OF INFORMATION AND TELECOMMUNICATION 5
significant influence on team’s outcome and revenue. Thus, our study paid more attention
to this aspect to provide analytical models to evaluate and forecast individual players by
ML and DL with high accuracy. Moreover, the comparison among the models’ results
could give a brief overview about the performance between Traditional ML and DL in rela-
tively small (basketball) dataset for prediction purpose.
3. Materials
3.1. Data sources
The first dataset is NBA players’ stats since 1950 from the website basketball-reference.-
com. Since there are many unavailable and inconsistent data for variables, only data
from 1979 were implemented including players’ information and their basketball stats,
which can be divided into 2 categories: ‘cumulative’ variables – number of games
played (G), total minutes played (MP), total points (PTS), … and its ‘percentage’ variables
– percentage of 2 points made on total attempts in a given season (%2P), percentage of 3
points (%3P), … The dataset also includes our target variable for the first objective – WS.
The second dataset is from the website basketball.realgm.com archiving all NBA All-
star rosters in each year and it was merged with the first dataset in the Data Preparation
step based on two mutual variables: year and players’ names, to find out if a player was
chosen to play in All-Star games coded as binary target variable: 1 if selected for NBA All-
star roster and 0 if not.
(1) Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) for the regression
models
(2) Accuracy, Precision, especially Recall and Receiver operating characteristic area under
curve (ROC AUC) and F1 scores for the classification models.
% and PF. For the purpose of our first objective, the original dataset was firstly partitioned
into training-test sets with the ratio of 80–20. Then, our training set was divided again into
train-valid sets with the ratio of 80–20. The train set was used to train the model with the
cross-validation technique. Then, train-valid sets were used to evaluate models under the
potential over-fitting condition. Test set would be used for our Evaluation phase to esti-
mate how effective our final model is when it is used for unseen data. To eliminate the
case when players’ data in some periods of time were sampled into train/valid/test sets
unevenly, stratified sampling technique was adopted, using Year ratio.
The primary difference for data engineering between the first and second objec-
tives and between our first paper and this one is that we considered the factor (pre-
dictor variable) being selected for this season All-Star game to predict players’
probability being the next season All-Star players. The train-valid-test splitting
process was also used for our second objective. As mentioned in the Data
Summary section, one issue with the original data is class imbalance which can
affect the model evaluation and performance, so two popular solutions: over-
sampling and under-sampling, were applied for this study, which aimed to ease the
effect of imbalanced data distribution on learning process (Batista et al., 2004;
Chawla et al., 2002; Chawla et al., 2004). Besides random over-sampling, synthetic
minority oversampling technique (SMOTE) was also used for over-sampling
purpose, which is widely adopted in many applications for different domains, such
as network intrusion detection (Cieslak et al., 2006), breast cancer detection (Fallahi
& Jafari, 2011), or biotechnology (Batuwita & Palade, 2009). Its methodology is to
create new minority class examples through randomly choosing one (or more
depending on the defined over-sampling ratio) of the k nearest neighbours (kNN)
of a minority class instance and then generation of the new instance values from a
random interpolation of both instances (Galar et al., 2011), so it would help to
reduce the potential effect of over-fitting from random over-sampling. Another
popular sampling technique used for our study is under-sampling, which targets to
balance class distribution through the random elimination of majority class instances
and is proved in many studies to outperform SMOTE or random over-sampling in
most situations for both low- and high-dimensional data (Hulse et al., 2007; Blagus
& Lusa, 2013; Drummond & Robert, 2003).
Figure 1. Distributions of cross-validation results for train data from six candidate regression models.
Figure 2. MAE and RMSE results for valid datasets from candidate regression models.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 9
acceptable for our data mining’s first objective. (3) Gradient Boosting Machine is less sub-
jective to over-fitting and performs much faster compared to Polynomial Support Vector
Machines and Neural Net.
For model tuning, manual grid search was used with 135 combinations based on fol-
lowing four parameters: (1) Learning rate: [0.01, 0.1, 0.3]; (2) Depth of trees: [1, 2, 3, 4, 5]; (3)
Minimum number of observations in the terminal nodes of the trees: [5, 10, 15]; (4) Sub-
sampling: [0.65, 0.8, 1]. Model was tuned by all train and valid datasets with CV of 5 to find
the optimal number of trees with maximum of 1000, and RMSE was used as a primary
metric. With lowest RMSE of 2.1467, these parameters: (1) learning rate = 0.1, (2) depth
of trees = 4, (3) minimum number of observations = 15, (4) subsampling = 0.8, and the
optimal number of trees of 358; were used to train our final regression model.
Figure 3. Top 10 Relative features for final regression model by traditional machine learning.
10 N. H. NGUYEN ET AL.
Naïve Bayes (NB), were trained with both random and SMOTE over-sampling cases and
were compared by Recall and ROC AUC scores. Two representative boosting algorithms,
AdaBoost (AB) and Gradient Boosting Machine (GBM), were also employed for compari-
son. As the under-sampling technique has a drawback of potentially removing the
useful information (Galar et al., 2011), the under-sampling methodology was applied
for two bragging-related algorithms, balanced bragging classifier (BB) and balanced
random forest classifier (BRF).
Table 3. Results on valid data from candidate classification models without predictor variable ‘being
selected for this season all-star game’.
Models Precision Recall ROC AUC F1
RF 0.4881 0.6308 0.7953 0.5503
GBM 0.3518 0.8308 0.8689 0.4943
BB 0.3652 0.8231 0.8680 0.5059
BRF 0.2977 0.9000 0.8855 0.4474
RU 0.2418 0.2846 0.6152 0.2615
Table 4. Results on valid data from candidate classification models with predictor variable ‘being
selected for this season all-star game’.
Valid data
Models Precision Recall ROC AUC F1
RF 0.5525 0.7042 0.8343 0.6192
GBM 0.3927 0.8380 0.8785 0.5348
BB 0.3980 0.8521 0.8858 0.8426
BRF 0.3155 0.9155 0.8957 0.4693
RU 0.3524 0.7817 0.8459 0.4858
JOURNAL OF INFORMATION AND TELECOMMUNICATION 13
parameters: (1) Max number of levels in each decision tree: [10, 20, 30, 40, 50, 60, 70, 80,
90, 100, 110]; (2) Min number of data points allowed in a leaf node: [2, 3, 4, 5]; (3) Min
number of data points placed in a node before the node is split: [6, 8, 10, 12]; (4)
Number of trees in forest: [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000] with
default bootstrap. With highest Recall score of 0.9085, these parameters: (1) max depth
= 20, (2) min samples leaf = 4, (3) min samples split = 8, (4) number of trees = 700, were
used to train our final model. Compared to our previous study, this final model has a
different ‘max depth’ parameter at 20 as it was 10 for our previous final model.
Figure 4. Top 10 Relative features for final classification model by traditional machine learning.
14 N. H. NGUYEN ET AL.
difference in performance for this particular dataset between the traditional ML and
modern ML (Deep Learning).
Due to a significant imbalance in the dataset, models were fit using large batch size of
512. This would ensure that each batch has a chance containing the rare cases (high
achievement basketball players). This is particularly important for classification models
as the classifiers would likely miss the special cases’ to learn from when the batch size
was too small.
In regression models, Mean Squared Error (MSE) was selected to evaluate the perform-
ance of the models. For activation function and optimizer, ReLu and Adam were used.
Optimization learning rate for Adam was left at default value, learning_rate = 0.001.
In classification, the 2-layer and 3-layer NN configuration had similar structures as NN is
used in regression models. Accuracy, Precision, Recall and AUC (Area Under the ROC
Curve) metrics were used to evaluate classifiers. For 3-layer NN (model_2), a special NN
structure known as Autoencoder layer was built on the top of the 3-layer NN to recon-
struct the raw input data. These layers would act as unsupervised features extracting to
help hidden layers learn the data better.
5.2. Results
5.2.1. Regression analysis
This appeared that the 3-layers with complex activation nodes (model_3) showed the best
performance result. Comparing the 3 charts (Figure 5), the Model_3 had the lowest vali-
dation loss value. The average loss (MSE) value on testing set was around 4.32 which is
nearly 0.4 smaller compared to Model_1 and Model_2. While the gap between validation
loss and training loss in Model_3 was small, the training loss curve had a larger value com-
pared to the validation loss. This indicated some under-fitting happening in Model_3. The
loss value curve showed the loss (training and validation) decreased to stable significant
faster than the others. Early stopping function was applied when epochs reached 50. The
stopping function is used to stop training model running when there are no signs of
improvement in training process.
Figure 5. Loss function results for train and valid data and evaluation loss for test data from each
candidate deep learning model.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 15
negative cases of ‘All Start Next Year’. ANN-2layers (Classi_model_1) and ANN-3layers
(Classi_model_2) had nearly the same performance. Both models performed slightly
better compared to ANN-3layers + autoencoder (Classi_model_3) (Figure 6).
The average prediction accuracy was above 96%. Average Classification Classi_mo-
del_1 and Classi_model_2 achieved over 80% precision rate. The loss curve value
showed the models were fitting well, the training and validation loss curves were
closed together in all 3 models (Figure 7).
Overall, Classi_model_1 appeared the most suitable NN classifier candidate for detect-
ing All Start Next Year. Based on the Classi_model_3’s confusion matrix table, out of 190
Figure 6. ROC curve for train data from each candidate deep learning model.
Figure 7. Loss function results for train and valid data and confusion matrix results for test data from
each candidate deep learning model.
16 N. H. NGUYEN ET AL.
actual ‘All Start Next Year’ cases, the model was able to predicted correctly 108 cases – a
76% chance of predicting ‘All Start Next Year’ correctly (Figure 7).
6. Limitations
As a team-oriented sport, there are some other factors influencing the players’ perform-
ance and popularity: team’s tactical style, coach decision, team chemistry … , which are
not included in our study and would be more accessible to study to improve the
models in the future. Although NBA season includes 2 periods: regular season and
playoff when the best 16 teams in a regular season compete for the championship, all
data used in our study are regular season’s stats. Thus, these ML and DL models may
not be suitable to predict players’ performance in playoff and the effect of prior playoff
performance on players’ popularity is also overlooked in this study. Popularity is also
affected by external non-sporting factors, such as celebrity status in media, charisma,
social media compared to the pure quality of the game (Adler, 1985). As these factors
are perceived distinctively by public in terms of time and geography, it is a complicated
issue and further research is needed.
7. Conclusion
Machine learning can give many advantages in Sport domain with its capability to predict
the future outcomes, which can be seen through our study’s results: RMSE of 2.1969 and
MAE of 1.6465 for Regression Analysis, Recall of 0.9368 and ROC AUC of 0.9152 for Classifi-
cation Analysis. Moreover, our result is consistent with many prior studies proving the
better capability of Under-Sampling technique compared to Over-Sampling on solving
Imbalanced Data issue. Additional, Deep Learning is also applied for both Regression
and Classification Analysis. Our study showed Deep Learning’s performance is not as
good as traditional Machine Learning’s. It is justified that our data are relatively small-
scale and structured with a few predictor variables. Thus, it limited Deep Learning’s
efficiency on Big Data, which is universally recognized in Computer Vision and Natural
Language Processing fields. Because our study used intensively pure basketball statistics
for models, it possibly neglects the critical influence of external factors on popularity.
Thus, it is suggested further studies in this domain with more external variables would
improve the predictive ability and provide more comprehensive understanding about
the degree of importance of different factors.
Disclosure statement
No potential conflict of interest was reported by the author(s)..
Notes on contributors
Nguyen Hoang Nguyen is Senior Business Intelligence at ShopeePay, Vietnam (Sea Group – Singa-
pore) since May 2021. He received Master degree of Data Science at Texas Tech University in 2019.
His research interest is business intelligence based on big data, machine learning and data privacy.
Recently, he focuses on developing intelligent marketing forecasting systems by using machine
learning and deep learning methodologies.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 17
Duy Thien An Nguyen is taking Master by Research at University of Southern Queensland since 2021.
He received Master degree of Data Science at University of Southern Queensland in 2021. His
research interest is the application of data science in different industries. He is focusing on devel-
oping a data ecosystem for the development of an automated trading platform.
Bingkun Ma received Master degree of Data Science at Texas Tech University in 2019. His research
interest is the application of data science in finance and accounting.
Jiang Hu received PhD degree at Texas Tech University.
References
Adler, M. (1985). Stardom and talent. American Economic Review, 75(1), 208–212.
Apostolou, K., & Tjortjis, C. (2019). Sports Analytics algorithms for performance prediction. In 10th
International Conference on Information, Intelligence, Systems and Applications (IISA), PATRAS,
Greece. https://doi.org/10.1109/IISA.2019.8900754.
Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods
for balancingmachine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20–29.
https://doi.org/10.1145/1007730.1007735
Batuwita, R., & Palade, V. (2009). Micropred: Effective classification of pre-miRNAs for human miRNA
gene prediction. Bioinformatics (Oxford, England), 25(8), 989–995. 10.1093/bioinformatics/btp107
Berri, D. J., & Schmidt, M. B. (2006). On the road with the National Basketball Association’s superstar
externality. Journal of Sports Economics, 7(4), 347–358. https://doi.org/10.1177/
1527002505275094
Blagus, R., & Lusa, L. (2013). SMOTE for high-dimensional class-imbalanced data. BMC Bioin-
Formatics, 14(1), 106. https://doi.org/10.1186/1471-2105-14-106
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July 27 - 29). A training algorithm for optimal margin
classifiers. Proceedings of the 5th Annual Workshop on Computational Learning Theory
(COLT’92), Pittsburgh.
Bottou, L., & Bousquet, O. (2012). The tradeoffs of large scale learning. In S. Sra, S. Nowozin, & S. J.
Wright (Eds.), Optimization for machine learning (pp. 351–368). MIT Press. ISBN 978-0-262-01646-
9.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140. https://doi.org/10.1007/
BF00058655
Bromley, J., Bentz, J. W., Bottou, L., Guyon, I., Lecun, Y., Moore, C., Säckinger, E., & Shah, R. (1993).
Signature verification using a “siamese” time delay neural network. International Journal of
Pattern Recognition and Artificial Intelligence, 7(04), 669–688.
Brownlee, J. (2016). XGBoost with Python: Gradient boosted trees with XGBoost and Scikit-Learn (pp.
10–11). Machine Learning Mastery.
Burges, Christopher J. C. (1998). Data Mining and Knowledge Discovery, 2(2), 121–167. https://doi.
org/10.1023/A:1009715923555
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-
sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.
1613/jair.953
Chawla, N. V., Japkowicz, N., & Kolcz, A. (2004). Special issue learning imbalanced datasets, SIGKDD
Explor. Newsl, 6, 1–6. https://doi.org/10.1145/1007730.1007733
Chen, C., Liaw, A., & Breiman, L. (2004). Using random forest to learn imbalanced data. University of
California. 110: pp.1–12.
Cieslak, D. A., Chawla, N. W., & Striegel, A. (2006). Combating imbalance in network intrusion data-
sets. In Proceedings of the IEEE International Conference on Granular Computing, Atlanta, Georgia,
USA.
Colet, E., & Parker, J. (1997). Advanced scout: Data mining and knowledge discovery in NBA data.
Data Mining and Knowledge Discovery, 1(1), 121–125. https://doi.org/10.1023/A:1009782106822
Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression
machines. Advances in Neural Information Processing Systems, 9, 155–161.
18 N. H. NGUYEN ET AL.
Drummond, C., & Robert, C. H. (2003). C4. 5, class imbalance, and cost sensitivity: Why under-
sampling beats over-sampling. In Workshop on Learning from Imbalanced Datasets II, 11. Citeseer.
Fallahi, A., & Jafari, S. (2011). An expert system for detection of breast cancer using data pre-proces-
sing and Bayesian network. International Journal Advanced Science Technology, 34, 65–70.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an
application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.
org/10.1006/jcss.1997.1504
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). A review on ensembles for
the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE
Transactions on systems, Man, and Cybernetics. Part C (Applications and Reviews), 42(4), 463–
484. https://doi.org/10.1109/TSMCC.2011.2161285
Guo, H., & Herna, L. V. (2004). Learning from imbalanced data sets with boosting and data gener-
ation. ACM Sigkdd Explorations Newsletter, 6(1), 30–39. https://doi.org/10.1145/1007730.1007736
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and
Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression. Wiley-Interscience.
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference
framework. Journal of Computational and Graphical Statistics, 15(3), 651–674. https://doi.org/10.
1198/106186006X133933
Hulse, J. V., Khoshgoftaar, T. M., & Napolitano, A. (2007). Experimental perspectives on learning from
imbalanced data. In Proceedings of the 24th International Conference on Machine Learning (pp.
935–942). Oregon State University.
Humphreys, B. R., & Johnson, C. (2020). The effect of superstars on game attendance: Evidence from
the NBA. Journal of Sports Economics, 21(2), 152–175. https://doi.org/10.1177/1527002519885441
Kahn, J. (2003). Neural network prediction of NFL Football Games.
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced data sets: One-sided sampling. In
Proceedings of the 14th International Conference on Machine Learning (pp. 179–186). Morgan
Kaufmann.
Langley, P., Iba, W., & Thompson, K.. (1992). An analysis of Bayesian classifiers. The Tenth National
Conference on Artificial Intelligence, 223–228. AAAI Press. https://doi.org/10.5555/1867135.
1867170
Leung, C. K., & Joseph, K. W.. (2014). Sports data mining: Predicting results for the college football
games. Procedia Computer Science, 35, 710–719. https://doi.org/10.1016/j.procs.2014.08.153
Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions (1998).
Madhavan, V. (2016). Predicting NBA game outcomes with hidden Markov models. Berkeley University.
Maher, M. J. (1982). Modelling association football scores. Statistica Neerlandica, 36(3), 109–118.
https://doi.org/10.1111/j.1467-9574.1982.tb00782.x
Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv, 1801.00631.
McCabe, A., & Trevathan, J.. (2008). Artificial intelligence in sports prediction. Fifth International
Conference on Information Technology: New Generations (itng 2008), 1194–1197. https://doi.org/
10.1109/ITNG.2008.203
Miljković, D., Gajić, L., Kovačević, A., & Konjović, Z. (2010). The use of data mining for basketball
matches outcomes prediction. In IEEE 8th International Symposium on Intelligent Systems and
Informatics, Subotica. https://doi.org/10.1109/SISY.2010.5647440.
Nguyen, N., Ma, B., & Hu, J. (2020). Predicting National Basketball Association players performance
and popularity: A data mining approach. Computational Collective Intelligence. ICCCI 2020, Da
Nang, Nov 28 - Dec 3. Lecture Notes in Computer Science, vol 12496. Springer, Cham. https://
doi.org/10.1007/978-3-030-63007-2_23.
Pifer, N. D., Mak, J., Bae, W., & Zhang, J. (2015). Examining the relationship between star player
characteristics and brand equity in professional sport teams. Marketing Management Journal,
25, 88–106.
Releases, Forbes Press. (2019, February 6). Forbes releases 21st annual NBA team valuations. Forbes.
Retrieved May 26, 2021, from www.forbes.com/sites/forbespr/2019/02/06/forbes-releases-21st-
annual-nba-team-valuations/?sh=72543d3511a7
JOURNAL OF INFORMATION AND TELECOMMUNICATION 19
Rotshtein, P., Posner, M., & Rakityanskaya, A. B. (2015). Football predictions based on a fuzzy model
with genetic and neural tuning. Cybernetics and Systems Analysis, 41(4). https://doi.org/10.1007/
s10559-005-0098-4
Schwenk, H., & Bengio, Y.. (1997). Artificial Neural Networks — ICANN’97. ICANN 1997. Lecture Notes
in Computer Science, Vol. 1327. Berlin: Springer. https://doi.org/10.1007/BFb0020278
Thabtah, F., Zhang, L., & Abdelhamid, N. (2019). NBA game result prediction using feature analysis
and machine learning. Annals of Data Science, 6(1), 103. https://doi.org/10.1007/s40745-018-
00189-x
Tichy, W. (2016). Changing the Game: ‘Dr. Dave’ Schrader.
Wirth, R., & Hipp, J. (2000, April). CRISP-DM: Towards a standard process model for data mining. In
Proceedings of the 4th International Conference on the Practical Applications of Knowledge
Discovery and Data Mining, Citeseer.
Yan, X., & Su, X. (2009). Linear regression analysis: Theory and computing (pp. 2–3). https://doi.org/10.
1142/6986
Yanofsky, N. (2015). Probably approximately correct: Nature’s algorithms for learning and prosper-
ing in a complex world. Common Knowledge, 21(2), 340–340. https://doi.org/10.1215/0961754X-
2872666
Zimmermann, A., Moorthy, S., & Shi, Z. (2013). Predicting college basketball match outcomes using
machine learning techniques: some results and lessons learned. arXiv preprint arXiv:1310.3607.
Zuccolotto, P., Manisera, M., & Sandri, M. (2018). Big data analytics for modelling scoring probability
in basketball: The effect of shooting under high-pressure conditions. International Journal of
Sports Science & Coaching, 13(4), 569–589. https://doi.org/10.1177/1747954117737492