Predicting NBA Games Using Neural Networks
Predicting NBA Games Using Neural Networks
Sports
Volume 5, Issue 1 2009 Article 7
∗
Air Force Institute of Technology, [email protected]
†
Air Force Institute of Technology, [email protected]
‡
Air Force Institute of Technology, [email protected]
Copyright
2009
c The Berkeley Electronic Press. All rights reserved.
Predicting NBA Games Using Neural
Networks
Bernard Loeffelholz, Earl Bednar, and Kenneth W. Bauer
Abstract
In this paper we examine the use of neural networks as a tool for predicting the success of
basketball teams in the National Basketball Association (NBA). Statistics for 620 NBA games
were collected and used to train a variety of neural networks such as feed-forward, radial basis,
probabilistic and generalized regression neural networks. Fusion of the neural networks is also
examined using Bayes belief networks and probabilistic neural network fusion. Further, we inves-
tigate which subset of features input to the neural nets are the most salient features for prediction.
We explored subsets obtained from signal-to-noise ratios and expert opinions to identify a sub-
set of features input to the neural nets. Results obtained from these networks were compared to
predictions made by numerous experts in the field of basketball. The best networks were able
to correctly predict the winning team 74.33 percent of the time (on average) as compared to the
experts who were correct 68.67 percent of the time.
KEYWORDS: feed-forward neural networks, radial basis functions, probabilistic neural network,
generalized regression neural networks, Bayesian belief networks, fusion, signal-to-noise ratio,
basketball
Loeffelholz et al.: Predicting NBA Games Using Neural Networks
1. Introduction
Since the winter of 1891, following the day Dr. James Naismith nailed a peach
basket into a gym wall, basketball has evolved into a true American game
(NMHOF, 2008). Nearly 270,000 people, each game day, attend basketball
arenas around the country to watch the best of the best sweat, hustle, and entertain
(Ibisworld, 2008). Along with watching the games, millions of fans are involved
with the ever growing arena of fantasy basketball leagues and other gambling
alternatives. Involvement in these leagues and gambling precipitates the desire to
know which team will win before they even participate in a game.
In this paper we examine the use of neural networks as tool for predicting
the success of basketball teams in the National Basketball Association (NBA).
Further, we investigate which subset of features input to the neural nets are the
most salient features for prediction.
2. Previous Research
The research performed in this paper is similar in nature to that found in the
works of Purucker (1996), Kahn (2003), and Calster et al. (2008). Purucker
(1996) applied back-propagation, self-organizing maps (SOMs) and other neural
structures to predict the winner of a football game in the National football League
(NFL). Purucker investigated different training methods and determined back-
propagation would be best to develop a model with greater predictive accuracy
than various experts in the field of football predictions. Purucker achieved 61%
accuracy as compared to the 72% accuracy of the experts.
Kahn (2003) extended the work of Purucker (1996) to strengthen the
model developed to predict NFL football games. Back-propagation was also
employed by Kahn, but a different learning mechanism as well as network
structure was applied. Kahn was able to attain 75% accuracy, performing far
better than Purucker and slightly better than the experts in the field. Both authors
used differential statistics from the box scores rather than raw statistics.
Calster et al. (2008) applied Bayesian Belief Networks (BBNs) in the area
of professional soccer. These authors investigated causes of unwanted draws that
occur in soccer. These authors confirmed that Bayesian networks are beneficial if
prior information or probabilities are known.
3.1 Data
The database used in this research consists of box scores from NBA games played
in the 2007-2008 season. 620 games of the season are used for the training and
test samples and 30 games are used as the validation set to represent “un-played”
games. With 30 teams being present in the league, this amounts to each team
playing about 20 games or a quarter of a season. This amount of games per team
should be representative of how each team is expected to perform on a future
game. The first 650 games of the season were chosen since the least amount of
transactions and injuries occur during this time. The trade deadline is not until
around mid-season, therefore the average statistics should reflect each team
accurately. Also, staying within the season was important to capture each team’s
averages and avoiding any off-season trades or improvements that may drastically
change a particular team’s performance. The box scores were downloaded from
www.espn.com using Microsoft Excel to organize and separate the data. Figure 1
displays the typical box score downloaded.
The information extracted from the box score came from the bottom three
lines. Table 1 provides clarification for each statistic. Information on the team
totals in the game, as well as their home/away situation, were the only features
used to conduct the neural network analysis. Table 2 displays an example set of 6
http://www.bepress.com/jqas/vol5/iss1/7 2
DOI: 10.2202/1559-0410.1156
Loeffelholz et al.: Predicting NBA Games Using Neural Networks
games where each exemplar consists of the away team statistics, the home team
statistics, and the winner of the game (1 for away and 2 for home).
set are used in the prediction or validation set of the neural network. A specific
team’s average statistics typically provide insight into performance. This also
allows for the model to be easily updated as the season is played. In this case, the
future games that have been played are entered into the training set and the new
current season averages are used for the games not yet played.
Two other schemes, with respect to predicting the “un-played” games,
were also applied to the model but were found to be no more beneficial than the
overall current season averages. The first technique employed the current season
average of each team with respect to being either at home or away. This
information provided insight into a team’s performance in the home stand as well
as in an opponent’s arena. The second technique used only the average of the
previous five games played by each team. This data set allowed considering those
teams that might be on hot or cold streaks. It also attempts to capture any
devastating injuries or sicknesses.
The final data collected were the experts’ opinions on who would win.
Numerous experts exist in the field and any could be chosen. If one was
interested in betting on the game, the Las Vegas Line might be considered the
expert to beat. Rather than betting on the line, this research used the experts’
opinions as to who was favored to win. The experts’ opinions were derived from
USA Today (2008) by examining the favorite to win each game. Figure 2
displays a typical layout of the sports information odds page provided by USA
Today (2008). The matchup is printed with the home team always on the bottom.
As discussed previously, other alternatives in measuring the success of an expert’s
opinion exist, but using the favorite and underdog proved to be the most common
method. Any one of the five betting sources could be used, but in this research,
the team favored by a majority of the experts was considered the favorite. The
item that is of interest is the highlighted number. Only the sign of a highlighted
number was considered. A negative sign indicates the home team is the favorite
to win while a positive number indicates the away team is the favorite. For
instance, Figure 2 reflects that Detroit and San Antonio are the experts’ opinions
on who should win.
http://www.bepress.com/jqas/vol5/iss1/7 4
DOI: 10.2202/1559-0410.1156
Loeffelholz et al.: Predicting NBA Games Using Neural Networks
Boston 174.5o -110 176o -110 176o -110 176o -107 176o -110
Detroit -5.5 -110 -6.5 -110 -6 -110 -6.5 100 -6 -110
L.A. Lakers 192o -110 192o -110 192o -110 192o -110 192o -110
San Antonio -4.5 -110 -4.5 -110 -4 -110 -4 -110 -4 -110
Neural networks are a powerful tool in pattern recognition (Verikas et al., 2002).
Computers have advanced tremendously in recent years allowing for quicker
results to be obtained from neural networks using massive data sets. The
elements upon which a neural network is constructed is three-fold: The structure
of the network, the training method, and the activation function (Purucker, 1996).
Four different neural networks are examined in this research. The first
neural network examined was the feed-forward network. We applied a log-
sigmoid transfer function, given as Equation 1 below, as the activation function
(Matlab©).
1
log sig (n) = (1)
(1 + e − n )
To construct the network, one hidden layer was created. Accounting for
the number of hidden neurons within this layer is an “art” but some research has
been accomplished in this area; see for instance (Steppe, et al., 1996). Most
researchers appear to resort to simple trial and error, as did we, given the
relatively small dimensionality of our problem. A large disadvantage of the feed-
forward network is its relatively slow training. Figure 3 provides a graphical
representation of a typical neural network. A noise node is added to the input
layer, which is primarily used in the SNR feature selection method explained
below.
http://www.bepress.com/jqas/vol5/iss1/7 6
DOI: 10.2202/1559-0410.1156
Loeffelholz et al.: Predicting NBA Games Using Neural Networks
⎛ J 1 2 ⎞
⎜ ∑ (wi, j ) ⎟
SNRi = 10log10 ⎜ J j ⎟
=1
(2)
⎜ 2 ⎟
⎜ ∑ (wNoise, j ) ⎟
1
⎝ j =1 ⎠
where SNR i is the value of the saliency measure for feature i and J represents the
number of hidden nodes. All the weights are first layer weights and go from
either node i to node j or from the noise node to node j. The use of 10 log10 of
the summations transforms the saliency measure to a decibel scale. After training,
the SNR is calculated for each feature. The feature with the smallest SNR is
eliminated from the data set. This process is repeated until a smaller subset of the
entire data set is generated without losing too much information.
Radial basis functions (RBFs) are a newer and more powerful network
than the feed-forward approach; however, the two are similar (Looney, 1997: 96).
Neurons in this network are used to create a smooth function to represent the data
set. A network is produced with as many hidden neurons as there are feature
vectors in the training set. These networks require some user provided
information such as a parameter called the ‘spread”. Determining the spread, or,
for that matter, the number of neurons in the network, is usually determined by
trial and error or using heuristic procedures. Probabilistic neural networks
(PNNs) and generalized neural networks (GRNNs) are types of radial basis
networks and were also applied in this work.
3.4. Fusion
A single neural network can provide useful and powerful results in classifying
new exemplars based on a training set. However, the use of fusion can help one
neural network complement another neural network. The idea is to integrate the
contributions of each neural network to obtain a new decision context that one
would hope to be more accurate than the networks operating separately. Figure 4
provides a layout of a typical fusion technique.
There are a variety of fusion rules available (Rolli, 2002). In this research,
two different fusion rules were applied The first fusion method applied was the
Bayesian belief network (BBN). Rodriguez et al. (2008) provide an explanation
on how Bayesian model averaging (here the term refers to a BBN) merges several
multi-class classifiers. The Bayes Net Toolbox (Murphy, 2007) for Matlab© was
used to perform the computations. The user specifies the local conditional
probability distribution (CPD) for a classification model, Mk, where k is one of K
classifiers and M is the set of all classifiers. The CPD of each model Mk is
p(Mk|T), which represents the probability that a classification model will classify a
target instance T. For example, given a target that is a home team win (or an
away team win), p(Mk|T = Home Team Win) represents the probability
distribution over all of the possible classifications Mk could make, i.e., home team
win or away team win. In our implementation, the confusion matrix, which
represents the correct and incorrect classifications for a multi-class classifier,
provides this information for each classifier.
The fusion process uses the classifications from the classification models
(M), in conjunction with Bayes Rule, to compute the posterior probability for each
target classification T = c (c = home team win or away team win):
K
p (T = c | M ) = ∏ p ( M k | T = c) p (T = c)
k =1
The final classification is designated as the target classification T = c with the
highest probability. The prior probabilities p(T) are calculated based on the
number of home/away team wins used in the testing.
http://www.bepress.com/jqas/vol5/iss1/7 8
DOI: 10.2202/1559-0410.1156
Loeffelholz et al.: Predicting NBA Games Using Neural Networks
Bayesian
Model
Home Team Win / Away Team Win
The second fusion rule scheme in this research was probabilistic neural
network (PNN) fusion (Leap et al., 2007). In this fusion technique, posterior
probabilities are obtained from each neural network and then fused, as features, to
a new PNN.
To conduct the analysis for this research, the neural network toolbox
provided in Matlab© is used as a primary tool along with user developed code for
fusion techniques. Matlab© provides further documentation in the understanding
the workings of each neural network and instructions for implementation.
4. Results
The four neural networks discussed in the previous section were applied to the
entire data to set a baseline for comparison. These networks were the FFNN, the
RBF, the PNN and GRNN, respectively. 620 games were used to train and 30
“un-played” games were the validation set. This differs from subsequent fusion
techniques in that the 620 games were split up into a training set and a test set.
The test set was necessary to create posterior probabilities for the fusion rules.
All four neural networks were used as classifiers for both fusion methods.
For purposes of the analysis, 10 different training and validation sets were
created to obtain a fairly accurate estimate of each neural network’s performance.
The first training set and validation set were constructed deliberately to represent
the first 620 games and then the next 30 games respectively. This allows for the
idea that one would be able to predict the next week’s game winners based on the
previous two months worth of games played. The next nine training and
validation sets were constructed randomly using the same 650 games. However,
no game was used more than once in the validation sets. This 10-fold cross
validation should provide accurate estimates of the neural network performance.
Results from the 10-fold average will be presented as well as results from the first
validation set.
For each validation set, an average is gained from expert opinion to
determine how well the neural networks compare against expert opinion. For the
first validation set, the experts correctly picked 70% of the ultimate winners. This
means that the experts incorrectly “guessed” 9 games out of the possible 30. In
terms of all 10 experiments run, the experts were correct 68.67% of the time.
The following sections report the results for three different sets of
analyses. The first set details the neural network analysis, including fusion, using
all the statistics collected from the box scores as the feature set. The second
analysis considers the feature set as suggested by the SNR method. The final
analysis considers two different subsets of the feature set, each consisting only of
shooting statistics, as suggested by experts. One subset will use all six shooting
statistics while the second subset only uses four of the six statistics.
Using the entire feature set (all 22 game statistics as features), a maximum
(average) accuracy of 71.67% was achieved for the baseline testing while the first
validation set achieved the same results as the experts. Table 2 summarizes each
of the individual neural network results as well as the fusion results. Once again,
all 620 games were used to train each network and the 30 “un-played” games
were used as a validation set to determine the accuracy. For the fusion rules, 400
games were used for the training set and the remaining 120 games were used in
the test set to determine posterior probabilities. The same 30 “un-played” games
were used for the validation set. The results are given in Table 3 and are
presented in percentage values.
http://www.bepress.com/jqas/vol5/iss1/7 10
DOI: 10.2202/1559-0410.1156
Loeffelholz et al.: Predicting NBA Games Using Neural Networks
maintaining similar accuracy, or to reduce the data set size and increase the
overall accuracy of the prediction model. The first technique applied was the use
of SNR. After implementing SNR on the data, a new subset consisting of TO and
PTS for both the away team and the home team was suggested: This allowed us
to reduce the number of features to only four variables, showing an emphasis on
turnovers and points per game. Overall, the performance for the average
accuracy experienced a slight decline from the previous feature set, but each
neural network and fusion method tested proved to be better than the expert
opinion. Also, in terms of validation set 1, every neural network now achieved
the same results as expert opinion. Results for the SNR tests are shown below in
Table 4.
As seen by the results in Table 5, every neural network and fusion method,
with the exception of RBF, saw around a four percent increase in accuracy over
the 10 validation sets compared to the experts. Also, each of these networks and
fusion methods saw an increase compared to using all the features or the SNR
feature set. It should also be noted that in validation set 1, all the neural networks
and fusion methods performed better than the experts (most by 10%).
Further reduction was performed on the six-variable data set. Factor
Analysis was conducted and it appeared that the true dimensionality of this subset
was four. FG and 3P for both the home and way team are thought of as one
variable, with an emphasis placed on FG due to a heavier loading. This is
expected since the field goal percentage is affected by the three point percentage.
Using this information, the data set was further reduced to only include FG and
FT for each team. In this case, all the neural networks and fusion methods saw an
increase in performance, most notably the RBF. Table 6 presents the results
obtained using only the four shooting variables in the feature set.
http://www.bepress.com/jqas/vol5/iss1/7 12
DOI: 10.2202/1559-0410.1156
Loeffelholz et al.: Predicting NBA Games Using Neural Networks
5. Conclusions
As seen in the results sections, FFNNs, PNNs and GRNNs appear to be good
neural network models to use in predicting the outcome of future NBA games.
Overall, all four networks were able to match or perform better than the experts’
opinions/models.
Table 8 summarizes the use of neural network models coupled with
fusion, with respect to all tested data sets in terms of the average of all 10
validation sets and Table 9 summarizes the results using validation set 1. Even
though fusion did not yield a better accuracy than the networks alone, its use
could still be applied in future research. The fusion model may not have provided
better results due to the small size of the validation set, as well as less data to
train. Very few games were misclassified using the optimal data set and fusion
was unable to correctly classify these games due to small numbers. It is also
important to note that the optimal accuracy obtained in this research was using
only four statistics to obtain an average accuracy of 74.33% (using FFNN) and in
terms of validation set 1, an optimal of 83.33% was achieved.
Experts 70 70 70 70
The neural networks in this research were capable of using common box
score statistics to accurately classify the outcome of an “un-played” game.
Extensions to this research can be made off the baseline model to add or delete
more features. The models created could also be adjusted to determine if the
classification can beat the Las Vegas Line rather than classify which team will
win. Overall, these models have shown that they can achieve up to a 5.66 percent
better (average) accuracy using FFNN of only four variables or a 13.33 percent
better accuracy (in validation set 1) than the experts in the sport of basketball.
6. References
Bauer, K.W., Alsing, S.G., and K.A. Greene, Feature Screening Using Signal-to-
Noise Ratios, Neurocomputing, 2000, 31, 29-44.
Calster, B.V., Smits, T., and S.V. Huffel, The Curse of Scoreless Draws in
Soccer: The Relationship with a Team’s Offensive, Defensive, and
Ooverall Performance, Journal of Quantitative Analysis in Sports, 2008, 4,
1, article 4.
Ibisworld Inc., United States Basketball League, Incorporated- Company Report
US, 2008, available on
http://www1.ibisworld.com/enterprise/retail.aspx?entid=51530&chid=1&r
cid=1, (accessed on May 25, 2008).
Kahn, J., Neural Network Prediction of NFL Football Games, 2003, available on
http://homepages.cae.wisc.edu/~ece539/project/f03/kahn.pdf, (accessed on
April 1, 2008).
Leap, N.J., Clemans, P.P., Bauer, K.W., and Oxley, M.E., An Investigation of the
Effects of Correlation and Autocorrelation on Classifier Fusion and
Optimal Classifier Ensembles, International Journal of General Systems-
Intelligent Systems Design, 2007, forthcoming.
Looney, C.G., Pattern Recognition Using Neural Networks, 1997, Oxford
University Press, New York.
http://www.bepress.com/jqas/vol5/iss1/7 14
DOI: 10.2202/1559-0410.1156
Loeffelholz et al.: Predicting NBA Games Using Neural Networks