0% found this document useful (0 votes)

152 views11 pages

Basketball Prediction Using Multiple Regression As A Data Model in Predicting The Outcome of Game

Uploaded by

Danna Jade Canoza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views11 pages

Basketball Prediction Using Multiple Regression As A Data Model in Predicting The Outcome of Game

Uploaded by

Danna Jade Canoza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Basketball Prediction using Multiple Regression as a

Data Model in Predicting the Outcome of Game

John Ian Paulo I. Lumbao1, Kristen Kaye V. Sindac2, Fatima Corrine Angel B. Paed3,
Juan Miguel M. Gonzaga4, Danna Jade Y. Canoza5, Eduardo P. Pablo Jr.6, Herminiño C.
Lagunzad7, Jake M. Libed8, Mary Rose V. Hilarion9 and Arnold S. De Guzman10
123456
Student, College of Information Technology, The National Teachers College
Quiapo, Manila
1
[email protected], 2 [email protected], 3 [email protected], 4 [email protected], 5
[email protected], 6 [email protected]

7 8 9 10
IT Instructor, College of Information Technology, The National Teachers College
Quiapo, Manila
7
[email protected], 8 [email protected], 9 [email protected], 10
[email protected]

Abstract
In basketball arena, analyzation of a team performance is one of the difficult tasks for the coaches. Categorizing the
factors that would result for the winning of their respective games is a crucial aspect for every team. In this study, the
researchers employ the use of multiple regression in identifying the relevant variables leading to a successful
prediction process of winning a basketball game.

Keyword
data mining, multiple regression, classification, basketball prediction

1. Introduction
Technology and application software are growing fast each day and one of this technology is sport data
mining, in sports they use this technology as an analysis and a basis to know the statistics and capability of
a players and to predict the outcomes of the game. Data mining (DM) is the process of finding useful
association, pattern and trend from many data. It is also a process in extracting information of implicit and
people didn't know, but it is potentially useful information and knowledge, from many incomplete, noisy,
and fuzzy, random data [1]. DM systems aim to assist coaches, players and other sports enthusiast not only
for prediction, but also for evaluating player performance. Predicting game results has become widely
popular among sports fans, especially basketball fans, throughout the world [2].

Classification is a data mining technique that assigns items in a collection to target categories or classes.
The aim of classification is to predict the target class for each case in the data accurately. Classification is
important when a data repository contains samples that can be used as the basis for future decision making.
There are several classification techniques which can be used for decision support system [3].

Regression is a statistical technique to determine the linear relationship between two or more variables.
Regression is primarily used for prediction and causal inference. In its simplest (bivariate) form, regression
shows the relationship between one independent variable (X) and a dependent variable (Y), as in the
formula [4]: Y = β0 + β1X + u
2. Review of Related Literature and Studies
Multiple Regression is a statistical tool that allows you to examine how multiple independent variables are
related to a dependent variable. Once you have identified how these multiple variables relate to your
dependent variable, you can take information about all the independent variables and use it to make much
more powerful and accurate predictions about why things are the way they are. This latter process is called
“Multiple Regression” [5].

Equation 1 discuss the formula needed for our computation.

Y= a + b1X1 + b2X2
Y = a predicted value of Y (which is the dependent variable) (1)
a = the “Y Intercept”
b1 = the change in Y for each 1 increment change in X 1
b2 = the change in Y for each 1 increment in X 2

Lopez, M. and Matthews, G. (2014) used data from the 2014 men's basketball tournament and more than
400 predictions of game outcomes submitted to a contest hosted by the website Kaggle. The researchers
built a prediction model for men's basketball tournament outcomes under the binomial log-likelihood loss
function. Under different sets of true underlying game probabilities, they simulate tournament outcomes
and imputed pool standings, to determine how much of an entry's success can be attributed to luck. While
one of the two submissions finished first in the Kaggle contest, the researchers estimate that this winning
entry had no more than about a 12% chance of doing so, even under the most optimistic of game probability
scenarios [6].

Arnold, T. and Godbey, J. (2012) the researchers conduct a study that aims to determine what are the key
factors that affects the basketball game. In this study, they gather data from the 12 players for the basketball
team of Georgia State University. Then, they considered using multiple regression as a tool in analyzing
the data of all the players. The researchers came up with analysis and suggest the following variables that
they used in analyzing the game. This are: points per game, minutes per game and rebound per game. Based
on the analyze data, they came up with a good result in predicting the outcome of a basketball game [7].

Yuan, LH, et al. (2015) this study aims to predict or forecast the outcome of the NCAA men’s basketball
tournament, which spans 63 games over 3 weeks. Statistical prediction of game outcomes involves a
multitude of possible covariates and information sources, large performance variations from game to game,
and a scarcity of detailed historical data. In this paper, the researchers considered the used of logical
regression to present the results of a team of modelers working together to forecast the 2014 NCAA men’s
basketball tournament. The researchers present not only the methods and data used, but also several novel
ideas for post-processing statistical forecasts and decontaminating sources [8].

3. Research Work
3.1 The Basketball Game System Architecture
This study shows a system architecture that can be used and it is shown on Figure 1. It also consists of
major phase such as (1) Data Preprocessing; (2) Apply Multiple Regression; and (3) Prediction.
Figure 1. Basketball Game Prediction System Architecture

3.1.A Data Preprocessing

It consists of (1) Historical Data; and (2) Data Cleaning. This will filter all the needed data in our study.

3.1.A.a Historical Data

Table 1 shows a sample raw data and team statistics for the one previous game on National Collegiate
Athletic Association Season 92 of 2016 [9].

Table 1. Sample Raw Data for One Game only

3.2 Data Cleaning
Table 2 shows a complete NCAA basketball game on season 92 of 2016 for one team only and it contain
18 games [9].

Table 2. Statistics for the NCAA Season 92 Basketball Game

3.2.A Variable Table

The researchers came up with a variable needed for this study and it is shown in table 3.

Table 3. Variable Name and Description

Variable Description
Points Total number of points
Free Throw % Total free throw
percentage
Rebound Total number of rebound

3.2.A.a The Dependent (y) and Independent Variable (x1) and (x2)

Table 4 displays the selected and needed data for Dependent Variable which is Points and Independent
Variable such as Free Throw% and Rebound.
Table 4. Dependent and Independent Variable

3.2.A.b Points and Free Throw % Scatterplot

As shown in figure 2 the dependent variable points and independent variable free throw has a very strong
linear relationship.

Figure 2. Points(y) and Free Throw %(x1) Variable Scatterplot

3.2.A.c Points and Rebound Scatterplot

Figure 3 shows that dependent variable points and independent variable rebound has slight linear
relationship.
Figure 3. Points(y) and Rebound(x2) Variable Scatterplot

3.3 Apply Multiple Regression

Equation 2 discuss the step by step process in computing the intercept and coefficient of each variable [10].

Y = a + 𝑏1 𝑥1 + 𝑏2 𝑥2 + 𝑏3 𝑥3……. 𝑏𝑖 𝑥𝑖
Y = Dependent
X = Independent Variable (2)
a = Intercept
b = Coefficient
ŷ = a + b1 x1 + b2 x2

The result of the coefficient after the computation is shown in equation 3 [10].

𝑦̂ = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2
Values of 𝑏1 , 𝑏2 , 𝑎
b1 = 0.3269129 (3)
b2 = -0.42211
a = 32.0068

Table 5 shows the result of the computation for the value of x1y, x2y, x1x2, x12, x22 and y2. To get the
result, the computation will be: (x1*y), (x2*y), (x1*x2), (x1)2, (x2)2 and (y)2.

Table 5. Compute for the value of x1y, x2y, x1x2, x1 2, x22 and y2
The result for the computation of Sum and Average and it is shown in table 6. To get the sum, we need to
sum up each variable and data like points, filed goal%, rebound, x1y, x2y, x1x2, x12, x22 and y2. And get
also the average of each variable and data to be use in the next process.

Table 6. Sum and Average

The formula will be used to solve for the value of dependent variable (y) and independent variable (x1)
and (x2) as shown in equation 4 [10].

Two (2) Independent Variables:

(∑𝑦)2
∑𝒚 𝟐 = ∑𝑦 2 −
𝑁
(𝛴𝑥 )1 (𝛴𝑦 )
𝜮𝒙𝟏 𝒚 = 𝛴𝑥1 𝑦 −
𝑁
(∑𝑥1)2
∑𝒙𝟏 𝟐 = ∑𝑥1 2 − (4)
𝑁
(∑𝑥 ) (𝛴𝑦 )
2
𝜮𝒙𝟐 𝒚 = 𝛴𝑥2 𝑦 −
𝑁
(∑𝑥2)2
∑𝒙𝟐 𝟐 = ∑𝑥2 2 −
𝑁
(∑𝑥 ) (∑𝑥 )
1 2
𝜮𝒙𝟏 𝒙𝟐 = 𝛴𝑥1 𝑥2 −
𝑁

Equation 5 shows the result for the computation of dependent variable (y) and independent variable (x1)
and (x2).

Ʃy² = 100847 - 1329 * 1329 / 18

Ʃy² = 2722.5
Ʃx1² = 72298.0984 –1104.76 * 1104.76 / 18
Ʃx1² = 4492.83964
Ʃx2² = 48786 – 928 * 928 / 18 (5)
Ʃx2² = 942.4444444
Ʃx1Y =83037.98 – 1104.76 * 1329 / 18
Ʃx1Y = 1469.866667
Ʃx2Y = 68916 – 928 * 1329 / 18
Ʃx2Y = 398.6667
Ʃx1x2 = 56959.12 – 1104.76 * 928 / 18
Ʃx1x2 = 2.604444444

The last formula in getting the value for a = intercept and b1 & b2 = coefficient and it is shown in
equation 6 [10].
b1 = (∑𝑥2 2)(𝛴𝑥𝐼 𝑦) − (∑𝑥1 𝑥2 ) ( 𝛴𝑥2 𝑦)
__________________________________
2
(∑𝑥1 2) (∑𝑥2 2) - (𝛴𝑥1 𝑥2 ) (6)

b2 = (∑𝑥1 2)(𝛴𝑥2 𝑦) − (∑𝑥1 𝑥2 ) ( 𝛴𝑥1 𝑦)

__________________________________
2
(∑𝑥1 2) (∑𝑥2 2) - (𝛴𝑥1 𝑥2 )

a = 𝑦̅ − 𝑏1 𝑥̅1 − 𝑏2 2

Equation 7 displays the result of the computation for the coefficient b.

b1 = ((942.4444444 * 1469.866667) – (2.604444444 * 398.6667)) / ((4492.83964 * 942.4444444) –

(2.604444444 * 2.604444444))
b1 = 0.3269129 (7)

For the result of coefficient b as computed and it is shown in equation 8.

b2 = ((4492.83964 * 398.6667) – (2.604444444 * 1469.866667)) / ((4492.83964 * 942.4444444) –

(2.604444444 * 2.604444444))
b2 = 0.42211 (8)

Equation 9 shows the result of the computation for the intercept.

a = 73.83333333 – (0.3269129 * 61.37555556) – (0.42211 * 51.55555556)

a = 32.0068 (9)

Figure 7 shows the result for the coefficient value of a = Intercept, b1 = Free Throw% and b2 = Rebound.

Table 7. Coefficient Value

3.4Prediction
This is the final process where you can predict points based on the free throw% and rebound. A formula
and computation in predicting points is shown in equation 10.

Predicted Points = a + b1 * x1 + b2 * x2 (10)

Equation 11 shows the result after the computation process for the predicted points.

Predicted Points = 32.0068 + 0.3269129 * 63 + 0.42211 * 40 = 69.4867

Predicted Points = 32.0068 + 0.3269129 * 50 + 0.42211 * 45 = 67.3474
Predicted Points = 32.0068 + 0.3269129 * 100 + 0.42211 * 65 = 92.1352 (11)
Predicted Points = 32.0068 + 0.3269129 * 45 + 0.42211 * 35 = 61.4917
Predicted Points = 32.0068 + 0.3269129 * 25 + 0.42211 * 23 = 49.8881
As illustrated in table 8, shows the computation for free throw%, rebound and predicted points.

Table 8. Simulation Result

4. Conclusion and Recommendation

Based on the result, the researchers came up with a good prediction that will help those basketball coaches
and players in their next season game. Combining the historical data and implementing multiple regression
in this research made this more interesting. As evidence the following result are as follows: (1) if free
throw% is equal to 63 and rebound is equal to 40 the predicted points is 69; (2) if free throw% is equal to
50 and rebound is equal to 45 the predicted points will be 67; (3) if the free throw% is equal to 100 and
rebound is equal to 65 to predicted points is 92; (4) if free throw% is equal to 45 and rebound is equal to 35
the predicted points will be 61; and (5) if the free throw% is equal to 25 and rebound is equal to 23 the
predicted points will be 49. For the future work, the researchers would like to suggest of having more than
two independent variables so that they can predict well. Because in basketball theirs a lot of data to be
considered, like field goal percentage, assist, block, turnover etc.

5. References

[1] Tian, L. and Wang, F. 2013. Design and Application of Basketball Tactics Analysis Based on the Database
and Data Mining Technology. Proceedings of the 2nd International Conference on Computer Science and
Electronics Engineering (ICCSEE 2013)
[2] Haghighat, M., Rastegari, H. and Nourafza, N. 2013. A Review of Data Mining Techniques for Result
Prediction in Sports. ACSIJ Advances in Computer Science: an International Journal. Volume 2, Issue 5,
Number 6, ISSN:2322-5157
[3] Vaghela, C., Bhatt, N. and Mistry, D. 2015. A Survey on Various Classification Techniques for Clinical
Decision Support System. International Journal of Computer Applications (0975 – 8887). Volume 116,
Number 23
[4] Campbell, D. and Campbell S. Statlab Workshop Introduction to Regression and Data Analysis.
http://statlab.stat.yale.edu/wo rkshops/IntroRegression/StatLab-IntroRegre ssionFa08.pdf [retrieved:
February, 2017]
[5] Introduction to Multiple Regression. http://www.biddle.com/documents/bcg_comp_chapter4.pdf [retrieved:
February, 2017]
[6] Lopez, M. and Matthews, G. 2014. Building an NCAA mens basketball predictive model and quantifying its
success. https://arxiv. org/pdf/1412.0248.pdf [retrieved: February, 2017]
[7] Arnold, T. and Godbey, J. 2012. Introducing Linear Regression: An Example Using Basketball Statistics.
Journal of Economics And Finance Education Volume 11, Number 2
[8] Yuan, LH., Liu, A., Yeh, A., Kaufman, A., Reece, A., Bull, P., Franks, A., Wang, S., Illushin, D. and Bornn,
L. 2015. A mixture-of-modelers approach to forecasting NCAA tournament outcomes. Journal of
Quantitative Analysis in Sports, Volume 11, Issue 1, 2015, pp 13–27
[9] “National Collegiate Athletics Association Basketball Statistics” http://ncaaph.org/bask etball/statistics/
[retrieved: February, 2017]
[10] “Elements of Multiple Regression Analysis: Two Independent Variables” http://jonathan
templin.com/files/regression/ersh8320f07/ersh8320f07_06.pdf [retrived: February, 2017]

Authors
John Ian Paulo I. Lumbao
4th Year Bachelor of Science in Information Technology
Former President, Computer Society Club
The National Teachers College