0% found this document useful (0 votes)
46 views6 pages

Iml Report

This document provides an overview of a basketball game prediction project that uses machine learning techniques. The project uses historical NBA game data to build and train random forest classifiers to predict winners of games. Key inputs for the models include statistics like points, rebounds, assists, and field goals. The models were refined using feature selection methods to identify the most important predictive factors. The goal is to accurately forecast outcomes and better understand dynamics of NBA games.

Uploaded by

Tushar kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views6 pages

Iml Report

This document provides an overview of a basketball game prediction project that uses machine learning techniques. The project uses historical NBA game data to build and train random forest classifiers to predict winners of games. Key inputs for the models include statistics like points, rebounds, assists, and field goals. The models were refined using feature selection methods to identify the most important predictive factors. The goal is to accurately forecast outcomes and better understand dynamics of NBA games.

Uploaded by

Tushar kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1

BASKETBALL GAME PREDICTOR


Machine Learning Course Project

V Semester , Bachelor of Technology , Information Technology


Indian Institute of Information Technology , Allahabad , Prayagraj

Tushar Kumar (IIT2021203), Yuvraj Jindal (IIT2021161), Parth Garg(IIT2021116) , Jinam


Jain(IIT2021180), Rishika Rajput(IIT2021117), Sakshi khokhar
(IIT2021108)

Abstract— The Basketball Game Prediction using basketball is perceived as a sport that is primarily
machine learning is an innovative application that player driven, forecasting the results of NBA games
harnesses advanced algorithms to forecast the is particularly interesting. The prevailing stigma is
winning of game during the National Basketball that winning games requires a superstar. Teams are
Association tournaments. This cutting-edge adopting and utilizing increasingly sophisticated
system analyse a wide variety of factors Historical statistics. In this project, we employ Random Forest
data, which allows it to make informed predictions classifier algorithms to attempt to forecast the
about winning. outcome of a game between two teams as well as
This predictor makes use of key inputs including identify the factors that are truly most crucial to
field goal , assists, rebounds, points per game, and determining the result of a game without taking into
many more statistics by incorporating machine account the individual player statistics.
learning techniques.These components are
important markers for comprehending the
dynamics of an NBA game and aid in the
development of outcome-estimating predictive
models.
2 BACKGROUND
By carefully examining past match data, our
model's forecast accuracy has been refined. The 1. Decision tree: A decision tree is a visual
algorithms have been adjusted to find patterns model for decision-making, presenting
and correlations between the aforementioned choices and their potential outcomes in a
variables and the final result. The model is tree-like structure. Nodes represent
constantly learning and adapting to new data and decisions or tests, branches depict possible
trends, making improvements to its accuracy and outcomes, and leaves indicate final results.
dependability with every iteration. Widely used in machine learning, decision
trees offer a clear and interpretable
approach to classification and regression
tasks across diverse domains.
1 INTRODUCTION
2. Random Forest: A random forest classifier is
an ensemble learning method that
P REDICTING the results of sporting events is a
logical use case for machine learning. A lot of
combines multiple decision trees for
improved accuracy and robustness. It
professional sports have readily available, generally
constructs a forest of trees, each trained on
random, and predictably large data sets. Because
a random subset of the data and features. By
2

aggregating predictions, it enhances 5. FEATURE SELECTION


predictive performance and mitigates
overfitting in diverse machine learning
The standard NBA box score includes 14 statistics
applications.
measuring each teams performance over the course
of a game. These statistics are:
3. OBJECTIVE - Field Goals Made (FGM)
- Field Goals Attempted (FGA)
The primary objective of a Basketball Game - 3 Point Field Goals Made (3PM)
Prediction model using Random Forest Classifier is - 3 Point Field Goals Attempted (3PA)
to accurately forecast the final winner. This - Free Throws Made Made (FTM)
predictive tool aims to leverage historical match - Free Throws Attempted (FTA)
data and crucial match-specific parameters to - Offensive Rebounds (OREB)
provide insights into the potential outcome. By - Defensive Rebounds (DREB)
achieving these objectives, the Basketball Game
- Assists (AST)
Predictor model endeavors to be a valuable asset in
the Basketball domain, offering actionable insights - Turnovers (TOV)
and aiding stakeholders in making informed - Steals (STL)
decisions during matches. The model can be useful - Blocks (BLK)
in betting, fantasy sports, and improving team - Personal Fouls (PF) - Points (PTS)
performance by identifying areas of improvement
Using the statistics contained in the box score, we
and developing targeted training strategies. constructed a dimensional feature vector for each
game, containing the difference in the competing
teams net: [win-lose record, points scored, points
allowed, field goals made and attempted, 3pt made
4. DATASET and attempted, free throws made and attempted,
offensive and defensive rebounds, turnovers,
assists, steals, block, and personal fouls].
We been using Kaggle NBA Game Data for this Initially, we trained and tested all of our learning
project. For the purposes of data collection on NBA models on the aforementioned feature vectors. We
matches, this dataset has been gathered. I used the quickly realized, however, that besides logistic
nba stats website to create this dataset.We used a regression, which performed well, all of the other
Games.csv database containing all games from 2004 models suffered from overfitting and poor test
to the latest update2022 with dates, teams and accuracies. In order to curb our overfitting we
other data such as numbers of points, etc. decided to instead construct our models using a
small subset of our original features consisting of the
Datasetlink: features that best captured a teams ability to win. In
https://www.kaggle.com/datasets/nathanlauga/nb choosing a specific set of features to utilize in our
a-games learning models, we ran three separate feature
selection algorithms in order to determine which
features are most indicative of a teams ability to
win. Two of the feature selection algorithms used
were forward and backward search, in which we
utilize 10-fold cross validation and add or remove
features one by one in order to determine which
features result in the highest prediction accuracies.
In addition, we ran a heuristic feature selection
algorithm to verify that the features selected tended
3

to be those that are most informative about


whether a team will win. The results of the three
methods are shown in the table below.

Forward Search Backward Search Heuristic


Points Scored Points Scored Points Scored
Points Allowed Field Goals Attempted Field Goals Attempted
Field Goals Attempted Defensive Rebounds Free Throws Made
Defensive Rebounds Assists Defensive Rebounds
Assists Turnovers Assists
Blocks Overall Record Overall Record
Overall Record Recent Record Recent Record

The features selected by backward search were Feature Extraction


almost the exact same features as those selected by
heuristic search. This indicated that the backward
This includes the following steps:
search features captured the aspects of a teams play
that best indicated whether that team would win
and thus that these features would likely yield good - Identify the features (independent
results. Our preliminary results showed that variables) that are likely to have predictive
backward search did in result in the best power.
crossvalidation accuracy. The features selected by - Remove irrelevant or redundant
backward search also agree with the experts view of features that do not contribute to the
the game, that prediction is most accurate when predictive task.
considering the offensive and scoring potential of a - Convert categorical variables into a
team compared to its opponent. Each of the format suitable for machine learning
selected statistics are related to scoring, even models.
turnovers and defensive rebounds as they - This may involve one-hot encoding,
essentially give the team possession of the ball. label encoding, or other methods
depending on the nature of the data.
- Splitting the dataset into training
6. LITERATURE REVIEW and test dataset.

A literature review of a Basketball Game


Predictor model utilizing Random Forest Classifier
would investigate existing research on forecasting
basketball game winner. It would explore the Decision tree Algorithm:
application of machine learning in sports analytics,
with a focus on basketball, emphasizing crucial The Decision Tree Splits the node into Subnodes
features such as points per game, rebounds, hence increasing the prunity of the nodes with
assists, field goal percentage, and many other respect to the target variable. A decision tree is
statistics for predictive modeling. The review would similar to a flowchart where each node
analyze and contrast various machine learning represents a test on an attribute (independent
algorithms, discussing their accuracy and
variable).A decision tree is basically a graphical
limitations in predicting basketball scores.
Furthermore, it would consider the feasibility of representation of every possible solution to a
real-time predictions during NBA matches, decision based problem based on a certain
addressing challenges and proposing potential condition.
avenues for future research in basketball score
prediction.
4

There are some important terms related to randomizing new trees during learning. Our data
forming a decision tree – has a plenty of features and a random forest can
help unravel complex unknown interactions
● Entropy: It is a measure of randomness or between predictor variables.
unpredictability in the dataset.
In terms of feature importance, the popular
● Information Gain: This is nothing but a method for random forests and decision trees in
decrease in the entropy of the dataset after general is based on mean decrease Gini. This is
splitting the dataset (on the basis of an based on the Gini impurity index which is
attribute) . computed by:

● Root Node:It represents the entire


𝐺𝐼 = ∑𝑝𝑖 (1 − 𝑝𝑖 )
population or sample and this further gets
divided into two or more homogeneous
sets. where 𝑛𝑐 is the number of classes present in the
output (in our case, two) and 𝑝𝑖 represents the
● Leaf Node: Node cant be further probability of class 𝑖 in the training set. The Gini
segregated. impurity index is a metric of misclassification
error.
● Pruning: opposite of splitting, basically
removing unwanted branches from the
tree.

Random Forest:

A random forest is made of decision trees. Each


decision tree can be thought of as a
representation of the training data that is split
into subpopulations based on a strong
differentiating variable. Because a each tree is
built on a different subset of the training
observations, random forest can easily handle
outliers and can prevent overfitting by
5

7. METHODOLOGY

This code methodology involves data extraction,


feature engineering to create relevant predictors,
model training using Random Forest Classifier,
evaluating model performance, and saving the best-
performing model for deployment or further
analysis. The code demonstrates a pipeline for
developing predictive models to forecast game
winner based on various game related features and
machine learning algorithms. Adjustments to
hyperparameters, feature selection, or data
processing methods can be made to further
enhance model performance.
▪ Feature importance refers to the
significance of different input factors in
determining game outcomes. Analyzing
feature importance helps identify key
elements like player performance, team
8. Results statistics, or game context that heavily
influence the model's ability to predict
▪ We use classification accuracy, which basketball winners, aiding in refining the
measures the percentage of correct predictive model.
predictions made by the model, to
evaluate its performance.
▪ We developed a machine learning model
using random forest algorithm to predict
whether a basketball team will win or
lose the game. Our model achieved
76.74% accuracy, proving its
effectiveness in predicting game
outcomes.
▪ Confusion matrix: A confusion matrix is a
performance evaluation tool in machine
learning that tabulates the true positive,
true negative, false positive, and false
negative outcomes of a classification
algorithm. It provides a clear snapshot of
model accuracy, precision, recall, and F1
score, aiding in the assessment of
predictive performance and error
analysis.
6

9. CONCLUSION

We found that a basketball teams win record plays


a central role in determining their likeliness of
winning future games. Winning teams win more
because they have the ingredients for success
already on the team. However, we were surprised
that removing the winning record significantly
changed classification accuracy. If we consider a
teams win record as representative of that teams
ability to win, then this implies that the score
statistics fail to completely represent a teams
success on the court. This result points to the need
for advanced statistics that go beyond the score in
order to potentially improve prediction accuracy for
close games and upsets. This need explains the
growing popularity on advanced statistic sport
conferences like the MIT Sloan conference.

10. REFERENCES

[1] M. Beckler, H. Wang. NBA Oracle


[2] A. Bocskocsky, J. Ezekowitz, C. Stein. The
Hot Hand: A New

You might also like