Cricket Players Performance Prediction and Evaluation Using Machine Learning Algorithms
Cricket Players Performance Prediction and Evaluation Using Machine Learning Algorithms
net/publication/371067353
CITATION READS
1 321
3 authors:
Rajkamal Murugesan
Birla Institute of Technology and Science Pilani
11 PUBLICATIONS 35 CITATIONS
SEE PROFILE
All content following this page was uploaded by Prabu Selvam on 09 January 2024.
Abstract—Machine learning (ML) techniques are used to is to analyze the player’s performance on the basis of their
complete the difficult tasks in a timely manner. Presently, ML specialization (batting, bowling and fielding). Based on their
models are used for decision making in a different sectors like performance, the rank list of the players is prepared, and the
healthcare, agriculture, weather forecasting analysis, best 11 members are selected from the list. The clustering
transportation, sports etc. Sports plays vital role in a human technique is a better choice to detect players in the sports
life and it involves crores of investment. Hence, player’s field. Clusters are typically formed using similarity values
performance analysis is an essential and required task in sports derived from the distance between data points and the
sectors. In a proposed system, the performance of cricket centroid value. When compared to other clustering
players will be analyzed and determine the performance of
techniques, K-means clustering is the preferred method for
specific athletic to form team and plan for training. By using
linear regression, K-means, and random forest models etc. the
locating similar high-performing players [4].
performance of cricket players are analyzed. Cricket players’ The objective of the proposed work is listed as follows:
performance can be predicted and regressed with a linear line
using linear regression. The K-means classification divides the • To collect a cricket dataset from data sources and apply
variables into ‘n’ clusters based on the same player preprocessing on the dataset to get the required factors.
characteristics. The clusters accuracy over test data is then
validated using random forest based classification. Based on • Apply the linear regression technique to predict the
this analysis, the best players on the list are selected for team performance of the individual players.
formation and increase the likelihood of winning matches. This • To cluster the predicted values by using K-means
work will aid in the preparation of the player rank for game- algorithm to find similar performance players.
related applications.
• To prepare the rank list and find the top players by using
Keywords—Performance Evaluation, Linear Regression, K- the random forest technique.
Means, Random Forests, Sports, Cricket Prediction.
The remaining portion of this article is structured as
follows: In section 2, previous research on game analysis and
I. INTRODUCTION
prediction using ML models are discussed. The necessary
In a sports sector player’s performance analysis is architecture and methods for the proposed ML models are
essential to selecting the best players for the team. addressed in section 3, and the experimental findings of the
Conventionally analyze the existing performance of the proposed system are discussed in section 4. Section 5 ends
player is done by manual process. The manual process with a conclusion and future development.
consumes more time and some cases leads to illegal
activities. To avoid these issues, the ML based performance
II. RELATED WORK
analysis is proposed in this work. The player’s existing
performance is analyzed by various ML algorithms and The existing research on player prediction in various
prepares the rank list of the players. The highest-ranked games using ML models is covered in this section, along
player is chosen for the team formation and training [1]. with its merits and limitations.
Compared to other games, cricket is a primary game and H. Wagner et al. evaluated players’ and teams’ handball
produces the highest economic growth in the world. Hence, game performances. Data gathered from sports statistics and
cricket team formation is a challenging and highly sensitive performance analysis of the individual. This work focuses
task. Thus, the modern technology is used to predict the more on the performance of individual players than team
performance of individual players in an accurate manner in performance [5]. M. Bilge et al. proposed the Olympic game
different aspects. The wrong prediction led to a high analysis and the European championships in men’s handball
economic loss [2]. tournaments. A 2D graph representing stamina and training
Nowadays, ML models are used for result prediction, hours is created using linear regression. However, test data
training analysis, team and individual performance analysis cannot be used with this technique [6]. L. T. Ronglan et al.
etc. Hence, ML models are preferable in sports analysis. In a discussed the female handball players based on
cricket game, the player’s performances like batting and neuromuscular fatigue and the recovery process. The
bowling are analyzed by simple statistical analysis tools like strength of the player is categorized using classification
average run rate and scores, number of wickets, average run procedures. This model groups the top players based on their
rate per over etc. In addition to these parameters, different performance using a filtering algorithm [7]. T. Frantisek et
factors need to be analyzed to predict the accurate results. al. examined the handball matches with competitive loading
The ML regression model helps to predict the run rate based of elite players [8]. Ren, Y. et al. talked about diagnosing
on weather and toss, pitch and lightening conditions [3]. The tuberculosis using AI and ML algorithms. This method
major objective of the ML models in game player prediction examined different ML algorithms. The best algorithms
Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
should be prepared for a perfect model that is less effective home or away field, team average score, etc. This work
than other models [9]. examines team performance rather than individual players
[12]. Sushant et al. used a random forest model to analyse the
G. R. Sena et al. developed ML algorithms for the performance of the Indian Premier League. Instead of
classification and regression techniques to classify the individual players, team performance is evaluated. The
players according to their performance in previous matches. ranker method is used to prepare a team’s rank list based on
The performance of each player was examined in this study various processes such as data collection, cleaning, attribute
without consideration for team classification [10]. Passi et al. selection, mining, and result analysis. Deep learning models,
analyzed the players’ performances based on the batsman’s in addition to ML models, are used to predict player
run score and the bowler’s wickets. The players are divided performance [13]. Grossberg et al. proposed a prediction
into different ranges by the categorization algorithm, which technique based on an artificial neural network (ANN). The
also evaluates each match’s performance. The best players prediction accuracy rate in an ANN model is determined by
among the list of players are selected using the random forest neuron weight. As a result, the neuron weights are constantly
and decision tree models [11]. Parag Shah at el. addressed changed to find the most accurate prediction result [14].
the logistic regression technique-based prediction. Table 1 shows the recent works related to player
Considerations include the game plan, timing of the match,
performance prediction by using ML algorithms.
Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
Where, y- Linear function and a, b, x - Parameters
Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
random forest algorithm easily corrects over-fitting and Preprocessing Data:
under-fitting problems. Algorithm 3 demonstrates the The irrelevant and missing value-based records are
random forest model’s operation in the proposed work. removed for further processing during preprocessing.
Following preprocessing, 304 records with 13 attributes are
Algorithm 3: Random Forest Algorithm
chosen for further processing. The preprocessed data is
Input: Number of Features depicted in Figure 3.
Output: Rank list
Procedure:
1. Choose ‘k’ features at random from a total of ‘m’
features, where k<m.
2. Calculate the node ‘d’ using the best split point among
the ‘k’ features.
3. Using the best split, divide the node into daughter
nodes.
4. Repeat steps 1 to 3 until the ‘1’ number of nodes is
reached.
5. Build a forest by repeating steps 1 to 4 an “n” number
of times to produce a “n” number of trees.
6. Determine the majority vote and prepare the rank list.
The best performing players are accurately predicting Figure 3. Preprocessed Data
using linear regression, K-Means clustering and the random
forest algorithm. Linear Regression Analysis
The preprocessed data is used for player performance
IV. EXPERIMENTAL RESULTS linear regression analysis. The scatter plot diagram for the
Open Google Colab in the experimental setup and linear regression analysis results is shown in Figure 4. The
import the relevant dataset. Use python code to perform linear plot clearly defines the proposed system’s data
some preprocessing steps and visualise the results in various visibility and removes non-correlated data for further
graphs. The dataset is based on a Kaggle dataset with 2500 processing. The values that are closest to the linear line are
records and 15 attributes considered for the following process, while the remaining
(https://www.kaggle.com/datasets/mahendran1/icc-cricket)
points are considered irrelavent.
such as player name, span, the number of matches, innings,
total runs, highest score, average score, and so on. The
dataset description is shown in figure 2.
Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
The values of each attribute are depicted in the figure making process. Individual player performance is analysed
above. Clusters are formed based on these values. Figure 6 in a proposed system to predict the top scorer to form the
depicts the proposed work’s cluster formation. The scatter best team. In this paper, different ML techniques are
plot shows the relationship between clusters and runs. It evaluated to predict the performance of a specific player.
demonstrates how the players are grouped based on their Linear regression, K-Means, and random forest models are
total runs. 13 clusters are formed based on each attribute. used to predict the performance of a male cricket player.
Cluster number one has the most players out of all of these The performance of cricket players are predicted and
clusters. This cluster also includes the fastest runner. regressed with linear lines using linear regression to select
Concentrate on cluster number one to find the best players. the relevant attribute for performance analysis. The K-
Means classification divides the variable into ‘n’ clusters
based on the same variable of the players. These models are
used to classify the best players on the list, which increases
the likelihood of winning matches. The best cluster is
identified among the various clusters by the players with the
highest run score. A total of 14 clusters are formed, with
cluster ‘1’ being identified as the best of the bunch.
Following that, the random forest-based classification is
used to validate the clusters’ accuracy on test data. The team
with the highest ranking would select a final 20 players.
This work will aid in the preparation of the player rank for
game-related applications. In the future, the same procedure
will be applied to various gaming datasets.
Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
[15] S. Murdeshwar, “Data Mining on Cricket Data Set for predicting the [20] Luca Pappalardo, Paolo Cintia, Emanuele Massucco, Dino Pedreshi,
results”, report in December 2016. Fosca Giannotti, “PlayeRank: Data-driven performance Evaluation
[16] S.Grossberg, “Nonlinear neural networks: principles, mechanisms, and player ranking in soccer via a machine learning approach”, ACM
and architectures”, Neural network 1988. Trans. Intell. Syst. Technol. Vol. 10, No.5, 2019.
[17] M.Sumathi, S.Prabu, “Random forest based classification of user data [21] Wei Gu, Krista Foster, Jennifer Shang, Lirong Wei, “A game-
and access protection”, International Journal of Recent technology predicting expert system using big data and machine learning”, Expert
and Engineering, Vol.8, 2019. Systems with applications, 130, (2019), 293-305.
[18] Abraham Garcia-aliaga, Moises Marquina, Javier Coteron, Asier [22] Kalpdrum Passi and Niravkumar Pandey, “Increased prediction
Rodriguez-Gonzalez, and Sergio Luengo-Sanchez, “In-game accuracy in the game of cricket using machine learning”, International
behaviour analysis of football players using machine learning journal of data mining & knowledge management process (IJDKP),
techniques based on player statistics”, International Journal of Sports Vol.8, No.2, 2018, 1-18.
Science & Coaching, 2021, Vol 16(1), 148-157. [23] Mat Herold, Floris Goes, Stephan Nopp, Pascal Bauer, Chris
[19] Sait Can Yucebas, “A deep learning analysis for the effect of Thomspson, Tim Meyer, “Machine learning in men’s professional
individual player performances on match results”, Neural Computing football: current applications and future directions for improving
and Applications, 2022, 34: 12967-12984. attacking play”, International journal of sports science & coaching,
2019, PP 1-20.
Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.