0% found this document useful (0 votes)
27 views

Cricket Players Performance Prediction and Evaluation Using Machine Learning Algorithms

Uploaded by

Ajay Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Cricket Players Performance Prediction and Evaluation Using Machine Learning Algorithms

Uploaded by

Ajay Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/371067353

Cricket Players Performance Prediction and Evaluation Using Machine


Learning Algorithms

Conference Paper · April 2023


DOI: 10.1109/ICNWC57852.2023.10127503

CITATION READS

1 321

3 authors:

M. Sumathi Prabu Selvam


SASTRA University VIT University
25 PUBLICATIONS 123 CITATIONS 22 PUBLICATIONS 59 CITATIONS

SEE PROFILE SEE PROFILE

Rajkamal Murugesan
Birla Institute of Technology and Science Pilani
11 PUBLICATIONS 35 CITATIONS

SEE PROFILE

All content following this page was uploaded by Prabu Selvam on 09 January 2024.

The user has requested enhancement of the downloaded file.


Cricket Players Performance Prediction and
Evaluation Using Machine Learning Algorithms
Rajkamal M3
Sumathi M1 Prabu S2
Assistant Professor, School of Research Assistant Application Developer,
Computing SASTRA Deemed to be University IBM, Bangalore, Karnataka.
SASTRA Deemed to be University Thanjavur, Tamil Nadu, India [email protected]
Thanjavur, Tamil Nadu, India, [email protected]
2023 International Conference on Networking and Communications (ICNWC) | 979-8-3503-3600-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICNWC57852.2023.10127503

[email protected]

Abstract—Machine learning (ML) techniques are used to is to analyze the player’s performance on the basis of their
complete the difficult tasks in a timely manner. Presently, ML specialization (batting, bowling and fielding). Based on their
models are used for decision making in a different sectors like performance, the rank list of the players is prepared, and the
healthcare, agriculture, weather forecasting analysis, best 11 members are selected from the list. The clustering
transportation, sports etc. Sports plays vital role in a human technique is a better choice to detect players in the sports
life and it involves crores of investment. Hence, player’s field. Clusters are typically formed using similarity values
performance analysis is an essential and required task in sports derived from the distance between data points and the
sectors. In a proposed system, the performance of cricket centroid value. When compared to other clustering
players will be analyzed and determine the performance of
techniques, K-means clustering is the preferred method for
specific athletic to form team and plan for training. By using
linear regression, K-means, and random forest models etc. the
locating similar high-performing players [4].
performance of cricket players are analyzed. Cricket players’ The objective of the proposed work is listed as follows:
performance can be predicted and regressed with a linear line
using linear regression. The K-means classification divides the • To collect a cricket dataset from data sources and apply
variables into ‘n’ clusters based on the same player preprocessing on the dataset to get the required factors.
characteristics. The clusters accuracy over test data is then
validated using random forest based classification. Based on • Apply the linear regression technique to predict the
this analysis, the best players on the list are selected for team performance of the individual players.
formation and increase the likelihood of winning matches. This • To cluster the predicted values by using K-means
work will aid in the preparation of the player rank for game- algorithm to find similar performance players.
related applications.
• To prepare the rank list and find the top players by using
Keywords—Performance Evaluation, Linear Regression, K- the random forest technique.
Means, Random Forests, Sports, Cricket Prediction.
The remaining portion of this article is structured as
follows: In section 2, previous research on game analysis and
I. INTRODUCTION
prediction using ML models are discussed. The necessary
In a sports sector player’s performance analysis is architecture and methods for the proposed ML models are
essential to selecting the best players for the team. addressed in section 3, and the experimental findings of the
Conventionally analyze the existing performance of the proposed system are discussed in section 4. Section 5 ends
player is done by manual process. The manual process with a conclusion and future development.
consumes more time and some cases leads to illegal
activities. To avoid these issues, the ML based performance
II. RELATED WORK
analysis is proposed in this work. The player’s existing
performance is analyzed by various ML algorithms and The existing research on player prediction in various
prepares the rank list of the players. The highest-ranked games using ML models is covered in this section, along
player is chosen for the team formation and training [1]. with its merits and limitations.
Compared to other games, cricket is a primary game and H. Wagner et al. evaluated players’ and teams’ handball
produces the highest economic growth in the world. Hence, game performances. Data gathered from sports statistics and
cricket team formation is a challenging and highly sensitive performance analysis of the individual. This work focuses
task. Thus, the modern technology is used to predict the more on the performance of individual players than team
performance of individual players in an accurate manner in performance [5]. M. Bilge et al. proposed the Olympic game
different aspects. The wrong prediction led to a high analysis and the European championships in men’s handball
economic loss [2]. tournaments. A 2D graph representing stamina and training
Nowadays, ML models are used for result prediction, hours is created using linear regression. However, test data
training analysis, team and individual performance analysis cannot be used with this technique [6]. L. T. Ronglan et al.
etc. Hence, ML models are preferable in sports analysis. In a discussed the female handball players based on
cricket game, the player’s performances like batting and neuromuscular fatigue and the recovery process. The
bowling are analyzed by simple statistical analysis tools like strength of the player is categorized using classification
average run rate and scores, number of wickets, average run procedures. This model groups the top players based on their
rate per over etc. In addition to these parameters, different performance using a filtering algorithm [7]. T. Frantisek et
factors need to be analyzed to predict the accurate results. al. examined the handball matches with competitive loading
The ML regression model helps to predict the run rate based of elite players [8]. Ren, Y. et al. talked about diagnosing
on weather and toss, pitch and lightening conditions [3]. The tuberculosis using AI and ML algorithms. This method
major objective of the ML models in game player prediction examined different ML algorithms. The best algorithms

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
should be prepared for a perfect model that is less effective home or away field, team average score, etc. This work
than other models [9]. examines team performance rather than individual players
[12]. Sushant et al. used a random forest model to analyse the
G. R. Sena et al. developed ML algorithms for the performance of the Indian Premier League. Instead of
classification and regression techniques to classify the individual players, team performance is evaluated. The
players according to their performance in previous matches. ranker method is used to prepare a team’s rank list based on
The performance of each player was examined in this study various processes such as data collection, cleaning, attribute
without consideration for team classification [10]. Passi et al. selection, mining, and result analysis. Deep learning models,
analyzed the players’ performances based on the batsman’s in addition to ML models, are used to predict player
run score and the bowler’s wickets. The players are divided performance [13]. Grossberg et al. proposed a prediction
into different ranges by the categorization algorithm, which technique based on an artificial neural network (ANN). The
also evaluates each match’s performance. The best players prediction accuracy rate in an ANN model is determined by
among the list of players are selected using the random forest neuron weight. As a result, the neuron weights are constantly
and decision tree models [11]. Parag Shah at el. addressed changed to find the most accurate prediction result [14].
the logistic regression technique-based prediction. Table 1 shows the recent works related to player
Considerations include the game plan, timing of the match,
performance prediction by using ML algorithms.

TABLE I. RECENT WORKS RELATED TO PLAYER PERFORMANCE PREDICTION


Author details Technique used Merits Limitations
Abraham Garcia- For performance prediction, The positions of the players are prepared The styles of the players are not taken
Aliaga et al [18] dimensionality reduction and the based on their technical and tactical into account when evaluating them.
RIPPER algorithms are used. behaviour.
Sait Can Yucebas For performance prediction, the The relative function of Gedeon was used to The performance of the players was
[19]. Rectifier Linear Unit was used. rank players. This technique improves evaluated solely on the basis of their pitch
average performance prediction. position, with no regard for the other
factors.
Luca Pappalardo et The PlayRank algorithm is used to The PlayRank algorithm performs well on
al [20]. find, evaluate and rank the players. various datasets and also aids in the efficient Difficult to implement.
analysis of individual performance.
Wei Gu et al [21] In addition to the nonparametric In comparison to other algorithms, SVM has
hypothesis test, various ML an accuracy rate of more than 90%. This model is applicable to a single
algorithms were used to analyse the season rather than multiple seasons.
dataset.
Kalpdrum Passi et To predict the accuracy, the random Random forest outperforms other ML Apply only to small datasets; do not
al [22] forest, naïve bayes, multiclass SVM, algorithms in terms of accuracy. apply to larger datasets.
and decision tree algorithms are
compared.
Mat Herold et al. Various ML algorithms examine the Examine team adaptability, possession, Applicable to simple datasets only, not
[23] time, amount of data, and location of communication and tempos. complex challenges.
the data.
Rory P. Bunker et For learning strategy, ML algorithms The predefined features are identified based Only the predicted outcome of the
al [24] are used, and ANN is used for on historical results yields better results than analysis is applicable to team formation.
prediction. manual predictions.
Based on the above analysis, the following points and • Data preparation: The dataset is imported from the
gaps in the existing work have been identified: internet and required features are extracted, such as the
number of matches played, player span, innings, total
• When compared to statistical analysis models, ML number of runs and not-outs, highest score, number of
models produce more accurate player predictions. balls faced, total number of 100’s, 50’s and ducks,
• The majority of the work concentrated on team average run rate, and strike rate.
performance analysis rather than individual player • Data preprocessing: In preprocessing, data is cleaned
performance analysis and normalized for model planning.
• Only one of the ML model is used for performance • Performance analysis and visualization: The player
analysis, none are used for multiclass based analysis. performance is analyzed by ML model for predicting in
an accurate way. The output is visualized by various
III. PROPOSED SYSTEM graphs like bar graphs, scatter plots, linear graphs etc.
For player performance analysis, the proposed system
uses linear regression (the performance of cricket players is A. ML algorithm – Linear Regression
predicted and regressed with a linear line), k-means Linear regression is a popular and simple ML model for
(classifies the variable into ‘n’ clusters based on the same accurate prediction. The independent variables are taken as
characteristics variable of the players), and random forest input features in linear regression, and the dependent variable
(verify the clusters’ accuracy over test data). The proposed are used as output variables. For linear regression analysis,
system architecture is shown in figure 1. The dataset is use equation 1. To provide an accurate prediction, this score
collected from Kaggle. To remove noisy and missing data, calculation is repeated. Future values are calculated and
the dataset is preprocessed before analysis. A multi-class predicted using this model. The working principle of linear
prediction model performs performance analysis and results regression is demonstrated in algorithm 1.
visualization.
y=a + b *c (1)

Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
Where, y- Linear function and a, b, x - Parameters

Figure 1. Proposed System Architecture

Equation 2 and 3 are used in linear regression to calculate


B. K-Means Clustering Algorithm
the constant values ‘a’ and ‘b’.
The K-means algorithm is a well-known technique for
b= (n*sumxy – sumx * sumy) / (n*sumx2 – sumx * sumx) (2) categorizing data into an ‘n’ number of groups based on
a= (sumy – b* sumx) / n (3) similar values. K-means clustering is used in a proposed
technique to group players based on their performance.
The calculated values are now used to solve the linear Players with the same score are grouped together in one
equation y=a + b* x. A similar calculation is used to analyse cluster. During the clustering process, the Euclidean distance
each player’s performance. Following linear regression, the is measured. The points that are close to the centroid are
grouped together. Initially, the distance value is used to form
predicted features are fed into the K-means clustering
the ‘n’ number of clusters, and a new cluster is created based
process. on the new centroid value. This process generates a loop
Algorithm 1: Linear Regression execution that continues until the clusters are finished or no
more clusters are formed. Algorithm 2 shown the formation
Input: Cricket player score dataset of k-means clustering in a proposed system.
Output: Find value a and b
Procedure: Algorithm 2: K-Means Clustering
1. Read number of data Input: Number of Clusters, Data points
2. For i=1 to n: Output: Form the clusters
Read Xi and Yi Procedure:
Next i 1. Choose number of clusters (K) and data points
Initialize sumx, sumx2, sumy, sumxy=0 2. Place the centroid C1, C2, … CK randomly.
3. Repeat steps 4 and 5 until convergence or end of
3. Calculate required sum
fixed number of iterations.
4. For i=1 to n: 4. For each data point Xi:
Sumx = sumx +Xi Find the nearest centroid C1, C2, … CK
Sumx2= sumx2 + Xi * Xi Assign the point to that cluster
Sumy= sumy + Yi 5. For each cluster j=1…k
Sumxy = sumxy + Xi * Yi New centroid=Mean(cluster)
Next i 6. Return generated cluster
______________________________________________
5. Calculate required constant a and b of y=a + b*c
b= (n*sumxy – sumx * sumy) / (n*sumx2 – sumx * The clustered values are now used to prepare the rank
sumx) and find the best solution using the random forest technique.
a= (sumy – b* sumx) / n
6. Display value a and b C. Random Forest Algorithm
________________________________________________ The random forest classification technique combines the
regression and classification. The majority vote is calculated
from the result of each tree in a random forest technique. The

Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
random forest algorithm easily corrects over-fitting and Preprocessing Data:
under-fitting problems. Algorithm 3 demonstrates the The irrelevant and missing value-based records are
random forest model’s operation in the proposed work. removed for further processing during preprocessing.
Following preprocessing, 304 records with 13 attributes are
Algorithm 3: Random Forest Algorithm
chosen for further processing. The preprocessed data is
Input: Number of Features depicted in Figure 3.
Output: Rank list
Procedure:
1. Choose ‘k’ features at random from a total of ‘m’
features, where k<m.
2. Calculate the node ‘d’ using the best split point among
the ‘k’ features.
3. Using the best split, divide the node into daughter
nodes.
4. Repeat steps 1 to 3 until the ‘1’ number of nodes is
reached.
5. Build a forest by repeating steps 1 to 4 an “n” number
of times to produce a “n” number of trees.
6. Determine the majority vote and prepare the rank list.
The best performing players are accurately predicting Figure 3. Preprocessed Data
using linear regression, K-Means clustering and the random
forest algorithm. Linear Regression Analysis
The preprocessed data is used for player performance
IV. EXPERIMENTAL RESULTS linear regression analysis. The scatter plot diagram for the
Open Google Colab in the experimental setup and linear regression analysis results is shown in Figure 4. The
import the relevant dataset. Use python code to perform linear plot clearly defines the proposed system’s data
some preprocessing steps and visualise the results in various visibility and removes non-correlated data for further
graphs. The dataset is based on a Kaggle dataset with 2500 processing. The values that are closest to the linear line are
records and 15 attributes considered for the following process, while the remaining
(https://www.kaggle.com/datasets/mahendran1/icc-cricket)
points are considered irrelavent.
such as player name, span, the number of matches, innings,
total runs, highest score, average score, and so on. The
dataset description is shown in figure 2.

Figure 4. Linear Regression Analysis


K-Means Analysis
Following linear regression, the result dataset is
subjected to k-means clustering. Figure 5 depicts the output
of the values prior to clustering.
Figure 2. Dataset Description
1. When import a dataset, it contains all of the strings and
cannot be processed further.
Solution: Using Python cmd, all data types are
converted to Int data types.
2. It is difficult to find the right clusters using K-Means
clustering.
Solution: It is simple to find using the trial and
error method.
Figure 5. Attribute values

Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
The values of each attribute are depicted in the figure making process. Individual player performance is analysed
above. Clusters are formed based on these values. Figure 6 in a proposed system to predict the top scorer to form the
depicts the proposed work’s cluster formation. The scatter best team. In this paper, different ML techniques are
plot shows the relationship between clusters and runs. It evaluated to predict the performance of a specific player.
demonstrates how the players are grouped based on their Linear regression, K-Means, and random forest models are
total runs. 13 clusters are formed based on each attribute. used to predict the performance of a male cricket player.
Cluster number one has the most players out of all of these The performance of cricket players are predicted and
clusters. This cluster also includes the fastest runner. regressed with linear lines using linear regression to select
Concentrate on cluster number one to find the best players. the relevant attribute for performance analysis. The K-
Means classification divides the variable into ‘n’ clusters
based on the same variable of the players. These models are
used to classify the best players on the list, which increases
the likelihood of winning matches. The best cluster is
identified among the various clusters by the players with the
highest run score. A total of 14 clusters are formed, with
cluster ‘1’ being identified as the best of the bunch.
Following that, the random forest-based classification is
used to validate the clusters’ accuracy on test data. The team
with the highest ranking would select a final 20 players.
This work will aid in the preparation of the player rank for
game-related applications. In the future, the same procedure
will be applied to various gaming datasets.

Figure 6. Scatterplot for data References


[1] H. Wagner, T. Finkenzeller, S. Wurth, and S. P. Von Duvillard,
Random Forest Analysis ‘‘Individual and team performance in team-handball: A review,’’ J.
Sports Sci. Med., vol. 13, no. 4, p. 808, 2014.
Following the determination of the number of clusters, the
[2] M. Bilge, ‘‘Game analysis of olympic, world and European
highest valued cluster (cluster one) is chosen for rank list championships in men’s handball,’’ J. Hum. Kinetics, vol. 35, no. 1,
preparation. Figure 7 depicts the rank calculation, while pp. 109–118, Dec. 2012.
Figure 8 depicts individual player performance. It’s the data [3] L. B. Michalsik, ‘‘Analysis of working demands of Danish handball
preview after clustering and adding label as a column. players,’’ in What’s Going Gym. Copenhagen, Denmark: Forlaget
Underskoven, 2004, pp. 321–330.
Cluster number “one” has the highest run scorer here.
[4] B. Bideau, R. Kulpa, N. Vignais, S. Brault, F. Multon, and C. Craig,
‘‘Using virtual reality to analyze sports performance,’’ IEEE Comput.
Graph. Appl., vol. 30, no. 2, pp. 14–21, Mar./Dec. 2010.
[5] L. T. Ronglan, T. Raastad, and A. Bàrgesen, ‘‘Neuromuscular fatigue
and recovery in elite female handball players,’’ Scandin. J. Med. Sci.
sports, vol. 16, no. 4, pp. 267–273, 2006.
[6] P. O. Donoghue, Research Methods for Sports Performance Analysis.
Evanston, IL, USA: Routledge, 2009.
[7] T. Frantisek, Competitive Loading in Top Team Handball. Vienna,
Austria: European Handball Federation WEB PERIODICAL, 2011.
[8] B. Sekeroglu, K. Dimililer, and K. Tuncal, ‘‘Artificial Intelligence in
Education: Application in student performance evaluation,’’ Dilemas
Contemporaneos, Educacion, Politica Valores, vol. 7, no. 1, pp. 1–21,
2019.
[9] Z. Ren, Y. Hu, and L. Xu, ‘‘Identifying tuberculous pleural effusion
Figure 7. Players rank Calculation using artificial intelligence machine learning algorithms,’’
Respiratory Res., vol. 20, no. 1, p. 220, Dec. 2019.
[10] G. R. Sena, T. P. F. Lima, M. J. G. Mello, L. C. S. Thuler, and J. T.
O. Lima, ‘‘Developing machine learning algorithms for the prediction
of early death in elderly cancer patients: Usability study,’’ JMIR
Cancer, vol. 5, no. 2, Sep. 2019, Art. no. e12163.
[11] S. S. Mirzoyan, H. Khachatryan, G. Yegorian, and V. G. Gurzadyan,
‘‘Machine learning and kolmogorov analysis to reveal gravitational
lenses,’’ Monthly Notices Roy. Astronomical Soc., Lett., vol. 489, no.
1, pp. L32–L36, Oct. 2019.
[12] L. Hendriks and C. Aerts, ‘‘Deep learning applied to the
asteroseismic modelling of stars with coherent oscillation modes,’’
Astronomical Soc. Pacific, vol. 131, no. 1004, Oct. 2019, Art. no.
108001.
Figure 8. Comparison of different player’s performance [13] Passi, K. and Pandey, N. (2018). Increased prediction accuracy in the
game of cricket using machine learning, International Journal of Data
Mining Knowledge Management Process 8: 19–36.
V. CONCLUSION
[14] Parag Shah, and Mitesh Shah. “Predicting ODI Cricket Result,”
Machine learning models are useful for analysing data Journal of Tourism, Hospitality and Sports, Vol 5, 2015.
in various ways and can be used to improve decision-

Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.
[15] S. Murdeshwar, “Data Mining on Cricket Data Set for predicting the [20] Luca Pappalardo, Paolo Cintia, Emanuele Massucco, Dino Pedreshi,
results”, report in December 2016. Fosca Giannotti, “PlayeRank: Data-driven performance Evaluation
[16] S.Grossberg, “Nonlinear neural networks: principles, mechanisms, and player ranking in soccer via a machine learning approach”, ACM
and architectures”, Neural network 1988. Trans. Intell. Syst. Technol. Vol. 10, No.5, 2019.
[17] M.Sumathi, S.Prabu, “Random forest based classification of user data [21] Wei Gu, Krista Foster, Jennifer Shang, Lirong Wei, “A game-
and access protection”, International Journal of Recent technology predicting expert system using big data and machine learning”, Expert
and Engineering, Vol.8, 2019. Systems with applications, 130, (2019), 293-305.
[18] Abraham Garcia-aliaga, Moises Marquina, Javier Coteron, Asier [22] Kalpdrum Passi and Niravkumar Pandey, “Increased prediction
Rodriguez-Gonzalez, and Sergio Luengo-Sanchez, “In-game accuracy in the game of cricket using machine learning”, International
behaviour analysis of football players using machine learning journal of data mining & knowledge management process (IJDKP),
techniques based on player statistics”, International Journal of Sports Vol.8, No.2, 2018, 1-18.
Science & Coaching, 2021, Vol 16(1), 148-157. [23] Mat Herold, Floris Goes, Stephan Nopp, Pascal Bauer, Chris
[19] Sait Can Yucebas, “A deep learning analysis for the effect of Thomspson, Tim Meyer, “Machine learning in men’s professional
individual player performances on match results”, Neural Computing football: current applications and future directions for improving
and Applications, 2022, 34: 12967-12984. attacking play”, International journal of sports science & coaching,
2019, PP 1-20.

Authorized licensed use limited to: SASTRA. Downloaded on May 29,2023 at 03:34:23 UTC from IEEE Xplore. Restrictions apply.

View publication stats

You might also like