Machine Learning With Applications
Machine Learning With Applications
1. Introduction data may not satisfy the required assumptions. Furthermore, events
related to cricket are not independent events and are influenced by var-
Cricket is a bat and ball game played in 106 member states of the ious human factors; therefore, the selection of data analysis techniques
International Cricket Council (ICC), and it has become a multi-billion has paramount importance (Horvat & Job, 2019; Karlis & Ntzoufras,
dollar business (Kampakis & Thomas, 2015). There are three major 2003). In addition, most traditional metrics fail to address modern-
formats: Test Cricket, One Day International Cricket (ODI), and the day game hypotheses due to a massive volume of related variables.
Twenty–Twenty Cricket (T20). Cricket has rapidly become a popular The emergence of Machine learning (ML) has influenced sports data
team sport mainly due to the dynamic nature of both ODI and T20 analytics tremendously amid the limitations shown by conventional
formats (Bandulasiri, Brown, & Wickramasinghe, 2016). Franchise T20 statistical data analysis techniques.
leagues have emerged in various world regions with this increased A branch of artificial intelligence called Machine Learning (ML) is
popularity. a collection of computer algorithms providing systems the ability to
Cricket data analysis has become an integral part of any successful
automatically learn and progress with the experience. The main aim
cricket team. The findings of cricket analytics provide a better insight
of ML techniques is to provide a higher level of automation in the
into the players and the game, which is very helpful to people involved
process of knowledge engineering by replacing time-consuming human
in the game, such as current players, technical staff, and managers, and
activities. The abundance of computational power and data has given
to educate the players of next generation (Morgulev, Azar, & Lidor,
enormous popularity to ML techniques used in Sports data analytics.
2018; Sarlis & Tjortjis, 2020). Due to the quick evolvement of Cricket,
administrators frequently search for innovative ideas to enhance the The advancement of the ML process and its applications have shown
performance of cricketers to a competitive edge. Intense and stressful tremendous maturity during the past two decades. Furthermore, sports
situations characterize the sports environment at a competitive level. data analytics have reached a higher level due to the wealth of sport-
The duty of the performers in the sport is to handle the pressure and related data and the development of machine learning techniques. ML
psychological responses to fulfill their potential (Devonport, 2015). The techniques, as opposed to conventional computer systems by allowing
player-performance management is connected with optimizing player these systems to learn from data without imposing rules or being
performance while minimizing the risk of injury risks involved in explicitly programmed (Grues, 2015).
various sports (Naglah et al., 2018; Xu & Tang, 2021). In this process, The introduction of electronic devices to gather sports data has
cricket data analysis has an important role to play. shown a tremendous boost in sports data analytics, with the influence
Sports data analytics could not rely solely on traditional statistical of ML. The power of ML techniques has proliferated due to their
procedures due to numerous limitations. Usually, the conventional sta- impact on the devices used to collect data (Jamil, Iqbal, Ahmad, &
tistical techniques are based on assumptions about the data, and cricket Kim, 2020), extract information from the devices (Morris, Mundt,
The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility
Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.
E-mail address: [email protected].
https://doi.org/10.1016/j.mlwa.2022.100435
Received 30 April 2022; Received in revised form 10 November 2022; Accepted 11 November 2022
Available online xxxx
2666-8270/© 2022 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
I. Wickramasinghe Machine Learning with Applications 10 (2022) 100435
Goldacre, & Jacqueline, 2020; Weir, Alderson, Smailes, Elliott, & Don- 2022. The electronic search was conducted using the keywords ‘‘Appli-
nelly, 2019), and the processing the information gathered through the cations of Machine Learning in Cricket’’. All published research work
devices to enhance the understanding of the ultimate users (Ahmadi satisfying the above searching information was preliminarily identified.
et al., 2014; Rommers et al., 2020). Most of these electronic devices Secondly, we selected the most relevant publications based on the
come as wearable devices. The amalgamation of these wearables and following criteria.
ML algorithms has opened the gate for developing AI expert systems
used in sports (Acikmese, Ustundag, & Golubovic, 2017). These ML- • Studies directly related to the game of cricket, including but not
limited to the player performance, the game’s outcome, team
based applications quickly grasped sports data analytics to construct
performance, prediction of the score, pitch related, cricket com-
models and predict future outcomes with existing sports data to deliver
mentary, and videoing the cricket games were considered.
accurate predictions to make better decisions in sports. Applications
• Only the human game of cricket is considered by discarding the
of ML quickly grasped sports data analytics to construct models and
robot or electronic versions of the game.
predict future outcomes with existing sports data to deliver accurate
• Studies utilizing at least one ML technique were selected.
predictions to make better decisions in sports. The development of these
• Studies published in a peer-reviewed journal from 2001 to 2021
ML systems requires domain-specific knowledge. In order to optimize
were considered.
ML algorithms, it uses a process called feature engineering. The purpose
• Both conference and journal versions of publications were in-
of this is to extract features from the raw data with the help of
cluded.
domain knowledge to optimize the ML algorithms. Though sports data
are abundant, applying ML for big data is thought-provoking as it Finally, duplicates were removed for the final selection. After re-
demands expert knowledge of the domain, used learning algorithms, moving the duplicates from each database, only 59 publications re-
and software engineering (Koseler & Stephan, 2017). mained for the final selection.
Systematic reviews help collect empirical evidence about the re-
search’s continuous growth using ML in the cricket domain. Further- 3. Results
more, according to the author’s knowledge, there is no such systematic
review of two decades of research findings combining cricket and ML. This section aims to present and discuss the findings of this system-
Therefore, this work intends to fill the above vacuum in the literature, atic review. First, a discussion is conducted to identify the research
and the findings of this study will be beneficial for the players, coaches, areas in cricket where the applications of ML can be seen. Then,
and sports administrators. Finally, this effort will help researchers to some descriptive statistics are given about the reviewed publications.
have a concise overview of the existing research areas in cricket and Secondly, in the research question section, all the selected articles
identify research gaps for their future research work. are reviewed to identify which studies have addressed the research
questions related to which type of feature extractions, ML technique,
and accuracy estimating technique were used. The following subsec-
2. Methodology
tion reviews the data sets and the number of attributes used in each
study. Then, findings of the used data reduction and feature extraction
This manuscript systematically studied how machine learning tech- techniques are presented. After that, a review of the frequently used ML
niques have been applied in cricket in contemporary literature. As a ref- techniques is presented. The final section aims to quantify the accuracy
erence, standardized guidelines proposed by Preferred Reporting Items of the used ML techniques.
for Systematic Reviews and Meta-Analysis (PRISMA; Moher, Liberati,
Tetzlaff, Altman, & Prisma Group*, 2009) for systematic reviews were 3.1. Study areas in cricket
followed. The entire review protocol comprises three phases: identifi-
cation, screening, and final selection. Fig. 1 outlines these three phases According to findings, studies involving cricket and ML techniques
of the review protocol. have increased exponentially since 2001. Among these publications,
First we identified all published work using four major databases, the focus on the ODI game has increased significantly compared to the
namely, Google Scholar, Science Direct, Scopus, Web of Science. This other formats. Fig. 4 displays the number of studies conducted using
search using broader research domains was conducted in February ML techniques from 2001 to 2021 related to each format of cricket.
2
I. Wickramasinghe Machine Learning with Applications 10 (2022) 100435
Table 1 measurement units (IMUs) have been used to identify and assess
Research areas in cricket.
shot making skills of the batsmen (Dias, Mitchell, & Harland,
Research area % of studies
2020). Furthermore, ML and the wearable devices can used to
Game outcome prediction 35 study the bowler’s characteristics such as bowling action (Salman,
Player’s performance classification 16
Qaisar, & Qamar, 2017), bowling volume, ball releasing speed,
Batting style/Stroke classification 9
Bowling action/performance classification 9 and identifying the intensity zones (McGrath, Neville, Stewart,
Other 9 Clinning, & Cronin, 2021; McGrath, Neville, Stewart, Clinning,
Umpire’s decision/gestures 6 Thomas, & Cronin, 2021; McGrath, Neville, Stewart, & Cronin,
Score prediction 5
2019; Ranaweera & Silva, 2019). Moreover, accurate evaluation
Cricket commentary/media 4
Pitch behavior prediction 4 of the blower’s workload can be monitored and predicted with
Team selection/performance 3 the use of ML, and it will, in turn, minimize the possible future
player injuries (Jowitt, Durussel, Brandon, & King, 2020).
• Other areas: In addition to the above areas, ML has been used
in media coverage of cricket, umpire-related activity prediction,
Further investigations show that out of all areas in cricket, about
pitch behavior predication, and score prediction.
35% of the studies have focused on predicting the game’s outcome.
About 16% of the studies have been conducted studies in player per- Most international cricketers are actively engaging with their fans
formance classification. More information about the researched areas through social media platforms. This review reveals the use nu-
is listed in Table 1. The following sections summarize the contribution merous media platforms including social media together with ML
of reviewed articles corresponding to each section. in predicting cricket related information (Dubey, Suri, & Gupta,
2021; Mustafa, Nawaz, Lali, Zia, & Mehmood, 2017; Wickramas-
• Game outcome prediction: With the increased popularity and the inghe & Yapa, 2018; Zakzouk & Mathkour, 2012). Video analytics
commercialization of the game, outcome prediction of a cricket in cricket has evolved into an exciting area with the development
game has become of the utmost importance. In this direction, re- of ML technology. Applications of ML techniques in cricket video
searchers use various performance indicators representing various studies can be seen in Gupta and Muthiah (2020), Sen, Deb,
aspects of the game and a wide range of ML techniques to pre- Dhar, and Koshiba (2021), and Kumar, Kumar, and Kumar (2018).
dict the game’s outcome. Batsman, bowler, and fielder attributes With the enhancement of the game’s competitiveness, at times,
(Hasanika, Dilhara, Liyanage, Bandaranayake, & Deegalla, 2021; umpires are under some pressure as their decision can hugely
Karthik et al., 2021; Modani, Kilaru, Kaur, Sinha, & Khetan, 2020) impact the outcome of the game. With technology coupled with
are the most commonly used performance measures to use in ML techniques, systems can be developed to mitigate human
the prediction. In addition, other attributes such as home game errors from umpires. Such efforts can be seen in Aftab, Hussain,
advantage (Kaluarachchi & Aparna, 2010; Kumar, Santhadevi, &
Waleed, Ashfaq, and Umair (2019), Iyer, Bala, Sohan, Dharmesh,
Barnabas, 2019), the outcome of the toss (Pathak & Wadhwa,
and Raman (2020), Mustafa et al. (2017), and Samaraweera,
2016), and the behavior of the pitch (Tekade, Markad, Amage,
Premaratne, and Dharmaratne (2020).
& Natekar, 2020) have been utilized in the prediction. More
According to the reviewed articles, score prediction (Kamble,
work in outcome prediction can be seen in Basit et al. (2020),
2021; Singh, Singla, & Bhatia, 2015; Srinivas, Bhat, & Revanasid-
Hatharasinghe and Poravi (2019), Shakil, Abdullah, Momen, and
dappa, 2021) is another area of study using the ML techniques.
Mohammed (2020), Vistro, Rasheed, and David (2019).
Furthermore, a significant number of studies have been conducted
• Player’s performance classification: Though batting, bowling, and
to various aspects of the game such as Duckworth–Lewis (Abbas
fielding are considered the three main departments of the game,
& Haider, 2019), injuries (Dias et al., 2020; Gupta & Muthiah,
batting and bowling are the most popular statistics used to quan-
2020), cricket data analytic (Jayalath, 2018; Kamble, 2021; Ka-
tify players’ performance (Wickramasinghe, 2014b). Though bat-
ting, bowling, and fielding are considered the three central de- padia, Abdel-Jaber, Thabtah, & Hadi, 2020; Parameswaran, 2013;
partments of the game, batting and bowling are the most pop- Rahman, Shamim, & Ismail, 2018; Raja, Manasa, Reddy, & Sun-
ular statistics used to quantify players’ performance (Wickra- dari, 2021; Shahjalal, Ahmad, Rayan, & Alam, 2017; Srinivas
masinghe, 2014b). Individual players’ skills have a good im- et al., 2021), and predicting the behavior of the pitch (Kanhaiya,
pact on the game’s outcome. Therefore, assessing individual skill Gupta, & Sharma, 2019).
levels is imperative in many ways to the game; for this, ML
techniques can be used together with player-level parameters 3.2. Descriptive statistics about the studies
(Wickramasinghe, 2020a). Numerous such instances can be seen
as Classifying players into various skill levels (Aburas, Mehtab, This section summarizes the descriptive statistics of the reviewed
& Mehtab, 2018; Manage, Kafle, & Wijekularathna, 2020; Wick- articles. According to the findings, 56% of the studies were pub-
ramasinghe, 2020b), evaluating or ranking them (Ahmad et al., lished in peer-reviewed journals, and the remainder were published
2021; Premkumar, Chakrabarty, & Chowdhury, 2020), identifying in conferences. Fig. 2 illustrates further information about the type
or predicting the player performance level (Anik, Yeaser, Hossain, of publication and the published year. Since 2010, there has been a
& Chakrabarty, 2018; Mody, Malathi, & Jayaseeli, 2021; Rupai, rapid increment in the number of studies in cricket using ML. Only
Mukta, & Islam, 2020), and predicting the best batsmen/bowler 10% of the publication can be seen from 2010-to 2016. After that, for
(Rani et al., 2020) or the rising stars (Ahmad et al., 2017). With
each year from 2017 to 2021, there are 8%, 12%, 14%, 30%, and 25%
the help of identifying the best performers, a strong team can
publications were recorded. More information can be seen in Fig. 3.
be formed as the performance of the individual players have a
direct impact on the rank of the team (Wickramasinghe, 2014a).
Evidence of the use of ML techniques to select the best team can 3.3. Research questions
be seen in Ishi and Patil (2020), Mahbub, Miah, Islam, Sorna,
Hossain, and Biswas (2021). This section focuses on three main research questions as listed
• Batting style and bowling action classification: Cricket is considered below. We identify whether each reviewed study has focused on each
a batsmen’s game requiring the ability to make shot selection of the following identified research questions, and a summary is illus-
and execution. ML techniques coupled with body-worn internal trated in Table 2.
3
I. Wickramasinghe Machine Learning with Applications 10 (2022) 100435
Table 2
Study and the addressed research questions.
S/N Reference RQ1 RQ2 RQ3 S/N Reference RQ1 RQ2 RQ3
1. Abbas and Haider (2019) N Y N 31. McGrath, Neville, Stewart, Clinning, Thomas et al. Y N Y
(2021)
2. Aburas et al. (2018) N N N 32. Modani et al. (2020) Y Y Y
3. Aftab et al. (2019) Y Y Y 33. Mody et al. (2021) Y Y Y
4. Ahmad et al. (2017) Y Y Y 34. Mustafa et al. (2017) N Y Y
5. Ahmad et al. (2021) Y Y Y 35. Nandyal and Kattimani (2021) Y N Y
6. Anik et al. (2018) Y Y Y 36. Panda, Sathya, Mishra, and Satpathy (2019) N Y N
7. Awan et al. (2021) N N Y 37. Parameswaran (2013) N N Y
8. Basit et al. (2020) N Y Y 38. Pathak and Wadhwa (2016) N Y Y
9. Deval, Hamid, and Goel (2021) N Y N 39. Premkumar et al. (2020) N N N
10. Dias et al. (2020) Y Y Y 40. Rahman et al. (2018) N Y N
11. Dubey et al. (2021) N Y N 41. Raja et al. (2021) N Y N
12. Goggins et al. (2021) N Y Y 42. Ranaweera and Silva (2019) Y N Y
13. Gupta and Muthiah (2020) Y Y N 43. Rani et al. (2020) N Y N
14. Hasanika et al. (2021) Y Y N 44. Rupai et al. (2020) Y Y Y
15. Hatharasinghe and Poravi (2019) Y Y N 45. Salman et al. (2017) Y Y Y
16. Ishi and Patil (2020) N Y N 46. Samaraweera et al. (2020) Y Y Y
17. Iyer et al. (2020) N Y Y 47. Sen et al. (2021) Y Y Y
18. Jowitt et al. (2020) Y N N 48. Shakil et al. (2020) Y Y Y
19. Jayalath (2018) N Y N 49 Shahjalal et al. (2017) N N Y
20. Kamble (2021) N Y N 50. Singh et al. (2015) N Y N
21. Kaluarachchi and Aparna (2010) N Y Y 51. Somaskandhan, Wijesinghe, Wijegunawardana, Y Y N
Bandaranayake, and Deegalla (2017)
22. Kanhaiya et al. (2019) N N N 52. Srinivas et al. (2021) Y Y N
23. Kapadia et al. (2020) Y Y Y 53. Tekade et al. (2020) Y Y N
24. Karthik et al. (2021) N Y Y 54. Tyagi, Kumari, Makkena, Mishra, and Pendyala Y Y Y
(2020)
25. Kumar et al. (2018) N Y Y 55. Vistro et al. (2019) Y Y N
26. Kumar et al. (2019) Y N N 56. Wickramasinghe and Yapa (2018) Y Y Y
27. Mahbub et al. (2021) N Y N 57. Wickramasinghe (2020a) N N Y
28. Manage et al. (2020) N Y Y 58. Wickramasinghe (2020b) N Y N
29. McGrath et al. (2019) N Y Y 59. Zakzouk and Mathkour (2012) Y Y Y
30. McGrath, Neville, Stewart, Clinning et al. (2021) Y Y Y
Table 3
Descriptive statistics of the used data set.
Format Mean SD Min Q1 Q2 Q3 Max
First Class 861.8 774.3 44 281.0 909.5 1490.2 1584
ODI 27239.6 93191.0 128 385.5 1301.5 5603.8 350899
T20 537502.0 1400273.0 140 994.0 16720.0 123514.0 4000000
Test 1666.0 1856.2 354 1010.0 1666.0 2323.0 2979
The quality and the accuracy of the study directly depend on using
a representative sample. In addition, bias in the inference can be
minimized using a larger sample. Furthermore, selecting a good rep-
resentation of features also directly impacts the quality of the outcome
Fig. 2. Publication type (CP-Conference Paper; JP-Journal Paper) vs year. of the ML-based data analysis. Therefore, selecting a good data set is
foremost essential. This section summarizes the sizes of data sets used
in the selected studies. According to the reviewed studies, the average
• Research Question #1, RQ1: Which standard feature extraction number of instances of data used in research in the T20 format is
537,502, which is the highest compared to the other formats of cricket.
techniques have been used and discussed? RQ1 aims to identify
In ODI-related studies, approximately an average of 27,240 instances of
the type of feature extraction technique techniques that have been
records have been used. For the Test and First-Class cricket, these are
used in cricket data analysis.
approximately 1666 and 862, respectively (see Table 3).
• Research Question #2, RQ2: Which ML techniques have been
Feature selection is an integral part of when using an ML technique
used? RQ2 aims to identify ML techniques that have been used as it directly impacts the accuracy of the ML technique. Due to the
in cricket-related research activities. popularity of the game and the advancement in technology, cricket data
• Research Question #3, RQ3: How to estimate the accuracy of the has become easy to access. Further, the feature set of the cricket game
used ML model(s)? RQ3 deals with identifying different ways to expanded accordingly. According to the findings, studies conducted
quantify the overall accuracy of the used ML model(s). using First-Class games have used 144 features on average, which
4
I. Wickramasinghe Machine Learning with Applications 10 (2022) 100435
is the highest compared to the other game formats. Secondly, the • Principal Component Analysis
studies conducted using Test cricket have used 115 features on average, • Recursive Feature Elimination
while for ODI and T-20, the average number of features is 40 and 12, • Regression
respectively. Fig. 2 illustrates the distribution of the number of features
Among the above list, Correlation based feature selection tech-
used in each of the formats of the game.
niques, Recursive Elimination, and the Chi-Square techniques were the
most frequently used.
3.5. Used data reduction and feature extraction techniques
3.6. Frequently used ML techniques
Feature selection is a critical step in ML. The focus of the feature
selection approach is to choose a subset of variables from the input This section studies the frequently used ML techniques in cricket.
data to describe the input data efficiently while reducing effects from According to the reviewed studies, the following ML techniques have
noise or irrelevant variables and still providing good prediction results been used as the popular ML techniques to analyze cricket data.
(Guyon & Elisseeff, 2003). Based on the reviewed studies, the follow-
ing techniques were able to identify as the primary feature selection • Regression
technique. • Naïve Bayes (NB)
• K-Means
• Chi-Square • Random Forest (RF)
• Correlation Based Feature Selection • Decision Trees (DT)
• Graphics Based Feature Selection • 𝑘th Nearest Neighbor (kNN)
5
I. Wickramasinghe Machine Learning with Applications 10 (2022) 100435
Table 4 Table 5
Used ML techniques. Used accuracy testing techniques.
ML technique Applications % of studies ML accuracy technique % of studies
SVM Umpire’s gesture modeling 45 Confusion matrix based 57.7
Fast bowler’s action modeling F-score 15.5
Game’s outcome prediction RMSE 6.2
Player-performances classification ROC 9.3
RF Predicting the winner of the tournament 42 Other techniques 3.1
Predicting the winner of the game Cohen’s kappa statistic 2.1
Classification of all-rounders MAE 2.1
NB Predicting the winner of the game 36 MCC 2.1
Prediction the Batting and bowling performance 𝑅2 -statistic 2.1
Pitch behavior prediction
Regression Identification of game’s influential factors 31
Innings’ score prediction
DT Identification of game’s influential factors 26 two decades. Fig. 5 shows the distribution of publications using the
Winner prediction
above-stated ML techniques from 2001 to 2021.
Batting and bowling performance prediction
NN Game outcome prediction 21
Cricket-shot classification 3.7. Quantifying the accuracy of ML technique
Game’s score prediction
kNN Classification of all-rounders 17
Player-performances classification
Since all ML models are data-driven, it is essential to evaluate the
Other Predicting the player-ranking 14 accuracy of the ML model. Perhaps this is the most important element
Cricket and social media of ML models. The reviewed studies show that the Confusion matrix-
XGBoot Bowler’s workload prediction 5 based accuracy measures are the most popular (about 58%) method to
K-Means Batsmen classification 3
test the accuracy of the ML model. The rest of the popular techniques
are F-Score, Receiver Operating Characteristic (ROC) curve (about 9%),
and Root Mean Square Error (RMSE) (about 6%). Table 5 as stated be-
• Artificial Neural Networks (ANN) fore, the main accuracy testing technique is the confusion matrix-based
• Support Vector Machine (SVM) technique. Accuracy, Balance Accuracy, Custom Accuracy, Precision,
• XGBoot Recall, Sensitivity, and Specificity are some confusion matrix-based
techniques researchers have used.
Predicting the duration of a game Player performance modeling
Among the above-listed ML techniques used in the reviewed papers,
3.8. Some of the addressed problems
SVM (45%) is the most frequently used technique. RF (42%) has been
the second most frequently used ML technique, followed by NB (36%),
the third most frequently used ML technique. According to the findings, The aim of this section is to have a glance at some of the studies
45% reviewed studies have used SVM, and 26% and 24% of them were conducted to solve existing issues in cricket with the help of ML
published in 2019 and 2021. The second most used ML technique is techniques.
RF. There are 42% published research have used RF, and 29% of them The team selection process is considered tedious, and picking the
have been used in 2020 and 2021. The third popular ML technique most suitable eleven players for an upcoming game involves lots of
used in the published works from 2010 to 2021 was NB . Out of all brainstorming sessions. Mahbub et al. (2021) addressed this prob-
the published works, 36% studies have used NB. Of them, 26% of the lem through the player strength–weakness-ranking mechanism. After
work was published in 2019, and 22% were used in 2020. More details, collecting batsman and bowler characteristics, they constructed two
including some of the research areas above ML techniques, have been novel performance indicators for batsman and bowler as the base for
used can be seen in Table 4. According to the findings, many studies classifying players. They trained several ML models, SVM, NB, and RF,
using ML techniques with cricket data have increased during the last and the SVM model outperformed the rest.
6
I. Wickramasinghe Machine Learning with Applications 10 (2022) 100435
Identification of the legality of bowling action is a daunting task This study reveals that most cricket-player-related studies relied on
due to the complicated bio-mechanical movements of the bowler’s a conventional set of features associated with batting and bowling.
arm, and the on-field umpire is unable to observe the bowler’s action With the introduction of technology, the game evolved by adding novel
accurately. Salman et al. (2017) used inertial measurement units (IMU) features related to the game. Therefore in future research, the time has
and ML techniques to classify the bowler’s bowling action about the come to search for these novel features while using the frequently used
delivery’s legality. In the first stage, the authors collected data about attributes of the game. Moreover, with the introduction of novel rules
various bowling actions. In the profiling stage, they identified critical in cricket, such as fielding restrictions, the number of balls used, the
events of the legal bowling action. Next, features gathered using inertial type of balls used, and player substitution, the nature of the current
sensors fixed on the bowler’s arm were extracted. Finally, several ML game deviated from the past. Therefore, if we gather cricket data that
techniques, SVM, kNN, NB, RF, and ANN, were trained to classify the spans several decades, there is an issue with the compatibility of the
bowling action by achieving higher classification accuracy. data, which can result in inconsistent outcomes.
Wickramasinghe (2020a) proposed an ML approach to predict the Cricket comprises three disciplines: batting, bowling, and fielding,
winner of an ODI game. Instead of the conventional regression ap- which are equally crucial for the game’s outcome. According to the
proach, the author used the ML technique due to the lack of in- reviewed articles, the number of studies has not been distributed evenly
stances required to run a regression analysis. After collecting a dataset among the above three departments. Even within studies on bowlers,
representing batting, bowling, and team-related features, univariate, the focus has been mainly on fast bowlers. It is rare to see any re-
recursive elimination, and principle component analysis (PCA) were search conducted using medium pacers and spinners. When considering
used to select the most influential features for the prediction. An NB studies related to player performance, one of the factors that could
model was trained using the finalized data to achieve a higher level of impact their performance is the type of cricket ball the game uses.
prediction accuracy. In another study, Wickramasinghe (2020b) used There are several brands of cricket balls, namely, Kookaburra, Dukes,
NB, kNN, and RF models to predict all-rounders in the game accurately. and SG. Unfortunately, the literature does not indicate any significant
These models were trained on a dataset representing various aspects of research work in this direction to study the impact of the type of
bawling, batting, and fielding performance indicators, and RF turned cricket ball on player performance. Furthermore, there is a significant
out to be the best predictor. number of studies related to batting, though there are hardly any
Making the third innings declaration is a tricky decision due to the studies related to fielders. However, fielding plays an essential role
various factors needed to consider to take such a decision. Deval et al. in the game’s outcome, yet it is surprising to see the lack of studies
(2021) studied the declaration of the third innings of a test cricket conducted regarding the fielding aspect.
match. In particular, they constructed a decision support system to From an ML point of view, some reviewed studies have not revealed
predict the result of a test match at different phases of the match so that vital information related to the ML process. The nature of the data
the optimal time to declare the inning can be predicted. After gathering (size, number of attributes, correlation structure, and sparsity), the data
a large collection of variables of the game’s aspects, they constructed pre-processing technique, feature selection technique, the accuracy
three models to predict the game’s outcome before it started, after the quantifying technique, and the level of accuracy the ML technique
second innings, and after the third innings. Though they used several reaches are some of them.
ML techniques such as ANN, Regression, RF, SVM, and XGboost, the The reviewed articles showcased the application of various ML tech-
SVM appeared to be the most effective one. niques in different research areas in cricket. It is hard to recommend a
Distinguishing the gestures of the umpire is considered a tiresome single ML technique for a given application in cricket, as the success of
process in cricket videos. Nandyal and Kattimani (2021) used SVM the ML technique depends on various aspects. Based on these articles,
to classify umpire action and non-action gestures, which is vital in there has been an increment in the use of kNN, Regression, and RF
video summarization and automatic sports highlights generation. They compared to the other ML techniques during the last two decades.
developed a system for the identification of umpire action and non- Other ML techniques such as neural networks and Naive Bayes do
action gestures. In this process, the authors trained an SVM model using not show increased popularity, while SVM still shows its popularity
two datasets representing the umpire’s performing and non-performing in cricket applications. One of the significant observations was the
activities relating to specific occasions, such as showing boundaries, lack of the use of some popular ML techniques, such as deep learning
illegal bowling, and player dismissals, to classify each action as an reinforcement learning and natural language processing. It is expected
Umpire Action or Non-Action. to see future research using the latest ML techniques such as automated
Fast bowlers are more prone to injuries compared to slow bowlers machine learning, multi-modal learning, multi-objective models, and
and batters. Unfortunately, no concrete criteria exist to quantify the tiny ML to address more sophisticated issues in cricket.
maximum tolerated workload of bowlers. McGrath, Neville, Stewart, We identify several limitations of this systematic review. First, the
Clinning et al. (2021) used IMUs and ML models to quantify bowling study considered only four databases. Furthermore, due to limited re-
volume (BV), ball release speed (BRS), and perceived intensity zone sources, this study could not consider another version of cricket, called
(PIZ). Using IMUs, data are collected from bowlers in a cross-sectional T-10. This review identified various research activities conducted in
study setting. Though they used multiple ML techniques, XGBoost cricket using ML and other areas that require researchers’ attention. As
turned out to be the most accurate in predicting the BV, and both RF cricket is a rapidly changing game, more research studies are required
and XGBoost showed the best performances with BRS. in the future. In this direction, ML will play a significant contribution
which will eventually help to develop the game further.
4. Discussion and conclusion
Declaration of competing interest
Based on the conducted systematic review, it is evident that the
volume of research in cricket using ML technology has been increasing The authors declare that they have no known competing finan-
since 2001. This tendency is a positive sign for the game, as the findings cial interests or personal relationships that could have appeared to
of these research outcomes will eventually help the game’s develop- influence the work reported in this paper.
ment. Improved ML techniques enable the investigation of various
cricket aspects that were inconceivable to conduct before using the Acknowledgments
existing procedures. Furthermore, the amalgamation of ML techniques
and electronic wearable devices in cricket research provides numer- The author would like to thank the reviewers for their insightful
ous benefits. Research findings allow cricketers to deliver accelerated comments and the funding support given by the Prairie View A&M
performances by lowering injury risks. university’s Faculty Enhancement Program (FEP).
7
I. Wickramasinghe Machine Learning with Applications 10 (2022) 100435
References Kaluarachchi, A., & Aparna, S. V. (2010). Cricai: A classification based tool to predict
the outcome in ODI cricket. In 2010 fifth international conference on information and
Abbas, K., & Haider, S. (2019). Duckworth-Lewis-Stern method comparison with ma- automation for sustainability (pp. 250–255). IEEE.
chine learning approach. In 2019 international conference on frontiers of information Kamble, R. R. (2021). Cricket score prediction using machine learning. Turkish Journal
technology (pp. 197–1975). IEEE. of Computer and Mathematics Education (TURCOMAT), 12(1S), 23–28.
Aburas, A. A., Mehtab, M., & Mehtab, Y. (2018). Cricket world cup predictions using Kampakis, S., & Thomas, W. (2015). Using machine learning to predict the outcome
KNN intelligent bigdata approach. In Proceedings of the 2018 international conference of english county twenty over cricket matches. arXiv preprint arXiv:1511.05837.
on computing and big data (pp. 18–22). Kanhaiya, K., Gupta, R., & Sharma, A. K. (2019). Cracked cricket pitch analysis (CCPA)
Acikmese, Y., Ustundag, B. C., & Golubovic, E. (2017). Towards an artificial training using image processing and machine learning. Global Journal on Application of Data
expert system for basketball. In 10th international conference on electrical and Science and Internet of Things [ISSN: 2581-4370 (Online)], 3(1).
electronics engineering. Bursa Turkey: IEEE, Nov 30- Dec 2. Kapadia, K., Abdel-Jaber, H., Thabtah, F., & Hadi, W. (2020). Sport analytics for cricket
Aftab, K. H. A. N., Hussain, S. Q., Waleed, M., Ashfaq, K. H. A. N., & Umair, K. H. A. N. game results using machine learning: An experimental study. Applied Computing and
(2019). An automated snick detection and classification scheme as a cricket decision Informatics.
review system. Turkish Journal of Electrical Engineering and Computer Science, 27(6), Karlis, D., & Ntzoufras, I. (2003). Analysis of sports data by using bivariate Poisson
4118–4133. models. Journal of the Royal Statistical Society: Series D (the Statistician), 52(3),
Ahmad, H., Ahmad, S., Asif, M., Rehman, M., Alharbi, A., & Ullah, Z. (2021). 381–393.
Evolution-based performance prediction of star cricketers. CMC-Computers Materials Karthik, K., Krishnan, G. S., Shetty, S., Bankapur, S. S., Kolkar, R. P., Ashwin, T.
& Continua, 69(1), 1215–1232. S., et al. (2021). Analysis and prediction of fantasy cricket contest winners
Ahmad, H., Daud, A., Wang, L., Hong, H., Dawood, H., & Yang, Y. (2017). Prediction using machine learning techniques. In Evolution in computational intelligence
of rising stars in the game of cricket. IEEE Access, 5, 4104–4124. (pp. 443–453). Singapore: Springer.
Ahmadi, A., Mitchell, E., Richter, C., Destelle, F., Gowing, M., O’Connor, N. E., et Koseler, K., & Stephan, M. (2017). Towards the realization of a DSML for machine
al. (2014). Toward automatic activity classification and movement assessment learning: A baseball analytics use case. In International summer school on domain-
during a sports training session. IEEE Internet of Things Journal, 2(1), 98–103. specific modeling theory and practice. https://sc.lib.miamioh.edu/handle/2374.MIA/
http://dx.doi.org/10.1109/BSN.2014.29. 6224.
Anik, A. I., Yeaser, S., Hossain, A. I., & Chakrabarty, A. (2018). Player’s performance Kumar, J., Kumar, R., & Kumar, P. (2018). Outcome prediction of ODI cricket matches
prediction in ODI cricket using machine learning algorithms. In 2018 4th interna- using decision trees and MLP networks. In 2018 first international conference on
tional conference on electrical engineering and information & communication technology secure cyber computing and communication (pp. 343–347). IEEE.
(pp. 500–505). IEEE. Kumar, R., Santhadevi, D., & Barnabas, J. (2019). Outcome classification in cricket
Awan, M. J., Gilani, S. A. H., Ramzan, H., Nobanee, H., Yasin, A., Zain, A. M., et al. using deep learning. In 2019 IEEE international conference on cloud computing in
(2021). Cricket match analytics using the big data approach. Electronics, 10(19), emerging markets (pp. 55–58). IEEE.
2350. Mahbub, M. K., Miah, M. A. M., Islam, S. M. S., Sorna, S., Hossain, S., & Biswas, M.
Bandulasiri, A., Brown, T., & Wickramasinghe, I. (2016). Characterization of the result (2021). Best eleven forecast for Bangladesh cricket team with machine learn-
of one day format of cricket. Operation Research and Decisions, 26(4), 21–32. ing techniques. In 2021 5th international conference on electrical engineering and
Basit, A., Alvi, M. B., Jaskani, F. H., Alvi, M., Memon, K. H., & Shah, R. A. (2020). ICC information & communication technology (pp. 1–6). IEEE.
T20 cricket world cup 2020 winner prediction using machine learning techniques. Manage, A. B., Kafle, R. C., & Wijekularathna, D. K. (2020). Classification of all-
In 2020 IEEE 23rd international multitopic conference (pp. 1–6). IEEE. rounders in limited over cricket-a machine learning approach. Journal of Sports
Deval, G., Hamid, F., & Goel, M. (2021). When to declare the third innings of a test Analytics, 6(4), 295–306.
cricket match? Annals of Operations Research, 303(1), 81–99.
McGrath, J., Neville, J., Stewart, T., Clinning, H., & Cronin, J. (2021). Can an inertial
Devonport, T. (2015). Understanding stress and coping among competitive athletes in
measurement unit (IMU) in combination with machine learning measure fast
sport. Sport Exercise Psychology, 127.
bowling speed and perceived intensity in cricket? Journal of Sports Sciences, 39(12),
Dias, P., Mitchell, S. R., & Harland, A. R. (2020). Novel experimental protocol to capture
1402–1409.
movement data and predict shot execution in cricket batting. Multidisciplinary Digital
McGrath, J. W., Neville, J., Stewart, T., Clinning, H., Thomas, B., & Cronin, J. (2021).
Publishing Institute Proceedings, 49(1), 41.
Quantifying cricket fast bowling volume, speed and perceived intensity zone using
Dubey, P. K., Suri, H., & Gupta, S. (2021). Naïve Bayes algorithm based match
an apple watch and machine learning. Journal of Sports Sciences, 1–8.
winner prediction model for T20 cricket. In Intelligent computing and applications
McGrath, J. W., Neville, J., Stewart, T., & Cronin, J. (2019). Cricket fast bowling
(pp. 435–446). Singapore: Springer.
detection in a training setting using an inertial measurement unit and machine
Goggins, L., Warren, A., Osguthorpe, D., Peirce, N., Wedatilake, T., McKay, C. . . ., et
learning. Journal of Sports Sciences, 37(11), 1220–1226.
al. (2021). Detecting injury risk factors with algorithmic models in elite women’s
Modani, N., Kilaru, M., Kaur, A., Sinha, R., & Khetan, H. (2020). Predicting outcomes
pathway cricket. International Journal of Sports Medicine.
in limited-overs cricket matches. In Proceedings of the 7th ACM IKDD CoDS and 25th
Grues, J. (2015). Data science from scratch: first principles with python. O’Reilly.
COMAD (pp. 65–72).
Gupta, A., & Muthiah, S. B. (2020). Viewpoint constrained and unconstrained cricket
Mody, K., Malathi, D., & Jayaseeli, J. D. (2021). An artificial neural network approach
stroke localization from untrimmed videos. Image and Vision Computing, 100, Article
for classifying cricket batsman’s performance by adam optimizer and prediction by
103944.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. derived attributes. In 2021 smart technologies, communication and robotics (pp. 1–7).
Journal of Machine Learning Research, 3, 1157–1182. IEEE.
Hasanika, D., Dilhara, R., Liyanage, D., Bandaranayake, A., & Deegalla, S. (2021). Data Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & Prisma Group* (2009). Preferred
mining system for predicting a winning cricket team. In 2021 IEEE 16th international reporting items for systematic reviews and meta-analyses: the PRISMA statement.
conference on industrial and information systems (pp. 92–97). IEEE. Annals of Internal Medicine, 151(4), 264–269.
Hatharasinghe, M. M., & Poravi, G. (2019). Data mining and machine learning in cricket Morgulev, E., Azar, O. H., & Lidor, R. (2018). Sports analytics and the big-data era.
match outcome prediction: missing links. In 2019 IEEE 5th international conference International Journal of Data Science Analysis, 5, 213–222. http://dx.doi.org/10.
for convergence in technology (pp. 1–4). IEEE. 1007/s41060-017-0093-7.
Horvat, T., & Job, J. (2019). Importance of the training dataset length in basketball Morris, C., Mundt, M., Goldacre, M., & Jacqueline, A. (2020). Predict ground
game outcome prediction by using naïve classification machine learning methods. reaction forces from 2D video. https://twitter.com/JacquelineUWA/status/
ElektrotehniŠki Vestnik, 86, 197–202. 1327971555029073924.
Ishi, M. S., & Patil, J. B. (2020). A study on impact of team composition and optimal Mustafa, R. U., Nawaz, M. S., Lali, M. I. U., Zia, T., & Mehmood, W. (2017). Predicting
parameters required to predict result of cricket match. In Social networking and the cricket match outcome using crowd opinions on social networks: A comparative
computational intelligence (pp. 389–399). Singapore: Springer. study of machine learning methods. Malaysian Journal of Computer Science, 30(1),
Iyer, G. N., Bala, V. S., Sohan, B., Dharmesh, R., & Raman, V. (2020). Automated third 63–76.
umpire decision making in cricket using machine learning techniques. In 2020 4th Naglah, A., Khalifa, F., Mahmoud, A., Ghazal, M., Jones, P., Murray, T., et al.
international conference on intelligent computing and control systems (pp. 1216–1221). (2018). Athlete-customized injury prediction using training load statistical records
IEEE. and machine learning. In IEEE international symposium on signal processing and
Jamil, F., Iqbal, N., Ahmad, S., & Kim, D. H. (2020). Toward accurate position information technology (pp. 459–464). Louisville, KY, USA.
estimation using learning to prediction algorithm in indoor navigation. Sensors Nandyal, S., & Kattimani, S. L. (2021). Umpire gesture detection and recognition
(Switzerland), 20(16), 1–27. http://dx.doi.org/10.3390/s20164410. using HOG and non-linear support vector machine (NL-SVM) classification of deep
Jayalath, K. P. (2018). A machine learning approach to analyze ODI cricket predictors. features in cricket videos. Journal of Physics: Conference Series, 2070(1), Article
Journal of Sports Analytics, 4(1), 73–84. 012148.
Jowitt, H. K., Durussel, J., Brandon, R., & King, M. (2020). Auto detecting deliveries Panda, S. K., Sathya, A. R., Mishra, M., & Satpathy, S. (2019). A supervised learning
in elite cricket fast bowlers using microsensors and machine learning. Journal of algorithm to forecast weather conditions for playing cricket. International Journal
Sports Sciences, 38(7), 767–772. of Innovative Technology and Exploring Engineering (IJITEE), 9(1).
8
I. Wickramasinghe Machine Learning with Applications 10 (2022) 100435
Parameswaran, K. (2013). Vector quantization, density estimation and outlier detection Shakil, F. A., Abdullah, A. H., Momen, S., & Mohammed, N. (2020). Predicting the
on cricket dataset. In 2013 international conference on computer communication and result of a cricket match by applying data mining techniques. In Proceedings of the
informatics (pp. 1–5). IEEE. computational methods in systems and software (pp. 758–770). Cham: Springer.
Pathak, N., & Wadhwa, H. (2016). Applications of modern classification techniques to Singh, T., Singla, V., & Bhatia, P. (2015). Score and winning prediction in cricket
predict the outcome of ODI cricket. Procedia Computer Science, 87, 55–60. through data mining. In 2015 international conference on soft computing techniques
Premkumar, P., Chakrabarty, J. B., & Chowdhury, S. (2020). Key performance indicators and implementations (pp. 60–66). IEEE.
for factor score based ranking in one day international cricket. IIMB Management Somaskandhan, P., Wijesinghe, G., Wijegunawardana, L. B., Bandaranayake, A., &
Review, 32(1), 85–95. Deegalla, S. (2017). Identifying the optimal set of attributes that impose high
Rahman, M. M., Shamim, M. O. F., & Ismail, S. (2018). An analysis of Bangladesh one impact on the end results of a cricket match using machine learning. In 2017
day international cricket data: a machine learning approach. In 2018 international IEEE international conference on industrial and information systems (pp. 1–6). IEEE.
conference on innovations in science, engineering and technology (pp. 190–194). IEEE. Srinivas, S., Bhat, N. N., & Revanasiddappa, M. (2021). Data analysis of cricket score
Raja, M. A. M., Manasa, V. V. L., Reddy, D. S. N., & Sundari, K. S. (2021). Applying prediction. In Proceedings of international conference on recent trends in machine
data science for cricket predictions. Annals of the Romanian Society for Cell Biology, learning, iot, smart cities and applications (pp. 465–472). Singapore: Springer.
185, 3–1863. Tekade, P., Markad, K., Amage, A., & Natekar, B. (2020). Cricket match outcome pre-
Ranaweera, J., & Silva, P. (2019). Analysis of sensor locations on human body for diction using machine learning. International Journal of Advance Scientific Research
wearable sensor based activity classification during fast bowling in cricket. In and Engineering Trends, 5(7).
IcSPORTS (pp. 21–31). Tyagi, S., Kumari, R., Makkena, S. C., Mishra, S. S., & Pendyala, V. S. (2020). Enhanced
Rani, P. J., Kamath, A. V., Menon, A., Dhatwalia, P., Rishabh, D., & Kulkarni, A. predictive modeling of cricket game duration using multiple machine learning
(2020). Selection of players and team for an Indian premier league cricket match algorithms. In 2020 international conference on data science and engineering (pp. 1–9).
using ensembles of classifiers. In 2020 IEEE international conference on electronics, IEEE.
computing and communication technologies (pp. 1–6). IEEE. Vistro, D. M., Rasheed, F., & David, L. G. (2019). The cricket winner prediction with
Rommers, N., Rössler, R., Verhagen, E., Vandecasteele, F., Verstockt, S., Vaeyens, R., et application of machine learning and data analytics. International Journal of Scientific
al. (2020). A machine learning approach to assess injury risk in elite youth football & Technology Research, 8(09).
players. Medicine & Science in Sports & Exercise, Publish Ah, 52(12), 1745–1751. Weir, G., Alderson, J., Smailes, N., Elliott, B., & Donnelly, C. (2019). A reliable video-
http://dx.doi.org/10.1249/mss.0000000000002305. based ACL injury screening tool for female team sport athletes. International Journal
Rupai, A. A. A., Mukta, M. S. H., & Islam, A. N. (2020). Predicting bowling performance of Sports Medicine, 40(3), 191–199. http://dx.doi.org/10.1055/a-0756-9659.
in cricket from publicly available data. In Proceedings of the international conference Wickramasinghe, I. P. (2014a). Predicting the performance of batsmen in test cricket.
on computing advancements (pp. 1–6). Journal of Human Sport & Excercise, 9(4), 744–751.
Salman, M., Qaisar, S., & Qamar, A. M. (2017). Classification and legality analysis of Wickramasinghe, I. (2014b). Bowlers’ performances in 2013 champions trophy. Annals
bowling action in the game of cricket. Data Mining and Knowledge Discovery, 31(6), of Applied Sport Science, 2(1), 1–10.
1706–1734. Wickramasinghe, I. (2020a). Naive Bayes approach to predict the winner of an ODI
Samaraweera, W. J., Premaratne, S. C., & Dharmaratne, A. T. (2020). Deep learning cricket game. Journal of Sports Analytics, 6(2), 75–84.
for classification of cricket umpire postures. In International conference on neural Wickramasinghe, I. (2020b). Classification of all-rounders in the game of ODI cricket:
information processing (pp. 563–570). Cham: Springer. Machine learning approach. Athens Journal of Sports, 7(1), 21–34.
Sarlis, V., & Tjortjis, C. (2020). Sports analytics–evaluation of basketball players and Wickramasinghe, A. N., & Yapa, R. D. (2018). Cricket match outcome prediction using
team performance. Information System, 93, Article 101562. tweets and prediction of the man of the match using social network analysis: Case
Sen, A., Deb, K., Dhar, P. K., & Koshiba, T. (2021). CricShotClassify: An approach to study using IPL data. In 2018 18th international conference on advances in ICT for
classifying batting shots from cricket videos using a convolutional neural network emerging regions (p. 1). IEEE.
and gated recurrent unit. Sensors, 21(8), 2846. Xu, T., & Tang, L. (2021). Adoption of machine learning algorithm-based intelligent
Shahjalal, M. A., Ahmad, Z., Rayan, R., & Alam, L. (2017). An approach to automate basketball training robot in athlete injury prevention. Frontier Neurorobotology, 14,
the scorecard in cricket with computer vision and machine learning. In 2017 Article 620378. http://dx.doi.org/10.3389/fnbot.2020.620378.
3rd international conference on electrical information and communication technology Zakzouk, T. S., & Mathkour, H. I. (2012). Comparing text classifiers for sports news.
(pp. 1–6). IEEE. Procedia Technology, 1, 474–480.