Comprehensive Data Analysis and Prediction On IPL Using Machine Learning Algorithms Valarmathi B 2113j1
Comprehensive Data Analysis and Prediction On IPL Using Machine Learning Algorithms Valarmathi B 2113j1
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 221
(ii) Batman MVPI ranking: This is a better model of train a random forest model with predictors as these
ranking compared to batting average which takes into measures and the outcome being MVPI and we
consideration both batting average and batting strike generate the PRI score and rank the batsmen.
rates respectively. So, we get a better measure of The PRI is found using five parameters for bowlers. The
ranking for limited over IPL-T20 cricket [20]. parameters for bowlers include:
MVPI = ((MR/TMR) + (MSR/TMSR)) * TR – Economy = (Runs conceded by player/ (Count of balls
where bowled/6))
– MR is the batting average of particular batsman – Wicket-Taker = (Count of balls bowled / Count of
– TMR is the average of all batsmen in the IPL wickets taken)
– MSR is the mean strike rate of particular batsman – Consistent = (Runs conceded by the bowler/ Count of
– TMSR is the mean strike rate of all the batsmen in the wickets taken)
IPL – Big-Wicket-Taker = (Count of four wickets or five
– TR is the total runs of the batsman wickets or six wickets taken/ Count of innings played)
(iii) Batsman PRI (Player ranking index): This is the – Short-Performance = ((Count of total wickets –
best model of ranking or we can say it is an 4*Count of four wicket haul – 5* Count of times five
improvisation over MVPI ranking. It takes into account 5 wicket haul - 6*Count of six wicket haul) / (Count of total
different parameters. All which matters, the most in T20 played innings /Count of times four (or) five (or) six
cricket. When it comes to batsman the measures like wicket hauls totally))
how hard can he hit the ball (being good at hitting 4s
and 6s), capability of staying not out, capability of not H. Prediction
wasting any deliveries, consistent performance and For prediction we make use of PRI generated in the
finally his running between the wickets. Using all these previous sections. PRI are generated separately for
measures, we train a random forest model with batsmen and bowlers. Every player who ever played in
predictors as these measures and the outcome being the history of IPL surely would have a PRI. In the
MVPI and we generate the PRI score and rank the absence of corresponding batting/bowling records, he is
batsmen. assigned the last rank. The rank differences of playing
The PRI is found using five parameters for batsmen. 11 in the rival teams are the basic idea to make the
The parameters for batsmen include: predictions.
– Hard-Hitter = ((4*Four + 6*Six) / Balls played by Prediction is made on two sets of data.
batsman) – Training data – Season 1 to Season 8 IPL data.
– Finisher = (Count of matches being not out/ Total Tested on – Season 9.
count of innings played) – Training data – Season 1 to Season 9 IPL data.
– Fast-Scorer = (Player batting strike rate) Tested on – Season 10.
– Consistent = (Player batting average) The first data is used to show significant difference
– Running-Between-Wickets (RBW) = ((Run scored by compared to the existing models. Second is to predict
the player) – (4*Fours+ 6*Sixes)/Number of balls faced the matches in the recent IPL.
without boundary) Steps for predictive model used with the second training
(iv) Bowling average ranking: Here the bowlers are data is as follows:
ranked in descending order according to their bowling – For a particular match, for both the teams separately,
average. for each player, we need to find the batting PRI and
BOA = (TW/TM) bowling PRI for each player respectively.
– TW is the total wickets taken by a bowler – For batting and bowling PRI separately, we find
– TM is the total matches played by a bowler differences between corresponding player’s batting and
(v) Bowling MVPI ranking: This is a better model of bowling PRI.
ranking compared to bowling average which takes into – So, apart from the 22 columns (11 batting PRI and 11
account both bowling average and bowling economy bowling PRI) for a particular match, we add a 23rd
rate respectively. So, we get a better measure of column containing the match result, 1 if team-1 wins and
ranking for limited over IPL-T20 cricket [20]. 2 if team-2 wins.
MVPI = ((MW/TMW) + (TMER/MER)) * TW – Now we train various models over this dataset
where constructed. Which will be discussed in the below
– MW is the mean wickets taken by the bowler sections. After this we would have our prediction model
– TMW is the mean wickets taken by all the bowlers in ready.
the tournament Now when we are predicting a match’s outcome, we
– TMER is the mean economy rate of all the bowlers in generate the same data of 22 rows for that match and
the tournament predict which among team 1 or team 2 would be the
– MER is the average economy rate of the bowler winner. Prepare a test set with these selected 22
– TW is the total wickets taken by the bowler. features for 58 matches of season 10.
We use various algorithms for training which include
(vi) Bowling PRI (Player ranking index): This is the support vector machine, sequential minimal
best model of ranking or we can say it is an optimization, Instance based learning in parameter k,
improvisation over MVPI ranking. It takes into account 5 Random Forest, JRIP reduced error pruning algorithm,
different parameters. All which matters, the most in T20 J48 decision tree algorithm, Flexible Discriminant
cricket. As far as bowlers are concerned, the measures Analysis, Mixture discriminant analysis, C5.0 decision
like economy, wicket taker, consistent, big wicket taker tree algorithm and naïve Bayes classifier.
and short performance. Using all these measures, we
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 222
IV. RESULTS AND DISCUSSION
A. Website Interface
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 224
G. Ranking
In Table 1, the batsman is ranked according to their
batting averages. Chris Gayle is ranked first with a
batting average of 36.25. Only the top 10 batsmen are
shown.
In Table 2, we are ranking players according to MVPI
(most valuable player index) formula. Here we see that
David Warner is ranked first and Virat Kohli is ranked
second. Only the top 10 batsmen are shown.
In Table 3, we are ranking batsman according to PRI In Table 6, we are ranking batsman according to PRI
(Player ranking index) formula. Here also we see that (Player ranking index) formula. Here also SL Malinga
David Warner and Virat Kohli are ranked 1 and 2 retains his rank 1 respectively. Only the top 10 bowlers
respectively. Only the top 10 batsmen are shown. are shown.
In Table 4, bowlers are ranked according to their mean H. Prediction
wickets. SL Malinga rules this table with an average of Feature table of batsman and bowler rank differences
1.56 being on top. Only the top 10 bowlers are shown. (which will be used for further prediction) is generated in
In Table 5, we are ranking players according to MVPI Table 7. Bat1 to Bat11 is the batting rank difference.
(most valuable player index) formula. Here also we see Bowl1 to Bowl11 is the bowling rank difference. W
that SL Malinga is ranked first. Only the top 10 bowlers stands for winner. Training set contains 550 rows
are shown. approximately. First 22 rows of predicted outcomes for
IPL season 10 are shown as samples in Table 7.
Table 7: Feature table for prediction.
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 226
Table 8 shows the prediction for IPL season 10. These of 58 matches outcomes. A comparison of both the
are the predictions made by JRIP algorithm, which is model's accuracy is given below for reference.
found to be performing far better than the others. The Fig. 16 shows the accuracy comparison of predictions
table displays predictions of the top 10 matches (sample made by the existing system and proposed system
output). DMP (Deep mayo predictor) is the existing respectively by using the SVM algorithm in our first set
model proposed by Prakash et al., CDAI of predictions. The existing system has an accuracy of
(Comprehensive data analysis on IPL) is the proposed 69.64% in contrast to the proposed system, which has
system. an accuracy of 81.03%.
In the first set of predictions in our proposed systems, In the second set of predictions in our proposed system,
for which a similar attempt was made by Prakash et al., we predicted for season 10 using various algorithms.
[21], for predicting Season 9 IPL results, where they This is the first attempt for IPL 10. Our model is built
made their training set with international T20+IPL using Season 1 to Season 9 IPL dataset in this case.
Season 1-8 dataset. Their model was able to predict 39 Comparing the results of the model to the actual IPL 10
out of 58 matches with an accuracy of 69.68%. match outcomes. The accuracy comparison between
Whereas, our model which was built only with IPL each algorithm used for predicting season 10 results of
season 1-8 dataset is able to successfully predict 47 out matches is shown in Fig. 17.
Table 8: Prediction.
Match Winner Prediction Result
Delhi Daredevils-Gujarat Lions-2017-05-04.RData Delhi Daredevils Delhi Daredevils TRUE
Delhi Daredevils-Kings XI Punjab-2017-04-
Delhi Daredevils Delhi Daredevils TRUE
15.RData
Delhi Daredevils-Kolkata Knight Riders-2017-04-
Kolkata Knight Riders Kolkata Knight Riders TRUE
17.RData
Delhi Daredevils-Mumbai Indians-2017-05-
Mumbai Indians Mumbai Indians TRUE
06.RData
Delhi Daredevils-Rising Pune Supergiants-2017-
Delhi Daredevils Delhi Daredevils TRUE
05-12.RData
Delhi Daredevils-Royal Challengers Bangalore- Royal Challengers
Delhi Daredevils FALSE
2017-05-14.RData Bangalore
Delhi Daredevils-Sunrisers Hyderabad-2017-05-
Delhi Daredevils Sunrisers Hyderabad FALSE
02.RData
Gujarat Lions-Delhi Daredevils-2017-05-10.RData Delhi Daredevils Gujarat Lions FALSE
Gujarat Lions-Kings XI Punjab-2017-04-23.RData Kings XI Punjab Kings XI Punjab TRUE
Gujarat Lions-Kolkata Knight Riders-2017-04-
Kolkata Knight Riders Kolkata Knight Riders TRUE
07.RData
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 227
V. CONCLUSION [10]. Lenten, L. J., Geerling, W., & Kónya, L. (2012). A
hedonic model of player wage determination from the
The approach has brought out analysis and visualization Indian Premier League auction: Further evidence. Sport
of various aspects of IPL matches in all the possible Management Review, 15(1), 60-71.
ways and gives useful results to the user. This [11]. Rastogi, S. K., & Deodhar, S. Y. (2009). Player
information is of great value. It could be of great help to pricing and valuation of cricketing attributes: exploring
team owners (who purchase players for their teams in the IPL Twenty20 vision. Vikalpa, 34(2), 15-23.
auction every year), captain and coaches to make the [12]. Singh, S. (2011). Measuring the Performance of
right selection for playing 11, to invest in the right team Teams in the Indian Premier League. American Journal
for betting and lastly for the people who are curious of Operations Research, 1, 180-184.
about IPL and its statistics. [13]. Van, Staden, P. (2009). Comparison of Cricketers'
VI. FUTURE SCOPE Bowling and Batting Performance using Graphical
Displays.Current Science, 96, 764-766.
In future, making minor changes the model can also be [14]. Lakkaraju, P., & Sethi, S. (2012). Correlating the
made to work with the ODI and test matches. The Analysis of Opinionated Texts Using SAS® Text
international matches can be analysed in a similar way Analytics with Application of Sabermetrics to Cricket
and more visualizations can be added to the functions. Statistics. Proceedings of SAS Global Forum, 1-10.
The system can also be made to adapt more file formats [15]. Lemmer, H. (2004). A Measure for the Batting
of data for better analysis of varied forms of data performance of Cricket Players. South African Journal
collected. for Research in Sport, Physical Education and
Conflict of Interest. There is no conflict of interest Recreation, 26, 55-64.
involving the content enlisted in the given paper. [16]. Lemmer, H. (2008). An Analysis of Players'
Performances in the First Cricket Twenty20 World Cup
REFERENCES Series. South African Journal for Research in Sport,
Physical Education and Recreation, 30, 71-77.
[1]. Clarke, S. R. (1988). Dynamic programming in one
[17]. Lemmer, H. (2012). The Single Match Approach to
day cricket - optimal scoring rates. Journal of the
Strike Rate Adjustments in Batting Performance
Operational Research Society, 50, 536 – 545.
Measures in Cricket.Journal of Sports Science and
[2]. Kimber, A. C., & Hansford, A. R. (1993). A Statistical
Medicine, 10, 630-634.
Analysis of Batting in Cricket. Journal of Royal
[18]. Saikia, H., & Bhattacharjee, D. (2011). A Bayesian
Statistical Society, 156, 443 – 455.
Classification Model for Predicting the Performance of
[3]. Damodaran, U. (2006). Stochastic Dominance and
All-Rounders in the Indian Premier
Analysis of ODI Batting Performance: The Indian Cricket
League. Vikalpa, 36(4), 51-66.
Team, 1989-2005. Journal of Sports Science and
[19]. Khandelwal, M., Prakash, J., & Pradhan, T. (2015).
Medicine, 5, 503 – 508,
An Analysis of Best Player Selection Key Performance
[4]. Barr, G. D. I., and Kantor, B.S..A Criterion for
Indicator: The Case of Indian Premier League (IPL).
Comparing and Selecting Batsmen in Limited Overs
Advances in Intelligent Systems Technologies and
Cricket.Journal of the Operational Research Society, 55,
Applications, 173-190.
1266-1274.
[20]. http://www.rediff.com/
[5]. Borooah, V. K., & Mangan, J. E. (2010). The
[21]. Prakash, C. D., Patvardhan, C., & Lakshmi, C. V.
Bradman Class: An Exploration of Some Issues in the
(2016). Data Analytics based Deep Mayo Predictor for
Evaluation of Batsmen for Test Matches 1877–2006.
IPL-9. International Journal of Computer
Journal of Quantitative Analysis in Sports, 6(3): 14-22.
Applications, 152(6), 6-10.
[6]. Norman, J., & Clarke, S. R. (2004). Dynamic
[22]. Nimmagadda., A., Kalyan, N. V., Venkatesh, M.,
programming in cricket: Batting on sticky wicket.
Teja, N. N. S., & Raju, C .G. (2018). Cricket score and
Proceedings of the 7th Australasian Conference on
winning prediction using data mining. Int. J. Adv. Res.
Mathematics and Computers in Sport, 226–232.
Development, 3(3), 299-302.
[7]. Ovens, M., & Bukeit, B. (2006). A mathematical
[23]. Kapadia, K., Abdel-Jaber, H., Thabtah, F., & Hadi,
modeling approach to one day cricket batting orders.
W. (2019). Sport analytics for cricket game results using
Journal of Sports Science and Medicine, 5, 495-502.
machine learning: An experimental study. Applied
[8]. Lewis, A. (2008). Extending the Range of Player-
Computing and Informatics, 1-6.
Performance Measures in One-Day Cricket. Journal of
[24]. Rupai, A. A. A., Mukta, M, & Islam, A.K.M.N.,
Operational Research Society, 59, 729-742.
(2020). Predicting Bowling Performance in Cricket from
[9]. Parker, D., Burns, P., & Natarajan, H. (2008). Player
Publicly Available Data. International Conference on
valuations in the Indian Premier League. Frontier
Computing Advancements, 1-6.
economics Journal, 68, 68-76.
How to cite this article: AmalaKaviya V.S., Mishra, A. S. and Valarmathi B. (2020). Comprehensive Data Analysis
and Prediction on IPL using Machine Learning Algorithms. International Journal on Emerging Technologies, 11(3):
218–228.
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 228