0% found this document useful (0 votes)
316 views

Comprehensive Data Analysis and Prediction On IPL Using Machine Learning Algorithms Valarmathi B 2113j1

cse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
316 views

Comprehensive Data Analysis and Prediction On IPL Using Machine Learning Algorithms Valarmathi B 2113j1

cse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

et

International Journal on Emerging Technologies 11(3): 218-228(2020)


ISSN No. (Print): 0975-8364
ISSN No. (Online): 2249-3255

Comprehensive Data Analysis and Prediction on IPL using Machine Learning


Algorithms
Amala Kaviya V.S.1, Amol Suraj Mishra2 and Valarmathi B.3
1
Member of Technical Staff - Grade 2, VMware India Pvt. Ltd., Bangalore (Karnataka), India.
2
Member of Technical Staff - Grade 2, NetApp, Bangalore (Karnataka), India.
3
Associate Professor, Department of Software and Systems Engineering, School of Information Technology and
Engineering, Vellore Institute of Technology, Vellore (Tamilnadu), India.
(Corresponding author: Valarmathi B.)
(Received 28 January 2020, Revised 01 April 2020, Accepted 03 April 2020)
(Published by Research Trend, Website: www.researchtrend.net)
ABSTRACT: A detailed analysis of the complete IPL dataset and visualization of various features necessary
for IPL evaluation is performed. Many machine learning algorithms have been used to compare and predict
the winner between any two teams. Few models exist that try to rank players either based on simple formulae
or based on few mathematical models. Efficiency was very low, in the absence of valuable data sets in large
proportions. This is because enough data was not available when these models were suggested. T20 game
has its own requirements which weren’t satisfied by current models. In this paper, we have portrayed the
results of using a detailed ball-by-ball dataset of all the matches played in the history of IPL and doing a
comprehensive analysis of various aspects regarding measures involved in the game along with pragmatic
visualizations. We faced issues with ranking the players and we overcame that by modelling their strength
and weakness against a particular opponent, their performance on a particular pitch, etc. details which can
be of great benefit and can give the team a winning edge to a large extent. We have also ranked the players,
based on the Player Ranking Index using machine learning techniques. The accuracy of predictions have
increased upto 81% using the proposed system (Comprehensive data analysis on IPL (CDAI)) causing a hike
of 12% compared to the existing system (Deep mayo predictor (DMP)).
Keywords: BA, IPL, MVPI, ODI, PRI, T20.
Abbreviations: BA, Batting Average, IPL, Indian Premier League; MVPI, Most Valuable Player Index ; ODI, One Day
International ; PRI, Player ranking index; T20, Twenty-20.
I. INTRODUCTION done wonders in the field of the stock market, etc. in a
similar way, an application which would do detailed
Cricket is a thrill to both play the game and to watch itanalysis on players would be of great benefit. This
and its importance is no less than any sporting event.
motivated us to make an application which can do
Particularly after the advent of IPL, it gained huge comprehensive analysis, visualization along with the
popularity among people of all age groups, throughout prediction in every possible way and give the user
the universe. On one hand where it is said that cricket is
detailed information.
totally unpredictable, whereas on the other hand, this isFew models exist that attempt to rank players either
also very true that, cricket matches results heavily relybased on simple formulae or based on few
on the past statistical data. Hence there is a need for an
mathematical models. Few models try to predict the
accurate prediction model, which could provide winner. Considering efficiency, it is very low, in the
comprehensive analysis on players (his standards, absence of enough data set. Because, the time when
strength, weakness), teams and could also predict those models were suggested, enough data wasn't
higher chances of one team winning over the other. It available to train the models. Most of the models made
could be of great help to team owners (who purchase by using ODI cricket dataset too, along with T20
players for their teams), captain and coaches to make dataset, as T20 dataset alone wouldn't be enough for
the right selection for playing 11, to invest in the right
the need of prediction. But it had a
team for betting and lastly for the people who are loophole/shortcoming that the ODI performance of
curious about IPL and its statistics. So far, no such players was not equivalent or relevant to the
application has been proposed or developed in the past. performance rate of players in T20. Both the formats
This application is one try to fill up the gap. Thus, an and its requirements are way different. These little
application which could analyze and see the existing variations found, creates the need to rank them using
data and also could make predictions on future matches actual IPL/T20 data which are available now. The
would actually do wonders as far as IPL is concerned. disadvantages of the system include i) Low efficiency ii)
Few days back a prediction was that the succeeding Incorrect prediction method iii) Incomplete functionality
ability of huge, web scale datasets, as a substitute for iv) Less options for analysing v) Less usage of graphs
difficulties in models. And we got the detailed 10 for output. Contrast to all the other attempts, which just
seasons IPL dataset, of 636 matches played so far in concentrated on one of the aspects (either batsman
IPL from the cricsheet website. This dataset if analyzed characteristics or bowler characteristics), this paper will
properly can do huge wonders. How analyzing data has
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 218
do a comprehensive analysis on all possible aspects of and effective batting order amongst the 11 players
the IPL. It will be a 1 stop solution for any analysis available in the team. Duckworth/Lewis percentage
needed. Besides, it will also be able to rank the players, values were analyzed by Lewis respectively [8].
not only with their batting average, but using a lot of Duckworth Lewis method is of very great importance in
parameters, and thus much more accurate and it will be cricket, during the times of rain to declare the outcome
in sync with their present form. of a match, and also give targets when only shorter
Initially, it will be able to read and it also derives durations are left.
batsman specific, bowler specific, 1 team detailed and 2 After the immense popularity of Test and ODI, in 2005,
team specific data separately and saves them in came the era of T20, where each team is supposed to
separate files. Apart from these, it also contains special play for a limited 20 overs. Since it came into existence,
functions 3 for batsman analysis, bowler analysis, 1 it spread across the world very fast and gained
team detailed analysis, 2 specific team analysis and popularity very quickly because of the dynamic and
particular match analysis. All these will be possible to unpredictable nature of the game respectively. In this
do, using a web interface. Besides this, it also ranks format of the game, selectors prefer slow-consistent-
players based on a combination of many factors. higher average players rather a faster strike-rate player.
Through different rankings, we can analyze the same So, some new work was needed in this new dimension
player’s versatility. And these rankings are used to of cricket. From the dynamic batsmen who can score
predict the players of teams, playing opposite each most of their runs in boundaries, to having bowlers who
other, and predict the outcome of a match using our can bring in quick wickets. So, new prediction models
proposed approach CDAI and the Player Ranking index. were in need which would consider these factors. Since
Advantages of the proposed system include (i) to April 2008, IPL has started. The league, which was
analyze the player, it takes into account all the teams founded by the Board of Control for Cricket in India
that he played for (ii) It takes into account, ball by ball (BCCI) in 2008, has come a long way to 2017 currently
details from all the 10 seasons 636 matches (iii) It has playing through 10 seasons, 637 matches. It has gained
the option of both visualization and tabular output for a a lot of popularity since the time it came into existence.
few functions (iv) It can be used in future also, if new The most interesting aspect of IPL is being its dynamic
seasons yaml data files are made available (v) It could nature season by season. Every season the team goes
be of great help to team owners (who purchase players through auction and the players keep changing. So, for
for their teams in auction every year) (vi) It could be of the formation of teams, in order to decide which players
great help to captain and coaches to make the right are better to bag at the auction, a lot of work was done.
selection for playing 11 (vii) It could be of great help to A generic model for the valuation of players based on
invest in the right team for betting (viii) It could be of their past record was suggested by Parker et al.,
great help to lastly for the people who are curious about respectively [9]. Lenten et al., (2012) suggested a
IPL and its statistics. In short, CDAI will be able to hedonic model to accomplish the same [10]. A lot of
provide a beneficial prediction for analysing player existing attributes were combined by Rastogi and
performances and match results expectations using the Deodhar (2009) to suggest a pricing model, whether the
varied machine learning algorithms analysed in this bid would go in profit or loss for the owner [11]. But all
paper. the above work had a big drawback in them. All these
analyses were done using the player's ODI profiles, as
II. LITERATURE SURVEY not much T20 data was available then, in the early days
Surveying deep into analyzing cricket gives us the of IPL. Strike rate, Batting Average, no of 4s and 6s, etc.
following insights. Dynamic programming models were were some common attributes used to rank players into
used by Clarke to suggest the batting strategies which different classes and gave each class a certain
were optimal [1]. He suggested that it is the ball by ball valuation. And it was seen that players' prices in the
nature of the cricket that makes it suitable for dynamic actual auction were very much consistent with the class
programming. As a part of his findings, it was able to in which the model classified the players in. Season by
suggest few computations at any stage of the innings, season those models kept improving as more and more
along with a few extra estimates like he runs to be data kept on increasing and better algorithms were
scored totally, etc. Normal batting averages face a proposed.
drawback related to the player not being out in a match. To fill in the gaps that were prevalent in the existing
To overcome that, Kimber and Hansford and also models, some more work was done. Singh (2011)
Damodaran came up with the idea of alternate batting proposed a model to assess if the player was actually
averages methods. To deal with situations when the worth the price we bought him for [12]. Input parameters
batsman has not been out yet in the one day matches for his model included a wage bill of the player, the
[2, 3]. A method for prediction of matches based on wages of the support staff for him and other
strike rates and batting average was suggested by miscellaneous expenses bored for the player from the
Kantor and Barr respectively [4]. Test matches were team. Output parameters were based on the points
explored on the basis of batting average by Borooah awarded to him by various rankings, his net run rate
and Mangan respectively [5]. To increase the efficiency across the tournament, the various profits and revenues
in the batting order, an approach based on that were collected. Graphical methods were used to
mathematical models was applied by Clark and Norman analyze batsmen and bowler performance in all forms of
and Bukeit and Ovens respectively [6, 7]. The cricket by Van Staden [13].
mathematical modelling method also finds other Sabermetrics style of principle to analyse batting
applications in terms of likelihood of capability of one performance in cricket was suggested by Lakkaraju and
team to beat the other, besides finding the most efficient Sethi respectively [14]. Cricket carries a lot of similarities
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 219
in itself from baseball. Because, a lot of work and worth going to purchase him, how to make the right
discussions are already available on baseball. This selection and combination of teams.
method of Sabermetrics, it does deal essentially with the – Coaches and team captains themselves have a good
application of statistical methods to make predictions on understanding about their foes and make plans with the
the game of baseball. This paper tries to apply similar right combination of their playing 11 (at particular venue)
approaches and techniques to the game of cricket. to overshadow and accordingly to beat their opponents.
Performance analysis using batting and bowling – People who are betting on IPL matches. To help them
averages, strike rate and economy rates were with decision making, which team is stronger and has
suggested by Lemmer [15-17] respectively. While got higher chances of winning a match, etc. For them to
dealing with strike rates we do come across a peculiar invest in the right team and maximize their profit.
anomaly. A particular player may have better strike rate – Last but not the least, regarding the people who are
because his matches must have been on easier pitches interested in IPL cricket and are curious to explore its
and this counterpart who would have played it on statistics as their past time.
difficult pitches. A normalization technique is needed
before we compare them. All these factors were III. THE PROPOSED SYSTEM
covered in the work above. All round performances of a The complete work done has been compactly organized
player were evaluated by Saikia and Bhattacharjee [18]. into this architecture. It first begins with the processing
The Bayesian approach of classification was used for of datasets and loading it in the backend. Then user
the classification of all-rounders in IPL, based on how interface is provided with different functionalities, which
good they were. It was suggested on classifying the all- can be performed on the player / match. It can also be
rounders, as a good performer, all-rounder batsmen, all- used to perform prediction.
rounder bowler, and below average performer as all- We have implemented the following modules for
rounders are very good assets. Strategy to find the most analysis, prediction, ranking and visualization.
valuable player in the tournament (MVP) using a – Processing of datasets
decision tree approach was suggested by Khandelwal et – Batsmen performance analysis
al respectively. – Bowler performance analysis
In the initial stages, the models couldn’t give a very – Match analysis
efficient prediction. Most of the models made by using – Head-on-head analysis of teams
ODI cricket dataset too along with T20 dataset, as T20 – Team overall performance analysis
dataset alone wouldn't be enough for the sake of – Ranking of teams
prediction [16-19, 21]. But it had a loophole that the ODI – Match prediction
performance of players was not equivalent or relevant to – User interface creation
the performance rate of players in T20. Both the formats The below diagram illustrates on the various modules of
and its requirements are way different. These little the proposed system. Modules of our proposed system
variations found, creates the need to rank them using are demonstrated in Fig. 1.
actual IPL/T20 data which are available now. And also a
new approach was needed for IPL specific prediction of
matches.
Nimmagadda et al., (2018) proposed a model which is
used to predict the score in each of the innings using
Multiple Variable Linear Regression along with Logistic
regression and the winner of the match using the
Random Forest algorithm [22]. Kapadia et al., (2019)
used the significant features of the dataset to have been
distinguished utilizing filter-based techniques including
Correlation-based Feature Selection, Information Gain
(IG), ReliefF and Wrapper [23]. AI systems including
Naïve Bayes, Random Forest, K-Nearest Neighbor
(KNN) and Model Trees have been used to predict the
models. Rupai et al., (2020) used several classifiers to
predict the bowling performances from ODI matches
[24]. Fig. 1. Modules of the system.
This paper attempts to fulfil all those needs. From
providing an interactive and user-friendly portal, which A. Processing of datasets
would provide very advanced functionalities in order to This module’s functionality is to get the IPL data ready
perform detailed exploratory analysis on all dimensions and correctly formatted with apt data type for the rest of
of matches, batsman, bowlers, etc. To possess, the the project to function. We are using the dataset
ability to rank players well, using the novel ranking obtained from a cricsheet website which is presently in
approach and another one which is done using yaml (a specific type of xml format) (containing complete
advanced techniques. It also possesses the feature to ball by ball detail). It reads each match’s yaml data file
predict the outcome of a match, based on the players and processes it, and saves match-wise complete ball
who are part of the current playing 11. This paper will be by ball details in native R data frame with the correct
beneficial for 4 categories of people: data types assigned. Native R data frame because that
– Team owners to have a detailed idea of a player’s will make further data reading and processing, much
history and his ranking to help in deciding how far is it faster and efficient. Next from each match wise data
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 220
frame, it also extracts and generates a separate data by applying a wide variety of functions on it. This feature
frame, which would contain the entire batsman related would be a very important and main deciding factor
(team wise), bowler related (team wise), inter-team while accessing the standards of a team on whole and
related and a particular team’s entire career details. This choosing favorites. A subset of them includes best
module constitutes the heart of the project. And it’s very batting partnerships in the history of the team, overall
important for all the data frames to be generated and batting and bowling scorecard of the team, best
placed in the right location, for the rest of the modules to batsmen of the team versus best bowlers of the
work properly. tournament, best bowlers in the team versus best
batsmen of the tournament etc.
B. Batsmen performance analysis As far as ranking is concerned, 3-3 modes of ranking
This module provides the analyst with an ability to do a are available for batsmen and bowlers. The first and
comprehensive analysis of a batsman profile. Initially, it most basic one is using batting average. The second
extracts details of all the IPL teams a particular batsman one is done using MVPI (most valuable player index)
played for (as it is highly probable for the player to have ranking score suggested by Rediff. This was proposed
played for more than 1 team). Then after establishing by Rediff sports for giving useful insights about players.
the complete batsman profile, it can perform wide
For the third kind of ranking, it uses the parameters
multifarious analysis and visualizations. A subset of listed below for batsmen and bowlers, we generate the
them includes functionalities like plotting the runs of PRI of batsmen and bowlers and rank them. More
batsman against deliveries played by him, analysis of details about all the rank generation will be discussed in
the various ways he got out, analysis of his batting later sections.
average and strike rates, runs scored by him, venue For batsmen, they are:
wise etc. – Hard-hitter
C. Bowler performance analysis – Finisher
This module provides the analyst with an ability to do a – Fast-scorer
comprehensive analysis of a bowler profile. Initially, it – Consistent
extracts details of all the IPL teams a particular bowler – Running-between-wickets
played for (as it is highly probable for the player to have For bowlers, they are:
played for more than 1 team). Then after establishing – Economy
the complete bowler profile, it can perform wide – Wicket-taker
multifarious analysis and visualizations. A subset of – Consistent
them includes functionalities like mean economy rate of – Big-wicket-taker
bowler, mean runs given by him, his wicket type plot, – Short-performance-index
how well he has performed against a particular Now, a very important aspect is the ability to predict
opposition, how well he has performed at a particular which team among the 2 playing teams would win a
venue etc. match. Likelihood value would be of great impact for a
variety of things as discussed in previous sections. In
D. Match analysis’s objective IPL, players aren’t constantly a part of a single team,
This module is to analyze a single match completely. because they keep changing based on a particular
Apart including the basic functionalities to view the season’s auction. The only thing that remains with a
batting and bowling scorecard of a match, it is also player, is his performance, how well he played across
embedded with advanced analysis and visualization his previous seasons, no matter whichever team he was
functionalities. A subset of them includes analysis of the in. Based on this particular aspect, we use his PRI and
best batting partnership of each team in that match, how perform the computation. More details about the match
well particular batsmen have performed against a prediction will be discussed in later sections.
particular bowler and vice-versa, a few batsmen and We have a user interface created for all the modules. It
bowler specific functions and vice versa, the match is an interactive shiny web app, whose front end and
worm graph of two teams seeing how they have played back end are purely written in R. It performs all the
etc. functionalities mentioned in the previous modules. It
E. Head on head analysis of teams contains 3 input fields. First is the module to analyze,
This module is used to compare and contrast only two then, is to select the particular functionality to be
teams, by analyzing all matches they played in the past, analyzed and lastly to select the particular player to be
against each other. This feature would be of great help analyzed for. The computation goes on in the backend.
in decision making for both the teams whenever they And the output gets displayed in the graph or tabular
come face to face against one another. It also offers a form in the front end.
wide variety of functionalities. G. Ranking
A subset of which includes best batting partnerships Ranking is done in 3 ways each for batsman and
team wise when they played in the past, the detailed bowlers as mentioned in previous modules. They are
batting and bowling scorecards, how well particular explained below in detail.
batsmen have performed against particular bowlers (i) Batting average ranking: Here the batsman is
when those two teams played, win loss analysis, etc. ranked in descending order according to their batting
F. Team overall performance analysis average.
This module is used to analyse a team’s performance BA = (TR/TM)
as a whole. It does a comprehensive analysis on all the – TR is the total runs scored by the batsman
matches played by a particular team in its entire history – TM is the total matches played by the batsman.

Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 221
(ii) Batman MVPI ranking: This is a better model of train a random forest model with predictors as these
ranking compared to batting average which takes into measures and the outcome being MVPI and we
consideration both batting average and batting strike generate the PRI score and rank the batsmen.
rates respectively. So, we get a better measure of The PRI is found using five parameters for bowlers. The
ranking for limited over IPL-T20 cricket [20]. parameters for bowlers include:
MVPI = ((MR/TMR) + (MSR/TMSR)) * TR – Economy = (Runs conceded by player/ (Count of balls
where bowled/6))
– MR is the batting average of particular batsman – Wicket-Taker = (Count of balls bowled / Count of
– TMR is the average of all batsmen in the IPL wickets taken)
– MSR is the mean strike rate of particular batsman – Consistent = (Runs conceded by the bowler/ Count of
– TMSR is the mean strike rate of all the batsmen in the wickets taken)
IPL – Big-Wicket-Taker = (Count of four wickets or five
– TR is the total runs of the batsman wickets or six wickets taken/ Count of innings played)
(iii) Batsman PRI (Player ranking index): This is the – Short-Performance = ((Count of total wickets –
best model of ranking or we can say it is an 4*Count of four wicket haul – 5* Count of times five
improvisation over MVPI ranking. It takes into account 5 wicket haul - 6*Count of six wicket haul) / (Count of total
different parameters. All which matters, the most in T20 played innings /Count of times four (or) five (or) six
cricket. When it comes to batsman the measures like wicket hauls totally))
how hard can he hit the ball (being good at hitting 4s
and 6s), capability of staying not out, capability of not H. Prediction
wasting any deliveries, consistent performance and For prediction we make use of PRI generated in the
finally his running between the wickets. Using all these previous sections. PRI are generated separately for
measures, we train a random forest model with batsmen and bowlers. Every player who ever played in
predictors as these measures and the outcome being the history of IPL surely would have a PRI. In the
MVPI and we generate the PRI score and rank the absence of corresponding batting/bowling records, he is
batsmen. assigned the last rank. The rank differences of playing
The PRI is found using five parameters for batsmen. 11 in the rival teams are the basic idea to make the
The parameters for batsmen include: predictions.
– Hard-Hitter = ((4*Four + 6*Six) / Balls played by Prediction is made on two sets of data.
batsman) – Training data – Season 1 to Season 8 IPL data.
– Finisher = (Count of matches being not out/ Total Tested on – Season 9.
count of innings played) – Training data – Season 1 to Season 9 IPL data.
– Fast-Scorer = (Player batting strike rate) Tested on – Season 10.
– Consistent = (Player batting average) The first data is used to show significant difference
– Running-Between-Wickets (RBW) = ((Run scored by compared to the existing models. Second is to predict
the player) – (4*Fours+ 6*Sixes)/Number of balls faced the matches in the recent IPL.
without boundary) Steps for predictive model used with the second training
(iv) Bowling average ranking: Here the bowlers are data is as follows:
ranked in descending order according to their bowling – For a particular match, for both the teams separately,
average. for each player, we need to find the batting PRI and
BOA = (TW/TM) bowling PRI for each player respectively.
– TW is the total wickets taken by a bowler – For batting and bowling PRI separately, we find
– TM is the total matches played by a bowler differences between corresponding player’s batting and
(v) Bowling MVPI ranking: This is a better model of bowling PRI.
ranking compared to bowling average which takes into – So, apart from the 22 columns (11 batting PRI and 11
account both bowling average and bowling economy bowling PRI) for a particular match, we add a 23rd
rate respectively. So, we get a better measure of column containing the match result, 1 if team-1 wins and
ranking for limited over IPL-T20 cricket [20]. 2 if team-2 wins.
MVPI = ((MW/TMW) + (TMER/MER)) * TW – Now we train various models over this dataset
where constructed. Which will be discussed in the below
– MW is the mean wickets taken by the bowler sections. After this we would have our prediction model
– TMW is the mean wickets taken by all the bowlers in ready.
the tournament Now when we are predicting a match’s outcome, we
– TMER is the mean economy rate of all the bowlers in generate the same data of 22 rows for that match and
the tournament predict which among team 1 or team 2 would be the
– MER is the average economy rate of the bowler winner. Prepare a test set with these selected 22
– TW is the total wickets taken by the bowler. features for 58 matches of season 10.
We use various algorithms for training which include
(vi) Bowling PRI (Player ranking index): This is the support vector machine, sequential minimal
best model of ranking or we can say it is an optimization, Instance based learning in parameter k,
improvisation over MVPI ranking. It takes into account 5 Random Forest, JRIP reduced error pruning algorithm,
different parameters. All which matters, the most in T20 J48 decision tree algorithm, Flexible Discriminant
cricket. As far as bowlers are concerned, the measures Analysis, Mixture discriminant analysis, C5.0 decision
like economy, wicket taker, consistent, big wicket taker tree algorithm and naïve Bayes classifier.
and short performance. Using all these measures, we
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 222
IV. RESULTS AND DISCUSSION
A. Website Interface

Fig. 5. Runs vs Balls faced – MS Dhoni.

Fig. 2. Website Web view.


Fig. 2 shows the website when it opens. The user needs
to select 3 things, the module, function, and batsman.

Fig. 6. Runs vs required no of deliveries – MS


Dhoni.
C. Bowler Analysis Module
In Fig. 7, the bowler’s average wickets, as a function of
time throughout his career can be seen.

Fig. 3. Website Mobile view.


Fig. 3 Shows how the website looks when it is launched
on mobile phones.
B. Batsman Analysis Module
In Fig. 4, in the batsman analysis tab, we have selected
the function, ‘dismissals of batsman’ and selected ‘MS
Dhoni’. It makes us a pie chart of his various dismissals
all throughout his career.
Fig. 7. Moving average of wickets in a career – R
Ashwin.

Fig. 4. Type of dismissal – MS Dhoni.


Using this chart, we can conclude that most of his
dismissals have been through catch out. In Fig. 5, we
analysed batsman runs vs dismissals for MS Dhoni and
plotting a regression line through it. We can observe Fig. 8. Wickets by venue – R Ashwin.
that as the amount of balls increase, strike rate goes In Fig. 8 we are analysing average wickets of a bowler
higher and higher for Dhoni. at a particular venue. We can see that R Ashwin has the
In Fig. 6 we are using a decision tree to predict what will highest average wicket of 2.5 at the ACA-VDCA stadium
be runs scored by the batsman having the balls faced which is at Visakhapatnam.
as a predictor.
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 223
has made the highest partnership with MS Dhoni as
such.

Fig. 9. No. of deliveries to wicket – R Ashwin.


Fig. 12. CSK batting partnership against MI.
In Fig. 9 we are using a decision tree to predict what will
be wickets taken by a bowler having the deliveries
bowled as a predictor.
D. Match Analysis Module

Fig. 13. CSK batsman vs MI bowlers.


In Fig. 13, between Chennai Super Kings and Mumbai
Indians, we analysed the best batsman of CSK vs best
Fig. 10. Match scorecard – RCB vs PWI. bowler of MI. And we can see that Suresh Raina has hit
Harbhajan Singh the most, who is also the top bowler in
In Fig. 10, the match score card for a particular match.
the opposition side.
We have selected the same historic match in which
Chris Gayle made a knock of 175 in 63 deliveries. F. Team overall performance module

Fig. 11. Batsmen vs Bowlers – RCB vs PWI.


In Fig. 11, for a particular match, we analysed how the
batsman of a particular team played against the bowlers Fig. 14. CSK batting partnerships.
of the opposite team. It is observed that Gayle spared
none of the bowlers and scored as high as 48 runs In Fig. 14, we tried to see Chennai Super Kings, top
against AG Murtaza. batsman, with whom they shared their best
partnerships. In Fig. 15, we saw the performance of top
E. Two team analysis module batsman of Chennai Super Kings, Suresh Raina,
In Fig. 12, we did a head on head analysis for 2 arch against the top bowlers of IPL.
rivals, Chennai Super Kings and Mumbai Indians. We
can see that Suresh Raina has the highest score and he

Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 224
G. Ranking
In Table 1, the batsman is ranked according to their
batting averages. Chris Gayle is ranked first with a
batting average of 36.25. Only the top 10 batsmen are
shown.
In Table 2, we are ranking players according to MVPI
(most valuable player index) formula. Here we see that
David Warner is ranked first and Virat Kohli is ranked
second. Only the top 10 batsmen are shown.

Fig. 15. SK Raina performance against all bowlers.

Table 1: Batting average Ranking.


Batsman Matches Total runs Mean runs Mean SR Rank
CH Gayle 100 3626.00 36.26 133.73 1
ML Hayden 30 1077.00 35.90 128.97 2
DA Warner 114 4014.00 35.21 126.24 3
SE Marsh 67 2320.00 34.63 117.61 4
MEK Hussey 57 1930.00 33.86 105.81 5
V Kohli 140 4331.00 30.94 115.66 6
AM Rahane 97 2895.00 29.85 102.81 7
SR Tendulkar 76 2221.00 29.22 108.15 8
AB de Villiers 117 3393.00 29.00 133.71 9
S Dhawan 123 3544.00 28.81 113.70 10

Table 2: Batsman MVPI Ranking.


Batsman Matches Total runs Mean runs Mean SR MVPI Rank
DA Warner 114 4014.00 35.21 126.24 17186.33 1
V Kohli 140 4331.00 30.94 115.66 16504.32 2
SK Raina 154 4408.00 28.62 122.96 16266.30 3
CH Gayle 100 3626.00 36.26 133.73 16127.22 4
RG Sharma 151 4109.00 27.21 113.48 14270.49 5
G Gambhir 144 4010.00 27.85 109.31 13970.35 6
RVUthappa 141 3744.00 26.55 123.91 13196.52 7
AB de Villiers 117 3393.00 29.00 133.71 13004.87 8
S Dhawan 123 3544.00 28.81 113.70 12796.71 9
MS Dhoni 134 3394.00 25.33 131.70 11883.46 10

Table 3: Batsman PRI ranking.


Hard Fast
Batsman Finisher Consistent RBW MVPI PRI Rank
hitter scorer
DA Warner 0.94 -0.57 0.75 2.73 0.41 17186.33 13633.77 1
V Kohli 0.47 -0.55 0.45 2.18 0.23 16504.32 13238.66 2
CH Gayle 1.66 -0.66 0.93 2.77 -0.48 16127.22 12533.84 3
SDhawan 0.40 -0.57 0.42 1.92 0.29 12796.71 12523.88 4
SK Raina 0.67 -0.49 0.64 1.91 0.66 16266.30 12119.40 5
G Gambhir 0.37 -0.59 0.31 1.92 0.33 13970.35 11856.81 6
RG Sharma 0.55 -0.54 0.41 1.79 0.36 14270.49 11313.47 7
RV Uthappa 0.66 -0.67 0.67 1.67 0.38 13196.52 11194.82 8
AB de Villiers 0.91 -0.08 0.92 1.95 0.71 13004.87 10086.89 9
MEK Hussey 0.32 -0.54 0.21 2.50 0.36 7636.65 9172.40 10

Table 4: Bowling mean wickets ranking.


Bowler Matches Total wickets Mean wickets Meaner Rank
SL Malinga 108 169.00 1.56 6.72 1
A Nehra 87 121.00 1.39 7.72 2
MJ McClenaghan 39 54.00 1.38 8.64 3
Sandeep Sharma 55 75.00 1.36 7.82 4
SP Narine 80 109.00 1.36 6.33 5
DJ Bravo 103 137.00 1.33 8.08 6
YS Chahal 55 72.00 1.31 8.07 7
MG Johnson 46 60.00 1.30 8.01 8
B Kumar 89 116.00 1.30 7.13 9
P Awana 33 43.00 1.30 8.33 10
Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 225
Table 5: Bowler MVPI ranking.
Bowler Matches Total wickets Mean wickets Meaner MVPI Rank
SL Malinga 108 169.00 1.56 6.72 563.44 1
DJ Bravo 103 137.00 1.33 8.08 385.04 2
A Nehra 87 121.00 1.39 7.72 355.65 3
Harbhajan Singh 131 134.00 1.02 7.09 344.54 4
SP Narine 80 109.00 1.36 6.33 344.30 5
B Kumar 89 116.00 1.30 7.13 339.22 6
A Mishra 123 133.00 1.08 7.70 338.77 7
R Vinay Kumar 102 125.00 1.23 8.24 331.82 8
PP Chawla 126 132.00 1.05 8.04 324.05 9
Z Khan 94 112.00 1.19 7.39 306.38 10
Table 6: Bowler PRI ranking.
Bowler Pbwer Pbwa Pbwsr Bwt Shortperf MVPI PRI Rank
SL Malinga -0.80 -0.42 -0.57 1.57 1.47 563.44 343.81 1
DJ Bravo -0.24 -0.43 -0.41 0.01 1.34 385.04 284.66 2
A Nehra -0.37 -0.37 -0.40 0.38 1.36 355.65 272.20 3
Harbhajan Singh -0.72 0.07 -0.18 0.10 0.54 344.54 230.01 4
SP Narine -0.98 -0.23 -0.50 1.97 0.74 344.30 228.34 5
B Kumar -0.67 -0.27 -0.42 0.36 1.15 339.22 233.35 6
A Mishra -0.49 -0.08 -0.21 0.33 0.61 338.77 279.15 7
R Vinay Kumar -0.18 -0.27 -0.26 0.26 0.99 331.82 245.40 8
PP Chawla -0.47 -0.10 -0.21 0.12 0.61 324.05 276.52 9
Z Khan -0.49 -0.12 -0.24 0.32 0.91 306.38 227.67 10

In Table 3, we are ranking batsman according to PRI In Table 6, we are ranking batsman according to PRI
(Player ranking index) formula. Here also we see that (Player ranking index) formula. Here also SL Malinga
David Warner and Virat Kohli are ranked 1 and 2 retains his rank 1 respectively. Only the top 10 bowlers
respectively. Only the top 10 batsmen are shown. are shown.
In Table 4, bowlers are ranked according to their mean H. Prediction
wickets. SL Malinga rules this table with an average of Feature table of batsman and bowler rank differences
1.56 being on top. Only the top 10 bowlers are shown. (which will be used for further prediction) is generated in
In Table 5, we are ranking players according to MVPI Table 7. Bat1 to Bat11 is the batting rank difference.
(most valuable player index) formula. Here also we see Bowl1 to Bowl11 is the bowling rank difference. W
that SL Malinga is ranked first. Only the top 10 bowlers stands for winner. Training set contains 550 rows
are shown. approximately. First 22 rows of predicted outcomes for
IPL season 10 are shown as samples in Table 7.
Table 7: Feature table for prediction.

Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 226
Table 8 shows the prediction for IPL season 10. These of 58 matches outcomes. A comparison of both the
are the predictions made by JRIP algorithm, which is model's accuracy is given below for reference.
found to be performing far better than the others. The Fig. 16 shows the accuracy comparison of predictions
table displays predictions of the top 10 matches (sample made by the existing system and proposed system
output). DMP (Deep mayo predictor) is the existing respectively by using the SVM algorithm in our first set
model proposed by Prakash et al., CDAI of predictions. The existing system has an accuracy of
(Comprehensive data analysis on IPL) is the proposed 69.64% in contrast to the proposed system, which has
system. an accuracy of 81.03%.
In the first set of predictions in our proposed systems, In the second set of predictions in our proposed system,
for which a similar attempt was made by Prakash et al., we predicted for season 10 using various algorithms.
[21], for predicting Season 9 IPL results, where they This is the first attempt for IPL 10. Our model is built
made their training set with international T20+IPL using Season 1 to Season 9 IPL dataset in this case.
Season 1-8 dataset. Their model was able to predict 39 Comparing the results of the model to the actual IPL 10
out of 58 matches with an accuracy of 69.68%. match outcomes. The accuracy comparison between
Whereas, our model which was built only with IPL each algorithm used for predicting season 10 results of
season 1-8 dataset is able to successfully predict 47 out matches is shown in Fig. 17.
Table 8: Prediction.
Match Winner Prediction Result
Delhi Daredevils-Gujarat Lions-2017-05-04.RData Delhi Daredevils Delhi Daredevils TRUE
Delhi Daredevils-Kings XI Punjab-2017-04-
Delhi Daredevils Delhi Daredevils TRUE
15.RData
Delhi Daredevils-Kolkata Knight Riders-2017-04-
Kolkata Knight Riders Kolkata Knight Riders TRUE
17.RData
Delhi Daredevils-Mumbai Indians-2017-05-
Mumbai Indians Mumbai Indians TRUE
06.RData
Delhi Daredevils-Rising Pune Supergiants-2017-
Delhi Daredevils Delhi Daredevils TRUE
05-12.RData
Delhi Daredevils-Royal Challengers Bangalore- Royal Challengers
Delhi Daredevils FALSE
2017-05-14.RData Bangalore
Delhi Daredevils-Sunrisers Hyderabad-2017-05-
Delhi Daredevils Sunrisers Hyderabad FALSE
02.RData
Gujarat Lions-Delhi Daredevils-2017-05-10.RData Delhi Daredevils Gujarat Lions FALSE
Gujarat Lions-Kings XI Punjab-2017-04-23.RData Kings XI Punjab Kings XI Punjab TRUE
Gujarat Lions-Kolkata Knight Riders-2017-04-
Kolkata Knight Riders Kolkata Knight Riders TRUE
07.RData

Then SVM and FDA also gave good results with an


accuracy of 72.41% respectively, for predicting 42 out of
58 matches of IPL 10 correctly. Rest all the algorithm
results are shown.

Fig. 16. CDAI system’s proposed algorithms accuracy


comparison.
As it is a binary classification problem, random forest
and other tree based algorithms are outperformed by
the likes of JRIP and SVM. Amongst all the algorithms
we have applied, JRIP seems the most promising. With
an accuracy of 75.86%, for predicting 44 out of 58
Fig. 17. SVM accuracy comparison for both approaches
matches of IPL 10 correctly.
of existing and proposed system.

Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 227
V. CONCLUSION [10]. Lenten, L. J., Geerling, W., & Kónya, L. (2012). A
hedonic model of player wage determination from the
The approach has brought out analysis and visualization Indian Premier League auction: Further evidence. Sport
of various aspects of IPL matches in all the possible Management Review, 15(1), 60-71.
ways and gives useful results to the user. This [11]. Rastogi, S. K., & Deodhar, S. Y. (2009). Player
information is of great value. It could be of great help to pricing and valuation of cricketing attributes: exploring
team owners (who purchase players for their teams in the IPL Twenty20 vision. Vikalpa, 34(2), 15-23.
auction every year), captain and coaches to make the [12]. Singh, S. (2011). Measuring the Performance of
right selection for playing 11, to invest in the right team Teams in the Indian Premier League. American Journal
for betting and lastly for the people who are curious of Operations Research, 1, 180-184.
about IPL and its statistics. [13]. Van, Staden, P. (2009). Comparison of Cricketers'
VI. FUTURE SCOPE Bowling and Batting Performance using Graphical
Displays.Current Science, 96, 764-766.
In future, making minor changes the model can also be [14]. Lakkaraju, P., & Sethi, S. (2012). Correlating the
made to work with the ODI and test matches. The Analysis of Opinionated Texts Using SAS® Text
international matches can be analysed in a similar way Analytics with Application of Sabermetrics to Cricket
and more visualizations can be added to the functions. Statistics. Proceedings of SAS Global Forum, 1-10.
The system can also be made to adapt more file formats [15]. Lemmer, H. (2004). A Measure for the Batting
of data for better analysis of varied forms of data performance of Cricket Players. South African Journal
collected. for Research in Sport, Physical Education and
Conflict of Interest. There is no conflict of interest Recreation, 26, 55-64.
involving the content enlisted in the given paper. [16]. Lemmer, H. (2008). An Analysis of Players'
Performances in the First Cricket Twenty20 World Cup
REFERENCES Series. South African Journal for Research in Sport,
Physical Education and Recreation, 30, 71-77.
[1]. Clarke, S. R. (1988). Dynamic programming in one
[17]. Lemmer, H. (2012). The Single Match Approach to
day cricket - optimal scoring rates. Journal of the
Strike Rate Adjustments in Batting Performance
Operational Research Society, 50, 536 – 545.
Measures in Cricket.Journal of Sports Science and
[2]. Kimber, A. C., & Hansford, A. R. (1993). A Statistical
Medicine, 10, 630-634.
Analysis of Batting in Cricket. Journal of Royal
[18]. Saikia, H., & Bhattacharjee, D. (2011). A Bayesian
Statistical Society, 156, 443 – 455.
Classification Model for Predicting the Performance of
[3]. Damodaran, U. (2006). Stochastic Dominance and
All-Rounders in the Indian Premier
Analysis of ODI Batting Performance: The Indian Cricket
League. Vikalpa, 36(4), 51-66.
Team, 1989-2005. Journal of Sports Science and
[19]. Khandelwal, M., Prakash, J., & Pradhan, T. (2015).
Medicine, 5, 503 – 508,
An Analysis of Best Player Selection Key Performance
[4]. Barr, G. D. I., and Kantor, B.S..A Criterion for
Indicator: The Case of Indian Premier League (IPL).
Comparing and Selecting Batsmen in Limited Overs
Advances in Intelligent Systems Technologies and
Cricket.Journal of the Operational Research Society, 55,
Applications, 173-190.
1266-1274.
[20]. http://www.rediff.com/
[5]. Borooah, V. K., & Mangan, J. E. (2010). The
[21]. Prakash, C. D., Patvardhan, C., & Lakshmi, C. V.
Bradman Class: An Exploration of Some Issues in the
(2016). Data Analytics based Deep Mayo Predictor for
Evaluation of Batsmen for Test Matches 1877–2006.
IPL-9. International Journal of Computer
Journal of Quantitative Analysis in Sports, 6(3): 14-22.
Applications, 152(6), 6-10.
[6]. Norman, J., & Clarke, S. R. (2004). Dynamic
[22]. Nimmagadda., A., Kalyan, N. V., Venkatesh, M.,
programming in cricket: Batting on sticky wicket.
Teja, N. N. S., & Raju, C .G. (2018). Cricket score and
Proceedings of the 7th Australasian Conference on
winning prediction using data mining. Int. J. Adv. Res.
Mathematics and Computers in Sport, 226–232.
Development, 3(3), 299-302.
[7]. Ovens, M., & Bukeit, B. (2006). A mathematical
[23]. Kapadia, K., Abdel-Jaber, H., Thabtah, F., & Hadi,
modeling approach to one day cricket batting orders.
W. (2019). Sport analytics for cricket game results using
Journal of Sports Science and Medicine, 5, 495-502.
machine learning: An experimental study. Applied
[8]. Lewis, A. (2008). Extending the Range of Player-
Computing and Informatics, 1-6.
Performance Measures in One-Day Cricket. Journal of
[24]. Rupai, A. A. A., Mukta, M, & Islam, A.K.M.N.,
Operational Research Society, 59, 729-742.
(2020). Predicting Bowling Performance in Cricket from
[9]. Parker, D., Burns, P., & Natarajan, H. (2008). Player
Publicly Available Data. International Conference on
valuations in the Indian Premier League. Frontier
Computing Advancements, 1-6.
economics Journal, 68, 68-76.

How to cite this article: AmalaKaviya V.S., Mishra, A. S. and Valarmathi B. (2020). Comprehensive Data Analysis
and Prediction on IPL using Machine Learning Algorithms. International Journal on Emerging Technologies, 11(3):
218–228.

Amala Kaviya et al., International Journal on Emerging Technologies 11(3): 218-228(2020) 228

You might also like