0% found this document useful (0 votes)
26 views7 pages

Paper 3

Uploaded by

Srinu .M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views7 pages

Paper 3

Uploaded by

Srinu .M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/379038256

Predictive Analysis of IPL Match Winner using Machine Learning Techniques

Conference Paper · March 2024


DOI: 10.35940/ijitee.B1043.1292S19

CITATIONS READS

6 860

5 authors, including:

Sai Abhishek Chakka


University at Buffalo, The State University of New York
2 PUBLICATIONS 11 CITATIONS

SEE PROFILE

All content following this page was uploaded by Sai Abhishek Chakka on 18 March 2024.

The user has requested enhancement of the downloaded file.


International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-9 Issue-2S, December 2019

Predictive Analysis of IPL Match Winner using Machine


Learning Techniques
Ch Sai Abhishek, Ketaki V Patil, P Yuktha, Meghana K S, MV Sudhamani

As a consequence, the fact of having both live and historic


Abstract: Artificial intelligence (AI) can be implemented using data has made Machine Learning quite popular in the fields of
Machine Learning which allows the computing to potentially sports analytics [1–5]. Sports Analytics is a method of
robotically study and improve from its previous experiences
collecting and analyzing historical game information to derive
without being manually typed. Data can be accessed and used by
the computer programs developed using Machine learning. This essential knowledge from it, with the aim that it will promote
paper mainly focused on implementation of machine learning in successful decision-making.
the arena of sports to predict the captivating team of an IPL Machine learning in sports arena, both off-the-field and
match. Cricket is a popular uncertain sport, particularly the T-20 on-the-field, can be used effectively on different occasions.
format, there’s a possibility of the complete game play to change A team’s performance and its outcome against an opponent
with the effect of any single over. Millions of spectators watch the
Indian Premier League (IPL) every year, hence it becomes a can be efficiently predicted using the proposed model. This
real-time problem to compose a technique that will forecast the model primarily focuses on the healthy growth and
conclusion of matches. Many aspects and features determine the productivity of team owners and other investors in the
result of a cricket match each of which has a weighted impact on industry. Here, analysis is done by using certain machine
the result of a T20 cricket match. This paper describes all those learning classification techniques, like Decision Trees,
features in detail. A multivariate regression-based approach is
Logistic Regression, Support Vector Machine, K-Nearest
proposed to measure the team's points in the league. The past
performance of every team determines its probability of winning a Neighbors and Random Forest.
match against a particular opponent. Finally, a set of seven One of the greatest successful football clubs in Portugal,
factors or attributes is identified that can be used for predicting Sport Lisboa e Benfica [6-9] implements machine learning in
the IPL match winner. Various machine learning models were information processing techniques for making decisions ,
trained and used to perform within the time lapse between the toss demonstrating the importance of machine learning in athletic
and initiation of the match, to predict the winner. The
performance of the model developed are evaluated with various analysis. The club not only tracks but also evaluates virtually
classification techniques where Random Forest and Decision each part of the game, together with their habit of resting,
Tree have given good results. drinking, and practicing. After capturing raw player data,
Keywords: Cricket prediction, Decision Trees, KNN, Logistic
different models are programmed to analyze data to maximize
Regression, Multivariate Regression, Random forest, SVM, game preparation and create custom training schedules. The
Sports Analysis. data coming from the built models allow players to constantly
improve their performance by incorporating machine learning
I. INTRODUCTION and predictive analytics. Decisions including player
The main aim is to use Machine learning to develop the substitution, holding a player in the lineup and leaving a
computer programs which will be capable of retrieving data player at bench can be made by the team coach depending on
and using it for self-learning. The procedure of learning the analysis of the facts obtained.
commences with observations of data, such as instances, The dataset used in this work is collection of different
direct experience, or training, in order to identify some match plays, there are around 675 match details with
patterns in statistics and take improved decisions in the future complete information about the match winner, location toss
built on the samples that are provided. Machine Learning winner, team names and other important attributes. The
primarily aims at eliminating the human intervention or matches are from 2007 – 2018. This dataset has helped us
assistance by allowing the computers learn automatically and achieve our main aim of our project.
adjust its actions accordingly. The advancement in computing
in the recent years, has made it increasingly easy to acquire II. LITERATURE REVIEW
in-depth information. In the last few years, Major League Baseball (MLB) has
realized huge development in the sport technology domain
[7,8]. Sometimes ball-by-ball information and simple insights
about the game prove to be of great importance. These are
Revised Manuscript Received on December 15, 2019. typically not noticeable by human observation. Hence,
Chakka Sai Abhishek, Pursued Bachelor, of Technology In Professional MLB teams apply various machine learning
Information Science Engineering In RNS Institute of Technology. techniques to obtain and utilize the data.
Ketaki Vinod Patil Pursued Bachelor of Technology In Information
Science Engineering In RNS Institute of Technology.
Yuktha P,Pursued Bachelor of Technology In Information Science
Engineering In RNS Institute of Technology,
Meghana K S, Pursued Bachelor of Technology In Information Science
Engineering In RNS Institute of Technology.
Dr. M V Sudhamani, Professor And Hod, Dept. of ISE, RNSIT.

Published By:
Retrieval Number: B10431292S19/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.B1043.1292S19 430 & Sciences Publication
Predictive Analysis of IPL Match Winner using Machine Learning Techniques

Predicting the result of the game, classifying whether a team paper has provided a solution to manual effort to improve
can deliberately allow a player walk in bat and categorizing over the results obtained previously. This Bayesian network
non-fastball pitches by type of field, etc., are few of the prediction has provided a result of 59.21% accurate over the
sorting issues resolved with help of machine learning in other methods. Machine learning has always provided a
baseball sport [9]. prominent solution towards the way gaming can be
Additionally, cricket also uses sports analytics to monitored.
forecast a match's outcome while the action is underway or In the paper [8] analysis of baseball game is shown where
even before the match has begun [10-14]. Also issues the game play can be evaluated by using machine learning
such as forecasting a player's runs or wickets intended for a techniques like SVM, K nearest neighbors which helped over
the existing methodologies, this method involved extracting
match, built on his / her previous performance, are fascinating
75% of the highlights of the match and then deep evaluation is
issue that must be focus on. Few practical tools employed in
carried out. This change in gameplay evaluation helped the
cricket consist of WASP (Winning and Score Predictor) [15],
players to find out new techniques to win over the opponents.
a tool that forecasts a score and potential result of a restricted They even proposed the way game play results can be
match over cricket, i.e. one-day or t-20. This tool was first interpreted pictorially and provided the way the analysis of a
introduced by Sky Sports New Zealand during an ongoing match is given out.
T-20 match in 2012. Software such as Hawk-Eye [ 22–23], Supervised algorithms were used during the game
which records a ball's trajectory and reveals the most prediction, like logistic regression, Random forest, SVM ,
statistically significant direction visibly, has also been KNN which are explained deeply within this paper.
formally used since 2009 as the Umpire Decision Review
System. Similarly, this computer-assisted intelligent IV. PROPOSED WORK
technology is also used by other sports such as tennis,
From the Literature survey, it can be seen that a machine
badminton, snooker. In the further sections of this paper we
learning model can be built which will be able to forecast the
have briefly discussed about the importance of machine
learning in sports, proposed methodology. result of the match even beforehand the match begins. There
are many formats of playing cricket, among them T20 format
III. MACHINE LEARNING IN SPORTS REVIEW is the one that has many turnarounds such that it will be very
tough to foresee the champion until the last ball. So, it is quite
With AI and Machine learning application, Machine
learning is changing the technology in every possible field, complex to predict the winner.
machine learning can help real time problem in the society. It Most of the statistical work in sports is performed using
has a profound impact in the field of sports, provided in regression and classification tasks, both of which are subject
various sports like cricket, baseball, football and others. to supervised learning. Simply put, y = f(x) which is a
organizations can use the data during the game play to predictive value learned from a dataset constructed by the
improve every area of their players move. From player learning data: D = ((X1,Y1), (X2,Y2), (X3,Y3), ... (Xn, Yn)).
recruitment to player performance to ticket sales, predictive Supervised learning can be divided based on output to two
analysis can help make targeted decisions and strategic categories as classification, regression. So, this problem is a
changes that impact every area of sports organization. classification problem. So, we apply several classification
In the past few years, importance of sports has emerged as algorithms on the cricket dataset, evaluate the results and
an important element in our society, even in Olympics sports select the most appropriate model that gives greater accuracy.
helps towards community health and productivity. Using
machine learning we can improve the way players
involvement and necessary improvements can be made using
powerful machine learning techniques, these techniques help
to improve the results over the previous results which where
achieved without using any machine learning techniques.
Detailed data about the players performance will be analyzed
and improvement areas can be predicted using machine
learning. In the study “Predicting the match outcome in one Figure – 1 basic methodology of the paper
day international cricket matches” [17] this paper proposed a
basic solution towards forecasting the winner of a cricket The above figure 1 represents the basic methodology for our
match he also has given a discussion of how machine learning research work. This represents multiple methods like data
algorithms can change the future of the means players collection, data exploration, data cleaning, data
evaluate their play in the game. This method has many preprocessing, model development and model evaluation.
benefits over the existing method where manual effort to find These methods play a major role in predicting the output for
the flaw in the game play is reduced which makes it easier for our work. These methods are explained in detail in this paper
computers to efficiently find the improvement strategy. Even in further sections. There are various algorithms can be used
during the 1960’s Arthur Samuel used a intelligent approach to solve the real time problems in machine learning. These
to gaming by the methods of artificial intelligence in the game algorithms take a predefined input of game play and its
of checkers to provide the number of moves for the player can previous experiences and provide an accurate output, these
win the match against his opponent. In the research paper [12] algorithms differ in task and the operations involved to give
discussed they have proved that Bayesian networks has solution to the problem, these
performed over other machine learning techniques. This algorithms are discussed here:

Published By:
Retrieval Number: B10431292S19/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.B1043.1292S19 431 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-9 Issue-2S, December 2019

Decision tree V. METHODOLOGY


Decision tree is involved in both regression and classification,
this methodology is used to depict the various decisions taken A. Data Collection
to provide the necessary result. This results in a tree of The Indian Premier League's official website[ 21] is the
choices selected. Decision tree has more impact over other principal basis of data for this project. The data was
machine learning algorithms and has provided more accurate webscrapped from the website and kept in the appropriate
results. is calculated using entropy by classifying into two
format using a python library called beautifulsoup. The
major types as “yes” or “no” the mathematical equation is
given below in the equation (1): dataset has the columns regarding match-number, IPL season
year, the place where match has been held and the stadium
H(S) = Σ – p( c ) log2 p ( c ) ---(1) name, the match winner details, participating teams, the
margin of winning and the umpire details, player of the match.
C = { yes, no } Indian Premier League was only 11 years old, which is why,
after the pre-processing, only 634 matches were available.
Random Forest
Here, some of the columns may contain null values and some
Random forest is uniquely the most prominent machine
learning algorithms which consists of multiple decision trees of the attributes may not be required for match winner
together. Here every single individual tree will explore more prediction which is discussed in data preprocessing.
deeply about its class predictions and the class with most of
B. Data Preprocessing
the votes, becomes our model’s actual prediction.
Classification type of problem always have a discrete value as Here, in this step we have tried to explore more in the dataset
the output which are completely different to each other. The to find any anomalies present, every dataset might have
main strategy behind random forest is that it divides the whole certain defects which have to be regulated to make it a
strategy into multiple trees resulting in various solutions standard form for performing calculations. Defects can be like
resulting in the most prominent tree path as the final accuracy.
having null values in certain attribute values or like having
This helps in many classification algorithm, to classify
various object depending their behavior. Here the expected empty values in the certain required attributes. This step
prediction error is calculated for every time, this error is also provides us a detailed format or understanding the dataset and
known as test error, this equation (2) is given below. presenting in a structured format which easy to process.

----(2) (i) Data cleaning


Here E respresents the error, L represents the data values. There are some null values in the dataset in the columns such
as winner, city, venue etc. Due to the presence of these null
K Nearest Neighbors
values, the classification cannot be done accurately. So, we
This algorithm is possible for classification as well as tried to replace the null values in different columns with
regression type of problems, this algorithm is one of the dummy values.
prominent in machine learning since it is a non – parametric
(ii) Choosing Required Attributes
way where there won’t be any expectations about the
distribution of data. In supervised learning KNN is used in This step is the main part where we can eliminate some
powerful application like pattern identification, data mining columns of the dataset that are not useful for the estimation of
and intrusion detection. KNN is completely robust, it match winning team. This is estimated using feature
calculates the distance between the test data and the input and importance. The considered attributes has the following
gives the prediction accordingly. One of the equations used feature importance.
for finding the distance between the input and test data is
shown in the equation (3) below C. Model Development and Evaluation

Here, we have developed a generic model and applied all


----(3) classification methods. The detailed procedure is as follows:
Team2 0.254593
Support Vector Machine Team1 0.223901
Venue 0.168813
Support vector machine is used prominently for supervised Toss Winner 0.164288
algorithms, they work clearly well when there’s a strong City 0.154104
boundary of separation between classes, they’re additionally Toss Decision 0.034301
effective in high dimensional spaces. SVM is more memory
efficient. They are suitable for more large data sets. They Algorithm: Model Evaluation Algorithm:
work well in unstructured and semi structured data like text,
images and trees. They have generalization in practice. They
are good for text classification.

Published By:
Retrieval Number: B10431292S19/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.B1043.1292S19 432 & Sciences Publication
Predictive Analysis of IPL Match Winner using Machine Learning Techniques

Recall:
Recall is the proportion of number of correctly predicted
positive observations versus total number of all actual
observations.

F1-Score:
F1 Score is the weighted average of precision and recall.
The above algorithm is mainly describes the procedure for
the work in a pseudo code format, initially for every iteration,
the data is split into training data and test data, we train the
model using certain features and use it to predict the testing
data , then we calculate the performance of the system.
The above procedure evaluates the classification model Accuracy:
and calculates the accuracy.
The various classification models used are: Logistic Accuracy can be calculated as the average of Precision and
Regression, Gaussian Naïve Bayes Classifier, KNN (K Recall.
Nearest Neighbor) algorithm, Support Vector Machines,
Gradient Boost Algorithm, Decision Trees and Random
Forest Classifier. Among these methods the random Forest
and Decision tree has given good results.

VI. RESULTS AND DISCUSSIONS


The classification report of our proposed model is shown in
As discussed above, the IPL dataset was trained in different the table 2 below. The accuracy for predicting the winner of
machine learning algorithms for the database that included all the match is 90.1% and the accuracy for predicting the loser is
the match details from the launch of the Indian Premier 88.2%.The average accuracy for predicting the outcome of
League till 2018 and the highest accuracy is given by Random the match is 89.151%.
Forest Classifier and Decision Tree.
The Random Forest classifier and Decision Tree correctly
predicted the outcome with the accuracy of 89.151% given
the train data 70% and test data 30% of the entire dataset.
Classification report consists of values for accuracy,
precision, recall and f1-score, the explanation for which is
given below. The confusion matrix can be graphically
represented as:

Table 2: precision recall and accuracy evaluation

Comparison of the performance in terms of accuracy given in


Precision: different models is discussed in Table 3 below.
Precision is the proportion of correctly predicted positive Table 3 Accuracy of various methods
observations versus the total number of positively predicted
observations. Precision talks about how precise a model is out
of predicted positive, how many of them are actual positive.

Published By:
Retrieval Number: B10431292S19/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.B1043.1292S19 433 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-9 Issue-2S, December 2019

As seen from the table 3, based on the accuracy of the REFERENCES


classification, Random forest and Decision Tree were the
highest, followed by other methods such as Support Vector 1. P. Halvorsen, S. Sægrov, A. Mortensen, A. Eichhorn, M. Stenhaug, S.
Dahl, H. K. Stensland, V. R. Gaddam, C. Griwodz, et al., “Bagadus: an
machine and K-nearest neighbor classification algorithms. integrated system for arena sports analytics: a soccer case study,”
The performance of Naive Bayes and Logistic Regression Proceedings of the 4th ACM Multimedia System Conference, pp.
48–59, ACM, 2013
classifiers was poor in predicting the IPL match outcome. The
2. A. S. Forouhar, M. M. Kellogg, K. Ohiomoba, and E .
bar graph plotted below shows the accuracy of different Akhmetgaliyev, “Methods, systems and software programs for
classification models. enhanced sports analytics and applications,” May 14 2015.
3. K. Goldsberry, “Courtvision: New visual and spatial analytics for the
nba,” in 2012 MIT Sloan sports analytics conference, vol. 9, pp.
12–15, 2012.
4. M. Gowda, A. Dhekne, S. Shen, R. R. Choudhury, L. Yang, S.
Golwalkar, and A. Essanian, “Bringing iot to sports analytics,” in
14th {USENIX} Symposium on Networked Systems Design and
Implementation ({NSDI},2016
5. R. M. Rodenberg and E. D. Feustel, “Forensic sports analytics:
Detecting and predicting match-fixing in tennis.,” Journal of
prediction markets, vol. 8, no. 1, 2014.
6. Wired, “The unlikely secret behind benfica’s fourth consecutive
primeira liga title,” May 2017.
7. T. A. Severini, Analytic methods in sports: Using mathematics and
statistics to understand data from baseball, football, basketball, and
other sports. Chapman and Hall/CRC, 2014.
8. H. Ghasemzadeh and R. Jafari, “Coordination analysis of human
movements with body sensor networks: A signal processing model
to evaluate baseball swings,” IEEE Sensors Journal, vol. 11, no. 3,
pp. 603–610, 2010
9. R. Rein and D. Memmert, “Big data and tactical analysis in elite
soccer: future challenges and opportunities for sports science,”
SpringerPlus, vol. 5, no. 1, p. 1410, 2016
10. T. H. Davenport, “What businesses can learn from sports analytics,”
MIT Sloan Management Review, vol. 55, no. 4, p. 10, 2014.
11. G. Fried and C. Mumcu, Sport analytics: A data-driven approach to
sport business and management. Taylor & Francis, 2016.
12. T. A. Severini, Analytic methods in sports: Using mathematics and
VII. CONCLUSION statistics to understand data from baseball, football, basketball, and
other sports. Chapman and Hall/CRC, 2014.
Predicting the winner in sports, cricket in particular is a 13. K. Koseler and M. Stephan, “Machine learning applications in
challenge and very complex. But by incorporating machine baseball: A systematic literature review,” Applied Artificial Intelligence,
vol. 31, no. 9-10, pp. 745–763, 2017.
learning, this can be made much simpler and easier. In this 14. A. Bandulasiri, “Predicting the winner in one day international cricket,”
study, the various factors that influence the outcome of an Journal of Mathematical Sciences & Mathematics Education, vol. 3, no.
Indian Premier League matches were identified. The factors 1, pp. 6–17, 2008.
15. K. Koseler and M. Stephan, “Machine learning applications in
which significantly influence the result of an IPL match baseball: A systematic literature review,” Applied Artificial
included the playing teams, match venue, city, the toss winner Intelligence, vol. 31, no. 9-10, pp. 745–763, 2017.
16. A. Bandulasiri, “Predicting the winner in one day international
and the toss decision. cricket,” Journal of Mathematical Sciences & Mathematics
Education, vol. 3, no. 1, pp. 6–17, 2008.
A generic function for classifier model was designed to 17. M. Bailey and S. R. Clarke, “Predicting the match outcome in one day
international cricket matches, while the game is in progress,”
measure the points earned by each team based on their past Journal of sports science & medicine, vol. 5, no. 4, p. 480, 2006.
performances, including team1, team2, venue of the match, 18. V. V. Sankaranarayanan, J. Sattar, and L. V Lakshmanan,
“Auto-play: A data mining approach to odi cricket simulation and
toss winner, city and toss decision. Different prediction,” in Proceedings of the 2014 SIAM International Conference
classification-based machine learning algorithms were trained on Data Mining, pp. 1064–1072, SIAM, 2014.
19. A. Kaluarachchi and S. V. Aparna, “Cricai: A classification based
on the IPL dataset developed for this work. The tool to predict the outcome in odi cricket,” in 2010 Fifth
methodologies used in our work to find the final evaluation International Conference on Information and Automation for
are Logistic regression, Decision trees, Random forest and Sustainability, pp. 250–255, IEEE, 2010.
20. E. Crampton and S. Hogan, “Cricket and the wasp: Shameless self
K-nearest neighbors. Among these techniques, the Random promotion (wonkish)..”,2016
forest classifier and Decision Tree provided the highest 21. http://stats.espncricinfo.com/ci/engine/records/index
22. ] N. Owens, C. Harris, and C. Stennett, “Hawk-eye tennis system,” in
accuracy of 89.151%. 2003 International Conference on Visual Information Engineering VIE
For future work, we plan to expand our work using more 2016, pp. 182–185, IET, 2016.
attributes like the previous match score of the selected team
and opponent team, the number of skilled batsmen in the
opponent team, and more. The machine learning methods
used in our research can also be used to predict the outcome in
other outdoor sports such as football, baseball and more.

Published By:
Retrieval Number: B10431292S19/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.B1043.1292S19 434 & Sciences Publication
Predictive Analysis of IPL Match Winner using Machine Learning Techniques

AUTHORS PROFILE

Chakka Sai Abhishek has pursued bachelor of


technology in Information Science Engineering in RNS
Institute of Technology. He has won 2nd prize in IOT
innovation Events and also presented in various
technical research conferences.

Ketaki Vinod Patil pursued bachelor of technology in


Information Science Engineering in RNS Institute of
Technology. She has successfully presented a paper in
IETE Sponsored Second National Conference on
Emerging Trends in Engineering, Science and
Technology and also won 2nd prize in IOT innovation
event.

Yuktha P pursued Bachelor of Technology in


Information Science Engineering in RNS institute of
Technology, she has presented in various technical events
and has also presented a paper in IETE Sponsored Second
National Conference on Engineering Trends in
Engineering, Science and Technology.

Meghana K S, pursued Bachelor of Technology in


Information Science Engineering in RNS Institute of
Technology.

Dr. M V Sudhamani, currently working as Dean-R&D,


Professor and HoD, Dept. of ISE, RNSIT. She is having
Teaching, Research and Industrial experience of 25 years.
She has specialization in Image Processing,
Content-based Image Retrieval, Advanced Algorithms
and Databases. Guided and guiding candidates for Ph. D
degree. She has carried out two research projects from
VTU and AICTE. She has served as member of Board of
Examiners (BOE) and Board of Study (BOS) member in VTU and other
autonomous institutions across India. She has organized two international
conferences ICDECS 2011 and 2015, and one more in December 2019.

Published By:
Retrieval Number: B10431292S19/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.B1043.1292S19 435
View publication stats
& Sciences Publication

You might also like