IPL Base Price Modelling and Visualization Using Linear Regression
IPL Base Price Modelling and Visualization Using Linear Regression
Bachelor of Engineering
Institute of Engineering, Jiwaji University, Gwalior, India
Abstract
The Indian Premier League (IPL) is a professional Twenty20 cricket league in India contested during
April and May of every year by teams representing Indian cities. The aim of the research was to predict
the base price of players for IPL Auction. The performance of players from different leagues and world
tournaments were collected and important features including Runs Scored, Strike Rate, Average,
Wickets, Economy Rate were considered. The initial step of our action plan was to clean the data for
which we used Python’s Pandas. Once the data was cleansed and organized in a structured way, we
calculated the Base Price Score, which was a summation of the extracted features, and each feature had
its own weightage depending on the impact on IPL. The next step was to calculate the Average Score
depending on the weightage given on the basis of different leagues and tournaments. The machine
algorithm which was used here was Multiple Linear Regression as, in the above case, we had multiple
dependent variables for an independent variable i.e. Base Price Score. For measuring the accuracy of
the model, Root Mean Square Error method was incorporated.
1. Introduction
The Indian Premier League (IPL) is a professional Twenty20 cricket league in India contested during
April and May of every year by teams representing Indian cities. The league was founded by the Board
of Control for Cricket in India (BCCI) in 2007, and is regarded as the brainchild of Lalit Modi, the
founder and former commissioner of the league. The IPL is the most-attended cricket league in the
world. In 2010, the IPL became the first sporting event in the world to be broadcast live on YouTube.
Currently, with eight teams, each team plays each other twice in a home-and-away round-robin format
in the league phase.
1.2 Objective
The aim of this project is to predict the base price of players for the upcoming season of Indian Premier
League considering the previous performances of players in various formats and leagues. The
performance data of players are collected and important features are extracted. The objective of the
work is to analyse the data and predict the base price of the players by using machine learning algorithm
Linear Regression. Since input is about previous performances that are unstructured, we perform pre-
processing, extract features on to which are important for a player, then calculate the base price score
and generate the base price based on the range of score generated, and also plots graph for the result.
1.3 Dataset
This dataset contains performance data of players of different leagues such as Indian Premier League,
Big Bash League etc., from ESPNCRICINFO, which includes data from the year 2015 to 2017. This
dataset includes performance attributes (Runs scored, Number of Matches, Strike Rate, Number of
Wickets, Economy Rate, Number of 0s, 4s, 6s, 50s, 100s, Innings).
2. Methodology
Programming Language used: Python 3.4
Machine Learning and Data Analysis Library: SciKit Learn
SciKit Learn is a free software machine learning library for the Python programming language. It
features various classification, regression and clustering algorithms and is designed to interoperate with
the Python numerical and scientific libraries NumPy and SciPy.
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures
and data analysis tools for the Python programming language.
The Base Price Score of each player for each league is generated by giving weights to the extracted
features.
For Batsmen weights of {0.4, 0.4, 0.2} were given to Runs Scored, Strike Rate and Average
respectively.
For Bowlers weights of {0.4, 0.3, 0.15, 0.15} were given to Economy, Wickets Taken, Average and
Strike Rate respectively.
Since the Base Price Score (BPS) calculated were very high so we normalised by taking log base 10.
Then a column called as “Avg L_Score” is made which contains the various values of mean of base
price scores of all the leagues for particular player, which are later ranged/split according to the Base
Price Range given by the BCCI and IPL committee.
Then data is split into train and test data. The training data contains 80% of data and 20% data is kept
for testing and scholastic sampling is used for splitting the data.
3. Visualization
Figure 3.1: Distribution of Players from Different Countries Represented by a Heat Map
Figure 3.2: Distribution of Players from Different Categories in Various Base Prices Set by IPL
Governing Committee
Figure 3.4: Number of Players from Each Country Categorized by Bowlers, Batsmen and All-Rounders
Figure 3.7: Word Cloud Representing Number of Participations in Auction from Each Country
References
1. Data source: https://www.espncricinfo.com/
2. Linear Regression https://www.geeksforgeeks.org/ml-linear-regression/
3. General information from https://en.wikipedia.org/wiki/Indian_Premier_League
4. Hyun-Il Lim (2019). IEEE 43rd Annual Computer Software and Applications Conference
(COMPSAC) https://ieeexplore.ieee.org/xpl/conhome/8746989/proceeding
5. Sainathan Ganesh Iyer, Anurag Dipakkumar Pawar (2019). International Conference on Smart
Systems and Inventive Technology (ICSSIT)
https://ieeexplore.ieee.org/xpl/conhome/8966524/proceeding
6. B. Pavlyshenko (2016). IEEE International Conference on Big Data
https://ieeexplore.ieee.org/xpl/conhome/7818133/proceeding
7. Mengyu Huang (2020). International Conference on Computer Vision, Image and Deep Learning
(CVIDL) https://ieeexplore.ieee.org/xpl/conhome/9270288/proceeding
8. Lyn Bartram, Michael Correll, Melanie Tory (2021). Untidy Data: The Unreasonable Effectiveness
of Tables https://research.tableau.com/paper/untidy-data-unreasonable-effectiveness-tables
Acknowledgement
This research paper was written under the mentorship of Professor Alpana Sharma. She is the Head of
Department of Computer Science in Institute of Engineering, Jiwaji University, Gwalior.