0% found this document useful (0 votes)
5 views13 pages

IPL Base Price Modelling and Visualization Using Linear Regression

Uploaded by

lokeshvenkat775
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

IPL Base Price Modelling and Visualization Using Linear Regression

Uploaded by

lokeshvenkat775
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

IPL Base Price Modelling and Visualization using Linear


Regression
Rajat Chelani

Bachelor of Engineering
Institute of Engineering, Jiwaji University, Gwalior, India

Abstract
The Indian Premier League (IPL) is a professional Twenty20 cricket league in India contested during
April and May of every year by teams representing Indian cities. The aim of the research was to predict
the base price of players for IPL Auction. The performance of players from different leagues and world
tournaments were collected and important features including Runs Scored, Strike Rate, Average,
Wickets, Economy Rate were considered. The initial step of our action plan was to clean the data for
which we used Python’s Pandas. Once the data was cleansed and organized in a structured way, we
calculated the Base Price Score, which was a summation of the extracted features, and each feature had
its own weightage depending on the impact on IPL. The next step was to calculate the Average Score
depending on the weightage given on the basis of different leagues and tournaments. The machine
algorithm which was used here was Multiple Linear Regression as, in the above case, we had multiple
dependent variables for an independent variable i.e. Base Price Score. For measuring the accuracy of
the model, Root Mean Square Error method was incorporated.

Keywords: IPL, Linear Regression, Price Prediction, T-20

1. Introduction
The Indian Premier League (IPL) is a professional Twenty20 cricket league in India contested during
April and May of every year by teams representing Indian cities. The league was founded by the Board
of Control for Cricket in India (BCCI) in 2007, and is regarded as the brainchild of Lalit Modi, the
founder and former commissioner of the league. The IPL is the most-attended cricket league in the
world. In 2010, the IPL became the first sporting event in the world to be broadcast live on YouTube.
Currently, with eight teams, each team plays each other twice in a home-and-away round-robin format
in the league phase.

1.1 Auction Procedure


A team can acquire players through the annual player auction. The players who are going under the
bidding process have to set a base price which will be the beginning price for the auctioneers. Players
are bought by the franchise that bids the highest for them. A player will go unsold if no team bids for
them. The auctioneer will give the franchises an option of listing the unsold players they are interested
in and will start the bidding for those players for a second time with the base price of the player slashed
to half of the original price. If the players remain unsold for the second time, they will be considered
unsold in the auction.

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 88


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

1.2 Objective
The aim of this project is to predict the base price of players for the upcoming season of Indian Premier
League considering the previous performances of players in various formats and leagues. The
performance data of players are collected and important features are extracted. The objective of the
work is to analyse the data and predict the base price of the players by using machine learning algorithm
Linear Regression. Since input is about previous performances that are unstructured, we perform pre-
processing, extract features on to which are important for a player, then calculate the base price score
and generate the base price based on the range of score generated, and also plots graph for the result.

1.3 Dataset
This dataset contains performance data of players of different leagues such as Indian Premier League,
Big Bash League etc., from ESPNCRICINFO, which includes data from the year 2015 to 2017. This
dataset includes performance attributes (Runs scored, Number of Matches, Strike Rate, Number of
Wickets, Economy Rate, Number of 0s, 4s, 6s, 50s, 100s, Innings).

Figure 1.3.1: Sample Dataset

1.4 Problem Definition


Based on the data described in the previous section, the following are the problems to be solved:
1. To compute a Base Price Score (BP Score) for a given player based on his previous performances.
2. To find how many players fall in category of various base price ranges set by BCCI.
3. To find players with higher base prices in the category of All-Rounder, Batsmen and Bowlers.

2. Methodology
Programming Language used: Python 3.4
Machine Learning and Data Analysis Library: SciKit Learn

SciKit Learn is a free software machine learning library for the Python programming language. It
features various classification, regression and clustering algorithms and is designed to interoperate with
the Python numerical and scientific libraries NumPy and SciPy.

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 89


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

Reading the Dataset


The dataset contains a total of 120 Comma Separated Value (CSV) Files. Hence to read the files Pandas
is used.

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures
and data analysis tools for the Python programming language.

2.1 Pre-Processing and Feature Extraction


The pre processing of the data begins by removing unwanted rows from the file which contained
information like date of match, venue, and team name of player. Since there were 9 leagues of which
data were collected of different years, each league’s data was merged into one. As the dataset contained
13 features, we decided to keep only few parameters for evaluation which were important.

The following features are considered for analysis:


For Batsman: {Runs Scored, Strike Rate, Average}
For Bowler: {Economy Rate, Number of Wickets, Average, Strike Rate}

The Base Price Score of each player for each league is generated by giving weights to the extracted
features.
For Batsmen weights of {0.4, 0.4, 0.2} were given to Runs Scored, Strike Rate and Average
respectively.
For Bowlers weights of {0.4, 0.3, 0.15, 0.15} were given to Economy, Wickets Taken, Average and
Strike Rate respectively.

Since the Base Price Score (BPS) calculated were very high so we normalised by taking log base 10.
Then a column called as “Avg L_Score” is made which contains the various values of mean of base
price scores of all the leagues for particular player, which are later ranged/split according to the Base
Price Range given by the BCCI and IPL committee.

Then data is split into train and test data. The training data contains 80% of data and 20% data is kept
for testing and scholastic sampling is used for splitting the data.

2.2 Training using Machine Learning Algorithm


 Machine Learning Model Multiple Linear Regression from SciKit Learn is applied on the dataset.
 Multiple Linear Regression is used to explain the relationship between one continuous dependent
variable and two or more independent variables. The independent variables can be continuous or
categorical.
 The regression estimates are used to explain the relationship between one dependent variable and
one or more independent variables.
 The Avg_LScore Column is the target variable.

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 90


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 91


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

2.3.1 IPL Data Cleaning Code

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 92


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

2.3.2 Interactive Python code for New Data Entries

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 93


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

2.3.3 IPL Data Merging Code for Batsmen

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 94


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

2.3.4 Linear Regression Code for Base Price Prediction

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 95


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

3. Visualization

Figure 3.1: Distribution of Players from Different Countries Represented by a Heat Map

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 96


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

Figure 3.2: Distribution of Players from Different Categories in Various Base Prices Set by IPL
Governing Committee

Figure 3.3: Tabular View of Players in Different Categories of Base Price

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 97


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

Figure 3.4: Number of Players from Each Country Categorized by Bowlers, Batsmen and All-Rounders

Figure 3.5: Number of Indian Players vs Overseas Players

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 98


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

Figure 3.6: Number of Capped vs Uncapped from India

Figure 3.7: Word Cloud Representing Number of Participations in Auction from Each Country

References
1. Data source: https://www.espncricinfo.com/
2. Linear Regression https://www.geeksforgeeks.org/ml-linear-regression/
3. General information from https://en.wikipedia.org/wiki/Indian_Premier_League
4. Hyun-Il Lim (2019). IEEE 43rd Annual Computer Software and Applications Conference
(COMPSAC) https://ieeexplore.ieee.org/xpl/conhome/8746989/proceeding
5. Sainathan Ganesh Iyer, Anurag Dipakkumar Pawar (2019). International Conference on Smart
Systems and Inventive Technology (ICSSIT)
https://ieeexplore.ieee.org/xpl/conhome/8966524/proceeding
6. B. Pavlyshenko (2016). IEEE International Conference on Big Data
https://ieeexplore.ieee.org/xpl/conhome/7818133/proceeding

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 99


IJIRMPS | Volume 9, Issue 5, 2021 ISSN: 2349-7300

7. Mengyu Huang (2020). International Conference on Computer Vision, Image and Deep Learning
(CVIDL) https://ieeexplore.ieee.org/xpl/conhome/9270288/proceeding
8. Lyn Bartram, Michael Correll, Melanie Tory (2021). Untidy Data: The Unreasonable Effectiveness
of Tables https://research.tableau.com/paper/untidy-data-unreasonable-effectiveness-tables

Acknowledgement
This research paper was written under the mentorship of Professor Alpana Sharma. She is the Head of
Department of Computer Science in Institute of Engineering, Jiwaji University, Gwalior.

IJIRMPS2105009 Website : www.ijirmps.org Email : [email protected] 100

You might also like