0% found this document useful (0 votes)
7 views

source file

The project report focuses on predicting the score of IPL cricket matches using machine learning techniques, specifically employing the Ridge Regression algorithm. It analyzes various factors such as current runs, wickets, and recent performance to forecast the first innings score before its completion. The dataset used spans IPL matches from 2008 to 2017, aiming to enhance prediction accuracy compared to existing methods.

Uploaded by

naruto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

source file

The project report focuses on predicting the score of IPL cricket matches using machine learning techniques, specifically employing the Ridge Regression algorithm. It analyzes various factors such as current runs, wickets, and recent performance to forecast the first innings score before its completion. The dataset used spans IPL matches from 2008 to 2017, aiming to enhance prediction accuracy compared to existing methods.

Uploaded by

naruto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

A Project report on

CRICKET MATCH SCORE PREDICTION USING


MACHINE LEARNING

in partial fulfillment for the award of the degree of

BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING

Submitted by

Javvadi Ganesh 18B91B052


Yendamuri Sai Venkata Santosh 18B91B056
Chakka Chaitanya Srinivas 18B91B050
Valivarthi Venkata Srunjani 19B95B050
Under the Guidance of

P Neelima
Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND


ENGINEERING S RKR ENGINEERING COLLEGE (A)
Chinna Amiram, Bhimavaram, West Godavari Dist., A.P.

[2021 – 2022]

i
SRKR ENGINEERING COLLEGE (A)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
ChinnaAmiram, Bhimavaram, West Godavari Dist., A.P.

BONAFIDE CERTIFICATE

This is to certify that the Seminar report entitled “ CRICKET MATCHSCORE


PREDICTION USING MACHINE LEARNING” is the bonafide work of
JAVVADI GANESH, YENDAMURI SAI VENKATA SANTOSH, CHAKKA
CHAITANYA SRINIVAS, VALIVARTHI VENKATA SRUNJANI bearing
18B91B0520,18B91B0560,18B91B0508,19B95B0505 who carried out
this work under my supervision in partial fulfillment of the
requirements for the award of the degree of Bachelor of Technology
in Computer Science and Engineering during 4th year II semester.

SUPERVISOR HEAD OF THE DEPARTMENT


P Neelima Dr. V. Chandra Sekhar
Assistant Professor Professor

ii
SELF DECLARATION

We hereby declare that the project work entitled “CRICKET MATCH SCORE
PREDICTION USING MACHINE LEARNING” is a genuine work carried out by
us in B.Tech., (Computer Science and Engineering) at SRKR Engineering
College(A), Bhimavaram and has not been submitted either in part or full for
the award of any other degree or diploma in any other institute or University.

1. J. Ganesh
2. Y. S. V. Santosh
3. C. C. Srinivas
4. V. V. Srunjani

iii
TABLE OF CONTENTS

S NO DESCRIPTION PG NO

ABSTRACT iv
iv
LIST OF TABLES
iv
LIST OF FIGURES
iv
ABBREVIATIONS

1 INTRODUCTION 1

2 LITERATURE SURVEY 3

3 PROBLEM STATEMENT 11

4 METHODOLOGY 12

5 IMPLEMENTATION 17

6 RESULT ANALYSIS 23

7 CONCLUSION 27

8 REFERENCES 28

9 APPENDIX 30

iv
ABSTRACT

This project tells us about the model that forecast the projected score of an ipl cricket match.
The result of this model depends on various characteristics like current runs, current wickets
fallen, runs scored in last 5 overs, wickets fallen in last 5 overs. The dataset contains history
of all the ipl matches that have been conducted between the years 2008 to 2017. This project
can forecast the first innings score of an ipl match before the first innings is completed.
Ridge_regression algorithm is used to forecast the score. This model mainly focuses on the
data of last 5 overs to forecast the final score of the match.

v
LIST OF TABLES

S NO DESCRIPTION PG NO

1 COMPARISON BETWEEN DIFFERENT REGRESSORS 25

vi
LIST OF FIGURES

S NO DESCRIPTION PG NO

123 MODEL ARCHITECTURE 14

45 SCREENSHOT FOR DATA COLLECTION 17


SCREENSHOT FOR DATA CLEANING 18
SCREENSHOT FOR DATA EXTRACTION 18
6 SCREENSHOT FOR DATA PREPROCESSING 19
7 SCREENSHOT FOR DATA SPLITTING 20
8 SCREENSHOT FOR BUILDING THE MODEL 20
9 SCREENSHOT FOR WEB PAGE 21
10 SCREENSHOT FOR FILLING VALUES IN WEB PAGE 22
11 ERROR RATES BY USING LINEAR REGRESSION 23
12 ERROR RATES BY USING RIDGE REGRESSION 23
13 ERROR RATES BY USING LASSO REGRESSION 24
14 COMPARISON BETWEEN DIFFERENT FITTING MODELS 25
15 USING TRAIN DATA FOR TESTING 26
16 ERROR RATES FOR TRAINING DATA 26

OUTPUT SCREEN 36

vii
ABBREVIATIONS

ML: Machine Learning.


IPL: Indian Premier League.
BCCI: Board of Control for Cricket in India.
CRR: Current Run-Rate.
KNN: K Nearest neighbor
SVM: Support Vector Machine

viii
1 INTRODUCTION

Cricket, the most popular game at the time, was introduced to North America by
English people in the 17th century. This game is being played by the majority of countries.
The BCCI launched numerous betting apps, such as Dream 11, in 2008 to forecast the first
innings score of a cricket match. As a result, there is a high demand for algorithms that
anticipate the first innings score of a more crucial cricket match. The easiest technique to
anticipate the first innings score is to use machine learning algorithms. Reinforced,
unsupervised, and supervised learning are the three types of machine learning algorithms.
These algorithms are based on the application as well as the model's output.

Cricket is one of the most popular television programs. This sport is extremely
popular in countries such as India, Australia, England, New Zealand, and South Africa.
One of the significant difficulties that has arisen recently is that the projected score for
the first inning of the game does not match the actual score for the first inning. This is
where the necessity for a model to accurately estimate the first innings score of an IPL
match arises. This will aid viewers in predicting the ultimate score of the current match.

A normal Twenty20 match lasts three to four hours, with each innings lasting 75–
90 minutes and a 10- to 20-minute break in the middle of the game. Each innings is played
over 20 overs, with 11 players on each team. This is far shorter than previous versions of the
game and more in line with other popular team sports. It was introduced to establish a new
version of the game that would appeal to both on-field spectators and television viewers.

The first innings score of a cricket match is predicted utilizing many strategies in cricket

score prediction. To anticipate the cricket score of IPL matches, many techniques and prediction

methods are employed. The CRR method is extensively used to estimate a cricket match's first inning

score. The number of runs scored in an over is multiplied by the total number of overs in an innings in

the CRR technique. This approach focuses simply on the runs made in an over and ignores the various

parameters. The current technique can only estimate the first innings score based on the current score,

not the many characteristics. We are attempting to increase the

1
accuracy of the present method by incorporating several characteristics while
forecasting the first innings score of a cricket match. We will be focusing on live
cricket score prediction and will be evaluating IPL matches for score prediction.

The linear, lasso, and ridge regression machine learning techniques were employed

to forecast the IPL first Inning Match Score. The machine learning model is provided labelled data

in Linear Regression, and the data given to the model is already known to the human. The linear

regression model is used to predict continuous values rather than object classification. Ridge

regression can be used to investigate multicollinearity in data.

There are two sorts of problems that supervised machine learning algorithms can

solve: regression and classification problems. The main issue with categorization models is

output. When real value is the desired output, the Regression model has a severe flaw.

Recommendations and forecasts for a sequence of time series are two more prominent sorts of

challenges based on categorization models and regression techniques. Unsupervised learning

aims to model the structure or distribution of data in order to gain a better understanding of it.

2
2 LITERATURE SURVEY

A significant amount of work has been done over the decades on cricket score prediction

technique and machine learning for predicting the first innings final score. Several different

recognition and detection algorithms for score prediction have been evolving in this field.

There are different current techniques occurring from literature survey.

Nikhil Dhonge, et.al [1] chipped away at the calculations used to anticipate the IPL first Inning

Match Score. They want to look for a computationally proficient procedure that predicts the

extended score of the principal innings. The various classifiers utilized by them are SVC classifier,

choice tree classifier and Random Forest classifier. They gathered information on all the IPL

matches played from 2008. The dataset comprises of 76015 quantities of columns. Dataset

comprises 15 segments over which they applied include choice strategies and chose 8 elements

in which 7 are input highlights and 1 is target variable. Relapse examination involves different

calculation for the calculation and in light of that it predicts the ceaseless worth. There are sure

arrangement of factors are utilized for the information and the constant reach esteem is the

objective variable. The calculation accomplishes an exactness of 80.92% on straight relapse and

practically 80.84% on edge relapse and practically 80.45% on tether relapse. They reason that for

score expectation the direct relapse gives the most elevated precision result.

Apurva Lawate, et.al [2] proposed Cricket Match Prediction of Projected Score and Winner
Prediction utilizing AI. They thought about the exactness of various calculations: Linear
relapse, Ridge relapse, Multilayer Perceptron Neural Network. Information of the beyond 10
years of IPL matches is utilized to make this dataset. The Data in the dataset is dated from
2009 to 2019. The dataset is parted into two sections, the information from 2008 to 2016 is
utilized to prepare the models and the information from 2017 onwards is utilized to test the
model. They carried out the calculation that can anticipate exact extended in the middle
between an advancing match. They carried out as a web application with the assistance of
Flask. This model furnishes an exactness of 77.286% with the assistance of direct relapse
and practically 74.236% with the assistance of edge relapse.

3
R. Kamble, et.al [3] examined Cricket Score Prediction Using Machine Learning. They
fostered a model that can foresee the score of a group in the wake of playing 20 over from
the ongoing circumstance. They carried out Naive Thomas Bayes, Random Forest, multiclass
SVM and Tree classifiers to prompt the expectation models for each the issues. They viewed
that Random Forest classifier as the chief right for each issue. Information of the beyond 5
years of IPL matches is utilized to make this dataset. The dataset is partitioned into preparing
and testing part in 80 - 20 proportion .80 percent of the information for preparing and 20
percent of information for testing is utilized. Straight relapse is utilized in this model, it over
and over plays out a similar work to give better exact score. This model is utilized for
foreseeing and computing the last qualities. They executed a framework that is important for
going with key choices. The data set kept up with is refreshed on each forecast and the
framework works proficiently with a gigantic dataset of two a large number of columns.

T. Suvarna Kumari, et.al [4] dealt with Cricket Match Score Prediction utilizing k-Nearest

Neighbors Algorithm. They proposed a technique in which the last score can be anticipated of the

main innings. They executed KNN calculation to anticipate the match scores for the main innings

and second innings datasets where the class trait 'X' is the 'Score' and the info quality 'artificial

intelligence' (I =1,2,3…) are the overall group strength, home/away and scene normal. Factors like

Relative group strength, Home, and Venue normal has been considered for the expectation. The

dataset comprises of complete matches barring all the downpour hindered and downpour

deserted games, played somewhere in the range of 2000 and 2018 among the ODI playing groups

like India, England, Australia and so forth. They determined the blunder pace of the two innings,

and the mistake pace of first innings is 24% and that of second innings is 16%.

Kushooo, et al. [5] utilized information mining to estimate cricket scores and champs. They contrived a

procedure for anticipating and working out the main innings score in a cricket match. Most games

forecasts are made utilizing relapse or order issues, the two of which are administered learning

undertakings. The result in relapse is a consistent worth, though characterization manages discrete

result. Straight Regression seemed, by all accounts, to be very viable for anticipating persistent

qualities, and learning calculations like Naive Bayes, Logistic Regression, Neural Networks, and

Random Forests were viewed as utilized in many past investigations for

4
characterization issues, for example, foreseeing the result of matches or arranging players. From

the outcomes, they inferred that Random Forest ended up being the most dependable classifier

for both the datasets with a precision of 89.74% for anticipating runs scored by a batsman and

93.27% for foreseeing wickets taken by a bowler. Consequences of SVM accomplished a precision

of only 52.35% for anticipating runs and 72.85% for foreseeing wickets.

Prasad Thorat, et al. [6] proposed productive and tough calculations for assessing the expected score

of a cricket match's most memorable innings. The unique person of the cricket match-up is by and

large neglected while making expectations. They proposed CricFirst Predictor (CFP), a strategy that

can consider the game's powerful person. The information comes from Kaggle datasets. The CSV

design is utilized to store the information. The dataset is isolated into two sections: preparing

information for the model and test information for the model. The client inputs incorporate, Name of

the batting group, name of the bowling crew, number of runs scored, number of overs bowled, number

of wickets taken, number of runs scored in the past 5 overs, and number of wickets taken in the past 5

overs. The calculation accomplishes an exactness of 75.16% on direct relapse and practically 67.36%

on irregular woods relapse and practically 74.10% on rope relapse. They infer that for score

expectation the direct relapse gives the most noteworthy precision result.

Prateek Gupta, et.al [7] broke down Cricket Score Forecasting utilizing Neural Networks. They
proposed a framework which beats the significant shortcoming of manual work which is
tedious and required labor to keep up with the records and insights of every player
physically. For foreseeing consistent qualities, LSTM based Neural Network is utilized where
the information of the past 18 conveyances was utilized as setting, and the score after 18
conveyances was our objective. They utilized a dataset contains information from more than
10 years (2010 - 2021) separated for T20 matches, adding up to north of 2600 matches. Our
dataset incorporates information from IPL (2010-2021), BBL (2011/12-2020/21), PSL (2015/16-
2020/21), CPL (2013-2020), Lanka Premier League (2019/20) and T20 International Matches
(2010-2021). The most dependable model purposes the "ELU" initiation work on the result
layer and Mean Square Error as the misfortune work and beats the score anticipated by run
rate. They reasoned that model produces Root Mean Square Error of 7.0 when contrasted
with 18.11 created by the run rate-based forecast.

5
Sudhanshu Akarshe, et.al [8] chipped away at Cricket Score Prediction utilizing Machine
Learning Algorithms in view of different AI calculations. They proposed a model to anticipate
the match result, execution of every player in light of the authentic information. They
gathered the information for all conceivable matches. They gathered the dataset from
different sites like ESPN, Kaggle, and so on. Generally, single calculation is utilized in such
forecast framework and separate execution is estimated. All things considered, they mean to
utilize numerous AI calculations and moderately measure their exhibition.

The Indian Premier League was the subject of Harshit Barot, et.al [9]. They introduced a
framework that would utilize Machine Learning strategies like SVM, Logistic Regression,
Random Tree, Random Forest, and Naive Bayes to foresee the results of the IPL games.
The best still up in the air by taking a gander at the past five years' records, from 2015 to
2019. We utilized two different datasets. The first gives ball-by-ball data to each IPL
match at any point played, including the batsman, bowler, runs, wicket, and that's just
the beginning, for every single bundle of the game. The second dataset contains an
outline of each match, including the groups in question, the victor, the throw champ, and
other data for each match played in the IPL. They inferred that the most minimal
exactness got begins from 82.6% (Naive Bayes) and continues expanding to give higher
precision utilizing choice tree calculation (86.9%) and strategic relapse (94.6%).

Arjun Singhvi and partners [10] concentrated on the result of a Twenty-20 cricket match. The T20

arrangement of cricket is profoundly startling, which is one reason for its ongoing prominence.

They proposed a model that uses Machine Learning calculations, for example, strategic relapse,

support vector machines, bayes organization, choice tree, and arbitrary woods to anticipate the

results of IPL games. Just information from worldwide, homegrown, and association T20 cricket

matches was incorporated in light of the fact that it was the most reliable image of how a player

would act in a T20 challenge. ESPN gave a rundown of 5390 Twenty20 matches. They inferred that

AdaBoost has the most noteworthy exactness of 62%, while Decision Trees have the least.

6
Rameshwari, et al. [11] utilized a straight relapse classifier to foresee live cricket score and

winning. They made a model with two techniques: the main gauges the score of the primary

innings in view of the ongoing run rate as well as the quantity of wickets lost, the match scene,

and the batting and bowling crews. The subsequent strategy involves similar properties as the

main technique, in addition to the batting group's point, to figure the result of the match in the

subsequent innings. For the first and second innings, these two procedures were created utilizing

Linear Regression Classifier or Q-Learning premise choice tree approach and Naive Bayes

Classifier, individually. They got the data from the site http://cricsheet.org.

G. Sudhamathy et al. [12] involved AI strategies in the R bundle to foresee ipl information. They

introduced a model that zeroed in on authorization and estimated the contrast between the

models to foresee which side would be the more convincing in an IPL match. The calculation

examinations the information first to create a model that can be utilized to figure out examples

and patterns. The model is improved by picking boundaries and emphasizing to make the mining

model. The settings are then stacked into the dataset to recognize significant examples and nitty

gritty measurements. They utilized the elements of the R Package to find significant data about

the IPL Teams. They for the most part utilized Decision Tree, Naive Bayes, K-Nearest Neighbor,

and Random Forest as AI calculations. They assembled the information.

Cricket Match Analytics Using the Big Data Approach was proposed by Mazhar Javed Awan,
et al [13]. They recommended a direct relapse model-based strategy to foresee group scores
without utilizing huge information or the Spark ML large information system. The champ has
been anticipated utilizing procedures like calculated relapse, KNN, Nave Bayes, and SVM,
among others. They accumulated information from sites like as Kaggle, Cricsheet, and
others, and afterward applied ball-by-ball subtleties as well as various limitations. The
arranged information was partitioned into two classifications: preparing (80%) and testing
(20%). (20% ). They presumed that subsequent to applying straight relapse, the precision of
root mean square blunder (RMSE), mean square mistake (MSE), and mean outright mistake
(MAE) is 95%, 30.2, 1350.34, and 28.2, separately.

7
Aman Sahua et al. [14] zeroed in on a calculated relapse based Predictive Analysis of Cricket. They

handled information and made suggestions utilizing Google Colab, an information investigation

instrument. Irregular Forest Classifier, Multinomial calculated relapse, and Adaboost are a portion of

the calculations they use. Further developed models can help leaders during cricket match-ups by

permitting them to contrast a group's capacities with those of the resistance as well as ecological

contemplations. To describe the standards and proposals of the arrangements, they utilized a mark

encoder for preprocessing the dataset and an irregular woods classifier procedure for assemblage.

The information was exposed to calculated relapse, which uncovered 74.9 percent precision in

anticipating game results and 81 percent exactness in foreseeing the triumphant group. From the

outcomes, they reasoned that Random Forest Classifier, Multinomial Logistic Regression, Adaptive

Boosting calculations accomplished a precision of 98.14%,29.62%,1.0 for the preparation model and an

exactness of 89.47%,27.63%,84.21% for the testing model.

Akhil Nimmagadda, et al. [15] utilized information mining and the Random Forest technique to

conjecture Cricket score and win. They conceived a calculation to gauge the result of One-Day

International cricket matches, in which they assess the batting and bowling possibilities of the 22

players partaking in the match involving their vocation measurements and dynamic support in

late games. They took advantage of player potential to portray one group's relative

incomparability over the other. They utilize directed learning calculations to anticipate the match

champ utilizing specific other base elements, like run rate and the match scene, as well as relative

group strength. The model was constructed utilizing Multiple Variable Linear Regression.

Wickramasinghe, et al. [16] zeroed in on utilizing a Naive Bayes classifier to anticipate the champ

of an ODI cricket match. After the main inning of an ODI cricket match, they proposed a

calculation to anticipate the champs. They produce this expectation utilizing the Naive Bayes (NB)

strategy with information from 15 highlights, which incorporate factors connected to batting,

bowling, group arrangement, and different elements. They want to utilize highlight choice

calculations, for example, univariate, recursive end, and head part examination to expand the

precision of foreseeing the champ in the wake of building an underlying model (PCA). They

likewise utilized include determination methodologies to increment expectation precision

(univariate, recursive, and PCA strategies). In view of the consequences of the examinations, they

8
found that utilizing a univariate highlight choice method with a 85:15 preparation to
testing test size proportion accomplishes the most extreme precision in anticipating the
triumphant group. The most noteworthy expectation exactness is 86.71 percent (i.e., the
least blunder rate is 13.49 percent), which is magnificent for an ODI cricket match.

Jalaz Kumar et al. [17] proposed utilizing choice trees and MLP organizations to foresee the result

of ODI cricket matches. They utilize choice tree Classifier and multi-facet perceptron as

calculations. They assembled the measurements from all ODI matches from January 5, 1971,

through October 29, 2017. There were 3933 ODI match results that were dropped. They introduced

a technique that is better than measurements in light of the fact that, not at all like insights, which

utilizes numerical conditions to characterize connections between factors, these strategies don't

need any earlier presumptions about the information factors and their fundamental connections.

D. Jyothsna et al. [18] utilized Linear Regression, Decision Tree, K-implies, and Logistic
Regression to dissect and anticipate the result of IPL Cricket Data. They assembled
information from the IPL during the most recent seven years, including player data, match
setting data, group data, and ball-to-ball data, and assessed it to arrive at different
resolutions that guide in the improvement of a player's exhibition. Different variables, for
example, what the area or throw choice meant for the match's result in the past seven years,
are likewise determined. As per the discoveries, Random Forest is the most reliable
classifier, anticipating the best player execution with a precision of 89.15 percent.

Rameshwari Lokhande, et al. [19] focused on Live Cricket Score and Winning Prediction
utilizing Linear Regression and Naive Bayes Classifier. They proposed a methodology for
making a model for determining the principal innings last score and assessing the match
result in the second innings for a restricted overs cricket match. The throw, the groups' ODI
rankings, and the host group advantage are factors in the projections. On earlier matches,
two unmistakable models, one for the primary innings and the other for the subsequent
innings, have been proposed for use in the Linear Regression classifier and Naive Bayes
classifier, separately. Rather than straight relapse, the support calculation is used.

9
Foreseeing the Winner in One-Day International Cricket was a venture attempted by Ananda

Bandulasiri and associates [20]. They proposed a factual investigation of the meaning of "home

field" advantage in One Day International cricket. Besides, the curious result of the detriment of

winning the coin throw for daytime matches has been noted. They have found, notwithstanding,

that triumphant the coin throw offers them a benefit in "Day and Night" challenges. The best

current technique for refreshing focuses for deferred cricket matches is the Duckworth Lewis

strategy, which appears to measurably lean toward the group most affected by the interference.

They inferred that this is definitely not a fair correlation in light of the fact that the DL approach

doesn't have a similar measure of data to choose the genuine champ.

10
3 PROBLEM STATEMENT

This model is to plan a framework that can be anticipate the primary innings score of
cricket match, the framework can dissect numerous boundaries like batting group,
bowling crew, overs finished, runs scored till a specific number of overs, wickets fallen
till a specific number of overs, runs scored in past 5 overs, wickets fallen in past 5 overs.
To anticipate the aftereffects of an IPL match utilizing AI strategies or calculations like
Logistic Regression, Linear Regression, Ridge Regression, SVM, Lasso Regression and
Random Forest. We have utilized 15 elements which are as per the following: mid, date,
scene, batting group, bowling crew, batsman, bowler, runs, wickets, overs, runs scored
in last 5 overs, wickets fallen in last 5 overs, striker, non-striker, complete.

11
4 Methodology
4.1 Description of the dataset
IPL DATASET:
The dataset used in this project is ipl.csv which is obtained from Kaggle website. The
dataset contains information about each and every ball of all ipl matches from 2008 to
2017.The dataset contains 1140210 tuples. Each tuple contains 15 attributes. The
th
target feature is the 15 feature, which is a real valued attribute, its range lies
between 0 to infinity. Each feature used in this dataset are described as follows:

mid: It defines the match id.


date: It tells, on which date match had held. Its format is DD-
MM-YYYY.
venue: It is the Place at which match is being held.
bat_team: It is the name of the team which is batting.
Bowl_team: It is the name of the team which is bowling.
Batsman: name of batsman on strike.
bowler: name of bowler
runs: total number of runs scored till that ball
wickets: number of wickets fallen
overs: It gives information about which over,which ball is being bowled
Runs_last_5: runs scored in last 5 overs
Wickets_last_5: wickets fallen in last 5 overs
Stricker: A batsman who is ready to receive the ball
Non-striker: A batsman who is in but not ready to receive the ball
Total: It is the total number of runs scored after 20 overs in that match

4.2 MODEL ARCHITECTURE:

We plan to construct a model that can anticipate the main innings score of a live
IPL match effectively. We are hoping to construct a model that can consider
different boundaries that add to the score forecast.

12
DATA COLLECTION:
We will be taking the dataset from the datasets accessible on Kaggle. The
dataset will be taken in the CSV design. The information gathered from the site
will be cleaned in the following stage.

DATA CLEANING:
In the information cleaning step, we need to eliminate undesirable sections like match id,
setting, name of the batsman, name of the bowler, the score of the striker batsman, and
score of the non-striker batsman. These segments won't be needed during forecast
subsequently we will be neglecting the sections. In the IPL dataset, barely any groups are
not reliable they had just played for not many years. Along these lines, we really want to
dispose of those groups from the dataset and we just have to think about the steady
groups. We will think about the information after 5 overs. The date section in the dataset is
available in the string design yet we need to apply a few procedures on the date segment
for that we should switch the string over completely to a date-time object.

DATA PREPROCESSING:
In the wake of cleaning the information, we will require our information to be preprocessed. In

the information preprocessing step, we will perform one-hot encoding. One hot encoding is

made sense of exhaustively in the execution area. We should revise the sections of our dataset

in the information preprocessing step. The motivation behind revising sections is that we want

our segments to be appropriately organized in some succession.

DATA SPLITTING:
After data preprocessing, we will separate our data with the goal that IPL matches
played before 2016 will be considered for the arrangement of the model and IPL
matches played after 2016 will be considered for testing the model.

13
MODEL GENERATION:
We will utilize the Linear Regression model, Random Forest Regression and Lasso

Regression model for the foreseeing the result in light of info. These models are generated

by using training data from the input dataset. These models are tested by using test data

from the input dataset. Then, they are used to predict output on new data.

FINAL PREDICTION:
Finally, Inorder to predict the output, different data sources will be taken from the client ,based

on the input a range of final score will be displayed as output on the web page.

The following is the model design for CFP framework.

Model Generation
(Ridge Regression)

Fig 4.1 Model Architecture

14
4.3 MACHINE LEARNING ALGORITHM:

RIDGE REGRESSION:
Edge relapse is utilized to make a closefisted model when how much indicator factors
during a set surpasses the quantity of perceptions, or when an information set has
multi-collinearity (relationships between's indicator factors). Edge relapse has a place
with L2 regularization. This occasionally prompts the end of certain coefficients
through and through, which might yield inadequate models. L2 regularization adds a L2
punishment, which rises to the square of the extent of coefficients.

A direct relationship will be worked between input factors and the objective variable
in Linear Regression. On the off chance that result relies upon just single info
variable, it gives line as result, on the off chance that indicator variable relies upon
numerous information factors, it gives hyperplane as result. The coefficients of the
model are found through a headway interaction that tries to limit the aggregate
squared blunder between the forecasts (yhat) and the normal objective qualities (y).
loss = sum i=0 to n (y_i – yhat_i)^2

In straight relapse, once in a while assessed coefficients of the model can turn out to

be huge, making the model delicate to inputs and perhaps unsteady. It valid for issues

with not many examples or less examples (n) than input indicators (p) or factors.

One way to deal with address the security of relapse models is to change the
misfortune capacity to incorporate extra expenses for a model that has huge
coefficients. Direct relapse models that utilization these changed misfortune
capacities during preparing are alluded to by and large as punished straight relapse.

One famous punishment is to punish a model in view of the amount of the


squared coefficient values(beta). This is called a L2 punishment.

l2_penalty = sum j=0 to p beta_j^2

15
A L2 punishment limits the size, all things considered, in spite of the fact that it keeps any

coefficients from being taken out from the model by permitting their worth to become zero.

The impact of this punishment is that the boundary gauges are possibly permitted to
turn out to be huge assuming there is a corresponding decrease in SSE. Essentially,
this strategy recoils the evaluations towards 0 as the lambda punishment turns out
to be huge (these procedures are here and there called "shrinkage techniques").

This punishment can be added to the expense work for straight relapse and is alluded to as

Tikhonov regularization (after the creator), or Ridge Regression all the more by and large.

A hyperparameter is used called "lambda" that controls the weighting of the discipline to the

mishap work. A default worth of 1.0 will completely weight the punishment; a worth of 0 avoids

the punishment. Tiny upsides of lambda, for example, 1e-3 or more modest are normal.

ridge_loss = loss + (lambda * l2_penalty)

16
5 IMPLEMENTATION

The execution is done by using a web application called flask API, the models are
trained by using Jupyter Notebook and SKLearn library is used. Below are the
steps for implementing a machine model along with the training of the model.

5.1 DATA COLLECTION:

Fig 5.1 Screenshot for Data Collection

Figure 5.1 shows the snapshot of the code which is used to collect the data from
external source. For any model to be trained there must be input data which is
named as dataset. There are many ways to collect the data from web, here we are
using the dataset called as ipl.csv. It is collected from the Kaggle website. The
dataset consists of 1140210 tuples and each tuple contains 15 features.

17
5.2 DATA EXTRACTION:

Fig 5.2 Screenshot for Data Cleaning

Figure 5.2 shows the snapshot of the data that is necessary for building the model.
The above code is used to remove the unwanted features from the dataset. Here the
unwanted features are mid, batsman, bowler, striker, non-striker.

Fig 5.3 Screenshot for Data Extraction

18
Figure 5.3 shows the snapshot of code that is used to extract only the consistent
teams playing from the starting of the IPL. The model can’t predict the score with the
first 5 overs data so in the above code first 5 overs data is removed from the dataset.
Date column is initially in string format and it is converted into datetime object.

5.3 DATA PREPROCESSING:

Fig 5.4 Screenshot for Data Preprocessing

Figure 5.4 shows the snapshot of code that is used to convert the categorical features by using

oneHot encoding method. This type of encoding creates new binary feature for each possible

category for columns bat_team and bowl_team. The batting team entered by the user should be

restored by 1 and all the other teams should be restored with 0.The bowling team entered by the

user should be restored by 1 and all other teams should be restored with 0.

19
5.4 DATA SPLITTING:

Fig 5.5 Screenshot for Data Splitting

Figure 5.5 shows the entire dataset is split in 2 parts. The original dataset contains
the ipl matches from 2008 to 2017.This code divides the dataset into training and
testing where training data is taken from 2008 to 2016 and test data is taken from the
year 2017. Train data contains 821260 tuples and test data contains 61116 tuples.

5.5 BUILDING THE MODEL:

Fig 5.6 Screenshot for Building the Model

20
Figure 5.6 shows the snapshot of the code to build the model successfully for score prediction.

The main step of entire project is building a model to process the data that is collected. Here we

have used Ridge regression as it gives accurate output like linear regression, but it also protects

the model from over fitting. Grid search is a cross validation technique that takes different

combinations of values by a dictionary and assess the model by using cross validation method.

5.6 FINAL PREDICTION:

A user interface is created by using flask and web templates to take inputs from the
user and to display the range of score predicted by the ridge regressor model.

Fig 5.7 Screenshot for web page

Whenever user fills all the user entries for the above webpage then a list is created with around 16

variables. For example, if batting team is Bengaluru and bowling team is Rajasthan then it will be

encoded in backend as [1,0,0,0,0,0,0,0] + [0,0,0,0,1,0,0,0]. The above 2 lists are then combined and

added with the remaining 6 variables in HTML form. The forecasting of the final score is then

added by 10 to get maximum score can get and subtracted by 10 to get minimum score can get in

21
that match. This output is then displayed to user as the forecasted value. For
example, if the model forecasts 180 as the final score for the first inning, then 170
to 190 will be displayed for the final output.

Fig 5.8 Screenshot for filling values in web page

22
6 RESULT ANALYSIS

This project is implemented by using 3 different regression techniques, they are


Linear_regression, Ridge regression, Lasso regression. They are different mean
squared error rates for these 3 different regression techniques. The 3 different error
rates are compared by using 3 different regression techniques in the following way.

6.1 BY USING LINEAR REGRESSION:

Fig 6.1 Error Rates by using Linear Regression

Figure 6.1 shows the 3 different types error rates occurred by implementing the linear regression

technique. The mean absolute error rate occurred by implementing linear regression technique is

12.11861. The mean squared error occurred by implementing linear regression technique is 251.

00792.The root mean squared error occurred by implementing linear regression is 15.84322.

6.2 BY USING RIDGE REGRESSION:

Fig 6.2 Error Rates by using Ridge Regression

23
Figure 6.2 shows the 3 different types error rates occurred by implementing the lasso regression

technique. The mean absolute error rate occurred by implementing ridge regression technique is

12.11729. The mean squared error occurred by implementing ridge regression technique is 251.

03172.The root mean squared error occurred by implementing ridge regression is 15.84398.

6.3 BY USING LASSO REGRESSION:

Fig 6.3 Error Rates by using Lasso Regression

Figure 6.3 shows the 3 different types error rates occurred by implementing the lasso regression

technique. The mean absolute error rate occurred by implementing lasso regression technique is

12.21405. The mean squared error occurred by implementing lasso regression technique is 262.

37973.The root mean squared error occurred by implementing lasso regression is 16.19813.

6.4 COMPARISON BETWEEN 3 DIFFERENT REGRESSORS:

From the above 3 figures it is clear that root mean squared error value is very low when the

model is implemented by using Linear Regression and the mean absolute error value is very

low when the model is implemented by using Ridge regression and the mean squared error

value is very low when the model is implemented by using Linear regression.

Linear regression gives the best accurate result when compared to remaining
regression techniques like Ridge regression and Lasso regression.

24
Linear Regression Ridge Regression Lasso Regression

Mean Absolute Error 12.118617546193294 12.117294527005017 12.214053814850248

Mean Squared Error 251.00792310417438 251.03172964112733 262.37973664007154


Root Mean Squared Error 15.843229566732106 15.843980864704656 16.198139912967523

Comparison between different regressors

Fig 6.4 Comparison between different Fitting Models

OVERFITTING:

Overfitting occurs when a model works well on train data but does not give
accurate output for test data.

UNDERFITTING:

Underfit models are those which does not work well on testing data.

GOOD FIT:

Good Fit works with both training dataset and test dataset and gives accurate
outputs for both the datasets around 70 to 80%.

25
Fig 6.5 Using train data for testing

Fig 6.6 Error rates for training data

Figures 6.6 Shows the error rates for training data between 2010 to 2016.Here the error
rates are same as test data. The models are not overfitting our data. So, they are giving
accurate outputs on both test and training data. So they are good fit for our dataset.

26
7 CONCLUSION

This project mainly deals with first innings score of an ipl match is forecasted by using a machine

learning algorithm called ridge regression, linear regression, lasso regression. Here a dataset

named ipl.csv is used. This dataset contains of 15 different features out of which mainly used are

batting team, bowling team, runs scored at current over, wickets fallen at current overs, overs

completed, runs scored in last 5 overs, wickets fallen in last 5 overs. This project takes above

inputs and forecast the score of an ipl match as output. This project predicts the output by using

3 different models those are ridge regression, linear regression and lasso regression and their

corresponding root mean squared error rates are 15.8439,15.8432,16.1976. Linear regression

gives less root mean squared error rate when compared to different regression techniques like

ridge regression and lasso regression. But ridge regression protects the model from overfitting.

27
8 REFERENCES

1. Nikhil Dhonge, Shraddha Dhole. "Ipl cricket score forecast utilizing AI methods" Research

Journal of Computer Science and Technology, Volume:05/Issue:04/May-2021

2. Apurva Lawate , Nomesh Katare. “Cricket Prediction of projected Score


and Winner Prediction” Journal of Computer and Communication Engineering
Vol. 12, Issue 4, February 2021
3. R. R. Kamble , Nidhi Koul. “IPL Score Prediction by using Machine Learning
Algorithm” Journal of Computer Science and Engineering Vol.10 , 2020
4. T. Suvarna Kumari, P.Narsaiah. “Match Score Prediction using k-Nearest
Neighbors Algorithm” IJRECE VOL. 9 ISSUE 5 Apr - July 2018
5. Kushooo , Nisha. “IPL Score and winner prediction by using data mining”
Journal of Multi-Disciplinary Volume 5, Issue 4, February 2020
6. Prasad Thorat, Vighnesh Buddhivant. “Cricket score prediction” IJCRT |
Volume 9, Issue 5 May 2021
7. Prateek Gupta, Navya Sanjna Joshi. “Cricket Score Forecasting using Neural
Networks” I Journal of Engineering and Technology, Volume-11 Issue-4, June 2020
8. Sudhanshu Akarshe, Rohit Khade. “Cricket Score Prediction using Machine Learning
Algorithms” GRD Journal for Engineering | Volume 8 | Issue 7 | September 2018

9. Navya Sanjna Joshi. “Prediction for the Indian Premier League” 2020
International Conference of Technology July, 2019
10. Arjun Singhvi, Ashish V Shenoy. “Prediction of a Twenty-20 Cricket Match”
GRD Journal for Engineering | Volume 7 | Issue 2 | September 2020
11. Ashish V Shenoy. “Prediction of Live Cricket Score and Winning
Prediction” Journal of Trend in Research and Development, Volume 5
12. Vighnesh Buddhivant. “Prediction of Score of an IPL using machine learning by using
R package” Journal of Computer Science, November 2019, Volume: 2, Issue: 04

13. Devang Kaushik. “Cricket Match Analytics by using Big Data Approach”
Journal of Computer Science and Technology, 26 September 2021

28
14. Aman Sahua. “Predictive Analysis of Cricket” Journal of Computer Science
and Mathematics Education Vol.14 2020
15. Akhil Nimmagadda , Nidamanuri Venkata Kalyan. “IPL score prediction and
winning prediction using data mining Approach” Journal of Advance Research
and Development Volume 6, Issue 4)
16. Venkata Kalyan. “Machine Learning approach to predict the score of an IPL
cricket game” Journal of Sports Analytics 2018
17. Jalaz. “Score Prediction of IPL Matches using Machine Learning Algorithms”
2018 International Conference of Cyber Computing and Communication
18. Jyothsna. “Predicting the outcome of IPL Cricket Match” Journal of Research
in Science, Engineering and Technology Vol. 6, Issue 4, June 2018
19. Rameshwari Lokhande. “Cricket Live Score Prediction and Winning Prediction”
Journal of Computer Research and Development, Volume 5, Issue 9
20. Nidamanuri Venkata Kalyan, “Predicting the Score in an IPL Cricket”,
Journal of Computer Sciences & Mathematics Education, Vol. 4.

29
9 APPENDIX
SAMPLE CODES:

CODE FOR BUILDING MACHINE LEARNING MODELS:

import pandas as pds


import pickle

dset = pds.read_csv('ipl.csv')

col_to_rem = ['mid', 'venue', 'batsman', 'bowler', 'striker', 'non-


striker'] dset.drop(labels=col_to_rem, axis=1, inplace=True)

c_t = ['Kolkata_Knight_Riders', 'Chennai_Super_Kings', 'Rajasthan_Royals',


'Mumbai_Indians', 'Kings_XI_Punjab', 'Royal_Challengers_Bangalore', 'Delhi_Daredevils',
'Sunrisers_Hyderabad'] dset = dset[(dset['bat_team'].isin(c_t)) & (dset['bowl_team'].isin(c_t))]

dset = dset[dset['overs']>=5.0]

from datetime import datetime


dset['date'] = dset['date'].apply(lambda y: datetime.strptime(y, '%Y-%m-%d'))

enc_dset= pds.get_dummies(data=df, columns=['bat_team', 'bowl_team'])

enc_dset = enc_dset[['date', 'bat_team_Chennai_Super_Kings', 'bat_team_Delhi_Daredevils',


'bat_team_Kings_XI_Punjab', 'bat_team_Kolkata_Knight_Riders', 'bat_team_Mumbai_Indians',
'bat_team_Rajasthan_Royals','bat_team_Royal_Challengers_Bangalore','bat_team_Sunrisers_Hy
derabad','bowl_team_Chennai_Super_Kings','bowl_team_Delhi_Daredevils','bowl_team_Kings_
XI_Punjab','bowl_team_Kolkata_Knight_Riders','bowl_team_Mumbai_Indians','bowl_team_Raj
asthan_Royals','bowl_team_Royal_Challengers_Bangalore','bowl_team_Sunrisers_Hyderabad','o
vers', 'runs', 'wickets', 'runs_last_5', 'wickets_last_5', 'total']]

X_training = enc_dset.drop(labels='total', axis=1)[enc_dset['date'].dt.year <= 2016]


X_testing = enc_df.drop(labels='total', axis=1)[enc_dset['date'].dt.year >= 2017]

y_training = enc_dset[enc_dset['date'].dt.year <= 2016]['total'].values


y_testing = enc_dset[enc_dset['date'].dt.year >= 2017]['total'].values

X_training.drop(labels='date', axis=True, inplace=True)


X_testing.drop(labels='date', axis=True, inplace=True)

30
Code for Ridge Regressor:

from sklearn.linear_model import Ridge


from sklearn.model_selection import GridSearchCV
r=Ridge()
param={'alpha':[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40]}
reg=GridSearchCV(r,param,scoring='neg_mean_squared_error',cv=5)
reg.fit(X_training,y_training)
results=reg.predict(X_testing)
from sklearn import metrics
import numpy as npy
print("Mean_Absolute_Error:" ,metrics.mean_absolute_error(y_testing,results))
print("Mean_Squared_Error:" ,metrics.mean_squared_error(y_testing,results))
print("Root_Mean_Sqaured_Error:", npy.sqrt(metrics.mean_squared_error(y_testing,results)))

Code for Linear Regressor:

from sklearn.linear_model import LinearRegression


l_regressor = LinearRegression()
l_regressor.fit(X_train,y_train)
l_prediction=l_regressor.predict(X_test)
from sklearn import metrics
import numpy as np
print("Mean_Absolut_ Error:" ,metrics.mean_absolute_error(y_testing,l_prediction))
print("Mean_Squared_Error:" ,metrics.mean_squared_error(y_testing,l_prediction))
print("Root_Mean_Sqaured_Error:",npy.sqrt(metrics.mean_squared_error(y_testing,l_prediction
)))

Code for Lasso Regressor:

from sklearn.linear_model import Lasso


from sklearn.model_selection import GridSearchCV
lasso=Lasso()
parameters={'alpha':[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40]}
lasso_regressor=GridSearchCV(lasso,parameters,scoring='neg_mean_squared_error',cv=5)
lasso_regressor.fit(X_traininng,y_training)
lasso_prediction=lasso_regressor.predict(X_testing)

from sklearn import metrics


import numpy as npy
print("Mean_Absolute_Error:" ,metrics.mean_absolute_error(y_testing,lasso_prediction))
print("Mean_Squared_Error:" ,metrics.mean_squared_error(y_testing,lasso_prediction))
print("Root_Mean_Sqaured_Error:",np.sqrt(metrics.mean_squared_error(y_testing,lasso_predicti
on)))
fn = 'first-innings-score-lr-model.pkl'
pickle.dump(reg, open(fn, 'wb'))

31
Flask Code:

from flask import Flask, render_template, request


import pickle
import numpy as np

fn = 'first-innings-score-lr-model.pkl'
regressor = pickle.load(open(fn, 'rb'))

filename2 = 'linear-first-innings-score-lr-
model.pkl' lr = pickle.load(open(filename2, 'rb'))

filename3 = 'Lasso-first-innings-score-lr-model.pkl'
lassor = pickle.load(open(filename3, 'rb'))

app = Flask(_name_)

@app.route('/')
def home():
return render_template('index.html')

@app.route('/predict', methods=['POST'])
def predict():
t_a = list()

if request.method == 'POST':

bat_team = request.form['batting-team']
if bat_team == 'Chennai_Super_Kings':
t_a = t_a+ [1,0,0,0,0,0,0,0]
elif bat_team == 'Delhi_Daredevils':
t_a = t_a + [0,1,0,0,0,0,0,0]
elif bat_team == 'Kings_XI_Punjab':
t_a = t_a + [0,0,1,0,0,0,0,0]
elif bat_team == 'Kolkata_Knight_Riders':
t_a = t_a + [0,0,0,1,0,0,0,0]
elif bat_team == 'Mumbai_Indians':
t_a = t_a + [0,0,0,0,1,0,0,0]
elif bat_team == 'Rajasthan_Royals':
t_a = t_a + [0,0,0,0,0,1,0,0]
elif bat_team == 'Royal _Challengers_Bangalore':
t_a = t_a + [0,0,0,0,0,0,1,0]
elif bat_team == 'Sunrisers_Hyderabad':
t_a = t_a + [0,0,0,0,0,0,0,1]

32
bowl_team = request.form['bowling-team']
if bowl_team == 'Chennai_Super_Kings':
t_a = t_a + [1,0,0,0,0,0,0,0]
elif bowl_team == 'Delhi_Daredevils':
t_a = t_a + [0,1,0,0,0,0,0,0]
elif bowl_team == 'Kings_XI_Punjab':
t_a = t_a + [0,0,1,0,0,0,0,0]
elif bowl_team == 'Kolkata_Knight_Riders':
t_a = t_a + [0,0,0,1,0,0,0,0]
elif bowl_team == 'Mumbai_Indians':
t_a = t_a + [0,0,0,0,1,0,0,0]
elif bowl_team == 'Rajasthan_Royals':
t_a = t_a + [0,0,0,0,0,1,0,0]
elif bowl_team == 'Royal_Challengers_Bangalore':
t_a = t_a + [0,0,0,0,0,0,1,0]
elif bowl_team == 'Sunrisers_Hyderabad':
t_a = t_a + [0,0,0,0,0,0,0,1]

overs = float(request.form['overs'])
runs = int(request.form['runs'])
wickets = int(request.form['wickets'])
runs_in_prev_5 = int(request.form['runs_in_prev_5'])
wickets_in_prev_5 = int(request.form['wickets_in_prev_5'])

t_a = t_a + [overs, runs, wickets, runs_in_prev_5, wickets_in_prev_5]

data = npy.array([temp_array])
my_prediction = regressor.predict(data)[0]
l_prediction=lr.predict(data)[0]
lass_prediction=lassor.predict(data)[0]

return render_template('result.html', lower_limit = int(my_prediction)-10, upper_limit =


int(my_prediction)+10,Ridge_result=my_prediction,Linear_result=l_prediction,Lasso_result=las
s_prediction)

if _name_ == '_main_':
app.run(debug=True)

33
IMPLEMENTING SCREEN:

Fig 9.1 Output Screen

34

You might also like