0% found this document useful (0 votes)
14 views39 pages

Editable 1

The project report focuses on predicting the score of IPL cricket matches using machine learning techniques, specifically employing ridge regression based on various match parameters. The dataset used spans IPL matches from 2008 to 2017, containing detailed statistics to enhance prediction accuracy. The proposed model aims to improve upon existing methods by considering dynamic game factors and utilizing multiple regression techniques for better forecasting of match scores.

Uploaded by

naruto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views39 pages

Editable 1

The project report focuses on predicting the score of IPL cricket matches using machine learning techniques, specifically employing ridge regression based on various match parameters. The dataset used spans IPL matches from 2008 to 2017, containing detailed statistics to enhance prediction accuracy. The proposed model aims to improve upon existing methods by considering dynamic game factors and utilizing multiple regression techniques for better forecasting of match scores.

Uploaded by

naruto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

A Project report on

CRICKET MATCH SCORE PREDICTION


USING MACHINE LEARNING
in partial fulfillment for the award of the degree of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING

Submitted by

ADDANKI SWARNA SRI 19B91A0504


DANGETI DHARMA SAI 19B91A0542
BHUPATHIRAJU DILEEP VARMA 19B91A0523
BASAVA JAYANTH 19B91A0520

Under the Guidance of

SRI CH. VINOD VARMA


Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SRKR ENGINEERING COLLEGE (A)
Chinna Amiram, Bhimavaram, West Godavari Dist., A.P.
[2022 – 2023]
DEPARTMENT OF COMPUTER SCIENCE ANDENGINEERING
SRKR ENGINEERING COLLEGE (A)

Chinna Amiram, Bhimavaram, West Godavari Dist., A.P.

[2022 – 2023]

BONAFIDE CERTIFICATE

This is to certify that the project work entitled “CRICKET MATCH SCORE PREDICTION USING
MACHINE LEARNING” is the bonafied work of “Addanki Swarna Sri (19B91A0504), Dangeti Dharma
Sai (19B91A0542), Bhupathiraju Dileep Varma (19B91A0523), Basava Jayanth (19B91A0520)” who
carried out the project work under my supervision in partial fulfilment of the requirements for the award of the
degreeof Bachelor of Technology in Computer Science and Engineering.

SUPERVISOR HEAD OF THE DEPARTMENT

Sri.CH.Vinod Varma Dr. V Chandra Sekhar

Assistant Professor Professor


SELF DECLARATION

We hereby declare that the project work entitled “CRICKET MATCH SCORE PREDICTION USING
MACHINE LEARNING” is a genuine work carried out by us in B.Tech., (Computer Science and
Engineering) at SRKR Engineering College(A), Bhimavaram and has not been submitted either in part or
full for the award of any other degree or diploma in any other institute or University.

1. A.Swarna Sri 19B91A0504


2. D.Dharma Sai 19B91A0542
3. B.Dileep Varma 19B91A0523
4. B.Jayanth 19B91A0520
TABLE OF CONTENTS

S.NO DESCRIPTION PG.NO

1 ABSTRACT i

2 LIST OF FIGURES ii

3 INTRODUCTION 1

4 LITERATURE SURVEY 3

5 PROBLEM STATEMENT 5

6 EXISTING SYSTEM 6

7 PROPOSED SYSTEM 6

8 METHODOLOGY 7

9 SYSTEM DESIGN 12

10 IMPLEMENTATION 16

11 RESULT ANALYSIS 24

12 CONCLUSION 28

13 REFERENCES 29

14 APPENDIX 30
ABSTRACT

This project deals with designing a model that forecasts the projected score of an IPL cricket match. The result
of this model depends on various characteristics like current runs, current wickets fallen, runs scoredin last
5 overs, wickets fallen in last 5 overs. The dataset contains history of all the IPL matches that have been
conducted between the years 2008 to 2017. This project can forecast the first innings score of an IPL match
before the first innings is completed. Ridge regression algorithm is used to forecast the score. This model
mainly focuses on the data of last 5 overs to forecast the final score of the match.

i
LIST OF FIGURES

S.NO DESCRIPTION PG.NO

1 MODEL ARCHITECTURE 9

2 USE CASE DIAGRAM 12

3 CLASS DIAGRAM 13

4 SEQUENCE DIAGRAM 14

5 ACTIVITY DIAGRAM 15

6 DATA COLLECTION 16

7 DATA CLEANING 17

8 DATA EXTRACTION 17

9 DATA PREPROCESSING 18

10 DATA SPLITTING 19

11 LINEAR REGRESSION 20

12 LASSO REGRESSION 20

13 RIDGE REGRESSION 21

14 SCREENSHOT FOR WEB PAGE 22

15 SCREENSHOT FOR FILLING VALUES IN WEB PAGES 23

16 ERROR RATES USING LINEAR REGRESSION 24

17 ERROR RATES USING RIDGE REGRESSION 24

18 ERROR RATES USING LASSO REGRESSION 25

19 COMPARISION BETWEEN DIFFERENT FITTING MODELS 26

20 ERROR RATES FOR TRAINING DATA 27

21 OUTPUT SCREEN 33

ii
1.INTRODUCTION

Cricket, the most popular game at the time, was introduced to North America by English people in
the 17th century. This game is being played by the majority of countries. The BCCI launched numerous betting
apps, such as Dream 11, in 2008 to forecast the first innings score of a cricket match. As a result, there is a high
demand for algorithms that anticipate the first innings score of a more crucial cricket match. The easiest
technique to anticipate the first innings score is to use machine learning algorithms. Reinforced, unsupervised,
and supervised learning are the three types of machine learning algorithms. These algorithms are based on the
application as well as the model's output.

Cricket is one of the most popular television programs. This sport is extremely popular in countries
such as India, Australia, England, New Zealand, and South Africa. One of the significant difficulties that has
arisen recently is that the projected score for the first inning of the game does not match the actual score for the
first inning. This is where the necessity for a model to accurately estimate the first innings score of an IPL
match arises. This will aid viewers in predicting the ultimate score of the current match.

A normal Twenty20 match lasts three to four hours, with each innings lasting 75–90 minutes and a
10- to 20-minute break in the middle of the game. Each innings is played over 20 overs, with 11 players on
each team. This is far shorter than previous versions of the game and more in line with other popular team
sports. It was introduced to establish a new version of the game that would appeal to both on-field spectators
and television viewers.

The first innings score of a cricket match is predicted utilizing many strategies in cricket score
prediction. To anticipate the cricket score of IPL matches, many techniques and prediction methods are
employed. The CRR method is extensively used to estimate a cricket match's first inning score. The number of
runs scored in an over is multiplied by the total number of overs in an innings in the CRR technique. This
approach focuses simply on the runs made in an over and ignores the various parameters. The current technique
can only estimate the first innings score based on the current score, not the many characteristics. We are
attempting to increase the 2 accuracy of the present method by incorporating several characteristics while
forecasting the first innings score of a cricket match. We will be focusing on live cricket score prediction and
will be evaluating IPL matches for score prediction.

1
The linear, lasso, and ridge regression machine learning techniques were employed to forecast
the IPL first Inning Match Score. The machine learning model is provided labelled data in Linear Regression,
and the data given to the model is already known to the human. The linear regression model is used to predict
continuous values rather than object classification. Ridge regression can be used to investigate
multicollinearity in data.

There are two sorts of problems that supervised machine learning algorithms can solve: regression
and classification problems. The main issue with categorization models is output. When real value is the
desired output, the Regression model has a severe flaw. Recommendations and forecasts for a sequence of
time series are two more prominent sorts of challenges based on categorization models and regression
techniques. Unsupervised learning aims to model the structure or distribution of data in order to gain a better
understanding of it.

2
2.LITERATURE SURVEY

A significant amount of work has been done over the decades on cricket score prediction technique
and machine learning for predicting the first innings final score. Several different recognition and detection
algorithms for score prediction have been evolving in this field. There are different current techniques
occurring from literature survey.

Nikhil Dhonge, et.al [1] chipped away at the calculations used to anticipate the IPL first Inning Match
Score. They want to look for a computationally proficient procedure that predicts the extended score of the
principal innings. The various classifiers utilized by them are SVC classifier, choice tree classifier and Random
Forest classifier. They gathered information on all the IPL matches played from 2008. The dataset comprises
of 76015 quantities of columns. Dataset comprises 15 segments over which they applied include choice
strategies and chose 8 elements in which 7 are input highlights and 1 is target variable. Relapse examination
involves different calculation for the calculation and in light of that it predicts the ceaseless worth. There are
sure arrangements of factors are utilized for the information and the constant reach esteem is the objective
variable. The calculation accomplishes an exactness of 80.92% on straight relapse and practically 80.84% on
edge relapse and practically 80.45% on tether relapse. They reason that for score expectation the direct relapse
gives the most elevated precision result.

Apurva Lawate, et.al [2] proposed Cricket Match Prediction of Projected Score and Winner Prediction
utilizing AI. They thought about the exactness of various calculations: Linear relapse, Ridge relapse,
Multilayer Perceptron Neural Network. Information of the beyond 10 years of IPL matches is utilized to make
this dataset. The Data in the dataset is dated from 2009 to 2019. The dataset is parted into two sections, the
information from 2008 to 2016 is utilized to prepare the models and the information from 2017 onwards is
utilized to test the model. They carried out the calculation that can anticipate exact extended in the middle
between an advancing match. They carried out as a web application with the assistance of Flask. This model
furnishes an exactness of 77.286% with the assistance of direct relapse and practically 74.236% with the
assistance of edge relapse.

R. Kamble, et.al [3] examined Cricket Score Prediction Using Machine Learning. They fostered a
model that can foresee the score of a group in the wake of playing 20 over from the ongoing circumstance.
They carried out Naive Thomas Bayes, Random Forest, multiclass SVM and Tree classifiers to prompt the
expectation models for each the issues. They viewed that Random Forest classifier as the chief right for each
issue. Information of the beyond 5 years of IPL matches is utilized to make this dataset. The dataset is

3
partitioned into preparing and testing part in 80 - 20 proportion .80 percent of the information for preparing
and 20 percent of information for testing is utilized. Straight relapse is utilized in this model, it over and over
plays out a similar work to give better exact score. This model is utilized for foreseeing and computing the
last qualities. They executed a framework that is important for going with key choices. The data set kept up
with is refreshed on each forecast and the framework works proficiently with a gigantic dataset of two a large
number of columns.

T. Suvarna Kumari, et.al [4] dealt with Cricket Match Score Prediction utilizing k-Nearest Neighbors
Algorithm. They proposed a technique in which the last score can be anticipated of the main innings. They
executed KNN calculation to anticipate the match scores for the main innings and second innings datasets
where the class trait 'X' is the 'Score' and the info quality 'artificial intelligence' (I =1,2,3…) are the overall
group strength, home/away and scene normal. Factors like Relative group strength, Home, and Venue normal
has been considered for the expectation. The dataset comprises of complete matches barring all the downpour
hindered and downpour deserted games, played somewhere in the range of 2000 and 2018 among the ODI
playing groups like India, England, Australia and so forth. They determined the blunder pace of the two
innings, and the mistake pace of first innings is 24% and that of second innings is 16%.

Kushooo, et al. [5] utilized information mining to estimate cricket scores and champs. They contrived
a procedure for anticipating and working out the main innings score in a cricket match. Most games forecasts
are made utilizing relapse or order issues, the two of which are administered learning undertakings. The result
in relapse is a consistent worth, though characterization manages discrete result. Straight Regression seemed,
by all accounts, to be very viable for anticipating persistent qualities, and learning calculations like Naive
Bayes, Logistic Regression, Neural Networks, and Random Forests were viewed as utilized in many past
investigations for 5 characterization issues, for example, foreseeing the result of matches or arranging players.
From the outcomes, they inferred that Random Forest ended up being the most dependable classifier for both
the datasets with a precision of 89.74% for anticipating runs scored by a batsman and 93.27% for foreseeing
wickets taken by a bowler. Consequences of SVM accomplished a precision of only 52.35% for anticipating
runs and 72.85% for foreseeing wickets.

Akhil Nimmagadda, et al. [6] utilized information mining and the Random Forest technique to
conjecture Cricket score and win. They conceived a calculation to gauge the result of One-Day International
cricket matches, in which they assess the batting and bowling possibilities of the 22 players partaking in the
match involving their vocation measurements and dynamic support in late games. They took advantage of
player potential to portray one group's relative incomparability over the other. They utilize directed learning
calculations to anticipate the match champ utilizing specific other base elements, like run rate and the match
scene, as well as relative group strength. The model was constructed utilizing Multiple Variable Linear
Regression.

4
3.PROBLEM STATEMENT

This model is to plan a framework that can be anticipate the primary innings score of cricket match, the
framework can dissect numerous boundaries like batting group, bowling crew, overs finished, runs scored till
a specific number of overs, wickets fallen till a specific number of overs, runs scored in past 5 overs, wickets
fallen in past 5 overs. To anticipate the aftereffects of an IPL match utilizing AI strategies or calculations like
Logistic Regression, Linear Regression, Ridge Regression, SVM, Lasso Regression and Random Forest. We
have utilized 15 elements which are as per the following: mid, date, scene, batting group, bowling crew,
batsman, bowler, runs, wickets, overs, runs scored in last 5 overs, wickets fallen in last 5 overs, striker, non-
striker, complete.

5
4.EXISTING SYSTEM

❖ Current Run Rate method is widely used to predict the score of a cricket match.
❖ In the CRR method, the number of runs scored in an over is multiplied by the total number of overs in
an innings.
❖ It is observed that the prediction done using CRR method does not consider the dynamic nature of the
game.

5.PROPOSED SYSTEM

❖ We are proposing a system that can consider the dynamic nature of the game while predicting the result.
❖ To achieve high accuracy in prediction, we record all the accuracies acquired from several models,
including Linear Regression, Ridge Regression, Lasso Regression.
❖ And choose the best model based on their values.
❖ Our goal is predict the score accurately.

6
6.METHODOLOGY

6.1 Description of the dataset

IPL DATASET:

The dataset used in this project is ipl.csv which is obtained from Kaggle website. The dataset contains
information about each and every ball of all ipl matches from 2008 to 2017.The dataset contains 1140210
tuples. Each tuple contains 15 attributes. The target feature is the 15th feature, which is a real valued attribute,
its range lies between 0 to infinity. Each feature used in this dataset are described as follows:

❖ mid: It defines the match id.

❖ date: It tells, on which date match had held. Its format is DD-MMYYYY.

❖ venue: It is the Place at which match is being held.

❖ bat_team: It is the name of the team which is batting.

❖ Bowl_team: It is the name of the team which is bowling.

❖ Batsman: name of batsman on strike.

❖ bowler: name of bowler

❖ runs: total number of runs scored till that ball

❖ wickets: number of wickets fallen

❖ overs: It gives information about which over,which ball is being bowled

❖ Runs_last_5: runs scored in last 5 overs

❖ Wickets_last_5: wickets fallen in last 5 overs

❖ Stricker: A batsman who is ready to receive the ball

❖ Non-striker: A batsman who is in but not ready to receive the ball

❖ Total: It is the total number of runs scored after 20 overs in that match

7
6.2 MODEL ARCHITECTURE:

We plan to construct a model that can anticipate the main innings score of a live IPL match effectively. We are
hoping to construct a model that can consider different boundaries that add to the score forecast.

DATA COLLECTION:
We will be taking the dataset from the datasets accessible on Kaggle. The dataset will be taken in the CSV
design. The information gathered from the site will be cleaned in the following stage

DATA CLEANING:
In the information cleaning step, we need to eliminate undesirable sections like match id, setting, name of the
batsman, name of the bowler, the score of the striker batsman, and score of the non-striker batsman. These
segments won't be needed during forecast subsequently we will be neglecting the sections. In the IPL dataset,
barely any groups are not reliable they had just played for not many years. Along these lines, we really want to
dispose of those groups from the dataset and we just have to think about the steady groups. We will think about
the information after 5 overs. The date section in the dataset is available in the string design yet we need to
apply a few procedures on the date segment for that we should switch the string over completely to a date-time
object.

DATA PREPROCESSING:
In the wake of cleaning the information, we will require our information to be preprocessed. In the information
preprocessing step, we will perform one-hot encoding. One hot encoding is made sense of exhaustively in the
execution area. We should revise the sections of our dataset in the information preprocessing step. The
motivation behind revising sections is that we want our segments to be appropriately organized in some
succession.

DATA SPLITTING:
After data preprocessing, we will separate our data with the goal that IPL matches played before 2016 will be
considered for the arrangement of the model and IPL matches played after 2016 will be considered for testing
the model.

8
MODEL GENERATION:
We will utilize the Linear Regression model, Random Forest Regression and Lasso Regression model for the
foreseeing the result in light of info. These models are generated by using training data from the input dataset.
These models are tested by using test data from the input dataset. Then used to predict output on new data.

FINAL PREDICTION:

Finally, Inorder to predict the output, different data sources will be taken from the client ,based on the input a
range of final score will be displayed as output on the web page.

❖ The following is the model design for CFP framework.

Model Generation

(Ridge Regression)

Fig 6.1 Model Architecture

9
6.3 MACHINE LEARNING ALGORITHM:

RIDGE REGRESSION:

Edge relapse is utilized to make a closefisted model when how much indicator factors during a set surpasses
the quantity of perceptions, or when an information set has multi-collinearity (relationships between's indicator
factors). Edge relapse has a place with L2 regularization. This occasionally prompts the end of certain
coefficients through and through, which might yield inadequate models. L2 regularization adds a L2
punishment, which rises to the square of the extent of coefficients.

A direct relationship will be worked between input factors and the objective variable in Linear Regression. On
the off chance that result relies upon just single info variable, it gives line as result, on the off chance that
indicator variable relies upon numerous information factors, it gives hyperplane as result. The coefficients of
the model are found through a headway interaction that tries to limit the aggregate squared blunder between the
forecasts (yhat) and the normal objective qualities (y).

loss = sum i=0 to n (y_i – yhat_i)^2

In straight relapse, once in a while assessed coefficients of the model can turn out to be huge, making the model
delicate to inputs and perhaps unsteady. It valid for issues with not many examples or less examples (n) than
input indicators (p) or factors.

One way to deal with address the security of relapse models is to change the misfortune capacity to incorporate
extra expenses for a model that has huge coefficients. Direct relapse models that utilization these changed
misfortune capacities during preparing are alluded to by and large as punished straight relapse.

One famous punishment is to punish a model in view of the amount of the squared coefficient values(beta).
This is called a L2 punishment.

l2_penalty = sum j=0 to p beta_j^2

A L2 punishment limits the size, all things considered, in spite of the fact that it keeps any coefficients from
being taken out from the model by permitting their worth to become zero.

10
The impact of this punishment is that the boundary gauges are possibly permitted to turn out to be huge
assuming there is a corresponding decrease in SSE. Essentially, this strategy recoils the evaluations towards 0
as the lambda punishment turns out to be huge (these procedures are here and there called "shrinkage
techniques").

This punishment can be added to the expense work for straight relapse and is alluded to as Tikhonov
regularization (after the creator), or Ridge Regression all the more by and large.

A hyperparameter is used called "lambda" that controls the weighting of the discipline to the mishap work. A
default worth of 1.0 will completely weight the punishment; a worth of 0 avoids the punishment. Tiny upsides
of lambda, for example, 1e-3 or more modest are normal.

ridge_loss = loss + (lambda * l2_penalty)

11
7.SYSTEM DESIGN

UML DESIGN:

A general-purpose modelling language is UML. The primary goal of the UML is to establish a uniform method
of visualising a system's design process. It resembles blueprints used in other fields.

7.1 USE CASE DIAGRAM:


The dynamic behaviour of a system is represented by a use case diagram. It incorporates use cases, actors and
their interactions to encapsulate the functionality of the system.

Fig 7.1 USE CASE DIAGRAM

12
7.2 CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static structure
diagram that describes the structure of a system by showing the system's classes, attributes, operations (or
methods), and the relationships among the classes. It explains which class contains information.

Fig 7.2 CLASS DIAGRAM

13
7.3 SEQUENCE DIAGRAM:

In the world of software engineering, a sequence diagram, also known as a system sequence diagram (SSD),
displays process interactions grouped in sequence.

Fig 7.3 SEQUENCE DIAGRAM

14
7.4 ACTIVITY DIAGRAM:

Activity diagrams display organisational procedures as well as the procedural flow of control between class
objects. These diagrams are constructed from specialised forms and joined together using arrows.

Fig 7.4 ACTIVITY DIAGRAM

15
8.IMPLEMENTATION

8.1 DATA COLLECTION:

Fig 8.1 DATA COLLECTION

Figure 8.1 shows the snapshot of the code which is used to collect the data from external source. For
any model to be trained there must be input data which is named as dataset. There are many ways to
collect the data from web, here we are using the dataset called as ipl.csv. It is collected from the Kaggle
website. The dataset consists of 1140210 tuples and each tuple contains 15 features.

16
8.2 DATA CLEANING:

FIG 8.2 DATA CLEANING

Figure 8.2 shows the snapshot of the data that is necessary for building the model. The above code is
used to remove the unwanted features from the dataset. Here the unwanted features are mid, batsman,
bowler, striker, non-striker.

8.3 DATA EXTRACTION:

Fig 8.3 DATA EXTRACTION

17
Figure 8.3 shows the snapshot of code that is used to extract only the consistent teams playing from
the starting of the IPL. The model can’t predict the score with the first 5 overs data so in the above
code first 5 overs data is removed from the dataset. Date column is initially in string format and it is
converted into datetime object.

8.4 DATA PREPROCESSING:

Fig 8.4 DATA PREPROCESSING

Figure 8.4 shows the snapshot of code that is used to convert the categorical features by using oneHot
encoding method. This type of encoding creates new binary feature for each possible category for
columns bat_team and bowl_team. The batting team entered by the user should be restored by 1 and
all the other teams should be restored with 0.The bowling team entered by the user should be restored
by 1 and all other teams should be restored with 0.

18
8.5 DATA SPLITTING:

Fig 8.5 DATA SPLITTING

Figure 8.5 shows the entire dataset is split in 2 parts. The original dataset contains the ipl matches
from 2008 to 2017.This code divides the dataset into training and testing where training data is taken
from 2008 to 2016 and test data is taken from the year 2017. Train data contains 821260 tuples and
test data contains 61116 tuples.

19
8.6 MODEL EVOLUTION:

The proposed system consists of various Machine Learning Algorithms like Linear Regression, Ridge
Regression, Lasso Regression.

8.6.1 LINEAR REGRESSION:

Fig 8.6.1 LINEAR REGRESSION

8.6.2 LASSO REGRESSION:

Fig 8.6.2 LASSO REGRESSION

20
8.6.3 RIDGE REGRESSION:

Fig 8.6.3 RIDGE REGRESSION

21
8.7 FINAL PREDICTION:

A user interface is created by using flask and web templates to take inputs from the user and to display the
range of score predicted by the ridge regressor model.

Fig 8.7.1 SCREENSHOT FOR WEB PAGE

Whenever user fills all the user entries for the above webpage then a list is created with around 16 variables.
For example, if batting team is Bengaluru and bowling team is Rajasthan then it will be encoded in backend
as [1,0,0,0,0,0,0,0] + [0,0,0,0,1,0,0,0]. The above 2 lists are then combined and added with the remaining 6
variables in HTML form. The forecasting of the final score is then added by 10 to get maximum score can
get and subtracted by 10 to get minimum score can get in that match. This output is then displayed to user as
the forecasted value. For example, if the model forecasts 180 as the final score for the first inning, then 170
to 190 will be displayed for the final output.

22
Fig 8.7.2 SCREENSHOT FOR FILLING VALUES IN WEB PAGE

23
9.RESULT ANALYSYS

9.1 LINEAR REGRESSION:

Fig 9.1 ERROR RATE USING LINEAR REGRESSION

Figure 9.1 shows the 3 different types error rates occurred by implementing the linear regression technique.

The mean absolute error rate occurred by implementing linear regression technique is 10.2572. The mean

squared error occurred by implementing linear regression technique is 160.60052. The root mean squared error

occurred by implementing linear regression is 12.67282.

9.2 RIGE REGRESSION:

Fig 9.2 ERROR RATES USING RIDGE REGRESSION

24
Figure 9.2 shows the 3 different types error rates occurred by implementing the Ridge regression technique.
The mean absolute error rate occurred by implementing ridge regression technique is 12.43432. The mean
squared error occurred by implementing ridge regression technique is 276.68004. The root mean squared
error occurred by implementing ridge regression is 16.63370.

9.3 LASSO REGRESSION:

Fig 9.3 ERROR RATES USING LASSO REGRESSION

Figure 9.3 shows the 3 different types error rates occurred by implementing the lasso regression technique. The
mean absolute error rate occurred by implementing lasso regression technique is 12.21358. The mean squared
error occurred by implementing lasso regression technique is 262. 36538.The root mean squared error occurred
by implementing lasso regression is 16.19769.

9.4 COMPARISION BETWEEN 3 DIFFERENT REGRESSORS:

From the above 3 figures it is clear that root mean squared error value is very low when the model is
implemented by using Linear Regression and the mean absolute error value is very low when the model is
implemented by using Linear Regression and the mean squared error value is very low when the model is
implemented by using Linear regression.

Linear regression gives the best accurate result when compared to remaining regression techniques like Ridge
regression and Lasso regression.

25
LINEAR RIDGE REGRESSION LASSO REGRESSION
REGRESSION
MEAN ABSOLUTE 10.25729245359391 12.434327274896821 11.775248362875073
ERROR
MEAN SQUARED 160.6005271696334 276.6800450229181 236.32988043635393
ERROR
ROOT MEAN 12.672826329182982 16.633702084109782 15.373024440114376
SQUARED ERROR

COMPARISION BETWEEN DIFFERENT REGRESSORS

Fig 9.4 COMPARISION BETWEEN DIFFERENT FITTING MODELS

OVERFITTING:
Overfitting occurs when a model works well on train data but does not give accurate output for test data.

UNDERFITTING:
Underfit models are those which does not work well on testing data.

GOOD FIT:
Good Fit works with both training dataset and test dataset and gives accurate outputs for both the datasets
around 70 to 80%.

26
9.5 ERROR RATES FOR TRAINING DATA:

Fig 9.4 ERROR RATES FOR TRAINING DATA

27
10.CONCLUSION

This project mainly deals with first innings score of an ipl match is forecasted by using a machine learning
algorithm called Linear Regression, Ridge Regression, Lasso Regression. Here a dataset named ipl.csv is used.
This dataset contains of 15 different features out of which mainly used are batting team, bowling team, runs
scored at current over, wickets fallen at current overs, overs completed, runs scored in last 5 overs, wickets
fallen in last 5 overs. This project takes above inputs and forecast the score of an ipl match as output. This
project predicts the output by using 3 different models those are Linear Regression, Ridge Regression and
Lasso Regression and their corresponding root mean squared error rates are 12.67282, 16.63370, 16.19769.
Linear Regression gives less root mean squared error rate when compared to different regression techniques
like Ridge Regression and Lasso Regression. But Ridge Regression protects the model from overfitting.

28
10. REFERENCES

1. Nikhil Dhonge, Shraddha Dhole. "Ipl cricket score forecast utilizing AI methods" Research Journal of
Computer Science and Technology, Volume:05/Issue:04/May-2021

2. Apurva Lawate , Nomesh Katare. “Cricket Prediction of projected Score and Winner Prediction” Journal
of Computer and Communication Engineering Vol. 12, Issue 4, February 2021

3. R. R. Kamble , Nidhi Koul. “IPL Score Prediction by using Machine Learning Algorithm” Journal of
Computer Science and Engineering Vol.10 , 2020

4. T. Suvarna Kumari, P.Narsaiah. “Match Score Prediction using k-Nearest Neighbors Algorithm” IJRECE
VOL. 9 ISSUE 5 Apr - July 2018

5. Kushooo , Nisha. “IPL Score and winner prediction by using data mining” Journal of Multi-Disciplinary
Volume 5, Issue 4, February 2020

6. Prasad Thorat, Vighnesh Buddhivant. “Cricket score prediction” IJCRT | Volume 9, Issue 5 May 2021

7. Prateek Gupta, Navya Sanjna Joshi. “Cricket Score Forecasting using Neural Networks” I Journal of
Engineering and Technology, Volume-11 Issue-4, June 2020

8. Sudhanshu Akarshe, Rohit Khade. “Cricket Score Prediction using Machine Learning Algorithms” GRD
Journal for Engineering | Volume 8 | Issue 7 | September 2

29
11.APPENDIX

FLASK CODE:

from flask import Flask, render_template, request

import pickle
import numpy as np

fn = 'first-innings-score-lr-model.pkl'
regressor = pickle.load(open(fn, 'rb'))

filename2 = 'linear-first-innings-score-lr-model.pkl' lr = pickle.load(open(filename2, 'rb'))

filename3 = 'Lasso-first-innings-score-lr-model.pkl' lassor = pickle.load(open(filename3, 'rb'))

app = Flask(_name_)

@app.route('/')
def home():
return render_template('index.html')
@app.route('/predict', methods=['POST'])
def predict():
t_a = list()

if request.method == 'POST':

bat_team = request.form['batting-team']
if bat_team == 'Chennai_Super_Kings':
t_a = t_a+ [1,0,0,0,0,0,0,0]
elif bat_team == 'Delhi_Daredevils':
t_a = t_a + [0,1,0,0,0,0,0,0]
elif bat_team == 'Kings_XI_Punjab':

30
t_a = t_a + [0,0,1,0,0,0,0,0]
elif bat_team == 'Kolkata_Knight_Riders':
t_a = t_a + [0,0,0,1,0,0,0,0]
elif bat_team == 'Mumbai_Indians':
t_a = t_a + [0,0,0,0,1,0,0,0]
elif bat_team == 'Rajasthan_Royals':
t_a = t_a + [0,0,0,0,0,1,0,0]

elif bat_team == 'Royal _Challengers_Bangalore':

t_a = t_a + [0,0,0,0,0,0,1,0]


elif bat_team == 'Sunrisers_Hyderabad':
t_a = t_a + [0,0,0,0,0,0,0,1]

bowl_team = request.form['bowling-team']
if bowl_team == 'Chennai_Super_Kings':
t_a = t_a + [1,0,0,0,0,0,0,0]
elif bowl_team == 'Delhi_Daredevils':
t_a = t_a + [0,1,0,0,0,0,0,0]
elif bowl_team == 'Kings_XI_Punjab':
t_a = t_a + [0,0,1,0,0,0,0,0]
elif bowl_team == 'Kolkata_Knight_Riders':
t_a = t_a + [0,0,0,1,0,0,0,0]
elif bowl_team == 'Mumbai_Indians':
t_a = t_a + [0,0,0,0,1,0,0,0]
elif bowl_team == 'Rajasthan_Royals':
t_a = t_a + [0,0,0,0,0,1,0,0]
elif bowl_team == 'Royal_Challengers_Bangalore':
t_a = t_a + [0,0,0,0,0,0,1,0]
elif bowl_team == 'Sunrisers_Hyderabad':
t_a = t_a + [0,0,0,0,0,0,0,1]

31
overs = float(request.form['overs'])
runs = int(request.form['runs'])
wickets = int(request.form['wickets'])
runs_in_prev_5 = int(request.form['runs_in_prev_5'])
wickets_in_prev_5 = int(request.form['wickets_in_prev_5'])

t_a = t_a + [overs, runs, wickets, runs_in_prev_5, wickets_in_prev_5]

data = npy.array([temp_array])
my_prediction = regressor.predict(data)[0]
l_prediction=lr.predict(data)[0]
lass_prediction=lassor.predict(data)[0]

return render_template('result.html', lower_limit = int(my_prediction)-10, upper_limit =


int(my_prediction)+10,Ridge_result=my_prediction,Linear_result=l_prediction,Lasso_result=las
s_prediction)

if __name__ == '__main__':
app.run(debug=True)

32
IMPLEMENTING SCREEN:

Fig 11.1 OUTPUT SCREEN

33

You might also like