Editable
Editable
Submitted by
[2022 – 2023]
BONAFIDE CERTIFICATE
This is to certify that the project work entitled “CRICKET MATCH SCORE PREDICTION USING
MACHINE LEARNING” is the bonafied work of “Addanki Swarna Sri (19B91A0504), Dangeti
Dharma Sai (19B91A0542), Bhupathiraju Dileep Varma (19B91A0523), Basava Jayanth
(19B91A0520)” who carried out the project work under my supervision in partial fulfilment of the
requirements for the award of the degree of Bachelor of Technology in Computer Science and Engineering.
We hereby declare that the project work entitled “CRICKET MATCH SCORE PREDICTION USING
MACHINE LEARNING” is a genuine work carried out by us in B.Tech., (Computer Science and
Engineering) at SRKR Engineering College(A), Bhimavaram and has not been submitted either in part or
full for the award of any other degree or diploma in any other institute or University.
1 ABSTRACT i
2 LIST OF FIGURES ii
3 INTRODUCTION 1
4 LITERATURE SURVEY 3
5 PROBLEM STATEMENT 7
6 EXISTING SYSTEM 8
7 PROPOSED SYSTEM 9
8 METHODOLOGY 10
9 SYSTEM DESIGN 15
10 IMPLEMENTATION 19
11 RESULT ANALYSIS 27
12 CONCLUSION 32
13 REFERENCES 33
14 APPENDIX 34
ABSTRACT
The creation of a model that forecasts the ultimate result of an IPL cricket match is the aim of this research.
Current runs, current wickets, runs scored in the past five overs, and recent wicket falls are some of the
variables that affect this model's results. The archive contains a history of each IPL match played between
2008 and 2017. This study can forecast the first innings score of an IPL match before it begins. Ridge
regression approach is used to forecast the score. This model primarily examines data from the previous five
overs in order to forecast the final score of the game.
i
LIST OF FIGURES
1 MODEL ARCHITECTURE 12
3 CLASS DIAGRAM 16
4 SEQUENCE DIAGRAM 17
5 ACTIVITY DIAGRAM 18
6 DATA COLLECTION 19
7 DATA CLEANING 20
8 DATA EXTRACTION 20
9 DATA PREPROCESSING 21
10 DATA SPLITTING 22
11 LINEAR REGRESSION 23
12 LASSO REGRESSION 23
13 RIDGE REGRESSION 24
22 OUTPUT SCREEN 43
ii
1. INTRODUCTION
English people brought cricket, the most popular game at the time, to North America in the 17th
century. The majority of nations participate in this game. In order to predict the first innings score of a cricket
match, the BCCI introduced many betting apps, such as Dream 11, in 2008. The first innings score of a more
important cricket match is hence highly sought after algorithms. Using machine learning algorithms is the most
straightforward method for predicting the first inning score. The three different categories of machine learning
algorithms are supervised, unsupervised, and reinforced learning. These algorithms are based on both the output
of the model and the application.
One of the most watched television shows is cricket. In nations like India, Australia, England, New
Zealand, and South Africa, this sport is wildly popular. One of the major issues that has lately surfaced is that
the predicted score for the game's first inning does not line up with the actual score for the first inning. This is
the point at which a model to precisely predict the first innings score of an IPL match becomes necessary. This
will make it easier for the audience to forecast the final result of the present game.
A typical Twenty20 game lasts three to four hours, with each innings lasting between 75 and 90 minutes
and a 10- to 20-minute intermission. There are 11 players on each team, and each inning is played over 20
overs. This version of the game is considerably shorter than earlier iterations and more in line with other well-
liked team sports. To create a new form of the game that would appeal to both on-field spectators and television
viewers, it was implemented.
Utilizing a variety of methodologies, the first innings score of a cricket match is predicted. Many
strategies and prediction systems are used to forecast the cricket score of IPL matches. The first inning score
of a cricket match is frequently estimated using the CRR method. The CRR method multiplies the number of
runs scored in each over by the overall number of overs in an inning. This strategy ignores the numerous criteria
and concentrates just on the runs scored in an over. The present method, which simply takes into account the
current score and a few parameters, can only estimate the first inning score.By combining a number of criteria
while predicting the first innings score of a cricket match, we are striving to improve the accuracy of the current
system. We'll be concentrating on live cricket score prediction and analyzing IPL games to anticipate scores.
1
To predict the IPL first inning match score, the machine learning techniques of linear, lasso, and
ridge regression were used. The data presented to the machine learning model is labeled in a linear regression
and is already known to humans. Instead of classifying objects, the linear regression model is used to forecast
continuous values. Data multicollinearity can be investigated using ridge regression.
Regression and classification problems are the two categories of problems that supervised
machine learning algorithms can handle. The output of classification models is the key problem. A serious
problem exists in the regression model when real value is the desired output. Based on classification models
and regression approaches, the two more popular types of challenges are recommendations and forecasts for
a series of time series. Unsupervised learning seeks to simulate the distribution or structure of data in order
to comprehend it better.
2
2. LITERATURE SURVEY
For estimating the ultimate score of the first innings in cricket, a lot of work has been done on machine
learning and score prediction techniques. For score prediction, numerous alternative recognition and detection
algorithms have been developing in this area. Different modern strategies are emerging from literature
reviews.
Nikhil Dhonge, et.al [1] undermined the formulas used to predict the IPL first inning match score.
They are looking for a computationally sound method that can forecast the final score of the first two innings.
They make use of the SVC classifier, choice tree classifier, and random forest classifier, among other
classifiers. They gathered data on every IPL game played starting in 2008. There are 76015 columns total in
the dataset. There are 15 segments in the data set; using include choice procedures, they selected 8 of them, 7
of which are input highlights and 1 of which is a target variable. Relapse analysis uses various calculations for
the calculation, and because of this, it forecasts the value in perpetuity. Certain groupings of variables are used
for information and constant.
Apurva Lawate, et.al [2] proposed AI-based projected score and winner prediction for a cricket match.
They considered the accuracy of several calculations, including the Multilayer Perceptron Neural Network,
Ridge Relapse, and Linear Relapse. This dataset was created using data from IPL games played more than ten
years ago. The dataset's data spans the years 2009 to 2019. The dataset is divided into two parts: data from
2008 to 2016 is used to create the models, while data from 2017 and after is used to test the models. They
performed the calculation that can accurately predict the precise length of time between progressing matches.
With the aid of Flask, they operated as a web application. With the help of, this model provides an accuracy
of 77.286 percent.
R. Kamble, et.al [3]examined Machine Learning for Predicting Cricket Scores. They developed a model
that can predict a team's score after playing 20 overs under the current conditions. They performed Naive To
generate the expectation models for each problem, classifiers such Thomas Bayes, Random Forest, multiclass
SVM, and Trees are used. They considered the Random Forest classifier to be each issue's fundamental
correct. This dataset was created using data from IPL games played over the past five years. The set of data is
divided into 80-20 portions for preparing and testing. 20% of the information is used for testing, while 80%
of the information is used for preparation. In this approach, straight relapse is used, and the same work is
repeatedly performed. 3
T. Suvarna Kumari, et.al [4] worked with the k-Nearest Neighbors Algorithm for predicting the cricket
match score. They suggested a method in which the final score of the major innings might be predicted. In the
primary innings and second innings datasets, they used KNN calculations to predict the match scores where
the class attribute "X" is the "Score" and the information quality "artificial intelligence" (I = 1, 2, 3,...) are the
overall group strength, home/away, and scene normal. Relative group strength, home environment, and venue
norms have all been taken into account when determining the expectation. The dataset consists of entire
matches, excluding all rain-related cancellations and postponements, played between the years 2000 and 2018
between the ODI playing nations such as India, England, Australia, and so on.
Kushooo, et al. [5] used data mining to predict cricket scores and winners. They devised a method for
determining and predicting the primary innings score in a cricket match. The majority of game predictions use
relapse or order problems, both of which are delivered learning tasks. Relapse has a consistent outcome,
whereas characterization manages various outcomes. All things considered, Straight Regression appeared to
be very effective for predicting persistent qualities, and learning algorithms like Naive Bayes, Logistic
Regression, Neural Networks, and Random Forests were thought to have been used in many previous studies
for 5 characterization issues, such as predicting the outcome of matches or placing players. They concluded
from the results that Random Forest was the most reliable classifier for both datasets.
Akhil Nimmagadda, et al. [6]used data mining and the Random Forest method to predict the Cricket
score and the winner. They developed a formula to predict the outcome of One-Day International cricket
matches, in which they evaluate the bowling and batting potential of the 22 players participating in the match
using their job metrics and dynamic support in late games. They used player potential to highlight one group's
relative superiority to the other. They employ directed learning computations to predict the match winner by
taking into account a few other base factors, such as run rate, the match scene, and relative group strength.
Using Multiple Variable Linear Regression, the model was built.
Sudhanshu Akarshe, et.al [7] worked on predicting the cricket score using machine learning algorithms
in light of various AI calculations. They suggested a model to predict the outcome of the game and the actions
of each player in light of the real information. They gathered the data for every potential match. They acquired
the data from a variety of websites, including ESPN, Kaggle, and others. In such forecast frameworks, a
single calculation is often used, and each execution is estimated separately. All things considered, they want
to moderately measure their show while using a variety of AI computations.
4
Rameshwari, et al. [8] used a straight relapse classifier to predict the winning and live cricket score.
They created a model using two different approaches: the primary method evaluates the score of the first
innings in light of the current run rate, the number of wickets lost, the match environment, and the batting
and bowling teams. The next approach uses the same characteristics as the primary technique, as well as the
batting group's point, to predict the outcome of the game in the innings to follow. These two approaches were
developed separately for the first and second innings using Linear Regression Classifier or Q-Learning
premise choice tree approach and Naive Bayes Classifier.
Jalaz Kumar et al. [9] presented to predict the outcomes of ODI cricket matches using decision trees and
MLP organizations. They use multi-facet perceptrons and decision tree classifiers for their calculations. They
compiled the data from all ODI games played from January 5, 1971, and October 29, 2017. Results of 3933
ODI matches were discarded. They presented a method that is superior to measures because, unlike insights,
which characterize connections between components using numerical circumstances, these tactics don't
require any prior assumptions about the information factors and their underlying linkages.
D. Jyothsna et al. [10] analyzed and predicted the outcome of IPL Cricket Data using Linear Regression,
Decision Tree, K-implies, and Logistic Regression. They collected data from the IPL's most recent seven
seasons, including player, match, group, and ball-to-ball statistics, and analyzed it to come up with several
recommendations for how to enhance a player's performance. The impact of several factors, such as the
region or throw decision on the match's outcome during the previous seven years, is also determined.
According to the findings, Random Forest is the most accurate classifier, foretelling the greatest player
performance with an accuracy of 89.15 percent.
Rameshwari Lokhande, et al. [11] centered on Live Cricket Score and Winning Prediction Using Naive
Bayes Classifier and Linear Regression. In a limited overs cricket match, they provided a way for developing
a model for figuring out the final score of the first innings and evaluating the match outcome in the second
innings. The projections take into account the throw, the teams' ODI standings, and the host group advantage.
On earlier matches, the Linear Regression classifier and the Naive Bayes classifier, separately, have been
suggested to use two distinct models, one for the first innings and the other for the later innings. The support
calculation is employed rather than relapse in its purest form.
5
Prasad Thorat, et al. [12] provided useful and challenging equations for estimating the predicted score
of the most memorable innings of a cricket contest. When setting expectations, the cricket match's unique
character is typically overlooked. They suggested CricFirst Predictor (CFP), a method that takes the game's
key player into account. The knowledge is derived from Kaggle datasets. The information is stored using the
CSV design. The dataset is separated into two parts: data used to prepare the model and data used to test the
model. Name of the batting team, name of the bowling unit, total runs scored, total overs bowled, total
wickets taken, total runs scored in the last five overs, and total wickets taken in the previous five overs are
among the client inputs.
Prateek Gupta, et.al [13] analyzed and explained Cricket Score Forecasting with Neural Networks. They
put forth a framework that overcomes the primary drawback of manual work, which is tiresome and
necessitates exertion to physically keep up with the records and insights of each player. A LSTM-based
neural network is used to predict consistent traits, with the score after 18 conveyances serving as our aim and
the data from the previous 18 conveyances as the setting. They used a dataset that included statistics on T20
matches from over ten years (2010–2021) apart, totaling more than 2600 matches. Information from the IPL
(2010–2021), BBL (2011–2012–2020–2021), PSL (2015–2016–2020–2021), CPL (2013–2020), Lanka
Premier League (2019–2020), and T20 international matches is included in our dataset (2010-2021). The
most reliable model bases its initiation work on the "ELU".
6
3. PROBLEM STATEMENT
Cricket is a popular sport worldwide, and predicting the outcome of a cricket match has always been a
fascinating and challenging task for enthusiasts. Cricket score prediction is a complex task that involves
various factors such as weather conditions, pitch type, team composition, and past performance. Therefore,
there is a need for a machine learning-based model that can accurately predict the score of a cricket match.
7
4. EXISTING SYSTEM
❖ To forecast the outcome of a cricket match, many people utilize the current run rate technique.
❖ The total number of overs in an innings is multiplied by the number of runs scored in an over when
using the CRR method.
❖ It has been noted that the prediction made using the CRR approach does not take the game's dynamic
character into account.
8
5. PROPOSED SYSTEM
❖ The proposed solution will help cricket fans and enthusiasts to predict the final score of the match,
which will not only enhance their experience but also help them to make informed decisions while
placing bets.
❖ Additionally, it can be used by cricket teams to analyze their performance, identify the areas of
improvement, and develop strategies accordingly.
❖ We keep track of all the accuracy results from several models, such as Linear Regression, Ridge
Regression, and Lasso Regression, in order to attain high prediction accuracy.
❖ Choosing the best model in accordance with their values.
❖ Accurate score prediction is our aim.
9
6. METHODOLOGY
IPL DATASET:
The project's dataset, ipl.csv, was downloaded from the Kaggle website. The dataset includes data on each and
every ball played in every IPL game from 2008 to 2017. There are 1140210 tuples in the collection. 15 attributes
are contained in each tuple. The 15th characteristic, a real valued attribute with a range of 0 to infinity, is the
target feature. In this dataset, each feature is specified as follows:
❖ date: It indicates the day the match took place. It has the format DD-MMYYYY.
❖ Non-striker: Unprepared to receive the ball despite being in the batting lane
❖ Total: It represents the total runs scored in that game after 20 overs.
10
6.2 MODEL ARCHITECTURE:
We intend to develop a model that is capable of accurately predicting the primary innings score of a live IPL
match. We intend to build a model that can take into account several boundaries that improve the score forecast.
DATA COLLECTION:
The dataset will be taken from those available on Kaggle. The dataset will be collected in CSV format. The
information gathered from the site will be cleaned in the following stage
DATA CLEANING:
The match ID, setting, batsman and bowler names, the score of the striker batsman, and the score of the non-
striker batsman must all be removed during the information cleaning stage. We won't use these parts for
forecasting, thus we won't pay attention to the portions. There are very few groups that are unreliable in the IPL
dataset because they have only recently started playing. In light of this, we only need to consider the stable
groups and would really like to remove those groups from the dataset. Following the fifth over, we will reflect
on the material. Although the dataset's date segment is available in a string design, we still need to use a few
techniques on the date segment for
DATA PREPROCESSING:
We will demand that the information be preprocessed after it has been cleaned. We will execute one-hot
encoding as part of the information preparation stage. In the execution area, one hot encoding is thoroughly
decoded. The dataset's parts should be revised during the information preparation stage. We want our segments
to be properly ordered in some succession, which is why we revise portions.
DATA SPLITTING:
After preprocessing the data, we will separate it so that IPL games from before 2016 and IPL games from after
2016 will be taken into account for the model's setup and testing, respectively.
11
MODEL GENERATION:
For predicting the outcome in light of the information, we will use the Lasso Regression model, Random Forest
Regression, and Linear Regression models. The input dataset's training data is used to create these models.
Utilizing test data from the input dataset, these models are evaluated. the outcome is then predicted using fresh
data.
FINAL PREDICTION:
❖ Finally, various data sources will be obtained from the client in order to forecast the result. Based on
the input, a range of final scores will be displayed as output on the website.
❖
❖ The model design for the CFP framework is as follows.
Model Generation
(Ridge Regression)
12
6.3 MACHINE LEARNING ALGORITHM:
When the number of indicator components in a set exceeds the number of perceptions, or when an information
set involves multi-collinearity (relationships between indicator factors), edge relapse is used to create a close-
fisted model. A place for edge relapse in L2 regularization. This can occasionally result in the complete
termination of some coefficients, which could lead to inaccurate models. A L2 punishment that increases to the
square of the extent of the coefficients is added by L2 regularization.
In a linear regression, the input variables and the goal variable will be worked out to have a direct relationship.
If the indicator variable depends only on one information factor, the result is a line; if the indicator variable
depends on several information factors, the result is a hyperplane. The model's coefficients are determined by
a headway interaction, which seeks to reduce the sum squared error between the forecasts (yhat) and the typical
objective characteristics (y).
In straight relapse, the model's occasionally calculated coefficients can end up becoming extremely large,
making the model susceptible to input changes and possibly unstable. It is applicable to problems where there
are fewer examples (n) than input indicators (p) or factors.
One strategy for improving the safety of relapse models is to modify the capacity for misfortune such that it
can include additional costs for a model with large coefficients. In general, straight relapse models that use
these modified loss capacities during preparation are referred to as penalised straight relapse.
One well-known penalty is to penalize a model based on the quantity of squared coefficient values (beta). This
is referred to as an L2 penalty.
All things considered, an L2 punishment restricts the size even while it prevents any coefficients from being
removed from the model by allowing their value to reach zero.
13
The result of this penalty is that, given a matching drop in SSE, the border gauges may be permitted to grow
significantly. Essentially, when the lambda punishment increases, this system recoils the assessments toward
0. (these procedures are here and there called "shrinkage techniques").
This punishment, which is sometimes referred to as Ridge Regression or Tikhonov regularization (after its
author), can be added to the expense work for simple relapse.
The "lambda" hyperparameter is used to alter the weighting of the discipline to the accident work. If the default
value is set to 1, the punishment will be fully weighted; if it is set to 0, it will not. Small lambda upsides, such
1e-3 or other small values, are common.
14
7. SYSTEM DESIGN
UML DESIGN:
UML is a modeling language with many uses. The main objective of the UML is to create a standard technique
for visualizing the design process of a system. It looks like plans used in other industries.
15
7.2 CLASS DIAGRAM:
A class diagram is a form of static structure diagram used in software engineering that displays the classes,
attributes, operations (or methods), and interactions between the classes to illustrate the structure of a system.
It explains which sort of information is contained.A class diagram is a form of static structure diagram used in
software engineering that displays the classes, attributes, operations (or methods), and interactions between the
classes to illustrate the structure of a system. It explains which sort of information is contained.
16
7.3 SEQUENCE DIAGRAM:
A sequence diagram, commonly referred to as a system sequence diagram (SSD), is a visual representation of
process interactions arranged sequentially in the field of software engineering.
17
7.4 ACTIVITY DIAGRAM:
Organizational processes and the flow of control between class objects are both shown in activity diagrams.
These diagrams are built using specialized forms, and arrows are used to connect them.
18
8. IMPLEMENTATION
A screenshot of the code that is used to get data from an external source is shown in Figure 8.1. Any
model must have input data, also known as datasets, in order to be trained. There are other ways to
gather data from the web; in this instance, we're using the ipl.csv dataset. The Kaggle website is where
it was gathered. There are 1140210 tuples in the dataset, and each tuple has 15 features.
19
8.2 DATA CLEANING:
The snapshot of the data required for creating the model is shown in Figure 8.2. The dataset's
undesirable features are eliminated using the code mentioned above. Mid, batsman, bowler, striker,
and non-striker are the undesirable traits in this situation.
20
The code snapshot used to extract only the consistent teams competing from the beginning of the IPL
is shown in Figure 8.3. Since the first five overs of data cannot be used to predict the score, the first
five overs of data are excluded from the dataset in the code above. The date column is initially
translated from its string format into a datetime object.
The code snapshot used to convert the category features using the oneHot encoding method is shown
in Figure 8.4. For the columns bat team and bowl team, this kind of encoding generates new binary
features for each potential category. All other teams should be restored with 0, except for the batting
team that the user entered. All other teams should be restored with 0, while the bowling team that the
user entered should be restored with 1.
21
8.5 DATA SPLITTING:
The complete dataset is divided into two halves, as shown in Figure 8.5. The ipl matches from 2008 to
2017 are included in the original dataset. This code separates the dataset into training and testing,
using data from 2008 to 2016 for training and 2017 for testing. Test data has 61116 tuples, while train
data has 821260 tuples.
22
8.6 MODEL EVOLUTION:
Multiple Machine Learning Algorithms, including Linear Regression, Ridge Regression, and Lasso
Regression, are included in the proposed system.
23
8.6.3 RIDGE REGRESSION:
24
8.7 FINAL PREDICTION:
To receive input from the user and present the range of score predicted by the ridge regressor model, a user
interface is made using flask and web templates.
When a user completes all of the user fields on the aforementioned homepage, a list with roughly 16 variables
is generated. For example, if batting team is Bengaluru and bowling team is Rajasthan then it will be encoded
in backend as [1,0,0,0,0,0,0,0] + [0,0,0,0,1,0,0,0]. The two aforementioned lists are then concatenated and
added in HTML form together with the remaining 6 variables. The predicted final score is then multiplied
by 10 to determine the highest possible score and divided by 10 to determine the lowest possible score in that
game. The user is then shown this output as the predicted value. For instance, if the model predicts a first
inning score of 180, the final score will be shown as 170 to 190.
25
Fig 8.7.2 SCREENSHOT FOR FILLING VALUES IN WEB PAGE
26
9. RESULT ANALYSYS
The three various types of error rates that resulted from using the linear regression technique are displayed in
Figure 9.1. Implementing the linear regression technique resulted in a mean absolute error rate of 10.2572.
Implementing the linear regression technique resulted in a mean squared error of 160.60052. Implementing
27
The three various types of error rates that resulted from using the Ridge regression technique are shown in
Figure 9.2. Ridge regression approach implementation resulted in a mean absolute error rate of 12.43432.
Implementing the ridge regression technique resulted in a mean squared error of 276.68004. Implementing
ridge regression resulted in a root mean squared error of 16.63370.
Figure 9.3 shows the 3 different types error rates occurred by implementing the lasso regression technique. The
mean absolute error rate occurred by implementing lasso regression technique is 12.21358. The mean squared
error occurred by implementing lasso regression technique is 262. 36538.The root mean squared error occurred
by implementing lasso regression is 16.19769.
As shown in the aforementioned three figures, applying linear regression results in very low mean squared
error, mean absolute error, and root mean squared error values.
The most accurate results are produced by linear regression when compared to other regression methods like
Ridge regression and Lasso regression.
28
LINEAR RIDGE REGRESSION LASSO REGRESSION
REGRESSION
MEAN ABSOLUTE 10.25729245359391 12.434327274896821 11.775248362875073
ERROR
MEAN SQUARED 160.6005271696334 276.6800450229181 236.32988043635393
ERROR
ROOT MEAN 12.672826329182982 16.633702084109782 15.373024440114376
SQUARED ERROR
OVERFITTING:
When a model performs well on training data but poorly on test data, this is known as overfitting.
UNDERFITTING:
Models that do not perform well on test data are said to be underfit.
GOOD FIT:
The accuracy of the results produced by Good Fit for both the training and test datasets is between 70 and
80 percent.
29
9.5 GRAPHICAL REPRESENTATION FOR ACTUAL AND PREDICTED VALUES:
30
9.6 ERROR RATES FOR TRAINING DATA:
31
10 .CONCLUSION
This research primarily focuses on predicting the first innings score of an IPL match using the Linear
Regression, Ridge Regression, and Lasso Regression machine learning algorithms. Here, the ipl.csv dataset is
utilized. This dataset includes 15 distinct elements, the most frequently utilized of which are the batting team,
bowling team, runs scored in the current over, wickets taken in the current overs, number of completed overs,
runs scored in the last 5 overs, and wickets taken in the previous 5 overs. The forecasted score of an IPL match
is the project's output using the aforementioned inputs. Three alternative models—Linear Regression, Ridge
Regression, and Lasso Regression—are used in this project to forecast the output. Their respective root mean
squared error rates are 12.67282, 16.63370, and 16.19769. Less is given by linear regression.
32
11. REFERENCES
1. Nikhil Dhonge, Shraddha Dhole. "Ipl cricket score forecast utilizing AI methods" Research Journal
ofComputer Science and Technology, Volume:05/Issue:04/May-2021
2. Apurva Lawate , Nomesh Katare. “Cricket Prediction of projected Score and Winner Prediction”
Journalof Computer and Communication Engineering Vol. 12, Issue 4, February 2021
3. R. R. Kamble , Nidhi Koul. “IPL Score Prediction by using Machine Learning Algorithm” Journal
ofComputer Science and Engineering Vol.10 , 2020
4. T. Suvarna Kumari, P.Narsaiah. “Match Score Prediction using k-Nearest Neighbors Algorithm”
IJRECEVOL. 9 ISSUE 5 Apr - July 2018
5. Kushooo , Nisha. “IPL Score and winner prediction by using data mining” Journal of Multi-
DisciplinaryVolume 5, Issue 4, February 2020
6. Akhil Nimmagadda , Nidamanuri Venkata Kalyan. “IPL score prediction and winning prediction using
data mining Approach” Journal of Advance Research and Development Volume 6, Issue 4)
7. Sudhanshu Akarshe, Rohit Khade. “Cricket Score Prediction using Machine Learning Algorithms”
GRD Journal for Engineering | Volume 8 | Issue 7 | September 2018
8. Ashish V Shenoy. “Prediction of Live Cricket Score and Winning Prediction” Journal of Trend in
Research and Development, Volume 5
9. Jalaz. “Score Prediction of IPL Matches using Machine Learning Algorithms” 2018 International
Conference of Cyber Computing and Communication
10. Jyothsna. “Predicting the outcome of IPL Cricket Match” Journal of Research in Science, Engineering
and Technology Vol. 6, Issue 4, June 2018
11. Rameshwari Lokhande. “Cricket Live Score Prediction and Winning Prediction” Journal of Computer
Research and Development, Volume 5, Issue 9
12. Prasad Thorat, Vighnesh Buddhivant. “Cricket score prediction” IJCRT | Volume 9, Issue 5 May 2021
13. Prateek Gupta, Navya Sanjna Joshi. “Cricket Score Forecasting using Neural Networks” I Journal of
Engineering and Technology, Volume-11 Issue-4, June 2020
33
12. APPENDIX
SAMPLE CODES:
import pandas as pd
import pickle
df = pd.read_csv("D:\project\ipl.csv")
34
# Convert date to datetime object
# Reorder columns
35
'venue_M Chinnaswamy Stadium', 'venue_MA Chidambaram Stadium, Chepauk',
'venue_Maharashtra Cricket Association Stadium',
36
# --- Model Building ---
regressor = LinearRegression()
regressor.fit(X_test,y_test)
filename = 'linear_r_model.pkl'
test_lr = regressor.score(X_test,y_test)
prediction1=regressor.predict(X_test)
regressor.fit(X_train,y_train)
train_lr = regressor.score(X_train,y_train)
import numpy as np
filename = 'inning_one-lr-model.pkl'
37
## Ridge Regression
ridge=Ridge()
parameters={'alpha':[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,15,20,25,30]}
ridge_regressor=GridSearchCV(ridge,parameters,scoring='neg_mean_squared_error',cv=6)
ridge.fit(X_test,y_test)
ridge_regressor.fit(X_train,y_train)
filename = 'ridge-inning_one-score-lr-model.pkl'
prediction2=ridge.predict(X_test)
test_ridge = ridge.score(X_test,y_test)
ridge.fit(X_train,y_train)
train_ridge1 = ridge.score(X_train,y_train)
print(ridge_regressor.best_params_)
print(ridge_regressor.best_score_)
prediction2=ridge.predict(X_test)
import numpy as np
print("\n")
38
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, prediction2)))
# Lasso Regression
lasso=Lasso()
parameters={'alpha':[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30]}
lasso_regressor=GridSearchCV(lasso,parameters,scoring='neg_mean_squared_error',cv=5)
lasso.fit(X_test,y_test)
lasso_regressor.fit(X_train,y_train)
print(lasso_regressor.best_params_)
print(lasso_regressor.best_score_)
prediction3=lasso.predict(X_test)
import numpy as np
print("\n")
lasso=Lasso()
parameters={'alpha':[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30]}
39
lasso_regressor=GridSearchCV(lasso,parameters,scoring='neg_mean_squared_error',cv=5)
lasso.fit(X_test,y_test)
test_lasso = lasso.score(X_test,y_test)
lasso_regressor.fit(X_train,y_train)
train_lasso1 = lasso.score(X_train,y_train)
predictionlr=regressor.predict(X_train)
predictionr=ridge_regressor.predict(X_train)
predictionl=lasso_regressor.predict(X_train)
# Error Rates
print("linear regression")
print("\n")
print("ridge regression")
print("\n")
print("lasso regression")
40
FLASK:
app = Flask(__name__)
@app.route('/predict', methods=['POST'])
def predict():
# Get the team names and encode them as one-hot vectors
bat_team = request.form['batting-team']
bowl_team = request.form['bowling-team']
team_names = ['Chennai_Super_Kings', 'Delhi_Daredevils', 'Kings_XI_Punjab',
'Kolkata_Knight_Riders', 'Mumbai_Indians', 'Rajasthan_Royals', 'Royal_Challengers_Bangalore',
'Sunrisers_Hyderabad']
team_encoding = [int(bat_team==team) for team in team_names] + [int(bowl_team==team) for team in
team_names]
41
wickets = int(request.form['wickets'])
runs_in_prev_5 = int(request.form['runs_in_prev_5'])
wickets_in_prev_5 = int(request.form['wickets_in_prev_5'])
# Concatenate the input values and the team encoding into a single input vector
input_vector = np.array(team_encoding + [overs, runs, wickets, runs_in_prev_5,
wickets_in_prev_5]).reshape(1, -1)
if __name__ == '__main__':
app.run(debug=True)
42
IMPLEMENTING SCREEN:
43