Ipl 2
Ipl 2
Abstract— Score prediction is something we always try in our sports life. An early prediction is always helpful for the team management to
work on their plans quickly. Generally, prediction is always biased but this model attempts to predict the innings total in ODI after the first 5
over of the match.Player’s selection is one of the most important tasks for any sports and thus cricket is not any exception. The
performance of the player depends on various factors such as current form, record against the opposition, nature of the pitch, format of the
game, venue etc. The team management and captain select 11 players out of 15-16 squad members. This model classifies the players
based on their stats that who should play the limited over and which player should play in the test format of the game.For predictive model
Linear Regression and MLPRegressor have been used and for classification KNN, SVM, Naïve Bayes, and MLPClassifier have been used.
Keywords— Cricket, KNN, Linear Regression, MLP, Naïve Bayes, Prediction Model, SVM.
—————————— ——————————
1 INTRODUCTION
IJSER
Cricket is like Religion in our Country. Every fan tries to
predict the score and also they want the playing 11 according I ask to Google about it but didn’t understand a single line
to their choice. Cricket is increasingly popular among the when I was in class 8-9. Later, I studied about it and came with
statistical science community, but the unpredictable and these models and my implementation is still in process on the
inconsistent natures of this game make it challenging to apply different datasets of Cricket (including Test and T20s). There
in common probability models. However, numerous are numerous factors that can affect a cricket game’s score. In
researchers successfully applied various statistical methods to Cricket, It is believed that wickets in hand and current run rate
cricket data. I got inspiration from WASP (Winning and Score are very important factor to get a good total.
Prediction) model that was developed by Mr. Samuel back in Like in many sports, ODI cricket has both controllable and
2011. ICC used the same model in 2012, and I noticed that for uncontrollable variables. Playing combination, in and out field
the first time in a match against India vs. New Zealand that tactics including aggressive and offensive playing behaviors
WASP was predicting 282 according to model and if run rate may be considered controllable variables. However, coin-toss
would have been used then the score would have been 226 (go result is the main uncontrollable variables in the ODI format.
through the image) .
2 DATA AND TOOLS
I obtained all the data from different sources on the internet
like www.cricinfo.com, www.cricbuzz.com and Wikipedia
using web scrapping application that is developed by me in
Node JS using “Cheerio” module. The data (for predictive
modeling) contains the matches’ information between the
periods of 2006 to 2017. The data has many crucial factors
which are important for the prediction of the inning total.
IJSER © 2018
http://www.ijser.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & ENGINEERING RESEARCH, VOLUME 9, ISSUE 8, AUGUST-2018 238
ISSN 2229-5518
IJSER
we know that Prediction is always bias and Cricket is a game runs. If the predicted scores matches the Actual scores then
of uncertainty. model is acceptable. The function is as follows:-
IJSER
Bangladesh
3.1.2. MULTILAYER PERCEPTRON (West
A multilayer perceptron (MLP) is a feed forward artificial Indies)
neural network model that maps sets of input data onto a set
of appropriate outputs. 3.2. CLASSIFICATION MODEL
Constructor Parameters
In Indian Cricket, There are players who play Test cricket only
inputLayerFeatures (int) - the number of input layer and players who play Limited over only (exception of players
features who play both the formats). Pujara is comfortable in the Test
Cricket only because his strike rate is low, patience is high,
hiddenLayers (array) - array with the hidden layers
and temperament is of that level. Virat Kohli can play all the
configuration, each value represent number of neurons in formats of the game because his statistics shows everything.
each layers This classification model classifies the players based on their
classes (array) - array with the different training set stats that which player should play the Test and Limited over
classes (array keys are ignored) Cricket.
iterations (int) - number of training iterations
learningRate (float) - the learning rate
activationFunction (ActivationFunction) - neuron
activation function
There are several factors which classifies the players (in both
Fig. 8: Multilayer Perceptron Algorithm Modeling ODI and Test) as follows:-
IJSER © 2018
http://www.ijser.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & ENGINEERING RESEARCH, VOLUME 9, ISSUE 8, AUGUST-2018 240
ISSN 2229-5518
1. Total No. of Balls played in his career 3.2.2. SUPPORT VECTOR MACHINES
2. Total No. of Centuries Scored
3. Total Average Vladimir Vapnik, Bernhard Boser and IsabellGuyon
introduced the concept of support vector machine in their
4. Total Strike Rate
paper. SVMs are highly accurate and less prone to overfitting.
5. Total No. of Not outs SVMs can be used for both numeric prediction and
classification. SVM transforms the original data into a higher
The above factors are self explanatory for a cricket fan. A test dimension using a nonlinear mapping. It then
player always plays more no. of balls, scored bigger runs, has searches for a linear optimal hyperplane in this new
more average, low strike rate, more no. of not outs (less in rare dimension separating the tuples of one class from another.
cases) than a ODI player. With an appropriate mapping to a sufficiently high dimension,
The target has 3 classes as follows:- tuples from two classes can always be separated by a
1. Test only- 1 hyperplane. The algorithm finds this hyperplane using
2. ODI only – 2 support vectors and margins defined by the support vectors.
The support vectors found by the algorithm provide a
3. Test and ODI both – 3
compact description of the learned prediction model. A
separating hyperplane can be written as:
For generating the classification models, we used supervised
W∙X+b=0
machine learning algorithms. In supervised learning
algorithms, each training tuple is labeled with the class to where W is a weight vector, W = {w1, w2, w3,..., wn}, n is the
which it belongs. We used Naïve Bayes, K-Nearest Neighbors, number of attributes and b is a scalar often referred to as a
Multilayer Perceptron Classifier and Multiclass Support Vector bias. If we input two attributes A1 and A2, training tuples are
Machines for our experiments. These algorithms are explained 2-D, (e.g., X = (x1, x2)), where x1 and x2 are the values of
IJSER
in brief. attributes A1 and A2, respectively. Thus, any points above the
separating hyperplane belong to Class A1:
3.2.1. NAÏVE BAYES W∙X+b>0
Bayesian classifiers are statistical classifiers that predict the and any points below the separating hyperplane belong to
probability with which a given tuple belongs to a particular Class A2:
class. Naïve Bayes classifier assumes that each attribute has its W∙X+b<0
own individual effect on the class label, independent of the
values of other attributes. This is called class-conditional
independence. Bayesian classifiers are based on Bayes’
theorem.
Bayes Theorem: Let X be a data tuple and C be a class label.
Let X belongs to class C, then
P(C|X) = P(X|C)P(C) / P(X)
where;
• P(C|X) is the posterior probability of class C given predictor Fig. 11: Support Vector Machines Algorithm Modeling
X.
• P(C) is the prior probability of class. 3.2.3. K-NEAREST NEIGHBORS
• P(X|C) is the posterior probability of X given the class C.
In pattern recognition, the k-nearest neighbors algorithm (k-
• P(X) is the prior probability of predictor. NN) is a non-parametric method used
The classifier calculates P(C|X) for every class Ci for a given for classification and regression. In both cases, the input
tuple X. It will then predict that X
consists of the k closest training examples in the feature space.
belongs to the class having the highest posterior probability, The output depends on whether k-NN is used for
conditioned on X. That is X belongs
classification or regression.
to class Ci if and only if
P(Ci|X)> P(Cj|X) for 1 ≤ j ≤ m, j ≠ i. To determine which of the K instances in the training dataset
are most similar to a new input a distance measure is used.
For real-valued input variables, the most popular distance
measure is Euclidean distance.
Euclidean distance is calculated as the square root of the sum
of the squared differences between a new point (x) and an
existing point (xi) across all input attributes j.
Fig. 10: Naïve Bayes Algorithm Modeling Euclidean Distance(x, xi) = sqrt( sum( (xj – xij)^2 ) )
IJSER © 2018
http://www.ijser.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & ENGINEERING RESEARCH, VOLUME 9, ISSUE 8, AUGUST-2018 241
ISSN 2229-5518
IJSER
nearest neighbor. KNN has application in recommendation
system mostly.
3.2.5. RESULT
[3] Multilayer Perceptron, Wikipedia
https://en.wikipedia.org/wiki/Multilayer_perceptron\
4 CONCLUSION
The main limitation in carrying out this project was the limited
IJSER
dataset, which I had at my disposal. The next logical step in
the direction to improve the accuracy of prediction problem at
hand would be to test out the approaches and various
methodologies proposed in this paper using a larger and more
representative dataset. Also I would like to extend the features
like Weather condition, Nature of the pitch and Venue. The
accuracy will be even higher if Deep Neural network
(Tensorflow, Keras and Thaeno) comes into the
implementation. Not only this, Similar model can be
developed for other sports like Tennis, NBA and Football and
newer sports like Pro Kabaddi League.
ACKNOWLEDGMENT
REFERENCES