IPL - Prediction - Model - Training - Final - Ipynb - Colab
IPL - Prediction - Model - Training - Final - Ipynb - Colab
The Dataset contains ball by ball information of the matches played between IPL Teams of Season 1 to 10, i.e. from 2008 to 2017.
This Machine Learning model adapts a Regression Appoach to predict the score of the First Inning of an IPL Match.
The Dataset can be downloaded from Kaggle from here.
account_circle '1.25.2'
Mount your Google Drive and save the dataset in the Drive name "data.csv"
mid date venue batting_team bowling_team batsman bowler runs wickets overs runs_last_5 wickets_last_5 striker non-striker total
0 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders Royal Challengers Bangalore SC Ganguly P Kumar 1 0 0.1 1 0 0 0 222
1 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders Royal Challengers Bangalore BB McCullum P Kumar 1 0 0.2 1 0 0 0 222
2 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders Royal Challengers Bangalore BB McCullum P Kumar 2 0 0.2 2 0 0 0 222
3 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders Royal Challengers Bangalore BB McCullum P Kumar 2 0 0.3 2 0 0 0 222
4 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders Royal Challengers Bangalore BB McCullum P Kumar 2 0 0.4 2 0 0 0 222
count 76014.000000 76014.000000 76014.000000 76014.000000 76014.000000 76014.000000 76014.000000 76014.000000 76014.000000
mean 308.627740 74.889349 2.415844 9.783068 33.216434 1.120307 24.962283 8.869287 160.901452
std 178.156878 48.823327 2.015207 5.772587 14.914174 1.053343 20.079752 10.795742 29.246231
min 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 67.000000
25% 154.000000 34.000000 1.000000 4.600000 24.000000 0.000000 10.000000 1.000000 142.000000
50% 308.000000 70.000000 2.000000 9.600000 34.000000 1.000000 20.000000 5.000000 162.000000
75% 463.000000 111.000000 4.000000 14.600000 43.000000 2.000000 35.000000 13.000000 181.000000
max 617.000000 263.000000 10.000000 19.600000 113.000000 7.000000 175.000000 109.000000 263.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76014 entries, 0 to 76013
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 mid 76014 non-null int64
1 date 76014 non-null object
2 venue 76014 non-null object
3 batting_team 76014 non-null object
4 bowling_team 76014 non-null object
5 batsman 76014 non-null object
6 bowler 76014 non-null object
7 runs 76014 non-null int64
8 wickets 76014 non-null int64
9 overs 76014 non-null float64
10 runs_last_5 76014 non-null int64
11 wickets_last_5 76014 non-null int64
12 striker 76014 non-null int64
13 non-striker 76014 non-null int64
14 total 76014 non-null int64
dtypes: float64(1), int64(8), object(6)
memory usage: 8.7+ MB
mid 617
date 442
venue 35
batting_team 14
bowling_team 14
batsman 411
bowler 329
runs 252
wickets 11
overs 140
runs_last_5 102
wickets_last_5 8
striker 155
non-striker 88
total 138
dtype: int64
# Datatypes of all Columns
data.dtypes
mid int64
date object
venue object
batting_team object
bowling_team object
batsman object
bowler object
runs int64
wickets int64
overs float64
runs_last_5 int64
wickets_last_5 int64
striker int64
non-striker int64
total int64
dtype: object
Here, we can see that columns ['mid', 'date', 'venue', 'batsman', 'bowler', 'striker', 'non-striker'] won't provide any relevant information for our model
to train
<ipython-input-37-6e893f89f1c2>:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
32 3 6 61 0 5.1 59 0 222
33 3 6 61 1 5.2 59 1 222
34 3 6 61 1 5.3 59 1 222
35 3 6 61 1 5.4 59 1 222
36 3 6 61 1 5.5 58 1 222
<Axes: >
data.columns
data = np.array(columnTransformer.fit_transform(data))
batting_team_Royal
batting_team_Chennai batting_team_Delhi batting_team_Kings batting_team_Kolkata batting_team_Mumbai batting_team_Rajasthan batting_team_Sunrisers bowling_team_Chennai bo
Challengers
Super Kings Daredevils XI Punjab Knight Riders Indians Royals Hyderabad Super Kings
Bangalore
5 rows × 22 columns
▾ DecisionTreeRegressor
DecisionTreeRegressor()
# Evaluate Model
train_score_tree = str(tree.score(train_features, train_labels) * 100)
test_score_tree = str(tree.score(test_features, test_labels) * 100)
print(f'Train Score : {train_score_tree[:5]}%\nTest Score : {test_score_tree[:5]}%')
models["tree"] = test_score_tree
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
▾ LogisticRegression
LogisticRegression()
▾ Ridge
Ridge()
▾ LinearRegression
LinearRegression()
# Evaluate Model
train_score_linreg = str(linreg.score(train_features, train_labels) * 100)
test_score_linreg = str(linreg.score(test_features, test_labels) * 100)
print(f'Train Score : {train_score_linreg[:5]}%\nTest Score : {test_score_linreg[:5]}%')
models["linreg"] = test_score_linreg
▾ RandomForestRegressor
RandomForestRegressor()
# Evaluate Model
train_score_forest = str(forest.score(train_features, train_labels)*100)
test_score_forest = str(forest.score(test_features, test_labels)*100)
print(f'Train Score : {train_score_forest[:5]}%\nTest Score : {test_score_forest[:5]}%')
models["forest"] = test_score_forest
▾ LassoCV
LassoCV()
# Evaluate Model
train_score_lasso = str(lasso.score(train_features, train_labels)*100)
test_score_lasso = str(lasso.score(test_features, test_labels)*100)
print(f'Train Score : {train_score_lasso[:5]}%\nTest Score : {test_score_lasso[:5]}%')
models["lasso"] = test_score_lasso
From above, we can see that Random Forest performed the best, closely followed by Decision Tree and Neural Networks. So we will be
choosing Random Forest for the final model
keyboard_arrow_down Predictions
def predict_score(batting_team, bowling_team, runs, wickets, overs, runs_last_5, wickets_last_5, model=forest):
prediction_array = []
# Batting Team
if batting_team == 'Chennai Super Kings':
prediction_array = prediction_array + [1,0,0,0,0,0,0,0]
elif batting_team == 'Delhi Daredevils':
prediction_array = prediction_array + [0,1,0,0,0,0,0,0]
elif batting_team == 'Kings XI Punjab':
prediction_array = prediction_array + [0,0,1,0,0,0,0,0]
elif batting_team == 'Kolkata Knight Riders':
prediction_array = prediction_array + [0,0,0,1,0,0,0,0]
elif batting_team == 'Mumbai Indians':
prediction_array = prediction_array + [0,0,0,0,1,0,0,0]
elif batting_team == 'Rajasthan Royals':
prediction_array = prediction_array + [0,0,0,0,0,1,0,0]
elif batting_team == 'Royal Challengers Bangalore':
prediction_array = prediction_array + [0,0,0,0,0,0,1,0]
elif batting_team == 'Sunrisers Hyderabad':
prediction_array = prediction_array + [0,0,0,0,0,0,0,1]
# Bowling Team
if bowling_team == 'Chennai Super Kings':
prediction_array = prediction_array + [1,0,0,0,0,0,0,0]
elif bowling_team == 'Delhi Daredevils':
prediction_array = prediction_array + [0,1,0,0,0,0,0,0]
elif bowling_team == 'Kings XI Punjab':
prediction_array = prediction_array + [0,0,1,0,0,0,0,0]
elif bowling_team == 'Kolkata Knight Riders':
prediction_array = prediction_array + [0,0,0,1,0,0,0,0]
elif bowling_team == 'Mumbai Indians':
prediction_array = prediction_array + [0,0,0,0,1,0,0,0]
elif bowling_team == 'Rajasthan Royals':
prediction_array = prediction_array + [0,0,0,0,0,1,0,0]
elif bowling_team == 'Royal Challengers Bangalore':
prediction_array = prediction_array + [0,0,0,0,0,0,1,0]
elif bowling_team == 'Sunrisers Hyderabad':
prediction_array = prediction_array + [0,0,0,0,0,0,0,1]
prediction_array = prediction_array + [runs, wickets, overs, runs_last_5, wickets_last_5]
prediction_array = np.array([prediction_array])
pred = model.predict(prediction_array)
return int(round(pred[0]))
keyboard_arrow_down Test 1
Batting Team : Delhi Daredevils
Bowling Team : Chennai Super Kings
Final Score : 147/9
batting_team='Delhi Daredevils'
bowling_team='Chennai Super Kings'
score = predict_score(batting_team, bowling_team, overs=10.2, runs=68, wickets=3, runs_last_5=29, wickets_last_5=1)
print(f'Predicted Score : {score} || Actual Score : 147')
keyboard_arrow_down Test 2
Batting Team : Mumbai Indians
Bowling Team : Kings XI Punjab
Final Score : 176/7
batting_team='Mumbai Indians'
bowling_team='Kings XI Punjab'
score = predict_score(batting_team, bowling_team, overs=12.3, runs=113, wickets=2, runs_last_5=55, wickets_last_5=0)
print(f'Predicted Score : {score} || Actual Score : 176')
# Live Test
batting_team="Kings XI Punjab"
bowling_team="Rajasthan Royals"
score = predict_score(batting_team, bowling_team, overs=14.0, runs=118, wickets=1, runs_last_5=45, wickets_last_5=0)
print(f'Predicted Score : {score} || Actual Score : 185')
# Live Test
batting_team="Kolkata Knight Riders"
bowling_team="Chennai Super Kings"
score = predict_score(batting_team, bowling_team, overs=18.0, runs=150, wickets=4, runs_last_5=57, wickets_last_5=1)
print(f'Predicted Score : {score} || Actual Score : 172')
batting_team='Delhi Daredevils'
bowling_team='Mumbai Indians'
score = predict_score(batting_team, bowling_team, overs=18.0, runs=96, wickets=8, runs_last_5=18, wickets_last_5=4)
print(f'Predicted Score : {score} || Actual Score : 110')
batting_team='Kings XI Punjab'
bowling_team='Chennai Super Kings'
score = predict_score(batting_team, bowling_team, overs=18.0, runs=129, wickets=6, runs_last_5=34, wickets_last_5=2)
print(f'Predicted Score : {score} || Actual Score : 153')
dump(forest, "forest_model.pkl")
dump(tree, "tree_model.pkl")
dump(neural_net, "neural_nets_model.pkl")