Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Mandal-21 / Flight-Price-Prediction Public

Code Issues 10 Pull requests 5 Actions Projects Security Ins


master

Flight-Price-Prediction / flight_price.ipynb

Mandal-21
Add files via upload


1
contributor

4316 lines (4316 sloc)



436 KB
Flight Price Prediction

In [1]:
import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

sns.set()

Importing dataset
1. Since data is in form of excel file we have to
use pandas read_excel to load the data
2. After loading it is important to check the
complete information of data as it can
indication many of the hidden infomation
such as null values in a column or a row
3. Check whether any null values are there or
not. if it is present then following can be
done,
A. Imputing data using Imputation method
in sklearn
B. Filling NaN values with mean, median
and mode using fillna() method
4. Describe data --> which can give statistical
analysis

In [2]:
train_data = pd.read_excel(r"E:\MachineLearni

In [3]:
pd.set_option('display.max_columns', None)

In [4]:
train_data.head()

Out[4]: Airline Date_of_Journey Source Destination Ro

BL
0 IndiGo 24/03/2019 Banglore New Delhi

Air →
1 1/05/2019 Kolkata Banglore
India →

DE

Jet
2 9/06/2019 Delhi Cochin
Airways B

3 IndiGo 12/05/2019 Kolkata Banglore


BL
4 IndiGo 01/03/2019 Banglore New Delhi

In [5]:
train_data.info()

RangeIndex: 10683 entries, 0 to 10682

Data columns (total 11 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Airline 10683 non-null object

1 Date_of_Journey 10683 non-null object

2 Source 10683 non-null object

3 Destination 10683 non-null object

4 Route 10682 non-null object

5 Dep_Time 10683 non-null object

6 Arrival_Time 10683 non-null object

7 Duration 10683 non-null object

8 Total_Stops 10682 non-null object

9 Additional_Info 10683 non-null object

10 Price 10683 non-null int64

dtypes: int64(1), object(10)

memory usage: 918.2+ KB

In [6]:
train_data["Duration"].value_counts()

Out[6]: 2h 50m 550

1h 30 386
1h 30m 386

2h 45m 337

2h 55m 337

2h 35m 329

...

42h 5m 1

28h 30m 1

36h 25m 1

40h 20m 1

30h 25m 1

Name: Duration, Length: 368, dtype: int64

In [7]:
train_data.dropna(inplace = True)

In [8]:
train_data.isnull().sum()

Out[8]: Airline 0

Date_of_Journey 0

Source 0

Destination 0

Route 0

Dep_Time 0

Arrival_Time 0

Duration 0

Total_Stops 0

Additional_Info 0

Price 0

dtype: int64

EDA
From description we can see that
Date_of_Journey is a object data type,

Therefore, we have to convert this datatype into


timestamp so as to use this column properly for
prediction

For this we require pandas to_datetime to


convert object data type to datetime dtype.

.dt.day method will extract only day of that


date

.dt.month method will extract only month of


that date

In [9]:
train_data["Journey_day"] = pd.to_datetime(tr

In [10]:
train_data["Journey_month"] = pd.to_datetime(

In [11]:
train_data.head()

Out[11]: Airline Date_of_Journey Source Destination Ro

BL
0 IndiGo 24/03/2019 Banglore New Delhi

Air →
1 1/05/2019 Kolkata Banglore
India →

DE

Jet
2 9/06/2019 Delhi Cochin
Airways B

3 IndiGo 12/05/2019 Kolkata Banglore


BL
4 IndiGo 01/03/2019 Banglore New Delhi

In [12]:
# Since we have converted Date_of_Journey col

train_data.drop(["Date_of_Journey"], axis = 1

In [13]:
# Departure time is when a plane leaves the g
# Similar to Date_of_Journey we can extract v

# Extracting Hours

train_data["Dep_hour"] = pd.to_datetime(train

# Extracting Minutes

train_data["Dep_min"] = pd.to_datetime(train_

# Now we can drop Dep_Time as it is of no use


train_data.drop(["Dep_Time"], axis = 1, inpla

In [14]:
train_data.head()

Out[14]: Airline Source Destination Route Arrival_Time

BLR →
0 IndiGo Banglore New Delhi 01:10 22 Mar
DEL

CCU
Air → IXR
1 Kolkata Banglore 13:15
India → BBI
→ BLR

DEL →
LKO
Jet →
2 Delhi Cochin 04:25 10 Jun
Airways BOM

COK

CCU

3 IndiGo Kolkata Banglore 23:30
NAG
→ BLR

BLR →
4 IndiGo Banglore New Delhi NAG 21:35
→ DEL

In [15]:
# Arrival time is when the plane pulls up to
# Similar to Date_of_Journey we can extract v

# Extracting Hours

train_data["Arrival_hour"] = pd.to_datetime(t

# Extracting Minutes

train_data["Arrival_min"] = pd.to_datetime(tr

# Now we can drop Arrival_Time as it is of no


train_data.drop(["Arrival_Time"], axis = 1, i

In [16]:
train data head()
train_data.head()

Out[16]: Airline Source Destination Route Duration To

BLR →
0 IndiGo Banglore New Delhi 2h 50m
DEL

CCU
Air → IXR
1 Kolkata Banglore 7h 25m
India → BBI
→ BLR

DEL →
LKO
Jet →
2 Delhi Cochin 19h
Airways BOM

COK

CCU

3 IndiGo Kolkata Banglore 5h 25m
NAG
→ BLR

BLR →
4 IndiGo Banglore New Delhi NAG 4h 45m
→ DEL

In [17]:
# Time taken by plane to reach destination is
# It is the differnce betwwen Departure Time

# Assigning and converting Duration column in


duration = list(train_data["Duration"])

for i in range(len(duration)):

if len(duration[i].split()) != 2: # Ch
if "h" in duration[i]:

duration[i] = duration[i].strip()
else:

duration[i] = "0h " + duration[i]


duration_hours = []

duration_mins = []

for i in range(len(duration)):

duration_hours.append(int(duration[i].spl
duration_mins.append(int(duration[i].spli

In [18]:
# Adding duration_hours and duration_mins lis

train_data["Duration_hours"] = duration_hours
train_data["Duration_mins"] = duration_mins

In [19]:
train_data.drop(["Duration"], axis = 1, inpla

In [20]:
train_data.head()

Out[20]: Airline Source Destination Route Total_Stops

BLR →
0 IndiGo Banglore New Delhi non-stop
DEL

CCU
Air → IXR
1 Kolkata Banglore 2 stops
India → BBI
→ BLR

DEL →
LKO
Jet →
2 Delhi Cochin 2 stops
Airways BOM

COK

CCU

3 IndiGo Kolkata Banglore 1 stop
NAG
→ BLR

BLR →
4 IndiGo Banglore New Delhi NAG 1 stop
→ DEL

Handling Categorical Data


One can find many ways to handle categorical
data. Some of them categorical data are,

1. Nominal data --> data are not in any order


--> OneHotEncoder is used in this case
2. Ordinal data --> data are in order -->
LabelEncoder is used in this case

In [21]:
train_data["Airline"].value_counts()

Out[21]: Jet Airways 3849

IndiGo 2053

Air India 1751

Multiple carriers 1196

SpiceJet 818

Vistara 479

Air Asia 319

GoAir 194

Multiple carriers Premium economy 13

Jet Airways Business 6

Vistara Premium economy 3

Trujet 1

Name: Airline, dtype: int64

In [22]:
# From graph we can see that Jet Airways Busi
# Apart from the first Airline almost all are

# Airline vs Price

sns.catplot(y = "Price", x = "Airline", data


plt.show()

In [23]:
# As Airline is Nominal Categorical data we w

Airline = train_data[["Airline"]]

Airline = pd.get_dummies(Airline, drop_first=


Airline.head()

Out[23]:
Airline_Air Airline_Jet
Airline_GoAir Airline_IndiGo
India Airways

0 0 0 1 0

1 1 0 0 0

2 0 0 0 1

3 0 0 1 0

4 0 0 1 0
In [24]:
train_data["Source"].value_counts()

Out[24]: Delhi 4536

Kolkata 2871

Banglore 2197

Mumbai 697

Chennai 381

Name: Source, dtype: int64

In [25]:
# Source vs Price

sns.catplot(y = "Price", x = "Source", data =


plt.show()

In [26]:
# As Source is Nominal Categorical data we wi

Source = train_data[["Source"]]

Source = pd.get_dummies(Source, drop_first= T


Source.head()

Out[26]: Source_Chennai Source_Delhi Source_Kolkata Sour

0 0 0 0

1 0 0 1

2 0 1 0

3 0 0 1

4 0 0 0

In [27]:
train_data["Destination"].value_counts()

Out[27]: Cochin 4536

Banglore 2871

Delhi 1265

New Delhi 932

Hyderabad 697

Kolkata 381

Name: Destination, dtype: int64

In [28]:
# As Destination is Nominal Categorical data

Destination = train_data[["Destination"]]

Destination = pd.get_dummies(Destination, dro


Destination.head()

Out[28]:
Destination_Cochin Destination_Delhi Destination_H

0 0 0

1 0 0

2 1 0

3 0 0

4 0 0

In [29]:
train_data["Route"]

Out[29]: 0 BLR →
DEL

1 CCU → IXR → BBI →


BLR

2 DEL → LKO → BOM →


COK

3 CCU → NAG →
BLR

4 BLR → NAG →
DEL

...

10678 CCU → BLR

10679 CCU → BLR

10680 BLR → DEL

10681 BLR → DEL

10682 DEL → GOI → BOM → COK

Name: Route, Length: 10682, dtype: object

In [30]:
# Additional_Info contains almost 80% no_info
# Route and Total_Stops are related to each o

train_data.drop(["Route", "Additional_Info"],

In [31]:
train_data["Total_Stops"].value_counts()

Out[31]: 1 stop 5625

non-stop 3491

2 stops 1520

3 stops 45

4 stops 1

Name: Total_Stops, dtype: int64

In [32]:
# As this is case of Ordinal Categorical type
# Here Values are assigned with corresponding

train_data.replace({"non-stop": 0, "1 stop":

In [33]:
train_data.head()

Out[33]: Airline Source Destination Total_Stops Price

0 IndiGo Banglore New Delhi 0 3897

Air
1 Kolkata Banglore 2 7662
India

Jet
2 Delhi Cochin 2 13882
Airways

3 IndiGo Kolkata Banglore 1 6218

4 IndiGo Banglore New Delhi 1 13302

In [34]:
# Concatenate dataframe --> train_data + Airl

data_train = pd.concat([train_data, Airline,

In [35]:
data_train.head()

Out[35]:
Airline Source Destination Total_Stops Price

0 IndiGo Banglore New Delhi 0 3897

Air
1 Kolkata Banglore 2 7662
India

Jet
2 Delhi Cochin 2 13882
Airways

3 IndiGo Kolkata Banglore 1 6218

4 IndiGo Banglore New Delhi 1 13302

In [36]:
data_train.drop(["Airline", "Source", "Destin

In [37]:
data_train.head()

Out[37]:
Total_Stops Price Journey_day Journey_month D

0 0 3897 24 3

1 2 7662 1 5

2 2 13882 9 6

3 1 6218 12 5

4 1 13302 1 3

In [38]:
data_train.shape

Out[38]: (10682, 30)


Test set
In [39]:
test_data = pd.read_excel(r"E:\MachineLearnin

In [40]:
test_data.head()

Out[40]: Airline Date_of_Journey Source Destination R

D
Jet
0 6/06/2019 Delhi Cochin
Airways

1 IndiGo 12/05/2019 Kolkata Banglore


D
Jet
2 21/05/2019 Delhi Cochin
Airways

D
Multiple
3 21/05/2019 Delhi Cochin
carriers

B
4 Air Asia 24/06/2019 Banglore Delhi

In [41]:
# Preprocessing

print("Test data Info")

print("-"*75)

print(test_data.info())

print()

print()

print("Null values :")

print("-"*75)

test_data.dropna(inplace = True)

print(test_data.isnull().sum())

# EDA

# Date_of_Journey

test_data["Journey_day"] = pd.to_datetime(tes
test_data["Journey_month"] = pd.to_datetime(t
test_data.drop(["Date_of_Journey"], axis = 1,

# Dep_Time

test_data["Dep_hour"] = pd.to_datetime(test_d
test_data["Dep_min"] = pd.to_datetime(test_da
test_data.drop(["Dep_Time"], axis = 1, inplac

# Arrival_Time

test_data["Arrival_hour"] = pd.to_datetime(te
test_data["Arrival_min"] = pd.to_datetime(tes
test_data.drop(["Arrival_Time"], axis = 1, in

# Duration

duration = list(test_data["Duration"])

for i in range(len(duration)):

if len(duration[i].split()) != 2: # Ch
if "h" in duration[i]:

duration[i] = duration[i].strip()
else:

duration[i] = "0h " + duration[i]

duration_hours = []

duration_mins = []

for i in range(len(duration)):

duration_hours.append(int(duration[i].spl
duration_mins.append(int(duration[i].spli

# Adding Duration column to test set

test_data["Duration_hours"] = duration_hours

test_data["Duration_mins"] = duration_mins

test_data.drop(["Duration"], axis = 1, inplac

# Categorical data

print("Airline")

print("-"*75)

print(test_data["Airline"].value_counts())

Airline = pd.get_dummies(test_data["Airline"]

print()

print("Source")

print("-"*75)

print(test_data["Source"].value_counts())

Source = pd.get_dummies(test_data["Source"],

print()

print("Destination")

print("-"*75)

print(test_data["Destination"].value_counts()
Destination = pd.get_dummies(test_data["Desti

# Additional_Info contains almost 80% no_info


# Route and Total_Stops are related to each o
test_data.drop(["Route", "Additional_Info"],

# Replacing Total_Stops

test_data.replace({"non-stop": 0, "1 stop": 1


# Concatenate dataframe --> test_data + Airli


data_test = pd.concat([test_data, Airline, So

data_test.drop(["Airline", "Source", "Destina

print()

print()

print("Shape of test data : ", data_test.shap

Test data Info

---------------------------------------------
------------------------------

RangeIndex: 2671 entries, 0 to 2670

Data columns (total 10 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Airline 2671 non-null object

1 Date_of_Journey 2671 non-null object

2 Source 2671 non-null object

3 Destination 2671 non-null object

4 Route 2671 non-null object

5 Dep_Time 2671 non-null object

6 Arrival_Time 2671 non-null object

7 Duration 2671 non-null object

8 Total_Stops 2671 non-null object

9 Additional_Info 2671 non-null object

dtypes: object(10)

memory usage: 208.8+ KB

None

Null values :

---------------------------------------------
------------------------------

Airline 0

Date_of_Journey 0

Source 0

Destination 0

Route 0

Dep_Time 0

Arrival_Time 0

Duration 0

Total_Stops 0

Additional_Info 0

dtype: int64

Airline

---------------------------------------------
------------------------------

Jet Airways 897

IndiGo 511

Air India 440

Multiple carriers 347

SpiceJet 208

Vistara 129

Air Asia 86

GoAir 46

Multiple carriers Premium economy 3

Jet Airways Business 2

Vistara Premium economy 2


Vistara Premium economy 2

Name: Airline, dtype: int64

Source

---------------------------------------------
------------------------------

Delhi 1145

Kolkata 710

Banglore 555

Mumbai 186

Chennai 75

Name: Source, dtype: int64

Destination

---------------------------------------------
------------------------------

Cochin 1145

Banglore 710

Delhi 317

New Delhi 238

Hyderabad 186

Kolkata 75

Name: Destination, dtype: int64

Shape of test data : (2671, 28)

In [42]:
data_test.head()

Out[42]:
Total_Stops Journey_day Journey_month Dep_hou

0 1 6 6 1

1 1 12 5

2 1 21 5 1

3 1 21 5

4 0 24 6 2

Feature Selection
Finding out the best feature which will contribute
and have good relation with target variable.
Following are some of the feature selection
methods,

1. heatmap
2. feature_importance_
3. SelectKBest

In [43]:
data_train.shape

Out[43]: (10682, 30)

In [44]:
data_train.columns

Out[44]: Index(['Total_Stops', 'Price', 'Journey_day',


'Journey_month', 'Dep_hour',

'Dep_min', 'Arrival_hour', 'Arrival_mi


n', 'Duration_hours',

'Duration_mins', 'Airline_Air India',


'Airline_GoAir', 'Airline_IndiGo',
'Airline_Jet Airways', 'Airline_Jet Ai
rways Business',

'Airline_Multiple carriers',

'Airline_Multiple carriers Premium eco


nomy', 'Airline_SpiceJet',

'Airline_Trujet', 'Airline_Vistara',
'Airline_Vistara Premium economy',
'Source_Chennai', 'Source_Delhi', 'Sou
rce_Kolkata', 'Source_Mumbai',

'Destination_Cochin', 'Destination_Del
hi', 'Destination_Hyderabad',

'Destination_Kolkata', 'Destination_Ne
w Delhi'],

dtype='object')

In [45]:
X = data_train.loc[:, ['Total_Stops', 'Journe
'Dep_min', 'Arrival_hour', 'Arrival_mi
'Duration_mins', 'Airline_Air India',
'Airline_Jet Airways', 'Airline_Jet Ai
'Airline_Multiple carriers',

'Airline_Multiple carriers Premium eco


'Airline_Trujet', 'Airline_Vistara', '
'Source_Chennai', 'Source_Delhi', 'Sou
'Destination_Cochin', 'Destination_Del
'Destination_Kolkata', 'Destination_Ne
X.head()

Out[45]:
Total_Stops Journey_day Journey_month Dep_hou

0 0 24 3 2
1 2 1 5

2 2 9 6

3 1 12 5 1

4 1 1 3 1

In [46]:
y = data_train.iloc[:, 1]

y.head()

Out[46]: 0 3897

1 7662

2 13882

3 6218

4 13302

Name: Price, dtype: int64

In [47]:
# Finds correlation between Independent and d

plt.figure(figsize = (18,18))

sns.heatmap(train_data.corr(), annot = True,


plt.show()

In [48]:
# Important feature using ExtraTreesRegressor

from sklearn.ensemble import ExtraTreesRegres


l ti E t T R ()
selection = ExtraTreesRegressor()
selection.fit(X, y)

Out[48]: ExtraTreesRegressor(bootstrap=False, ccp_alph


a=0.0, criterion='mse',

max_depth=None, max_featu
res='auto', max_leaf_nodes=None,

max_samples=None, min_imp
urity_decrease=0.0,

min_impurity_split=None,
min_samples_leaf=1,

min_samples_split=2, min_
weight_fraction_leaf=0.0,

n_estimators=100, n_jobs=
None, oob_score=False,

random_state=None, verbos
e=0, warm_start=False)

In [49]:
print(selection.feature_importances_)

[1.95434120e-01 1.44021586e-01 5.38555703e-02


2.40539410e-02

2.17430940e-02 2.73470116e-02 1.96095029e-02


1.27500039e-01

1.74337150e-02 1.06741541e-02 1.87128973e-03


1.65718507e-02

1.51364596e-01 6.81637760e-02 1.89902192e-02


8.47326514e-04

3.10615675e-03 1.08644888e-04 5.29446496e-03


9.35369428e-05

6.27777871e-04 1.23137157e-02 3.24867396e-03


8.91085147e-03

1.41174133e-02 1.95895945e-02 7.68279226e-03


4.01014841e-04

2.50235698e-02]

In [50]:
#plot graph of feature importances for better

plt.figure(figsize = (12,8))

feat_importances = pd.Series(selection.featur
feat_importances.nlargest(20).plot(kind='barh
plt.show()

Fitting model using


Random Forest
1. Split dataset into train and test set in order
to prediction w.r.t X_test
2. If needed do scaling of data
Scaling is not done in Random forest
3. Import model
4. Fit the data
5. Predict w.r.t X_test
6. In regression check RSME Score
7. Plot graph

In [51]:
from sklearn.model_selection import train_tes
X_train, X_test, y_train, y_test = train_test

In [52]:
from sklearn.ensemble import RandomForestRegr
reg_rf = RandomForestRegressor()

reg_rf.fit(X_train, y_train)

Out[52]: RandomForestRegressor(bootstrap=True, ccp_alp


ha=0.0, criterion='mse',

max_depth=None, max_fea
tures='auto', max_leaf_nodes=None,
max_samples=None, min_i
mpurity_decrease=0.0,

min_impurity_split=Non
e, min_samples_leaf=1,

min_samples_split=2, mi
n_weight_fraction_leaf=0.0,

n_estimators=100, n_job
s=None, oob_score=False,

random_state=None, verb
ose=0, warm_start=False)

In [53]:
y_pred = reg_rf.predict(X_test)

In [54]:
reg_rf.score(X_train, y_train)

Out[54]: 0.9539164511170628

In [55]:
reg_rf.score(X_test, y_test)

Out[55]: 0.798383043987616

In [56]:
sns.distplot(y_test-y_pred)

plt.show()

In [57]:
plt.scatter(y_test, y_pred, alpha = 0.5)

plt.xlabel("y_test")

plt.ylabel("y_pred")

plt.show()

In [58]:
from sklearn import metrics

In [59]:
print('MAE:', metrics.mean_absolute_error(y_t
print('MSE:', metrics.mean_squared_error(y_te
print('RMSE:', np.sqrt(metrics.mean_squared_e

MAE: 1172.5455945373583

MSE: 4347276.1614450775

RMSE: 2085.0122688955757

In [60]:
# RMSE/(max(DV)-min(DV))

2090.5509/(max(y)-min(y))

Out[60]: 0.026887077025966846

In [61]:
metrics.r2_score(y_test, y_pred)

Out[61]: 0.7983830439876158

In [ ]:

Hyperparameter Tuning
Choose following method for
hyperparameter tuning
1. RandomizedSearchCV --> Fast
2. GridSearchCV
Assign hyperparameters in form of
dictionery
Fit the model
Check best paramters and best score

In [62]:
from sklearn.model_selection import Randomize

In [63]:
#Randomized Search CV

# Number of trees in random forest

n_estimators = [int(x) for x in np.linspace(s


# Number of features to consider at every spl
max_features = ['auto', 'sqrt']

# Maximum number of levels in tree

max_depth = [int(x) for x in np.linspace(5, 3


# Minimum number of samples required to split
min_samples_split = [2, 5, 10, 15, 100]

# Minimum number of samples required at each


min_samples_leaf = [1, 2, 5, 10]

In [64]:
# Create the random grid

random_grid = {'n_estimators': n_estimators,

'max_features': max_features,

'max_depth': max_depth,

'min_samples_split': min_sampl
'min_samples_leaf': min_sample

In [65]:
# Random search of parameters, using 5 fold c
# search across 100 different combinations

rf_random = RandomizedSearchCV(estimator = re

In [66]:
rf_random.fit(X_train,y_train)

Fitting 5 folds for each of 10 candidates, to


talling 50 fits

[CV] n_estimators=900, min_samples_split=5, m


in_samples_leaf=5, max_features=sqrt, max_dep
th=10

[Parallel(n_jobs=1)]: Using backend Sequentia


lBackend with 1 concurrent workers.

[CV] n_estimators=900, min_samples_split=5,


min_samples_leaf=5, max_features=sqrt, max_de
pth=10, total= 13.0s

[CV] n_estimators=900, min_samples_split=5, m


in_samples_leaf=5, max_features=sqrt, max_dep
th=10

[Parallel(n_jobs=1)]: Done 1 out of 1 | e


lapsed: 12.9s remaining: 0.0s
[CV] n_estimators=900, min_samples_split=5,
min_samples_leaf=5, max_features=sqrt, max_de
pth=10, total= 11.5s

[CV] n_estimators=900, min_samples_split=5, m


in_samples_leaf=5, max_features=sqrt, max_dep
th=10

[CV] n_estimators=900, min_samples_split=5,


min_samples_leaf=5, max_features=sqrt, max_de
pth=10, total= 11.8s

[CV] n_estimators=900, min_samples_split=5, m


in_samples_leaf=5, max_features=sqrt, max_dep
th=10

[CV] n_estimators=900, min_samples_split=5,


min_samples_leaf=5, max_features=sqrt, max_de
pth=10, total= 12.4s

[CV] n_estimators=900, min_samples_split=5, m


in_samples_leaf=5, max_features=sqrt, max_dep
th=10

[CV] n_estimators=900, min_samples_split=5,


min_samples_leaf=5, max_features=sqrt, max_de
pth=10, total= 10.9s

[CV] n_estimators=1100, min_samples_split=10,


min_samples_leaf=2, max_features=sqrt, max_de
pth=15

[CV] n_estimators=1100, min_samples_split=1


0, min_samples_leaf=2, max_features=sqrt, max
_depth=15, total= 18.6s

[CV] n_estimators=1100, min_samples_split=10,


min_samples_leaf=2, max_features=sqrt, max_de
pth=15

[CV] n_estimators=1100, min_samples_split=1


0, min_samples_leaf=2, max_features=sqrt, max
_depth=15, total= 17.9s

[CV] n_estimators=1100, min_samples_split=10,


min_samples_leaf=2, max_features=sqrt, max_de
pth=15

[CV] n_estimators=1100, min_samples_split=1


0, min_samples_leaf=2, max_features=sqrt, max
_depth=15, total= 18.5s

[CV] n_estimators=1100, min_samples_split=10,


min_samples_leaf=2, max_features=sqrt, max_de
pth=15

[CV] n_estimators=1100, min_samples_split=1


0, min_samples_leaf=2, max_features=sqrt, max
_depth=15, total= 16.6s

[CV] n_estimators=1100, min_samples_split=10,


min_samples_leaf=2, max_features=sqrt, max_de
pth=15

[CV] n_estimators=1100, min_samples_split=1


0, min_samples_leaf=2, max_features=sqrt, max
_depth=15, total= 16.6s

[CV] n estimators=300, min samples split=100,


[CV] n_estimators 300, min_samples_split 100,
min_samples_leaf=5, max_features=auto, max_de
pth=15

[CV] n_estimators=300, min_samples_split=10


0, min_samples_leaf=5, max_features=auto, max
_depth=15, total= 10.8s

[CV] n_estimators=300, min_samples_split=100,


min_samples_leaf=5, max_features=auto, max_de
pth=15

[CV] n_estimators=300, min_samples_split=10


0, min_samples_leaf=5, max_features=auto, max
_depth=15, total= 8.9s

[CV] n_estimators=300, min_samples_split=100,


min_samples_leaf=5, max_features=auto, max_de
pth=15

[CV] n_estimators=300, min_samples_split=10


0, min_samples_leaf=5, max_features=auto, max
_depth=15, total= 8.4s

[CV] n_estimators=300, min_samples_split=100,


min_samples_leaf=5, max_features=auto, max_de
pth=15

[CV] n_estimators=300, min_samples_split=10


0, min_samples_leaf=5, max_features=auto, max
_depth=15, total= 9.0s

[CV] n_estimators=300, min_samples_split=100,


min_samples_leaf=5, max_features=auto, max_de
pth=15

[CV] n_estimators=300, min_samples_split=10


0, min_samples_leaf=5, max_features=auto, max
_depth=15, total= 8.6s

[CV] n_estimators=400, min_samples_split=5, m


in_samples_leaf=5, max_features=auto, max_dep
th=15

[CV] n_estimators=400, min_samples_split=5,


min_samples_leaf=5, max_features=auto, max_de
pth=15, total= 16.6s

[CV] n_estimators=400, min_samples_split=5, m


in_samples_leaf=5, max_features=auto, max_dep
th=15

[CV] n_estimators=400, min_samples_split=5,


min_samples_leaf=5, max_features=auto, max_de
pth=15, total= 16.0s

[CV] n_estimators=400, min_samples_split=5, m


in_samples_leaf=5, max_features=auto, max_dep
th=15

[CV] n_estimators=400, min_samples_split=5,


min_samples_leaf=5, max_features=auto, max_de
pth=15, total= 15.3s

[CV] n_estimators=400, min_samples_split=5, m


in_samples_leaf=5, max_features=auto, max_dep
th=15

[CV] n_estimators=400, min_samples_split=5,


min_samples_leaf=5, max_features=auto, max_de
pth=15, total= 16.0s

[CV] n_estimators=400, min_samples_split=5, m


in_samples_leaf=5, max_features=auto, max_dep
th=15

[CV] n_estimators=400, min_samples_split=5,


min_samples_leaf=5, max_features=auto, max_de
pth=15, total= 15.8s

[CV] n_estimators=700, min_samples_split=5, m


in_samples_leaf=10, max_features=auto, max_de
pth=20

[CV] n_estimators=700, min_samples_split=5,


min_samples_leaf=10, max_features=auto, max_d
epth=20, total= 25.0s

[CV] n_estimators=700, min_samples_split=5, m


in_samples_leaf=10, max_features=auto, max_de
pth=20

[CV] n_estimators=700, min_samples_split=5,


min_samples_leaf=10, max_features=auto, max_d
epth=20, total= 25.6s

[CV] n_estimators=700, min_samples_split=5, m


in_samples_leaf=10, max_features=auto, max_de
pth=20

[CV] n_estimators=700, min_samples_split=5,


min_samples_leaf=10, max_features=auto, max_d
epth=20, total= 25.2s

[CV] n_estimators=700, min_samples_split=5, m


in_samples_leaf=10, max_features=auto, max_de
pth=20

[CV] n_estimators=700, min_samples_split=5,


min_samples_leaf=10, max_features=auto, max_d
epth=20, total= 24.2s

[CV] n_estimators=700, min_samples_split=5, m


in_samples_leaf=10, max_features=auto, max_de
pth=20

[CV] n_estimators=700, min_samples_split=5,


min_samples_leaf=10, max_features=auto, max_d
epth=20, total= 25.5s

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25, total= 24.9s

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25, total= 24.8s

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25, total= 24.7s

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25, total= 25.4s

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25

[CV] n_estimators=1000, min_samples_split=2,


min_samples_leaf=1, max_features=sqrt, max_de
pth=25, total= 25.5s

[CV] n_estimators=1100, min_samples_split=15,


min_samples_leaf=10, max_features=sqrt, max_d
epth=5

[CV] n_estimators=1100, min_samples_split=1


5, min_samples_leaf=10, max_features=sqrt, ma
x_depth=5, total= 9.4s

[CV] n_estimators=1100, min_samples_split=15,


min_samples_leaf=10, max_features=sqrt, max_d
epth=5

[CV] n estimators=1100 min samples split=1


[CV] n_estimators=1100, min_samples_split=1
5, min_samples_leaf=10, max_features=sqrt, ma
x_depth=5, total= 8.9s

[CV] n_estimators=1100, min_samples_split=15,


min_samples_leaf=10, max_features=sqrt, max_d
epth=5

[CV] n_estimators=1100, min_samples_split=1


5, min_samples_leaf=10, max_features=sqrt, ma
x_depth=5, total= 9.3s

[CV] n_estimators=1100, min_samples_split=15,


min_samples_leaf=10, max_features=sqrt, max_d
epth=5

[CV] n_estimators=1100, min_samples_split=1


5, min_samples_leaf=10, max_features=sqrt, ma
x_depth=5, total= 9.4s

[CV] n_estimators=1100, min_samples_split=15,


min_samples_leaf=10, max_features=sqrt, max_d
epth=5

[CV] n_estimators=1100, min_samples_split=1


5, min_samples_leaf=10, max_features=sqrt, ma
x_depth=5, total= 9.1s

[CV] n_estimators=300, min_samples_split=15,


min_samples_leaf=1, max_features=sqrt, max_de
pth=15

[CV] n_estimators=300, min_samples_split=15,


min_samples_leaf=1, max_features=sqrt, max_de
pth=15, total= 4.7s

[CV] n_estimators=300, min_samples_split=15,


min_samples_leaf=1, max_features=sqrt, max_de
pth=15

[CV] n_estimators=300, min_samples_split=15,


min_samples_leaf=1, max_features=sqrt, max_de
pth=15, total= 4.6s

[CV] n_estimators=300, min_samples_split=15,


min_samples_leaf=1, max_features=sqrt, max_de
pth=15

[CV] n_estimators=300, min_samples_split=15,


min_samples_leaf=1, max_features=sqrt, max_de
pth=15, total= 4.1s

[CV] n_estimators=300, min_samples_split=15,


min_samples_leaf=1, max_features=sqrt, max_de
pth=15

[CV] n_estimators=300, min_samples_split=15,


min_samples_leaf=1, max_features=sqrt, max_de
pth=15, total= 4.3s

[CV] n estimators 300 min samples split 15

You might also like