Hyperparameter Tuning For Machine Learning Models
Hyperparameter Tuning For Machine Learning Models
models
Hyperparameters tuning is crucial as they control the overall behavior of a
machine learning model. Every machine learning models will have different
hyperparameters that can be set.
The Titanic dataset from Kaggle is used for comparison. The purpose of this
article to explore how the performance and the computational time of the
random forest model are changing with various hyperparameter tuning
methods. After all, machine learning is all about finding the right balance
between computing time and the model’s performance.
We can get the default parameters used for the model using the
command. randomforest.get_params()
No need to worry if you don't know about these parameters and how they are
used. Usually, information about all the parameters can be found in Scikit
documentation of the models.
parameters ={'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
'criterion' : ['gini', 'entropy'],
'max_features': [0.3,0.5,0.7,0.9],
'min_samples_leaf': [3,5,7,10,15],
'min_samples_split': [2,5,10],
'n_estimators': [50,100,200,400,600]}
Using sklearn’s GridSearchCV, we can search our grid over and then run the grid
search.
Output
Our cross-validation score is improved from 81.56% to 84.12% with the Grid
search CV model compared with our baseline model. That is a 3.3%
improvement. The computational time is almost 5hrs which is not feasible for a
simple problem like this one.
Randomized Search
%%time
from sklearn.model_selection import RandomizedSearchCV
random_search=RandomizedSearchCV(estimator = RandomForestClassifier(),
param_distributions=parameters, n_jobs=-1,
n_iter=200)
random_result = random_search.fit(X_train, y_train)
print('Best Score: ', random_result.best_score_*100)
print('Best Params: ', random_result.best_params_)
Output