0% found this document useful (0 votes)
78 views9 pages

Hyperparameter Tuning

Uploaded by

Dipesh Pandit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
78 views9 pages

Hyperparameter Tuning

Uploaded by

Dipesh Pandit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 9
Jaswanth Badvelu Oct 31,2020 » 7minread + © Listen ty Save Hyperparameter tuning for Machine learning models Improving the accuracy of machine learning models using Random Search, Grid Search, and HyperOpt optimization methods Photo by Joe Caione on Unsplash Introduction This article covers the comparison and implementation of random search, grid search, and Bayesian optimization methods using Sci-kit learn and HyperOpt libraries for hyperparameter tuning of the machine learning model. Hyperparameters tuning is crucial as they control the overall behavior of a machine learning model. Every machine learning models will have different hyperparameters that can be set. A hyperparameter is a parameter whose value is set before the learning process begins. I will be using the Titanic dataset from Kaggle for comparison. The purpose of this article to explore how the performance and the computational time of the random forest model are changing with various hyperparameter tuning methods. After all, machine learning is all about finding the right balance between computing time and the model's performance. Baseline model with default parameters: random_forest = RandomForestClassi fier (random_state=1) .fit(X_train, y_train) random_forest.score(X_test,y_test) The accuracy of this model, when used on the testing set, is 81.56. We can get the default parameters used for the model using the command. randomforest.get_params() The default parameters are: {'bootstrap': True, 'ccp_alpha': 0.0, 'class_weight': None, ‘criterion’: 'gini', ‘max_depth': None, 'max_features': ‘auto’, ‘max_leaf_nodes': None, 'max_samples': None, ‘iin_impurity_decrease': ©.0, 'min_impurity_split': None, ‘min_samples_leaf': 1, 'min_samples_split': 2, ‘min_weight_fraction_leaf': 0.0, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, ‘random_state': 1, ‘verbose’: 0, ‘warm_start': False} No need to worry if you don't know about these parameters and how they are used. Usually, information about all the parameters can be found in Scikit documentation of the models. Some important Parameters in Random Forest: 1. max_depth: int, default=None This is used to select how deep you want to make each tree in the forest. The deeper the tree, the more splits it has, and it captures more information about the data. 2. criterion :{“Gini,” “entropy”}, default=” Gini”: Measures the quality of each split. “Gini” uses the Gini impurity while “entropy” makes the split based on the information gain. 3. max_features: {“auto,” “sqrt,” “log2”}, int or float, default=” auto”: This represents the number of features that are considered on a pre-split level when finding the best split. This improves the model's performance as each tree node is now considering a higher number of options. 4, min_samples_leaf: int or float, default=1: minimum required number of observations at the end of each decision tree node in the random forest to split it. ‘his parameter helps determine the 5. min_samples_split: int or float, default=2: This specifies the minimum number of samples that must be present from your data for a split to occur. 6. n_estimators: int, default=100: represents the number of trees you want to build within a random forest before is is perhaps the most important parameter. This calculating the predictions. Usually, the higher the number, the better, but this is more computationally expensive. More info about other parameters can be found in the random forest classifier model documentation. Grid Search One traditional and popular way to perform hyperparameter tuning is by using an Exhaustive Grid Search from Scikit learn. This method tries every possible combination of each set of hyper-parameters. Using this method, we can find the best set of values in the parameter search space. This usually uses more computational power and takes a long time to run since this method needs to try every combination in the grid size. ‘The parameter grid size will be a multiplication of all the parameters. i.e., for the 2000 following parameters in our model, the grid size will be 10*2*4*5*3*5 parameters ={'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100], ‘criterion! : ['gini', ‘entropy'], 'max_features': [0.3,0.5,0.7,0.9], ‘min_samples_leaf': [3,5,7,10,15], ‘min_samples_split': [2,5,10], ‘n_estimators': [50,100,200,400,600]} from sklearn.model_selection import ParameterGrid param_size = ParameterGrid(parameters) Len (param_size) Output: 12000 Using sklearn’s Gridsearchcv , we can search our grid over and then run the grid search. seetime from sklearn.model_selection import GridSearchcv grid_search = RandomForestClassifier() grid_search = GridSearchcv( grid_search, parameters, cvs5, scoring='accuracy' ,n_jobs=-1) grid_result= grid_search.fit(x_train, y train) print('Best Params: ', grid_result.best_params_) print('Best Score: ', grid_result.best_score_) Output Best Parans: {"eriterion’: *gini', ‘max depth’: 90, ‘max features": ‘log2", ‘nin samples leaf": 5, ‘mins anples_split’: 10, ‘nestinators’: 50) Best Score: 0.8412567412587413 PU times: user Sein 498, sys: 6.52 s, total: amin 56s all time: ah eomin 255 Our cross-validation score is improved from 81.56% to 84.12% with the Grid search CV model compared with our baseline model. That is a 3.3% improvement. The computational time is almost Shrs which is not feasible for a simple problem like this one. More information about different ways of implementing a Gridsearch can be found here. Randomized Search The main difference in the RandomizedSearch CV, when compared with Grid CV, is that instead of trying every possible combination, this chooses the hyperparameter sample combinations randomly from grid space. Because of this reason, there is no guarantee that we will find the best result like Grid Search. But, this search can be extremely effective in practice as computational time is very less. The computational time and model performs mainly depends on the n_iter value. Because this value specifies how many times the model should search for parameters. If this value is high, there is a better chance of getting better accuracy, but also this comes with more computational power. We can implement randomizedsearchcy by using the sklearn’s library. wetime from sklearn.model_selection import RandomizedSearchcv random_search=RandomizedSearchCV (estimator = RandomForestClassifier(), param_distributions=parameters,verbose=1, n_jobs=-1, n_iter=200) random_result = random_search.fit(X_train, y_train) print(‘Best Score: ', random_result.best_score_*100) print('Best Params: ', random_result.best_params_) Output Best Score: 92.94221412300828 Best Parans rs': 428, ‘min_sanples split’: 5, ‘min_samples leaf': 5, ‘max features’: 0.3, 'm ax depth’ ‘entropy’, ‘bootstrap’: True} CPU tines? user 5.5 5, syst 178 ms, total: 5.68 5 all time: Smin 20s Our cross-validation score is improved from 81.56% to 83.57% with the Randomized search CV model compared with our baseline model. That is a 2.5% improvement, which is 0.8% less than Grid CV. But the computational time is less than Smins, which is almost 60 times faster. For most simple problems, this randomized search will be the most feasible option for hyperparameter tuning. More information about different ways of implementing a randomized search can be found here. Bayesian model Optimization using HyperOpt To formulate the optimization problem in Hyperopt, we need an Objective Function that takes in an input and returns a loss to minimize the model. And a Domain space for hyperparameters similar to grid search, we should create a parameter space with the range of input values to evaluate, The function can be as simple as f(x) = sin(x), or it can be as complex as the error of a deep neural network. This model chooses hyperparameters based on previous steps. The process of how this model works is simple: 1. Create a surrogate probability model of the objective function. 2. Find the hyperparameters that perform best on the surrogate model. 3. Use these values on the true model to return the objective function and update the surrogate model. Openinapp 7 Gams) Sania 0) search Medium . ESCH CHOEE- UO CU EEE PHOS CEES WORSE WO TOC UEEGUr GEES UNOS EEN BEE seot'ime ‘import numpy as np from hyperopt import hp, tpe, fmin,STATUS_OK,Trials def accuracy_model (params) : clf = RandomForestClassi fier (+*params) return cross_val_score(clf, X_train, y_train).mean() param_space = {'max_depth': hp.choice('max_depth', range(10,100)), ‘max_features': hp.uniform('max_features', 0.1,1), ‘n_estimators': hp.choice('n_estimators', range(50,500)), 'min_samples_leaf': hp.choice('min_samples_leaf! ,range(3,5)), ‘min_samples_split': hp.choice('min_samples_split! ,range(2,10)), ‘criterion’: hp.choice('criterion', ["gini", "entropy"])} best = @ def f (params global best ace = accuracy_model (params) if acc > best: best = acc return {'loss': -acc, 'status': STATUS_OK} Trials = Trials() best_params = fmin(f, param_space , algo=tpe. suggest ,max_evals=500, trials= Trials) print('New best:', best, best_params) print (best_params) Output 100% | NNN | s00/500 [19:39<00: teu best: 0.8do4s6947503201 {Ceriterion’: 0, ‘max depth’: 89, ‘max features’: 0.23803038476252658, ‘min sanples leaf": 1, “nin samples split": 6, ‘n estimators’? 0) cu Fines: user 19min 28s, sys: 9.25 s, total: 19min 375 Wall tine: 19min 39s , 2.365/it, best loss: -0.8440460947503201] Our cross-validation cross using Bayes!-~ ~*'=!~ation is 84.44%, which is better than Random Search and Grid Search. Ar. 83 © onal time is 20 min, which is reasonable considering this method performed best. One more benefit of using the Bayesian Optimization model is unlike random search or grid search; we can track all past evaluation models used to form a probabilistic model mapping hyperparameters to a probability of a score on the objective function. More information about installing and implementing the Hyperopt library can be found here. Conclusion Hyperparameter tuning can be very advantageous to improve the accuracy of machine learning models. In our case, the random forest model is already good at predicting survival rate, so there was not much improvement in accuracy with hyperparameter tuning methods. To conclude, using a grid search to choose optimal hyperparameters can be very time- consuming. And the random search is high-speed but not reliable. However, even these methods are inefficient than Bayesian optimization because they do not choose the next hyperparameters to evaluate based on previous results. Because of this reason, these methods consume more time evaluating useless parameters. Additional Resources Apart from Random Search, Grid Search, and Bayesian Optimization, there are some advanced methods like Hyperband and BOHB which combine both HyperBand and Bayesian Optimization that are better for hyperparameter tuning. Detailed explanation about them is covered in an excellent blog by neptune.ai which can be found here The complete data and code can be found in my Github repository. Thope that’s useful! Thank you so much for reading it till here. If you have any questions regarding this article or want to connect and talk, feel free to direct message me on LinkedIn. I will be more than happy to chat with you and help in any way I can. Machine Learning --DataScience ——_-Hyperparameter Tuning Sign up for The Variable By Towards Data Science Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look. By signing up, you will ereate a Medium account iyou dont already have one, Review ‘our Pewacy Policy fer more information about our privacy practices (st ( & Get this newsletter CPs omen

You might also like