Hyperparameter Tuning For Machine Learning Models
Hyperparameter Tuning For Machine Learning Models
Hyperparameter Tuning For Machine Learning Models
Jeremy Jordan HOME ABOUT DATA SCIENCE READING LIST QUOTES LIFE FAVORITE T
DATA SCIENCE
Hyperparameter
tuning for machine
learning models.
JEREMY JORDAN
2 NOV 2017
• 8 MIN READ
Whereas the model parameters specify how to transform the input data
into the desired output, the hyperparameters define how our model is
You'veUnfortunately,
actually structured. successfully subscribed
there'sto
noJeremy
way toJordan!
calculate “which
way should I update my hyperparameter to reduce the loss?” (ie.
https://www.jeremyjordan.me/hyperparameter-tuning/ 2/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
1. Define a model
Model validation
Before we discuss these various tuning methods, I'd like to quickly
revisit the purpose of splitting our data into training, validation, and
test data. The ultimate goal for any machine learning model is to learn
from examples in such a manner that the model is capable of
generalizing the learning to new instances which it has not yet seen. At a
very basic level, you should train on a subset of your total dataset,
holding out the remaining data for evaluation to gauge the model's
ability to generalize - in other words, "how well will my model do on
data which it hasn't directly learned from during training?"
https://www.jeremyjordan.me/hyperparameter-tuning/ 3/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
Jeremy Jordan HOME ABOUT DATA SCIENCE READING LIST QUOTES LIFE FAVORITE T
To mitigate this, we'll end up splitting the total dataset into three
subsets: training data, validation data, and testing data. The
introduction of a validation dataset allows us to evaluate the model on
different data than it was trained on and select the best model
architecture, while still holding out a subset of the data for the final
evaluation at the end of our model development.
You can also leverage more advanced techniques such as K-fold cross
validation in order tosuccessfully
You've essentiallysubscribed
combine training
to Jeremyand validation data
Jordan!
for both learning the model parameters and evaluating the model
without introducing data leakage.
https://www.jeremyjordan.me/hyperparameter-tuning/ 4/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
Jeremy Jordan HOME ABOUT DATA SCIENCE READING LIST QUOTES LIFE FAVORITE T
Hyperparameter tuning methods
Recall that I previously mentioned that the hyperparameter tuning
methods relate to how we sample possible model architecture
candidates from the space of possible hyperparameter values. This is
often referred to as "searching" the hyperparameter space for the
optimum values. In the following visualization, the x and y dimensions
represent two hyperparameters, and the z dimension represents the
model's score (defined by some evaluation metric) for the architecture
defined by x and y.
Note: Ignore the axes values, I borrowed this image as noted and the
axis values don't correspond with logical values for the
hyperparameters.
https://www.jeremyjordan.me/hyperparameter-tuning/ 5/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
If we had access
Jeremy Jordan HOME
to such a plot,DATA
ABOUT
choosing
SCIENCE
the READING
ideal hyperparameter
LIST QUOTES LIFE FAVORITE T
combination would be trivial. However, calculating such a plot at the
granularity visualized above would be prohibitively expensive. Thus, we
are left to blindly explore the hyperparameter space in hopes of locating
the hyperparameter values which lead to the maximum score.
For each method, I'll discuss how to search for the optimal structure of a
random forest classifer. Random forests are an ensemble model
comprised of a collection of decision trees; when building such a model,
two important hyperparameters to consider are:
Grid search
Grid search is arguably the most basic hyperparameter tuning method.
With this technique, we simply build a model for each possible
combination of all of the hyperparameter values provided, evaluating
each model, and selecting the architecture which produces the best
results.
For example, we would define a list of values to try for both n_estimator
s and max_depth and a grid search would build a model for each
possible combination.
https://www.jeremyjordan.me/hyperparameter-tuning/ 6/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
Jeremy RandomForestClassifier(n_estimators=10,
Jordan HOME ABOUT DATA SCIENCE READING LIST
max_depth=3)
RandomForestClassifier(n_estimators=10, max_depth=10)
RandomForestClassifier(n_estimators=10, max_depth=20)
RandomForestClassifier(n_estimators=10, max_depth=40)
RandomForestClassifier(n_estimators=50, max_depth=3)
RandomForestClassifier(n_estimators=50, max_depth=10)
RandomForestClassifier(n_estimators=50, max_depth=20)
RandomForestClassifier(n_estimators=50, max_depth=40)
RandomForestClassifier(n_estimators=100, max_depth=3)
RandomForestClassifier(n_estimators=100, max_depth=10)
RandomForestClassifier(n_estimators=100, max_depth=20)
RandomForestClassifier(n_estimators=100, max_depth=40)
RandomForestClassifier(n_estimators=200, max_depth=3)
RandomForestClassifier(n_estimators=200, max_depth=10)
RandomForestClassifier(n_estimators=200, max_depth=20)
RandomForestClassifier(n_estimators=200, max_depth=40)
Each model would be fit to the training data and evaluated on the
validation data. As you can see, this is an exhaustive sampling of the
hyperparameter space and can be quite inefficient.
https://www.jeremyjordan.me/hyperparameter-tuning/ 7/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
Jeremy Jordan HOME ABOUT DATA SCIENCE READING LIST QUOTES LIFE FAVORITE T
Random search
Random search differs from grid search in that we longer provide a
discrete set of values to explore for each hyperparameter; rather, we
provide a statistical distribution for each hyperparameter from which
values may be randomly sampled.
n_estimators = sp_expon(scale=100)
Photo by SigOpt
max_depth = sp_randint(1, 40)
We can also define how many iterations we'd like to build when
searching for the optimal model. For each iteration, the hyperparameter
values of the model will be set by sampling the defined distributions
above. The scipy distributions above may be sampled with the rvs()
https://www.jeremyjordan.me/hyperparameter-tuning/ 8/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
As you can see, this search method works best under the assumption
that not all hyperparameters aresubscribed
You've successfully equally important. While this isn't
to Jeremy Jordan!
always the case, the assumption holds true for most datasets.
https://www.jeremyjordan.me/hyperparameter-tuning/ 9/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
Jeremy Jordan HOME ABOUT DATA SCIENCE READING LIST QUOTES LIFE FAVORITE T
Photo by SigOpt
Bayesian optimization
The previous two methods performed individual experiments building
models with various hyperparameter values and recording the model
performance for each. Because each experiment was performed in
isolation, it's very easy to parallelize this process. However, because
each experiment was performed in isolation, we're not able to use the
information from one experiment to improve the next experiment.
Bayesian optimization belongs to a class of sequential model-based
optimization (SMBO) algorithms that allow for one to use the results of
our previous iteration to improve our sampling method of the next
experiment.
https://www.jeremyjordan.me/hyperparameter-tuning/ 10/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
https://www.jeremyjordan.me/hyperparameter-tuning/ 11/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
Photo by SigOpt
Jeremy Jordan HOME ABOUT DATA SCIENCE READING LIST QUOTES LIFE FAVORITE T
Further reading
Random Search for Hyper-Parameter Optimization
Optuna
Hyperopt
Polyaxon
Talos
BayesianOptimization
You've successfully
Metric Optimization Engine subscribed to Jeremy Jordan!
Spearmint
https://www.jeremyjordan.me/hyperparameter-tuning/ 12/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
GPyOpt
Jeremy Jordan HOME ABOUT DATA SCIENCE READING LIST QUOTES LIFE FAVORITE T
Scikit-Optimize
SigOpt
Implementation examples:
RESOLUTIONS
JEREMY JORDAN
18 JAN 2018 • 2 MIN READ
https://www.jeremyjordan.me/hyperparameter-tuning/ 13/14
20.10.22, 22:20 Hyperparameter tuning for machine learning models.
Jeremy Jordan HOME ABOUT DATA SCIENCE READING LIST QUOTES LIFE FAVORITE T
BLOCKCHAIN
JEREMY JORDAN
16 OCT 2017 • 8 MIN READ
https://www.jeremyjordan.me/hyperparameter-tuning/ 14/14