Hyperparameter Optimization Based on Bayesian Optimization
Last Updated :
22 Feb, 2024
In this article we explore what is hyperparameter optimization and how can we use Bayesian Optimization to tune hyperparameters in various machine learning models to obtain better prediction accuracy. Before we dive into the how's of implementing Bayesian Optimization, let us learn what is meant by hyperparameters and hyperparameter optimization.
Hyperparameters
Machine/deep learning models consist of two types of parameters: model parameters and hyperparameters. Hyperparameters are external configuration variables set by us to operate machine model training. They are parameters that define the details of learning process. Examples of hyperparameters include number of nodes and layers in neural networks, learning rates, epochs etc. They have major impact on the accuracy and efficiency of the training model and hence they need to be defined in such a way so as to get the best results. This leads us to the topic of hyperparameter optimization.
Hyperparameter Optimization
Hyperparameter optimization or tuning is the process of selecting optimal values for a machine learning model's hyperparameters. Its job is to find a tuple of hyperparameters that gives an optimal model with enhanced accuracy/prediction. It minimizes the loss function on a given data obtained from the objective function that uses a particular tuple of hyperparameters.
There are various techniques that can be used to tune hyperparameters:
We are now going to dive deep into what bayesian optimization is and how it can be used with machine learning models for optimization.
Bayesian Optimization
Bayesian Optimization is an automated optimization technique designed to find optimal hyperparameters by treating the search process as an optimization problem. It aims to maximize an objective function f(x), particularly beneficial for functions that are computationally expensive to evaluate and are treated as "black boxes," where their internal structure is unknown.
One of the key features of Bayesian Optimization is its ability to consider previous evaluations when selecting the next set of hyperparameter combinations. This is achieved through the use of a probabilistic model, which estimates the probability of an objective function's result given a set of hyperparameters:
P ( score | hyperparameters)
This model is called a "surrogate" for the objective function and is represented by P(y | x). The Bayesian Optimization algorithm involves several steps:
- Build a Probability Model: Develop a probability model of the objective function based on past evaluations.
- Find Optimal Hyperparameters: Identify hyperparameters that perform best according to the probability model.
- Apply Hyperparameters: Apply the selected hyperparameters to the actual objective function and evaluate its performance.
- Update Probability Model: Update the probability model with the latest results.
- Repeat: Iterate steps 2-4 until reaching the maximum number of iterations or time limit.
The surrogate model begins with a prior distribution f(x), representing initial beliefs or knowledge about the parameters of the model before observing any data. As more evaluations are conducted, the surrogate model learns from the data, updating its beliefs according to Bayes' rule to form a posterior distribution.
Sampling points in the search space is facilitated by acquisition functions, which balance exploitation and exploration. Exploitation involves sampling where the surrogate model predicts a high objective value, while exploration entails sampling at locations with high uncertainty. Popular acquisition functions include Maximum Probability of Improvement (MPI), Expected Improvement (EI), and Upper Confidence Bound (UCB).
Bayesian Optimization is efficient because it intelligently selects the next set of hyperparameters, reducing the number of calls made to the objective function. Surrogate models such as Gaussian processes, Random Forest Regression, and Tree-Structured Parzen Estimators (TPE) are commonly used in Bayesian Optimization due to their effectiveness.
Hyperparameter Optimization Based on Bayesian Optimization
In this section we are going to learn how to use the BayesSearchCV model provided in the scikit-optimize library to improve the results of Support Vector Classifier on Breast Cancer Dataset. For implementing bayesian optimization, we are going to use scikit-optimize library.
Install the scikit-optimize library using the following command:
pip install scikit-optimize
Import Packages
We have imported various important libraries like numpy, pandas, train_test_split and also the breast_cancer dataset which is essentially the popular Wisconsin breast cancer dataset from the sklearn library.
Python
import numpy as np
import pandas as pd
import gc
import warnings
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, make_scorer, accuracy_score, recall_score, f1_score
from datetime import timedelta
import time
from skopt import BayesSearchCV
Load the Dataset and Extract Train Test Split
Sometimes dual coefficients or intercepts are not finite and this arises generally in SVMs and leads to the model running for an indefinite amount of time. To address this issue prepocessing of data is necessary. Here we have used the Scaling technique to normalize the data so that they have a similar range.
Python
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=1234)
scaler = StandardScaler()
# Fit the scaler on training data and transform both training and test data
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Training a Machine Learning Model
Python
start_time = time.time()
svc_model = SVC(kernel="rbf")
svc_model.fit(X_train, y_train)
elapsed_time_secs = time.time() - start_time
msg = "Execution took: %s secs (Wall clock time)" % timedelta(seconds=round(elapsed_time_secs))
svc_pred = svc_model.predict(X_test)
print("Train Accuracy", accuracy_score(y_train, svc_model.predict(X_train)))
print("Test Accuracy", accuracy_score(y_test, svc_model.predict(X_test)))
print('\n')
print("Train Recall Score", recall_score(y_train, svc_model.predict(X_train)))
print("Test Recall Score", recall_score(y_test, svc_model.predict(X_test)))
print('\n')
print("Train F1 Score", f1_score(y_train, svc_model.predict(X_train)))
print("Test F1 Score", f1_score(y_test, svc_model.predict(X_test)))
Output:
Train Accuracy 0.9912087912087912
Test Accuracy 0.9473684210526315
Train Recall Score 1.0
Test Recall Score 1.0
Train F1 Score 0.9931740614334471
Test F1 Score 0.9565217391304348
Here we have fit the SVC model using "rbf" kernel and obtain the accuracy of 91.6% and also print other performance metrics like execution time, f1_score, recall etc. We observe that there is a slight scope of improvement.
Define Hyperparameter Search Space
We have specified the hyperparameters we want to optimize for SVM. Common hyperparameters include the choice of kernel (linear, polynomial, radial basis function, etc.), the regularization parameter (C), and the kernel coefficient (gamma).
Python
param_space = {
'C': (1e-6, 1e+6, 'log-uniform'),
'gamma': (1e-6, 1e+1, 'log-uniform'),
'degree': (1, 8), # integer valued parameter
'kernel': ['linear', 'poly', 'rbf'], # categorical parameter
}
Bayesian Optimization
Initialize Bayesian Optimization
We have defined the Bayesian optimization process, including the objective function, search space, acquisition function, and any other necessary parameters.
Python
# Initialize Bayesian Optimization
opt = BayesSearchCV(
SVC(),
param_space,
n_iter=32,
cv=3
)
Run Bayesian Optimization
Python
opt.fit(X_train, y_train)
print("val. score: %s" % opt.best_score_)
print("test score: %s" % opt.score(X_test, y_test))
# Get best hyperparameters
best_params = opt.best_params_
print("Best Parameters:", best_params)
Output:
val. score: 0.9780411293133496
test score: 0.956140350877193
Best Parameters: OrderedDict([('C', 0.3317383202555499), ('degree', 8), ('gamma', 2.8889304722800495), ('kernel', 'linear')])
Here, we have fit the bayesian optimization model with our train and test split and compared the best score and accuracy of the model. The best set of hyperparameters happen to be: [('C', 0.3317383202555499), ('degree', 8), ('gamma', 2.8889304722800495), ('kernel', 'linear')].
Implementing SVM with Best Hyperparameters
Python3
# Get best hyperparameters
best_params = opt.best_params_
# Create an SVM classifier with the best parameters
best_svc_model = SVC(**best_params)
# Fit the classifier on the training data
best_svc_model.fit(X_train, y_train)
# Predict on the test data
best_svc_pred = best_svc_model.predict(X_test)
# Evaluate the performance of the model
print("Train Accuracy with best parameters:", accuracy_score(y_train, best_svc_model.predict(X_train)))
print("Test Accuracy with best parameters:", accuracy_score(y_test, best_svc_pred))
print('\n')
print("Train Recall Score with best parameters:", recall_score(y_train, best_svc_model.predict(X_train)))
print("Test Recall Score with best parameters:", recall_score(y_test, best_svc_pred))
print('\n')
print("Train F1 Score with best parameters:", f1_score(y_train, best_svc_model.predict(X_train)))
print("Test F1 Score with best parameters:", f1_score(y_test, best_svc_pred))
Output:
Train Accuracy with best parameters: 0.9868131868131869
Test Accuracy with best parameters: 0.9912280701754386
Train Recall Score with best parameters: 1.0
Test Recall Score with best parameters: 1.0
Train F1 Score with best parameters: 0.9895833333333333
Test F1 Score with best parameters: 0.993103448275862
Similar Reads
Hyperparameters Optimization methods - ML In this article, we will discuss the various hyperparameter optimization techniques and their major drawback in the field of machine learning. What are the Hyperparameters?Hyperparameters are those parameters that we set for training. Hyperparameters have major impacts on accuracy and efficiency whi
7 min read
Bayesian Optimization in Machine Learning Bayesian Optimization is a powerful optimization technique that leverages the principles of Bayesian inference to find the minimum (or maximum) of an objective function efficiently. Unlike traditional optimization methods that require extensive evaluations, Bayesian Optimization is particularly effe
8 min read
CatBoost Bayesian optimization Bayesian optimization is a powerful and efficient technique for hyperparameter tuning of machine learning models and CatBoost is a very popular gradient boosting library which is known for its robust performance in various tasks. When we combine both, Bayesian optimization for CatBoost can offer an
10 min read
Implementation of Teaching Learning Based Optimization The previous article Teaching Learning Based Optimization (TLBO) talked about the inspiration of teaching learning-based optimization, it's mathematical modeling and algorithms. In this article we will implement Teaching learning-based optimization (TLBO) for two fitness functions 1) Rastrigin funct
7 min read
How to tune a Decision Tree in Hyperparameter tuning Decision trees are powerful models extensively used in machine learning for classification and regression tasks. The structure of decision trees resembles the flowchart of decisions helps us to interpret and explain easily. However, the performance of decision trees highly relies on the hyperparamet
14 min read
Teaching Learning based Optimization (TLBO) The process of finding optimal values for the specific parameters of a given system to fulfill all design requirements while considering the lowest possible cost is referred to as an optimization. Optimization problems can be found in all fields of science. In general Optimization problem can be wri
4 min read
Implementation of Henry gas solubility optimization Article Henry gas solubility optimization (HGSO) talked about the inspiration of Henry gas solubility optimization, its mathematical modelling and algorithm. In this article, we will implement Henry gas solubility optimization (HGSO) for the Sphere fitness function. Sphere Fitness function Sphere fu
5 min read
ML | ADAM (Adaptive Moment Estimation) Optimization Prerequisite: Optimization techniques in Gradient Descent Gradient Descent is applicable in the scenarios where the function is easily differentiable with respect to the parameters used in the network. It is easy to minimize continuous functions than minimizing discrete functions. The weight update
2 min read
Cross-validation and Hyperparameter tuning of LightGBM Model In a variety of industries, including finance, healthcare, and marketing, machine learning models have become essential for resolving challenging real-world issues. Gradient boosting techniques have become incredibly popular among the myriad of machine learning algorithms due to their remarkable pre
14 min read