We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 8
‘Tune Hyperparameters with GridSearchCV O@O
ome
@ Sahu! shah —Publaned On dune 29,2024 and Last Moifed On uly 20th 2022
‘Algorithm Classification Intormediatc MachineLearning Proisct Evthon Structured Data Sunervised
‘This article was published as a part of the Data Science Blogathon
Data-Driven decision-making has large involvement of Machine Learning Algorithms. For a business problem, the professional
never rely on one algorithm. One always applies multiple relevant algorithms based on the problem and selects the best model
based on the best performance metrics shown by the models. But this is not the end, One can increase the model performance
using hyperparameters. Thus, finding the optimal hyperparameters would help us achieve the best-performing model. In this
article, we will learn about Hyperparameters, Grid Search, Cross-Validation, GridSearchCV, and the tuning of Hyperparameters
in Python,
Hyperparameters for a model can be chosen using several techniques such as Random Search, Grid Search, Manual Search,
Bayesian Optimizations, etc. In this article, we will learn about GridSearchCV which uses the Grid Search technique for finding,
‘the optimal hyperparameters to increase the model performance.
““L»p
Image by ThislsEngineering from Poxels
Table of Contents
1 Hunernaramtorsvs Parameters
Weuse cookies on Araltcs Vidiya webstest deliver our serves, aralze web afc and improve your exerience onthe se. By using Arabic Via. you
agree toour Peuncy Paley and Ieee‘Tune Hyperparameters with GridSearchCV O@O
Hyperparameters vs Parameters
Parameters and Hyperparameters both are associated with the Machine Learning model, but both are meant for different
tasks. Let's understand how they are different from each other in the context of Machine Learning,
Parameters are the variables that are used by the Machine Learning algorithm for predicting the results based on the input
historic data. These are estimated by using an optimization algorithm by the Machine Learning algorithm itself. Thus, these
variables are not set or hardcoded by the user or professional. These variables are served as a part of model training. Example
of Parameters: Coefficient of independent variables Linear Regression and Logistic Regression
Hyperparameters are the variables that the user specify usually while building the Machine Learning model. thus,
hyperparameters are specified before specifying the parameters or we can say that hyperparameters are used to evaluate
optimal parameters of the model. the best part about hyperparameters is that their values are decided by the user who is
building the model. For example, max depth in Random Forest Algorithms, kin KNN Classifier.
Understanding Grid Search
Now we know what hyperparameters are, our goal should be to find the best hyperparameters values to get the perfect
prediction results from our model. But the question arises, how to find these best sets of hyperparameters? One can try the
Manual Search method, by using the hit and trial process and can find the best hyperparameters which would take huge time to
build a single model
For this reason, methods like Random Search, GridSearch were introduced. Here, we will discuss how Grid Seach is performed
and how itis executed with cross-validation in GridSearchCV.
Grid Search uses a different combination of all the specified hyperparameters and their values and calculates the performance
for each combination and selects the best value for the hyperparameters. This makes the processing time-consuming and
expensive based on the number of hyperparameters involved,
“0.0 02 04 068 08 1.0
x
Grid Search across Two Parameters (Image by Alexander Elvers from WikiMedia
Weuse cockieson Araljtics Vidhya websites to delver cur services, analyze web affic and improve your exoerienceten the sk By using Anajties Vidhya, you
agree toour Privacy Poland Jorma! Use‘Tune Hyperparameters with GridSearchCV O@O
‘As we know that before training the model with data, we divide the data into two parts - train data and test data. In cross-
validation, the process divides the train data further into two parts - the train data and the validation data.
‘The most popular type of Cross-validation is K-fold Cross-Validation, Its an iterative process that divides the train data into k
partitions. Each iteration keeps one partition for testing and the remaining k-1 partitions for training the model. The next
iteration will set the next partition as test data and the remaining k-1 as train data and so on. In each iteration, it will record the
performance of the model and at the end give the average of all the performance. Thus, its also a time-consuming process.
‘Thus, GridSearch along with cross-validation takes huge time cumulatively to evaluate the best hyperparameters. Now we will
see how to use GridSearchCV in our Machine Learning problem.
=) =
k-Fold Cross Validation (Image by Gufosowa from WikiMedia)
How to Apply GridSearchCV?
GridSearchCV() method is available in the scikit-learn class model_selection. It can be initiated by creating an object of
GridSearchcV0):
Primarily, it takes 4 arguments ie. estimator, param_grid, cv, and scoring. The description of the arguments is as follows:
estimator - A scikit-learn model
2. param_grid ~ A dictionary with parameter names as keys and lists of parameter values.
3. scoring - The performance measure. For example, r2' for regression models, ‘precision’ for classification models.
4,ev- An integer that is the number of folds for K-fold cross-validation.
GridSearchCV can be used on several hyperparameters to get the best values for the specified hyperparameters,
Now let's apply GridSearchCV with a sample dataset:
Importing the Libraries & the Dataset
Python Code:
Weuse cockieson Araljtics Vidhya websites to delver cur services, analyze web affic and improve your exoerienceten the sk By using Anajties Vidhya, you
agree toour Privacy Poland Jorma! Use‘Tune Hyperparameters with GridSearchCV O@O
p moinpy 6 ¥
'
a ee me
/_ = eo
ee
—
"
‘
La ‘
‘
Login/ Signup to View & Run Code in the browser
& Run Code
Here we are going to use the HeartDiseaseUCI dataset.
Specifying Independent and Dependent Variables
X a dF.drop(‘target", axis
[target
Splitting the data into train and test set
X.train, X test, y train, y_test = teain_test_sp
est_size = 0.3, rand
ing Random Forest Classifier
rfc = RanconForestClassifier()
Here, we created the object rfc of RandomForestClassifier()
Initializing GridSearchCV() object and fitting it with hyperparameters
forest_parans = [(*nax_depth': list(range(1®, 15), "max features": list(range(
LF = GridSearchCv(rfe, forest_parans, ev = 1, scorings" accuracy’
Here, we passed the estimator object rfc, param_grid as forest_params, cv = 5 and scoring method as accuracy into
GridSearchCV() as arguments.
Getting the Best Hyperparameters
Weuse cookies on Analytics Vidhya websites to deliver our services, analyze web traffc ard improve your exoerienceon the she By using Analytics Vidhya you
agree toour Peuncyalcyand Jermeot Use Acc‘Tune Hyperparameters with GridSearchCV O@O
Putting it all together
On executing the above code, we get:
‘(Cmax depen’: 13, ‘max features:
0. eseaa15594015504
Best Params and Best Score of the Random Forest Classifier
‘Thus, clfbest_params_ gives the best combination of tuned hyperparameters, and clfbest_score_gives the average cross-
validated score of our Random Forest Classifier.
Conclusions
‘Thus, in this article, we learned about Grid Search, K-fold Cross-Validation, GridSearchCV, and how to make good use of
GridSearchCV, GridSearchCV is a model selection step and this should be done after Data Processing tasks. Itis always good to
compare the performances of Tuned and Untuned Models. This will cst us the time and expense but will surely give us the best
results. The scikit-learn APIs a great resource in case of any help. It's always good to learn by doing.
‘About the Author
Connect with me on Linkedin Here,
Weuse cookieson Analytics Vidhya websites to deliver cur services, analyze web vaffic and improve your exoerience.on the she By using Anajties Vidhya you
agree toour Privacy Poland Jorma! Use‘Tune Hyperparameters with GridSearchCV O@O
‘Thanks for giving your time!
The media shown in this article are not owned by Analytics Vidhya and are used at the Author's discretion.
Dataverse Hack
CC EO
Pir
About the Author
Rahul Shah
IT Engineering Graduate currently pursuing Post Graduate Diploma in Data Science,
Our Top Authors
G2eeese-
Download C08
‘Anais Vidya Aoptor the Latest ogi
| Understanding ResNet and analyzing various models on | “aversing the Trinity of Statistical Inference Part 2
Weuse cookieson Analytics Vidhya websites to deliver cur services, analyze web vaffic and improve your exoerience.on the she By using Anajties Vidhya you
agree toour PeuncyFalcyand Jermeot Use Acep‘Tune Hyperparameters with GridSearchCV O@O
Howto download the data
ely
Leave a Reply
Your emi ress willnatbe published Regiedfeds remarked”
[Notify me of follow-up comments ay email
[DNotify me of new posts by email
Top Resources
Python Tutorial: Working with CSV file for Data Science ‘The Most Comprehensive Guide to K-Means Clustering
Youll Ever Need
@ Harika Bonthy auc 21,2021 Pulkit Sharma auc 19,2019
“Understanding Support Vector Machine(SVMl algorithm
from examples (along with code}
Weuse cookies on Analytics Vidhya webs Iyze web rate and improve your ecerence on these. By using Analytics Vidhya you
agree tour Privacy Policy and Jermsaf Use‘Tune Hyperparameters with GridSearchCV O@O
Weuse cockieson Araljties Vidhya websites to delver cur services, analyze web atic and improve your exoerience.on the sk By using Anajties Vidhya, you
agree toour Prune Paley and I
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (