0% found this document useful (0 votes)

85 views5 pages

Hyperparameter Tuning For Machine Learning Models

Hyperparameter tuning is crucial for machine learning models as hyperparameters control model behavior. This document explores how random forest model performance and computation time change with different hyperparameter tuning methods. A baseline random forest model achieves 81.56% accuracy on a titanic dataset. Grid search exhaustively tries all hyperparameter combinations, improving accuracy to 84.12% but taking around 5 hours. Randomized search randomly samples combinations, getting 83.57% accuracy in under 5 minutes, making it more feasible for simple problems.

Uploaded by

Priya dharshini.G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views5 pages

Hyperparameter Tuning For Machine Learning Models

Uploaded by

Priya dharshini.G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Hyperparameter tuning for Machine learning

models
Hyperparameters tuning is crucial as they control the overall behavior of a
machine learning model. Every machine learning models will have different
hyperparameters that can be set.

A hyperparameter is a parameter whose value

is set before the learning process begins.

The Titanic dataset from Kaggle is used for comparison. The purpose of this
article to explore how the performance and the computational time of the
random forest model are changing with various hyperparameter tuning
methods. After all, machine learning is all about finding the right balance
between computing time and the model’s performance.

Baseline model with default parameters:

random_forest = RandomForestClassifier(random_state=1).fit(X_train, y_train)

random_forest.score(X_test,y_test)
The accuracy of this model, when used on the testing set, is 81.56.

We can get the default parameters used for the model using the
command. randomforest.get_params()

The default parameters are:

{'bootstrap': True, 'ccp_alpha': 0.0, 'class_weight': None, 'criterion':
'gini', 'max_depth': None, 'max_features': 'auto', 'max_leaf_nodes':
None, 'max_samples': None, 'min_impurity_decrease': 0.0,
'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0, 'n_estimators': 100, 'n_jobs': None,
'oob_score': False, 'random_state': 1, 'verbose': 0, 'warm_start': False}

No need to worry if you don't know about these parameters and how they are
used. Usually, information about all the parameters can be found in Scikit
documentation of the models.

Some important Parameters in Random Forest:

1. max_depth: int, default=None This is used to select how deep you

want to make each tree in the forest. The deeper the tree, the more splits it
has, and it captures more information about the data.
2. criterion :{“Gini,” “entropy”}, default=” Gini”: Measures the quality of
each split. “Gini” uses the Gini impurity while “entropy” makes the split
based on the information gain.
3. max_features: {“auto,” “sqrt,” “log2”}, int or float, default=”
auto”: This represents the number of features that are considered on a
pre-split level when finding the best split. This improves the model's
performance as each tree node is now considering a higher number of
options.
4. min_samples_leaf: int or float, default=1: This parameter helps
determine the minimum required number of observations at the end of
each decision tree node in the random forest to split it.
5. min_samples_split: int or float, default=2: This specifies the
minimum number of samples that must be present from your data for a
split to occur.
6. n_estimators: int, default=100: This is perhaps the most important
parameter. This represents the number of trees you want to build within a
random forest before calculating the predictions. Usually, the higher the
number, the better, but this is more computationally expensive.
Grid Search

One traditional and popular way to perform hyperparameter tuning is by using

an Exhaustive Grid Search from Scikit learn. This method tries every possible
combination of each set of hyper-parameters. Using this method, we can find
the best set of values in the parameter search space. This usually uses more
computational power and takes a long time to run since this method needs to
try every combination in the grid size.

GridSearchCV is the process of performing hyperparameter tuning in order to

determine the optimal values for a given model. As mentioned above, the
performance of a model significantly depends on the value of hyperparameters.
Note that there is no way to know in advance the best values for
hyperparameters so ideally, we need to try all possible values to know the
optimal values. Doing this manually could take a considerable amount of time
and resources and thus we use GridSearchCV to automate the tuning of
hyperparameters.

GridSearchCV is a function that comes in Scikit-learn’s(or SK-learn)

model_selection package.So an important point here to note is that we need to
have the Scikit learn library installed on the computer. This function helps to
loop through predefined hyperparameters and fit your estimator (model) on
your training set. So, in the end, we can select the best parameters from the
listed hyperparameters.

parameters ={'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
'criterion' : ['gini', 'entropy'],
'max_features': [0.3,0.5,0.7,0.9],
'min_samples_leaf': [3,5,7,10,15],
'min_samples_split': [2,5,10],
'n_estimators': [50,100,200,400,600]}

Using sklearn’s GridSearchCV, we can search our grid over and then run the grid
search.

from sklearn.model_selection import GridSearchCV

grid_search = RandomForestClassifier()
grid_search = GridSearchCV(
grid_search,
parameters,
cv=5,
scoring='accuracy',n_jobs=-1)

grid_result= grid_search.fit(X_train, y_train)

print('Best Params: ', grid_result.best_params_)
print('Best Score: ', grid_result.best_score_)

Output

Our cross-validation score is improved from 81.56% to 84.12% with the Grid
search CV model compared with our baseline model. That is a 3.3%
improvement. The computational time is almost 5hrs which is not feasible for a
simple problem like this one.

Randomized Search

The main difference in the RandomizedSearch CV, when compared with

GridCV, is that instead of trying every possible combination, this chooses the
hyperparameter sample combinations randomly from grid space. Because of
this reason, there is no guarantee that we will find the best result like Grid
Search. But, this search can be extremely effective in practice as computational
time is very less.

The computational time and model performs mainly depends on

the n_iter value. Because this value specifies how many times the model should
search for parameters. If this value is high, there is a better chance of getting
better accuracy, but also this comes with more computational power.

We can implement RandomizedSearchCV by using the sklearn’s library.

%%time
from sklearn.model_selection import RandomizedSearchCV
random_search=RandomizedSearchCV(estimator = RandomForestClassifier(),
param_distributions=parameters, n_jobs=-1,
n_iter=200)
random_result = random_search.fit(X_train, y_train)
print('Best Score: ', random_result.best_score_*100)
print('Best Params: ', random_result.best_params_)

Output

Our cross-validation score is improved from 81.56% to 83.57% with the

Randomized search CV model compared with our baseline model. That is a
2.5% improvement, which is 0.8% less than Grid CV. But the computational
time is less than 5mins, which is almost 60 times faster. For most simple
problems, this randomized search will be the most feasible option for
hyperparameter tuning.

Bank Statement PDF
50% (2)
Bank Statement PDF
3 pages
BSNL Landline Broadband Closure Letter
0% (1)
BSNL Landline Broadband Closure Letter
2 pages
SAP SuccessFactors Migrating Features PDF
No ratings yet
SAP SuccessFactors Migrating Features PDF
56 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Classical Planning in AI
100% (1)
Classical Planning in AI
5 pages
Hyperparameters
No ratings yet
Hyperparameters
8 pages
Ebit 30: Portable Color Doppler System
100% (2)
Ebit 30: Portable Color Doppler System
14 pages
Random Forest 1737667979
No ratings yet
Random Forest 1737667979
11 pages
Tuning A CART's Hyperparameters: Elie Kawerk
No ratings yet
Tuning A CART's Hyperparameters: Elie Kawerk
26 pages
Random Forest
No ratings yet
Random Forest
28 pages
Scikit Learn What Were Covering
No ratings yet
Scikit Learn What Were Covering
15 pages
q8, q9, q10 Question and Answers
No ratings yet
q8, q9, q10 Question and Answers
16 pages
Security Manual
100% (1)
Security Manual
16 pages
Team 5
No ratings yet
Team 5
12 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
Fault Code 131: Accelerator Pedal or Lever Position Sensor 1 Circuit - Voltage Above Normal or Shorted To High Source
No ratings yet
Fault Code 131: Accelerator Pedal or Lever Position Sensor 1 Circuit - Voltage Above Normal or Shorted To High Source
3 pages
Decision Trees in Sklearn Decision Trees in Sklearn
No ratings yet
Decision Trees in Sklearn Decision Trees in Sklearn
7 pages
Machine Learning Practice
No ratings yet
Machine Learning Practice
17 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Import Numpy As NP Import Pandas As PD
No ratings yet
Import Numpy As NP Import Pandas As PD
7 pages
QB 1
No ratings yet
QB 1
11 pages
Random Forest
No ratings yet
Random Forest
16 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
Hyperparametric Tuning of XG and RFC
No ratings yet
Hyperparametric Tuning of XG and RFC
2 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Hyperparameter Tuning The Random Forest in Python BOM 3 - by Will Koehrsen - Towards Data Science
No ratings yet
Hyperparameter Tuning The Random Forest in Python BOM 3 - by Will Koehrsen - Towards Data Science
15 pages
PA DA2 - Merged
No ratings yet
PA DA2 - Merged
29 pages
Module 6
No ratings yet
Module 6
4 pages
Grid Random Search
No ratings yet
Grid Random Search
6 pages
Slidesgo Optimizing Decision Trees A Deep Dive Into Hyperparameter Tuning With Randomizedsearchcv and Gridse 20241024015612VysU
No ratings yet
Slidesgo Optimizing Decision Trees A Deep Dive Into Hyperparameter Tuning With Randomizedsearchcv and Gridse 20241024015612VysU
8 pages
Optimized Hyperparameters Tuning of Multi-Class Classification Algorithms
No ratings yet
Optimized Hyperparameters Tuning of Multi-Class Classification Algorithms
17 pages
Hyperparameter Tuning in Machine Learning 1706249573
No ratings yet
Hyperparameter Tuning in Machine Learning 1706249573
9 pages
Reference Guide - Validation & Cross-Validation
No ratings yet
Reference Guide - Validation & Cross-Validation
7 pages
#Machinelearning: Mastering Tuning Hyperparameter
No ratings yet
#Machinelearning: Mastering Tuning Hyperparameter
7 pages
Supple Maximizing Performance in Cs CuBiCl
No ratings yet
Supple Maximizing Performance in Cs CuBiCl
5 pages
Supervised ML
No ratings yet
Supervised ML
13 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
7 pages
Random Forest
No ratings yet
Random Forest
9 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
2 pages
Hyper Parameters
No ratings yet
Hyper Parameters
24 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
3 pages
Hyperparameters
No ratings yet
Hyperparameters
2 pages
Updated Lecture 12 Zainab
No ratings yet
Updated Lecture 12 Zainab
17 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
6 pages
Grid Search Steps and Example
No ratings yet
Grid Search Steps and Example
1 page
Random Forest
No ratings yet
Random Forest
3 pages
XG Boosting Reference
No ratings yet
XG Boosting Reference
6 pages
Huawei WISP Solution v2.0
No ratings yet
Huawei WISP Solution v2.0
27 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Hyper Parameter Tuning
No ratings yet
Hyper Parameter Tuning
4 pages
Hyperparameter Tuning Is The Process of Optimizing The Model
No ratings yet
Hyperparameter Tuning Is The Process of Optimizing The Model
3 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
AML Code For m2
No ratings yet
AML Code For m2
7 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
ML Chap 5
No ratings yet
ML Chap 5
14 pages
ML Asst.-01
No ratings yet
ML Asst.-01
21 pages
Decision Tree
No ratings yet
Decision Tree
1 page
ML Functions
No ratings yet
ML Functions
12 pages
Lec 04 05
No ratings yet
Lec 04 05
37 pages
Hyper Parameter Optimization
No ratings yet
Hyper Parameter Optimization
13 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Cheat Sheet Building Supervised Learning Models
No ratings yet
Cheat Sheet Building Supervised Learning Models
3 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
Hyperparameter - Tuning
No ratings yet
Hyperparameter - Tuning
3 pages
Grid Search
No ratings yet
Grid Search
48 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
3 pages
Ned Mohan: Minneapolis 2002
No ratings yet
Ned Mohan: Minneapolis 2002
19 pages
Data Analysis and Decision Making: A Case Study of Re-Accommodating Passengers For An Airline Company
No ratings yet
Data Analysis and Decision Making: A Case Study of Re-Accommodating Passengers For An Airline Company
16 pages
Data Acquisition and Labview: Prof. R.G. Longoria
No ratings yet
Data Acquisition and Labview: Prof. R.G. Longoria
19 pages
V EEE EE3017 Lab Mnual
No ratings yet
V EEE EE3017 Lab Mnual
72 pages
Business Plan Template: Team Leader (Idea Inventor)
No ratings yet
Business Plan Template: Team Leader (Idea Inventor)
15 pages
Valli A A Compact Course On Linear Pdes
No ratings yet
Valli A A Compact Course On Linear Pdes
267 pages
Tl-Wa850re Qig V6
No ratings yet
Tl-Wa850re Qig V6
2 pages
UPDPSWin 3000MU
No ratings yet
UPDPSWin 3000MU
5 pages
PROGRAM 24: C++ Program For Multilevel Inheritance
No ratings yet
PROGRAM 24: C++ Program For Multilevel Inheritance
23 pages
Delcam - PowerMILL 2015 R2 WhatsNew EN - 2015
No ratings yet
Delcam - PowerMILL 2015 R2 WhatsNew EN - 2015
71 pages
Ket - MG610335
No ratings yet
Ket - MG610335
3 pages
Livegrade Pro Manual
No ratings yet
Livegrade Pro Manual
122 pages
Dell Powerconnect 6224/6224F/6224P/6248/6248P: 3.3.18.1 Firmware Release Notes
No ratings yet
Dell Powerconnect 6224/6224F/6224P/6248/6248P: 3.3.18.1 Firmware Release Notes
75 pages
Duval
No ratings yet
Duval
9 pages
Screen Capture: User's Guide
No ratings yet
Screen Capture: User's Guide
15 pages
Tarantella PDF
No ratings yet
Tarantella PDF
10 pages
Use of Automation Codecs Streaming Video Applications Based On Cloud Computing
No ratings yet
Use of Automation Codecs Streaming Video Applications Based On Cloud Computing
14 pages
Color 1
No ratings yet
Color 1
2 pages
Waterfall Whitepaper
No ratings yet
Waterfall Whitepaper
7 pages
End-Of-Term Test Higher A
No ratings yet
End-Of-Term Test Higher A
4 pages
Battery Capacity and Battery Backup Time Calculation
No ratings yet
Battery Capacity and Battery Backup Time Calculation
3 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Hyperparameter Tuning For Machine Learning Models

Uploaded by

Hyperparameter Tuning For Machine Learning Models

Uploaded by

Hyperparameter tuning for Machine learning

A hyperparameter is a parameter whose value

Baseline model with default parameters:

random_forest = RandomForestClassifier(random_state=1).fit(X_train, y_train)

The default parameters are:

Some important Parameters in Random Forest:

1. max_depth: int, default=None This is used to select how deep you

One traditional and popular way to perform hyperparameter tuning is by using

GridSearchCV is the process of performing hyperparameter tuning in order to

GridSearchCV is a function that comes in Scikit-learn’s(or SK-learn)

from sklearn.model_selection import GridSearchCV

grid_result= grid_search.fit(X_train, y_train)

The main difference in the RandomizedSearch CV, when compared with

The computational time and model performs mainly depends on

We can implement RandomizedSearchCV by using the sklearn’s library.

Our cross-validation score is improved from 81.56% to 83.57% with the

You might also like