Gradient Boosting - Hyperparameter Tuning Python
Gradient Boosting - Hyperparameter Tuning Python
Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python
Aarshay Jain
15 Jun, 2022 • 14 min read
Overview
Learn parameter tuning in gradient boosting algorithm
using Python
Understand how to adjust bias-variance trade-off in
machine learning for gradient boosting
Introduction
If you have been using GBM as a ‘black box’ till now,
maybe it’s time for you to open it and see, how it actually
works!
This article is inspired by Owen Zhang’s (Chief Product
Officer at DataRobot and Kaggle Rank 3) approach
shared at NYC Data Science Academy. He delivered a~2
hours talk and I intend to condense it and present the
most precious nuggets here.
Boosting algorithms play a crucial role in dealing with bias
variance trade-off. Unlike bagging algorithms, which only
controls for high variance in a model, boosting controls
both the aspects (bias & variance), and is considered to
be more effective. A sincere understanding of GBM here
should give you much needed confidence to deal with
such critical issues.
We use cookies on Analytics VidhyaInwebsites
this article, I’ll our
to deliver disclose
services,theanalyze
science behindandusing
web traffic, GBM
improve your experience on the site. By using
Analytics
usingVidhya,
Python.youAnd,
agree most
to our important,
Privacy Policyhow
and Terms of Use.
you can tuneAccept
its
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 1/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
Table of Contents
1. How Boosting Works?
2. Understanding GBM Parameters
3. Tuning Parameters (with Example)
Observations:
1. Box 1: Output of First Weak Learner (From the left)
Initially all points have same weight (denoted by
their size).
The decision boundary predicts 2 +ve and 5 -ve
points correctly.
2. Box 2: Output of Second Weak Learner
The points classified correctly in box 1 are given a
lower weight and vice versa.
The model focuses on high weight points now and
classifies them correctly. But, others are
misclassified now.
Similar trend can be seen in box 3 as well. This continues
for many iterations. In the end, all models are given a
weight depending on their accuracy and a consolidated
result is generated.
Did I whet your appetite? Good. Refer to these articles
(focus on GBM right now):
Learn Gradient Boosting Algorithm for better
predictions (with codes in R)
Quick Introduction to Boosting Algorithms in Machine
Learning
Getting smart with Machine Learning – AdaBoost and
Gradient Boost
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 3/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 5/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 9/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python
parametertun @ShilpiMazumdar1
32
Show files Open on Replit
main.py
1 #Import libraries:
2 import pandas as pd
3 import numpy as np
4
5 #import sklearn
6 from sklearn.ensemble import
GradientBoostingClassifier #GBM algorithm
7 from sklearn.model_selection import GridSearchCV
8 from sklearn.model_selection import train_test_split
9 #from sklearn.grid_search import GridSearchCV
#Perforing grid search
10
11 import matplotlib.pylab as plt
12 #%matplotlib inline
13 from matplotlib.pylab import rcParams
14 rcParams['figure.figsize'] = 12, 4
15
16 #train = pd.read_csv('train_modified.csv', encoding =
'utf-8')
17 train = pd.read_csv('train_modified.csv',
encoding='ISO-8859–1')
18 target = 'Disbursed'
19 IDcol = 'ID'
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 10/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
Complete Machine Learning Guide to Parameter Tuning in Gradient Boosting (GBM) in Python
alg.fit(dtrain[predictors], dtrain['Disbursed'])
32
#Predict training set:
dtrain_predictions = alg.predict(dtrain[predictors])
dtrain_predprob = alg.predict_proba(dtrain[predictors])
#Perform cross-validation:
if performCV:
cv_score = cross_validation.cross_val_score(alg, dt
if performCV:
print "CV Score : Mean - %.7g | Std - %.7g | Min - %
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
General
Analytics Approach
Vidhya, you agree toforourParameter
Privacy PolicyTuning
and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 11/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
min_samples_leaf
Complete Machine Learning2.Guide to Parameter=Tuning
50 : Canin Gradient
be selected based on
Boosting (GBM) in Python
32
intuition. This is just used for preventing overfitting
and again a small value because of imbalanced
classes.
3. max_depth = 8 : Should be chosen (5-8) based on the
number of observations and predictors. This has 87K
rows and 49 columns so lets take 8 here.
4. max_features = ‘sqrt’ : Its a general thumb-rule to
start with square root.
5. subsample = 0.8 : This is a commonly used used start
value
Please note that all the above are just initial estimates and
will be tuned later. Lets take the default learning rate of
0.1 here and check the optimum number of trees for that.
For this purpose, we can do a grid search and test out
values from 20 to 80 in steps of 10.
#Choose all predictors except target & IDcols
predictors = [x for x in train.columns if x not in [target,
param_test1 = {'n_estimators':range(20,81,10)}
gsearch1 = GridSearchCV(estimator = GradientBoostingClassifi
param_grid = param_test1, scoring='roc_auc',n_jobs=4,iid=Fa
gsearch1.fit(train[predictors],train[target])
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 15/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 16/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 17/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 18/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
End Notes
This article was based on developing a GBM ensemble
learning model end-to-end. We started with an
introduction to boosting which was followed by detailed
discussion on the various parameters involved. The
parameters were divided into 3 categories namely the
tree-specific, boosting and miscellaneous parameters
depending on their impact on the model.
Finally, we discussed the general approach towards
tackling a problem with GBM and also worked out the AV
Data Hackathon 3.x problem through that approach.
I hope you found this useful and now you feel more
We use cookies on Analytics Vidhyaconfident
websites totodeliver
applyourGBM in solving
services, analyzeaweb
datatraffic,
science problem.
and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 19/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
Aarshay Jain
15 Jun 2022
Aarshay graduated from MS in Data Science at Columbia
University in 2017 and is currently an ML Engineer at
Spotify New York. He works at an intersection or applied
research and engineering while designing ML solutions to
move product metrics in the required direction. He
specializes in designing ML system architecture,
developing offline models and deploying them in
production for both batch and real time prediction use
cases.
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 20/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
anurag
22 Feb, 2016
Great Article!! Can you do this for SVM,XGBoost, deep
learning and neural networks.
1 Show 1 reply
Jignesh Vyas
23 Feb, 2016
Wow great article, pretty much detailed and easy to
understand. Am a great fan of articles posted on this
site. Keep up the good work !
1 Show 1 reply
Pallavi
23 Feb, 2016
absolutely fantastic article. Loved the step by step
approach. Would love to read more of these on SVM,
deep learning Also it would be fantastic to have R
Write for us
Write, captivate, and earn accolades and rewards for your
work
Company Discover
About Us Blogs
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Contact
Analytics Us you agree to our PrivacyExpert
Vidhya, Policysession
and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 21/22
20/05/2024, 11:44 Gradient Boosting | Hyperparameter Tuning Python
Download App
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using
Analytics Vidhya, you agree to our Privacy Policy and Terms of Use. Accept
https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/ 22/22