0% found this document useful (0 votes)
462 views5 pages

Complete Guide To Parameter Tuning in Gradient Boosting (GBM) in Python PDF

The document provides an overview of parameter tuning in gradient boosting models. It discusses: 1. How boosting works by sequentially combining weak learners to improve predictions, focusing more on misclassified examples. 2. The types of parameters in gradient boosting models - tree parameters like max_depth that control overfitting, boosting parameters like learning_rate that control the impact of each tree, and other parameters. 3. An example parameter tuning process for gradient boosting models using techniques like cross-validation to select optimal values.

Uploaded by

Teodor von Burg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
462 views5 pages

Complete Guide To Parameter Tuning in Gradient Boosting (GBM) in Python PDF

The document provides an overview of parameter tuning in gradient boosting models. It discusses: 1. How boosting works by sequentially combining weak learners to improve predictions, focusing more on misclassified examples. 2. The types of parameters in gradient boosting models - tree parameters like max_depth that control overfitting, boosting parameters like learning_rate that control the impact of each tree, and other parameters. 3. An example parameter tuning process for gradient boosting models using techniques like cross-validation to select optimal values.

Uploaded by

Teodor von Burg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

Introduction
IfyouhavebeenusingGBMasablackboxtillnow,maybeitstimeforyoutoopenitandsee,how
itactuallyworks!
This article is inspired by Owen Zhangs (Chief Product Officer at DataRobot and Kaggle Rank 3)
approach shared at NYC Data Science Academy . He delivered a ~2 hours talk and I intend to
condenseitandpresentthemostpreciousnuggetshere.
Boosting algorithms play a crucial role in dealing with bias variance tradeoff. Unlike bagging
algorithms,whichonlycontrolsforhighvarianceinamodel,boostingcontrolsboththeaspects(bias
&variance),andisconsideredtobemoreeffective.AsincereunderstandingofGBMhereshould
giveyoumuchneededconfidencetodealwithsuchcriticalissues.
In this article, Ill disclose the science behind using GBM using Python.And, most important, how
youcantuneitsparametersandobtainincredibleresults.
SpecialThanks:Personally,IwouldliketoacknowledgethetimelesssupportprovidedbyMr.Sudalai
Rajkumar,currently AVRank2.Thisarticlewouldntbepossiblewithouthisguidance.Iamsurethe
wholecommunitywillbenefitfromthesame.

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

1/5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

TableofContents
1.HowBoostingWorks?
2.UnderstandingGBMParameters
3.TuningParameters(withExample)

1.HowBoostingWorks?
Boosting is a sequential technique which works on the principle ofensemble. It combines a set
of weak learners and delivers improved prediction accuracy. At any instant t, the model
outcomes are weighed based on the outcomes of previous instant t1. The outcomes predicted
correctly are given a lower weight and the ones missclassified are weighted higher. This
techniqueisfollowedforaclassificationproblemwhileasimilartechniqueisusedforregression.
Letsunderstanditvisually:

Observations:
1.Box1:OutputofFirstWeakLearner(Fromtheleft)
Initiallyallpointshavesameweight(denotedbytheirsize).
Thedecisionboundarypredicts2+veand5vepointscorrectly.
2.Box2:OutputofSecondWeakLearner
Thepointsclassifiedcorrectlyinbox1aregivenalowerweightandviceversa.
Themodelfocusesonhighweightpointsnowandclassifiesthemcorrectly.But,othersare
misclassifiednow.

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

2/5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

Similartrendcanbeseeninbox3aswell.Thiscontinuesformanyiterations.Intheend,allmodels
aregivenaweightdependingontheiraccuracyandaconsolidatedresultisgenerated.
DidIwhetyourappetite?Good.Refertothesearticles(focusonGBMrightnow):
LearnGradientBoostingAlgorithmforbetterpredictions(withcodesinR)
QuickIntroductiontoBoostingAlgorithmsinMachineLearning
GettingsmartwithMachineLearningAdaBoostandGradientBoost

2.GBMParameters
Theoverallparameterscanbedividedinto3categories:
1.TreeSpecificParameters:Theseaffecteachindividualtreeinthemodel.
2.BoostingParameters:Theseaffecttheboostingoperationinthemodel.
3.MiscellaneousParameters:Otherparametersforoverallfunctioning.

Illstartwithtreespecificparameters.First,letslookatthegeneralstructureofadecisiontree:

Theparametersusedfordefiningatreearefurtherexplainedbelow.NotethatImusingscikitlearn
(python)specificterminologiesherewhichmightbedifferentinothersoftwarepackageslikeR.But

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

3/5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

theidearemainsthesame.
1.min_samples_split
Definestheminimumnumberofsamples(orobservations)whicharerequiredinanodetobe
consideredforsplitting.
Usedtocontroloverfitting.Highervaluespreventamodelfromlearningrelationswhichmightbe
highlyspecifictotheparticularsampleselectedforatree.
Toohighvaluescanleadtounderfittinghence,itshouldbetunedusingCV.
2.min_samples_leaf
Definestheminimumsamples(orobservations)requiredinaterminalnodeorleaf.
Usedtocontroloverfittingsimilartomin_samples_split.
Generallylowervaluesshouldbechosenforimbalancedclassproblemsbecausetheregionsin
whichtheminorityclasswillbeinmajoritywillbeverysmall.
3.min_weight_fraction_leaf
Similartomin_samples_leafbutdefinedasafractionofthetotalnumberofobservationsinsteadof
aninteger.
Onlyoneof#2and#3shouldbedefined.
4.max_depth
Themaximumdepthofatree.
Usedtocontroloverfittingashigherdepthwillallowmodeltolearnrelationsveryspecifictoa
particularsample.
ShouldbetunedusingCV.
5.max_leaf_nodes
Themaximumnumberofterminalnodesorleavesinatree.
Canbedefinedinplaceofmax_depth.Sincebinarytreesarecreated,adepthofnwould
produceamaximumof2^nleaves.
Ifthisisdefined,GBMwillignoremax_depth.
6.max_features
Thenumberoffeaturestoconsiderwhilesearchingforabestsplit.Thesewillberandomly
selected.
Asathumbrule,squarerootofthetotalnumberoffeaturesworksgreatbutweshouldcheckupto
3040%ofthetotalnumberoffeatures.
Highervaluescanleadtooverfittingbutdependsoncasetocase.

Beforemovingontootherparameters,letsseetheoverallpseudocodeoftheGBMalgorithmfor2
classes:

1.Initializetheoutcome

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

4/5

10/7/2016

CompleteGuidetoParameterTuninginGradientBoosting(GBM)inPython

2.Iteratefrom1tototalnumberoftrees
2.1Updatetheweightsfortargetsbasedonpreviousrun(higherfortheonesmisclassified)
2.2Fitthemodelonselectedsubsampleofdata
2.3Makepredictionsonthefullsetofobservations
2.4Updatetheoutputwithcurrentresultstakingintoaccountthelearningrate
3.Returnthefinaloutput.

This is an extremely simplified (probably naive) explanation of GBMs working. The parameters
whichwehaveconsideredsofarwillaffectstep2.2,i.e.modelbuilding.Letsconsideranothersetof
parametersformanagingboosting:
1.learning_rate
Thisdeterminestheimpactofeachtreeonthefinaloutcome(step2.4).GBMworksbystarting
with an initial estimate which is updated using the output of each tree. The learning parameter
controlsthemagnitudeofthischangeintheestimates.
Lower values are generally preferred as they make the model robust to the specific
characteristicsoftreeandthusallowingittogeneralizewell.
Lower values would require higher number of trees to model all the relations and will be
computationallyexpensive.
2.n_estimators
Thenumberofsequentialtreestobemodeled(step2)
ThoughGBMisfairlyrobustathighernumberoftreesbutitcanstilloverfitatapoint.Hence,this
shouldbetunedusingCVforaparticularlearningrate.
3.subsample
Thefractionofobservationstobeselectedforeachtree.Selectionisdonebyrandomsampling.
Valuesslightlylessthan1makethemodelrobustbyreducingthevariance.
Typicalvalues~0.8generallyworkfinebutcanbefinetunedfurther.

Apartfromthese,therearecertainmiscellaneousparameterswhichaffectoverallfunctionality:
1.loss
Itreferstothelossfunctiontobeminimizedineachsplit.
It can have various values for classification and regression case. Generally the default values
workfine.Othervaluesshouldbechosenonlyifyouunderstandtheirimpactonthemodel.
2.init
Thisaffectsinitializationoftheoutput.
Thiscanbeusedifwehavemadeanothermodelwhoseoutcomeistobeusedastheinitial

http://www.analyticsvidhya.com/blog/2016/02/completeguideparametertuninggradientboostinggbmpython/

5/5

You might also like