0% found this document useful (0 votes)
38 views17 pages

Datagiri: Presented 17 November By: Himanshu Shrivastava

The document provides an overview of gradient boosting machines (GBM). It explains that GBM fits an additive model in a stage-wise fashion by introducing weak learners to compensate for the shortcomings of existing models, where shortcomings are identified by residuals. The key steps of the GBM algorithm are outlined, including computing residuals, fitting a regression tree to the residuals from the previous tree, and repeating this process through a chain of successive trees where the final prediction is the sum of the weighted contributions of each tree. Important GBM parameters are also listed.

Uploaded by

neeraj12121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views17 pages

Datagiri: Presented 17 November By: Himanshu Shrivastava

The document provides an overview of gradient boosting machines (GBM). It explains that GBM fits an additive model in a stage-wise fashion by introducing weak learners to compensate for the shortcomings of existing models, where shortcomings are identified by residuals. The key steps of the GBM algorithm are outlined, including computing residuals, fitting a regression tree to the residuals from the previous tree, and repeating this process through a chain of successive trees where the final prediction is the sum of the weighted contributions of each tree. Important GBM parameters are also listed.

Uploaded by

neeraj12121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

DataGiri

Presented 17th November


By : Himanshu Shrivastava

1
Objective of Machine Learning?

• Solving real life problems using data

• Find a function from data which is able to make accurate predictions in future

• This function/model should be an implementable and interpretable solution

 https://www.wired.com/2012/04/netflix-prize-costs/

 https://www.risk.net/asset-management/6119616/blackrock-shelves-unexplainable-ai-liquidity-models

2
Good Visualizations!!!
Tree based models : http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
Bias Variance tradeoff: http://www.r2d3.us/visual-intro-to-machine-learning-part-2/
Bias Variance Trade Off: (Model Complexity vs Error)

3
Additive Modeling
 Foundation of bagging and boosting algorithms

 Add a bunch of simple terms together to create a more complicated


expression

 A useful technique because we can often conjure up the simple


terms more easily than cracking the overall function in one go

4
Illustration:

Mathematically,

5
Random Forest Classifier – Uses Bagging

Training Data  For each test example,


every decision tree votes.
Building  Example is labeled as the
Decision class having majority of the
Trees votes
....…
Bagging

6
AdaBoost

 Fit an additive model (ensemble) in a forward stage-wise manner

 In each stage, introduce a weak learner to compensate the shortcomings of existing weak
learners

 “Shortcomings” are identified by high-weight data points

7
8
Building Intuition – Gradient Boosting Machines
• You are given (x1, y1),(x2, y2), ...,(xn, yn), and the task is to fit a model F(x) to minimize square loss.
• Suppose your friend wants to help you and gives you a model F. You check his model and find the
model is good but not perfect.
• There are some mistakes: F(x1) = 0.8, while y1 = 0.9, and F(x2) = 1.4 while y2 = 1.3...

• How can you improve this model?

Rules of the game:

• You are not allowed to remove anything from F or change any parameter in F
• You can add an additional model (regression tree) h to F, so the new prediction will be F(x) + h(x).

9
10
 yi − F(xi) are called residuals

 These are the parts that existing model F cannot do well. The role of h is
to compensate the shortcoming of existing model F

 If the new model F + h is still not satisfactory, we can add another


regression tree...

 We are improving the predictions of training data, is the procedure also


useful for test data?

 Yes! Because we are building a model, and the model can be applied to
test data as well.

11
Gradient Boosting
• Gradient Boosting = Gradient Descent + Boosting

• Fit an additive model (ensemble) in a forward stage-wise manner

• In each stage, introduce a weak learner to compensate the shortcomings of existing weak
learners

• In Gradient Boosting, “shortcomings” are identified by residuals/gradients

• Recall that, in Adaboost, “shortcomings” are identified by high-weight data points

• Both high-weight data points and gradients tell us how to improve our model.

12
GBM / Residual Modeling?

13
GBM (Gradient Boosting Machine)
Sequence of simple trees built in succession on the prediction residuals of preceding tree so Algorithm:
as to improve on the prediction • First tree fitted to
data/mean(y) can be used
• Residuals are computed
• Regression tree is fitted to the
residuals from the preceding
tree
Compute Residuals • Process repeated through a
Rmi = yi – Fm-1(xi) chain of successive trees
• Final predicted value is sum of
weighted contribution of each
tree

• Rmi is the residual of ith


observation of mth tree
• i is the indexes observations
Fm=Fm-1 + last regression tree • Fm-1 is the weighted sum of all
of residuals previous m-1 regression trees

Features:
• They often have a degree of accuracy that cannot be obtained using a large, single-tree model.
• Can handle hundreds or thousands of potential predictor variables.
• Irrelevant predictor variables are identified automatically and do not affect the predictive model.
• They are invariant under all (strictly) monotone transformations of the predictor variables. So transformations such as (a*x+b),
log(x) or exp(x) do not affect the model.
• Can handle both continuous and categorical predictor and target variables.

14
Illustration of Iterative boosting
In case of Gaussian regression, gradient boosting is equivalent to iteratively re-fitting the residuals of the model

mstop is chosen using the cross validation technique such that it maximizes accuracy

y = (0.5 - 0.9 e-50 *x2) x + 0.02 e Residuals

15
Important parameters of GBM
Total number of trees to fit. This
Available options are "gaussian"
is equivalent to the number of
(squared error), "laplace" N-trees
iterations and the number of
(absolute loss), "bernoulli"
basis functions in the additive
(logistic regression for 0-1
expansion.
outcomes), "adaboost" (the
AdaBoost exponential loss for 0-
Distribution Shrinkage
1 outcomes), "poisson" (count
outcomes), and "coxph" (right
censored observations).

GBM
This is to retard the learning rate
of the series, so that series is
longer and accuracy is better
Cross
Interactions
Validation
Depth to which interaction must
be considered is specified using
this parameter. Bag Fraction

Segments of data are used for


model building and validation
based on this parameter. Also, it
helps in finding max iter.
At each iteration, only a random
fraction, bag, of the residuals is
selected and tree is built on this
subset.

16
Thank You 

17

You might also like