Datagiri: Presented 17 November By: Himanshu Shrivastava
Datagiri: Presented 17 November By: Himanshu Shrivastava
1
Objective of Machine Learning?
• Find a function from data which is able to make accurate predictions in future
https://www.wired.com/2012/04/netflix-prize-costs/
https://www.risk.net/asset-management/6119616/blackrock-shelves-unexplainable-ai-liquidity-models
2
Good Visualizations!!!
Tree based models : http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
Bias Variance tradeoff: http://www.r2d3.us/visual-intro-to-machine-learning-part-2/
Bias Variance Trade Off: (Model Complexity vs Error)
3
Additive Modeling
Foundation of bagging and boosting algorithms
4
Illustration:
Mathematically,
5
Random Forest Classifier – Uses Bagging
6
AdaBoost
In each stage, introduce a weak learner to compensate the shortcomings of existing weak
learners
7
8
Building Intuition – Gradient Boosting Machines
• You are given (x1, y1),(x2, y2), ...,(xn, yn), and the task is to fit a model F(x) to minimize square loss.
• Suppose your friend wants to help you and gives you a model F. You check his model and find the
model is good but not perfect.
• There are some mistakes: F(x1) = 0.8, while y1 = 0.9, and F(x2) = 1.4 while y2 = 1.3...
• You are not allowed to remove anything from F or change any parameter in F
• You can add an additional model (regression tree) h to F, so the new prediction will be F(x) + h(x).
9
10
yi − F(xi) are called residuals
These are the parts that existing model F cannot do well. The role of h is
to compensate the shortcoming of existing model F
Yes! Because we are building a model, and the model can be applied to
test data as well.
11
Gradient Boosting
• Gradient Boosting = Gradient Descent + Boosting
• In each stage, introduce a weak learner to compensate the shortcomings of existing weak
learners
• Both high-weight data points and gradients tell us how to improve our model.
12
GBM / Residual Modeling?
13
GBM (Gradient Boosting Machine)
Sequence of simple trees built in succession on the prediction residuals of preceding tree so Algorithm:
as to improve on the prediction • First tree fitted to
data/mean(y) can be used
• Residuals are computed
• Regression tree is fitted to the
residuals from the preceding
tree
Compute Residuals • Process repeated through a
Rmi = yi – Fm-1(xi) chain of successive trees
• Final predicted value is sum of
weighted contribution of each
tree
Features:
• They often have a degree of accuracy that cannot be obtained using a large, single-tree model.
• Can handle hundreds or thousands of potential predictor variables.
• Irrelevant predictor variables are identified automatically and do not affect the predictive model.
• They are invariant under all (strictly) monotone transformations of the predictor variables. So transformations such as (a*x+b),
log(x) or exp(x) do not affect the model.
• Can handle both continuous and categorical predictor and target variables.
14
Illustration of Iterative boosting
In case of Gaussian regression, gradient boosting is equivalent to iteratively re-fitting the residuals of the model
mstop is chosen using the cross validation technique such that it maximizes accuracy
15
Important parameters of GBM
Total number of trees to fit. This
Available options are "gaussian"
is equivalent to the number of
(squared error), "laplace" N-trees
iterations and the number of
(absolute loss), "bernoulli"
basis functions in the additive
(logistic regression for 0-1
expansion.
outcomes), "adaboost" (the
AdaBoost exponential loss for 0-
Distribution Shrinkage
1 outcomes), "poisson" (count
outcomes), and "coxph" (right
censored observations).
GBM
This is to retard the learning rate
of the series, so that series is
longer and accuracy is better
Cross
Interactions
Validation
Depth to which interaction must
be considered is specified using
this parameter. Bag Fraction
16
Thank You
17