Boosting and Additive Tree

This document discusses techniques for boosting additive trees including: 1. Using gradient descent to numerically optimize the loss function and find the optimal additive tree model. 2. Controlling model complexity through techniques like limiting tree size, number of boosting iterations, and regularization to avoid overfitting. 3. Interpreting the final boosted tree model through variable importance measures and partial dependence plots to understand variable relationships.

Uploaded by

Jigar Patel

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Boosting and Additive Tree

Uploaded by

Jigar Patel

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 26

Boosting and Additive Trees (2)

Yi Zhang , Kevyn Collins-Thompson

Advanced Statistical Seminar 11-745
Oct 29, 2002
Recap: Boosting (1)
• Background: Ensemble Learning
• Boosting Definitions, Example
• AdaBoost
• Boosting as an Additive Model
• Boosting Practical Issues
• Exponential Loss
• Other Loss Functions
• Boosting Trees
• Boosting as Entropy Projection
• Data Mining Methods
Outline for This Class
• Find the solution based on numerical optimization
• Control the model complexity and avoid over
fitting
– Right sized trees for boosting
– Number of iterations
– Regularization
• Understand the final model (Interpretation)
• Single variable
• Correlation of variables
Numerical Optimization
• Goal: Find f that minimize the loss function over
training data
^ N
f  arg min L( f )  arg min  L( yi , f ( xi ))
f f i 1
• Gradient Descent Search in the unconstrained
function space to minimize the loss on training
L( yi , f ( xi ))
dataim
g  [ ] f ( x ) f ( x )   g1m , g 2m ,..., g Nm  T
f ( x ) i
i m1 i

 m  arg min L( f m1   * g m )


f m  f m1   * g m

• Loss
f m on
 { ftraining
m ( x1 ), f m (data
x2 ),...,converges
f m ( x N )}  {to
y1 ,zero
y2 ,..., y N }
Gradient Search on Constrained Function
Space: Gradient Tree Boosting

• Introduce a tree at the mth iteration whose predictions

tm are as close as possible to the negative gradient
~ N
  arg min  ( g im  T ( xi ; )) 2
 i 1

•Advantage compared with unconstrained gradient

search: Robust, less likely for over fitting
Algorithm 3: MART
1.Initialize f 0 ( x) to single terminal node tree
2. For m  1 to M :
a) Compute pseudo residuals rim based on loss function
b) Fit a regression tree to rim  R jm , j  1,2,... J m
c) Find the optimal value of coefficient within different region R lm
 jm  arg min
jm
 L( yi , f m1 ( xi )   )
xiR jm
Jm
d ) f m ( x)  f m1 ( x)    jm I ( x  R jm )
j 1
End For
^
Output : f  f M
View Boosting as Linear Model
• Basis expansion:
– use basis function Tm (m=1..M, each Tm is a
weak learner) to transform inputs vector X into
T space, then use linear models in this new
space
• Special for Boosting: Choosing of basis
function Tm depends on T1,… Tm-1
Improve Boosting as Linear Model
Recap: Linear Models in This Chapter: Improve
Chapter 3
Boosting
• Bias Variance trade off
1. Subset selection (feature 1. Size of the constituent
selection, discrete) trees J
2. Coefficient shrinkage 2. Number of boosting
(smoothing: ridge, lasso)
iterations M (subset
3. Using derived input direction
(PCA, PLA) selection)
• Multiple outcome shrinkage 3. Regularization (Shrinkage)
and selection
– Exploit correlations in
different outcomes
Right Size Tree for Boosting (?)
• The Best for one step is not the best in long run
– Using very large tree (such as C4.5) as weak learner to fit
the residue assumes each tree is the last one in the
expansion. Usually degrade performance and increase
computation
• Simple approach: restrict all trees to be the same size J
• J limits the input features interaction level of tree-
based approximation
• In practice low-order interaction effects tend to
dominate, and empirically 4J 8 works well (?)
Number of Boosting Iterations
(subset selection)

• Boosting will over fit as M -> 

• Use validation set
• Other methods … (later)
Shrinkage
• Scale the contribution of each tree by a
factor 0<<1 to control the learning rate
J ^
f m  f m1 ( x)   *   jm I ( x  R jm )
j 1
• Both  and M control prediction risk on the
training data, and operate dependently
  M
Penalized Regression
• Ridge regression or Lasso regression
^
N
ˆ ( )  arg min{ ( y    k Tk ( xi )) 2   * J ( )}
 i 1 k
K
J ( )    k2 Ridge Regression, L2 norm
k 1
K
J ( )   |  k | Lasso
k 1
Algorithm 4: Forward stagewise
linear
^
1. Initialize k  0, k  1,..., k , set   0 to some small constant and M large
2. For m  1 to M :
N K
a) (  , k )  arg min  ( y    lTl ( xi )  Tk xi ))2
* *
 ,k i 1 l 1
*
b) k   k    sign(  )
K
2.Output f(x)    kTk ( x)
k 1
If ˆ ( ) is
monotone in
, we have k|
k| =  M, and
the solution
for algorithm
4 is identical
to result of
lasso
regression as
described in
page 64.
( , M ) lasso regression S/t/
More about algorithm 4
• Algorithm 4  Algorithm 3 + Shrinkage
• L1 norm vs. L2 norm: more details later
– Chapter 12 after learning SVM
Interpretation: Understanding the
final model
• Single decision trees are easy to interpret
• Linear combination of trees is difficult to
understand
– Which features are important?
– What’s the interaction between features?
Relative Importance of
Individual Variables
– For a single tree, define the importance of xl as
 l2 (T )   improve in square error risk over for a constant fit over the region
over all node using x l for partition

– For additive tree, define the importance of xl as

M
1
2
l 
M
 l (Tm )
 2

m 1

– For K-class classification, just treat as K 2-class

classification task
Partial Dependence Plots
• Visualize dependence of approximation f(x)
on the joint values of important features
• Usually the size of the subsets is small (1-3)
• Define average or partial dependence
f ( X  )  E X C f ( X  , X c )   f ( X  , X c ) P( X c )dX c
Xc

• Can be estimated empirically using the

training data: f_ ( X )  1 N f ( X , x )
   iC
N i 1
10.50 vs. 10.52
10.50 : f ( X  )  E X C f ( X  , X c )   f ( X  , X c ) P( X c )dX c
~ X
10.52 : f ( X  )  E ( f ( X  , X c ) | X  )   f ( X  , X c ) P( X c | X  )dX c
c

Xc
• Same if predictor variables are independent
• Why use 10.50 instead of 10.52 to Measure Partial
Dependency?
– Example 1: f(X)=h1(xs)+ h2(xc)

10.50 : f ( X  )   (h1 ( X  )  h2 ( X c )) P( X c )dX c

 h1 ( X  )   h2 ( X c ) P ( X c )dX c  h1 ( X  )  Cons tan t

– Example 2: f(X)=h1X(xs)* h2(xc)
c
Conclusion
• Find the solution based on numerical
optimization
• Control the model complexity and avoid
over fitting
– Right sized trees for boosting
– Number of iterations
– Regularization
• Understand the final model (Interpretation)
• Single variable
• Correlation of variables

Xgboost Presentation
100% (3)
Xgboost Presentation
54 pages
GR 12 Inverse Function Investigation Memo
100% (1)
GR 12 Inverse Function Investigation Memo
11 pages
ml_cheat (1)
No ratings yet
ml_cheat (1)
9 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
CP 4
No ratings yet
CP 4
2 pages
Lec5 Boosting v2.7 1
No ratings yet
Lec5 Boosting v2.7 1
46 pages
Lec 29
No ratings yet
Lec 29
33 pages
09_EnsembleLearning
No ratings yet
09_EnsembleLearning
36 pages
Large Scale Machine Learning With Python - XGBOOST - P236
No ratings yet
Large Scale Machine Learning With Python - XGBOOST - P236
19 pages
DM - Lecture 4
No ratings yet
DM - Lecture 4
65 pages
ML mod1
No ratings yet
ML mod1
48 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Module 3.5 Ensemble Learning XGBoost
No ratings yet
Module 3.5 Ensemble Learning XGBoost
26 pages
1729585037_ML11_Generalization
No ratings yet
1729585037_ML11_Generalization
40 pages
Week 12
No ratings yet
Week 12
34 pages
Boosting Buehlmann
No ratings yet
Boosting Buehlmann
52 pages
Previous Year Placement Questions of ISI KOLKATA
No ratings yet
Previous Year Placement Questions of ISI KOLKATA
9 pages
ML Interview Questions and Answers
100% (1)
ML Interview Questions and Answers
25 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Boosting Mit
No ratings yet
Boosting Mit
36 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
09 Boosting
No ratings yet
09 Boosting
17 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
16 Boosting
No ratings yet
16 Boosting
7 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
gbt
No ratings yet
gbt
24 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Explicação de Cada Modelo IAML
No ratings yet
Explicação de Cada Modelo IAML
10 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
02 Section 12.4.1 QR Code Content
No ratings yet
02 Section 12.4.1 QR Code Content
8 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Assignment 3.Docx 2
No ratings yet
Assignment 3.Docx 2
23 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
DSA5102_lecture3
No ratings yet
DSA5102_lecture3
34 pages
RandomForest2324 CR - 4p
No ratings yet
RandomForest2324 CR - 4p
11 pages
Pa ZG512 Ec-3r First Sem 2022-2023
No ratings yet
Pa ZG512 Ec-3r First Sem 2022-2023
5 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Friedman 2002
No ratings yet
Friedman 2002
12 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
How To Learn Machine Learning Algorithms For Interviews
No ratings yet
How To Learn Machine Learning Algorithms For Interviews
16 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Boosted Tree
No ratings yet
Boosted Tree
41 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
AI Chapter 3 Part 3
No ratings yet
AI Chapter 3 Part 3
49 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
CS109a Lecture16 Bagging RF Boosting
No ratings yet
CS109a Lecture16 Bagging RF Boosting
48 pages
UNIT 1,2,3
No ratings yet
UNIT 1,2,3
17 pages
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Computational and Analytic Methods in Science and Engineering Christian Constanda - The latest ebook edition with all chapters is now available
100% (2)
Computational and Analytic Methods in Science and Engineering Christian Constanda - The latest ebook edition with all chapters is now available
55 pages
Differential Equation: DR Richard Kena Boadi
No ratings yet
Differential Equation: DR Richard Kena Boadi
17 pages
Week No. Topics Date: Mathematics Grade-9
No ratings yet
Week No. Topics Date: Mathematics Grade-9
3 pages
Linear Algebra
No ratings yet
Linear Algebra
11 pages
Lec13 Dynamic Programming
No ratings yet
Lec13 Dynamic Programming
47 pages
32 Linearization PDF
No ratings yet
32 Linearization PDF
2 pages
Mentary
No ratings yet
Mentary
5 pages
JEE Apex FTS – JEE Adv. Full Syllabus Test-04 [P2]_QP
No ratings yet
JEE Apex FTS – JEE Adv. Full Syllabus Test-04 [P2]_QP
16 pages
Week 2 Optional Assignment - Exponent Algebra
No ratings yet
Week 2 Optional Assignment - Exponent Algebra
12 pages
Calculo Upenn 2
No ratings yet
Calculo Upenn 2
7 pages
Algebraic Combinatorics - Po-Shen-Loh - MOP 2008
No ratings yet
Algebraic Combinatorics - Po-Shen-Loh - MOP 2008
7 pages
Lesson Plan: EE 2253 Control Systems
No ratings yet
Lesson Plan: EE 2253 Control Systems
6 pages
Observation Procedure of Settlement Prediction
No ratings yet
Observation Procedure of Settlement Prediction
15 pages
File Limits and Continuity
No ratings yet
File Limits and Continuity
32 pages
Double Pendulum
No ratings yet
Double Pendulum
13 pages
Fourier Analysis With Excel
No ratings yet
Fourier Analysis With Excel
20 pages
ALA - Assignment 3 2
No ratings yet
ALA - Assignment 3 2
2 pages
In Short, MODULE 3 More To EXPONENTS
No ratings yet
In Short, MODULE 3 More To EXPONENTS
10 pages
King Fahd University of Petroleum &minerals: Prep-Year Math Program
No ratings yet
King Fahd University of Petroleum &minerals: Prep-Year Math Program
9 pages
Iswasai Junior College: Ipe Examination
No ratings yet
Iswasai Junior College: Ipe Examination
2 pages
Triangular Distribution
No ratings yet
Triangular Distribution
3 pages
Table of Specifications Mathematics 8 1 Quarter: M8AL-Ia-b-1
No ratings yet
Table of Specifications Mathematics 8 1 Quarter: M8AL-Ia-b-1
2 pages
MATH 22 Calculus For Business
No ratings yet
MATH 22 Calculus For Business
2 pages
The_witch_of_agnesi
No ratings yet
The_witch_of_agnesi
12 pages
Unit-10.2_Basic-Calculus
No ratings yet
Unit-10.2_Basic-Calculus
34 pages
Introduction to Computational Earthquake Engineering 2nd Edition Muneo Hori - The ebook in PDF format with all chapters is ready for download
100% (3)
Introduction to Computational Earthquake Engineering 2nd Edition Muneo Hori - The ebook in PDF format with all chapters is ready for download
29 pages
Systematic Reduction of Irreducible Representations: N H: C: G
No ratings yet
Systematic Reduction of Irreducible Representations: N H: C: G
24 pages
Integral Equations
No ratings yet
Integral Equations
46 pages
Algebra Integrals and Integration
No ratings yet
Algebra Integrals and Integration
4 pages