0% found this document useful (0 votes)

24 views8 pages

02 Section 12.4.1 QR Code Content

Gradient tree boosting is an algorithm that combines decision trees in a stage-wise fashion to optimize a loss function. It builds the trees sequentially as it learns from the mistakes of the previous trees. The algorithm reduces the loss at each stage by learning the residuals from the previous stage. Weak learners like decision stumps are used in a boosting framework to perform regression or classification tasks.

Uploaded by

Riya Putti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views8 pages

02 Section 12.4.1 QR Code Content

Uploaded by

Riya Putti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

GRADIENT TREE BOOSTING

Gradient tree boosting is also known as Gradient Boosting Machine (GBM) or Gradient Boosted
Regression Tree (GBRT). Gradient Boosting is a combination of two procedures called Gradient
Descent and Boosting. AdaBoost is the first boosting algorithm which works by boosting the weights
of data instances learned by a weak learner and fed to subsequent weak learners in a sequential
fashion in order to increase the accuracy of predictions, and final prediction is estimated by the
accuracy of the predictions by the individual weak learners.
Gradient boosting or Gradient tree boosting is also a boosting algorithm which employs weak learners
in a sequential fashion but optimizes the loss function or minimizes the error at each stage by moving
in the opposite direction of the gradient to reach the minima. The algorithm boosts the residual errors
made by a weak learner rather than boosting the weights of misclassified data instances as done by
AdaBoost algorithm.
The general idea of gradient boosting is learning from mistake and to reduce the loss function or the
average of the squared errors.
The general formula for mean squared error or loss function is,
N
MSE (Mean Squared Error = loss = ∑ ( y actual− y predicted)
2

i=1

MSE is a measure of average of the squares of the errors between the actual values and the predicted
values.
The weak learners used in Gradient tree boosting are single split decision trees called a decision
stump. The predictions made by a weak learner are always combined with the predictions made by
previous weak learners. Then gradient boosting estimates the residual errors from the combined
predictions and train another weak learner with the estimated residuals. This procedure is iteratively
followed until the loss function is optimized.
There exist several implementations of the Gradient tree boosting framework such as Gradient
Boosting Machines (GBM), Extreme Gradient Boosting System (XGBoost), LightGBM, and
Catboost etc.

Algorithm:
Input: Training data set T with N data instances.
M weak learners.
Step 1: Train a weak learner (Decision Stump) on the training data set T. Estimate the
Decision Tree prediction T1 = DT(X1).
Step 2: Compute the residual error of this weak learner R1 = Y – T1,
where R1 is the residual error computed by the difference between actual
target value Y and the predicted target value T1 by the weak learner.
Step 3: Train another weak learner on residual error Ri (1≤ i ≤ M) as target variable.
Estimate the Decision Tree prediction Ti+1 = DT (Xi+1).
Step 4: Compute the combined predictions of the weak learner’s Ci = Ti + Ti+1.
Step 5: Compute Residual error of the combined prediction by the two weak learners R i+1
= Y- Ci.
Step 6: Estimate the Mean Squared Error (MSE) or loss function of Ri+1.
Step 7: Repeat Step 3 to Step 6 until the MSE becomes constant or the number of trees
we set to train M is reached.

Example 1: Consider a training dataset of 10 data instances which describes the skills of individual
students with attributes such as Practical Knowledge, Interactiveness and Aptitude as shown in Table
1. The target variable is CGPA which is a continuous valued variable. Based on the skill set of a
student, predict the CGPA that can be scored. It is a regression problem.
Table 1: Training Dataset
S.No Interactiveness Practical Aptitude CGPA
. Knowledge

1 Yes Good Good 9.5

2 No Average Good 8.2
3 No Good Good 9.1
4 No Average Poor 6.8
5 Yes Good Poor 8.5
6 Yes Good Good 9.5
7 Yes Average Poor 7.9
8 No Good Good 9.1
9 Yes Good Good 8.8
10 Yes Average Poor 9

Solution:

Stage 1:
Train a Decision Stump based on the attribute Practical Knowledge on the training dataset T as shown
in Figure 1. Estimate the Decision Tree prediction T1 = DT(X1).
Figure 1

Mean of target variable CGPA based on (X1 = Practical Knowledge ∈ {Good}) = 9.5+ 9.1+8.5+
9.5+ 9.1+ 8.8 = 9.08.
Mean of target variable CGPA based on (X1 = Practical Knowledge ∈ {Average}) = 8.2 + 6.8 + 7.9
+ 9 = 7.98.
Use the Decision Stump DT(X1) = HPractical Knowledge and estimate the prediction T1.
Calculate Residual error of this Decision Stump R1 = Y-T1 as shown in Table 2.

Table 2: Residual Error Estimation

S.No. Target X1 = Practical Tree1 Residual Error
Y= CGPA Knowledge prediction R1 = Y – T1
T1 = DT(X1)

1 9.5 Good 9.08 0.42

2 8.2 Average 7.98 0.22
3 9.1 Good 9.08 0.02
4 6.8 Average 7.98 -1.18
5 8.5 Good 9.08 -0.58
6 9.5 Good 9.08 0.42
7 7.9 Average 7.98 -0.08
8 9.1 Good 9.08 0.02
9 8.8 Good 9.08 -0.28
10 9 Average 7.98 1.02

N
Mean Squared Error or loss R1 = ∑ ( y actual− y predicted) is calculated as 0.3256.
2

i=1
Stage 2:
Train a Decision Stump based on the attribute Interactiveness on the residual error calculated in the
previous stage as shown in Figure 2.

Figure 2

Mean of Residual error R1 based on (X2 = Interactiveness ∈ {Yes}) = 0.42 + -0.58 + 0.42 + -0.08 +
-0.28 + 1.02 = 0.153.
Mean of Residual error R1 based on (X2 = Interactiveness ∈ {No}) = 0.22 + 0.02 + -1.18+ 0.02 = -
0.23.
Use the Decision Stump DT(X2) = HInteractiveness as shown in Figure 13.14 and estimate the prediction
T2.
Calculate the combined prediction C1 = T1 + T2.
Calculate Residual error of the combined prediction by the two weak learners R2 = Y- C1.

Table 3 shows the combined prediction by the two decision stumps and the estimation of residual
error.

Table 3: Residual Error

S.No. Target Tree 1 Residual X2= Tree2 Combined Residual
Y=CGPA prediction R1 = Interacti Prediction Prediction Error
T1 = F(X1) Y – T1 veness T2 = C1= R2 =
DT(X2) T1 + T2 Y- C1
1 9.5 9.08 0.42 Yes 0.153 9.233 0.267
2 8.2 7.98 0.22 No -0.23 7.75 0.45
3 9.1 9.08 0.02 No -0.23 8.85 0.25
4 6.8 7.98 -1.18 No -0.23 7.75 -0.95
5 8.5 9.08 -0.58 Yes 0.153 9.233 -0.733
6 9.5 9.08 0.42 Yes 0.153 9.233 0.267
7 7.9 7.98 -0.08 Yes 0.153 8.133 -0.233
8 9.1 9.08 0.02 No -0.23 8.85 0.25
9 8.8 9.08 -0.28 Yes 0.153 9.233 -0.433
10 9 7.98 1.02 Yes 0.153 8.133 0.867

Mean Squared Error or loss R2 is calculated as 0.2903334.

Stage 3:
Train a Decision Stump based on the attribute Aptitude on the residual error calculated in the Stage 2
as shown in Figure 3.

Figure 3

Mean of Residual error R2 based on (X3 = Aptitude ∈ {Good}) = 0.267 + 0.25 + 0.267 + 0.25 + -
0.433 + 0.45 = 0.175
Mean of Residual error R2based on (X3 =Aptitude ∈ {Poor}) = -0.95 + -0.733 + -0.233+ 0.867 = -
0.587
Use the Decision Stump DT(X3) = HAptitude and estimate the prediction T3.
Calculate the combined prediction C2 = C1 + T3.
Calculate Residual error of the combined prediction by the two weak learners R3 = Y - C2.

Table 4 shows the combined prediction and the final residual after training with three decision
stumps.

Table 4: Residual Error

S.No. Target Combined Residual X3 = Tree3 Combined Final
Y= Prediction R2 = Aptitude Prediction Prediction Residual
CGPA C1= Y- C1 T3=DT(X3) C2 = R3 =
T1 + T2 C1 + T3 Y – C2
1 9.5 9.233 0.267 good 0.175 9.408 0.092
2 8.2 7.75 0.45 good 0.175 7.925 0.275
3 9.1 8.85 0.25 good 0.175 9.025 0.075
4 6.8 7.75 -0.95 poor -0.587 7.163 -0.363
5 8.5 9.233 -0.733 poor -0.587 8.646 -0.146
6 9.5 9.233 0.267 good 0.175 9.408 0.092
7 7.9 8.133 -0.233 poor -0.587 7.546 0.354
8 9.1 8.85 0.25 good 0.175 9.025 0.075
9 8.8 9.233 -0.433 good 0.175 9.408 -0.608
10 9 8.133 0.867 poor -0.587 7.546 1.454

Mean Squared Error or loss R3 is calculated as 0.2865984.

We can observe that the MSE is getting reduced as we add the predictions of weak learners. This
procedure is iteratively followed by adding a weak learner and optimizing the loss function until it
becomes constant or reduced to 0.

Advantages
It reduces bias and variance.
Disadvantages
It is a greedy algorithm that can generally overfit with the training data.
XGBoost

XGBoost stands for Extreme Gradient Boosting. It is a tree ensemble model that belongs to the family
of Gradient Boosting Decision Tree (GBDT) models. It is a scalable tree boosting system that
combines both hardware optimizations and algorithmic optimizations to provide computational speed
and high performance with optimized usage of resources. The boosting model proves an improvement
by adding regularization with Gradient boosting to avoid overfitting and bias. This model provides an
open end-to-end tree boosting scalable system. This algorithm has become very popular with Kaggle
competitions because of its high performance and scalability. It solves for both regression and
classification based real world problems with large data sets and many features.
Some of the important features added to XGBoost that makes the model perform better than boosting
algorithms are,
 Regularization
Regularization has been added to the boosting model that smooth the final learnt weights to
avoid over-fitting. It penalizes more complex models through both LASSO (L1) and Ridge
(L2) regularization to prevent overfitting.

 In-built cross validation that avoids explicit learning which further improves its
performance.

 Post Pruning
The model employs post tree pruning using DFS. It allows the tree to grow to maximum
depth and prune the trees backward by recursively pruning the leaf nodes with negative gain.

It adds two algorithmic optimizations to gradient boosting called,

 Weighted Approximate Quantile Sketch procedure that handles weighted instances for
approximate tree learning by finding optimal split points during tree learning.

 Sparse Aware algorithm is to handle missing data values or sparse features more efficiently.

It also supports hardware optimizations such as,

 Out of core computing
This supports for handling huge datasets that do not fit into memory. It optimizes the
available disk space and maximizes its usage when handling large amount of data.

 Parallelization
It employs parallelized tree building process of tree construction using multiple cores on the
CPU during training.

 Cache awareness
Cache optimization is provided by allocating internal buffers in each thread to store gradient
statistics.

 Distributed Computing
It uses a cluster of machines for training very large model.

SUMMARY

1. Gradient tree boosting is also known as Gradient Boosting Machine (GBM) or Gradient
Boosted Regression Tree (GBRT) is a boosting algorithm which employs weak learners in a
sequential fashion but optimizes the loss function or minimizes the error at each stage by
moving in the opposite direction of the gradient to reach the minima.
2. XGBoost stands for Extreme Gradient Boosting is a tree ensemble model that belongs to the
family of Gradient Boosting Decision Tree (GBDT) models.
3. XGBoost is a scalable tree boosting system that combines both hardware optimizations and
algorithmic optimizations to provide computational speed and high performance with
optimized usage of resources.

Xgboost Presentation
100% (3)
Xgboost Presentation
54 pages
Diabetic Retinopathy Presentation 30 Slides
No ratings yet
Diabetic Retinopathy Presentation 30 Slides
30 pages
WebNMS Technical Guide PDF
No ratings yet
WebNMS Technical Guide PDF
2,204 pages
BT205 BCE Lab Manual - 1702453890
No ratings yet
BT205 BCE Lab Manual - 1702453890
21 pages
Diabetic Retinopathy Presentation
No ratings yet
Diabetic Retinopathy Presentation
5 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Tetris Development For Android
0% (2)
Tetris Development For Android
53 pages
Ad3311 - Ai Lab Manual
No ratings yet
Ad3311 - Ai Lab Manual
37 pages
Enable or Disable Concurrent Prog Parameters Dynamically
No ratings yet
Enable or Disable Concurrent Prog Parameters Dynamically
14 pages
Fundamentals of Utilization of Electrical Energy July 2023
No ratings yet
Fundamentals of Utilization of Electrical Energy July 2023
4 pages
Boosting
No ratings yet
Boosting
13 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Distributed Shared Memory For Advanced Os
No ratings yet
Distributed Shared Memory For Advanced Os
21 pages
JCL Utilities Quick Reference
100% (1)
JCL Utilities Quick Reference
61 pages
Unit3 ML
No ratings yet
Unit3 ML
23 pages
C by Examples
No ratings yet
C by Examples
803 pages
Gradient Boosting: November 2020
100% (1)
Gradient Boosting: November 2020
7 pages
Pear DB
No ratings yet
Pear DB
29 pages
DV - Resume - Shubham - 4
No ratings yet
DV - Resume - Shubham - 4
2 pages
ML Unit 2 r20 Jntuk
No ratings yet
ML Unit 2 r20 Jntuk
34 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
1 3 2 RelationshipAmongClasses
No ratings yet
1 3 2 RelationshipAmongClasses
10 pages
DM - Lecture 4
No ratings yet
DM - Lecture 4
65 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
ML Unit 3 r20 Jntuk
No ratings yet
ML Unit 3 r20 Jntuk
22 pages
Classification Problems
No ratings yet
Classification Problems
53 pages
Lec. 14 - Bartlett P. Boosting The Margin - A New Explanation For The Effectiveness of Voting Methods. (1998)
No ratings yet
Lec. 14 - Bartlett P. Boosting The Margin - A New Explanation For The Effectiveness of Voting Methods. (1998)
36 pages
Coupling User v8p4
No ratings yet
Coupling User v8p4
46 pages
Oakridge International School: Student Grade Report System
No ratings yet
Oakridge International School: Student Grade Report System
29 pages
Osy Unit 2
No ratings yet
Osy Unit 2
16 pages
Lec5 Boosting v2.7 1
No ratings yet
Lec5 Boosting v2.7 1
46 pages
Data Structures Lab Manual
No ratings yet
Data Structures Lab Manual
41 pages
Last PPR
No ratings yet
Last PPR
28 pages
Boosted Tree
No ratings yet
Boosted Tree
41 pages
CS326 Report
No ratings yet
CS326 Report
36 pages
Oss - Unit 1
No ratings yet
Oss - Unit 1
25 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Boosting and Additive Tree
No ratings yet
Boosting and Additive Tree
26 pages
Report Mini Project
No ratings yet
Report Mini Project
12 pages
Module 3.5 Ensemble Learning XGBoost
No ratings yet
Module 3.5 Ensemble Learning XGBoost
26 pages
Trees, Boosting, and Random Forest
No ratings yet
Trees, Boosting, and Random Forest
14 pages
Large Scale Machine Learning With Python - XGBOOST - P236
No ratings yet
Large Scale Machine Learning With Python - XGBOOST - P236
19 pages
Mock Test AI 11 July 2021
No ratings yet
Mock Test AI 11 July 2021
26 pages
Thesis Final Version Julian Van Erk
No ratings yet
Thesis Final Version Julian Van Erk
30 pages
Session 10 - Ensemble Methods (XGBoost)
No ratings yet
Session 10 - Ensemble Methods (XGBoost)
37 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Lec 29
No ratings yet
Lec 29
33 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
09 EnsembleLearning
No ratings yet
09 EnsembleLearning
36 pages
Datagiri: Presented 17 November By: Himanshu Shrivastava
No ratings yet
Datagiri: Presented 17 November By: Himanshu Shrivastava
17 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
MATLAB
No ratings yet
MATLAB
17 pages
00 MidtermFall2016-17Solution
No ratings yet
00 MidtermFall2016-17Solution
8 pages
ML Notes
No ratings yet
ML Notes
15 pages
2025 Ensemble Learning
No ratings yet
2025 Ensemble Learning
25 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Ensemble (v6)
No ratings yet
Ensemble (v6)
45 pages
LECTURE+NOTES Boosting
No ratings yet
LECTURE+NOTES Boosting
8 pages
Learning Nonlinear Functions Using Regularized Greedy Forest
No ratings yet
Learning Nonlinear Functions Using Regularized Greedy Forest
24 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
Gradient Boosting
No ratings yet
Gradient Boosting
20 pages
Gradient Boosted Trees: Dr. Geetha Kuntoji
No ratings yet
Gradient Boosted Trees: Dr. Geetha Kuntoji
24 pages
Gradient Boosting
No ratings yet
Gradient Boosting
17 pages
Ensemble Machine Learning Approach
No ratings yet
Ensemble Machine Learning Approach
13 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
(Type The Document Title) : LAB # 4 (A)
No ratings yet
(Type The Document Title) : LAB # 4 (A)
10 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
PXC 3887798
No ratings yet
PXC 3887798
6 pages
Al3451 Ia 2 Answer Key
No ratings yet
Al3451 Ia 2 Answer Key
12 pages
Solid Project
No ratings yet
Solid Project
9 pages
Predicting Uniaxial Compressive Strength of Rocks Using Simple Test Data
No ratings yet
Predicting Uniaxial Compressive Strength of Rocks Using Simple Test Data
10 pages
SAP User Authorization Audit and Explanation
No ratings yet
SAP User Authorization Audit and Explanation
11 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Function Module in ABAP
No ratings yet
Function Module in ABAP
9 pages
16 Boosting
No ratings yet
16 Boosting
7 pages
GRADIENTBOOSTING
No ratings yet
GRADIENTBOOSTING
6 pages
Cryptography
No ratings yet
Cryptography
6 pages
ML Minors Exp8
No ratings yet
ML Minors Exp8
6 pages
VL2019201000936 Da PDF
No ratings yet
VL2019201000936 Da PDF
2 pages
Boosting Reduces Bias
No ratings yet
Boosting Reduces Bias
7 pages
Unit 3 by GPT
No ratings yet
Unit 3 by GPT
10 pages
Migrating Your SQL Server Workloads To PostgreSQL - Part 3 - CodeProject
No ratings yet
Migrating Your SQL Server Workloads To PostgreSQL - Part 3 - CodeProject
6 pages
Baysian Final
No ratings yet
Baysian Final
7 pages
Main Logs
No ratings yet
Main Logs
5 pages
Boosting
No ratings yet
Boosting
2 pages
Task Management Architecture
No ratings yet
Task Management Architecture
2 pages
Task Management Full Documentation
No ratings yet
Task Management Full Documentation
2 pages
W7 Lab
No ratings yet
W7 Lab
3 pages
CP 4
No ratings yet
CP 4
2 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Rodps - Sapi012 Error During Delta Extraction: Datasource 0fi - Aa - 11
No ratings yet
Rodps - Sapi012 Error During Delta Extraction: Datasource 0fi - Aa - 11
2 pages
Amul Agrawal
No ratings yet
Amul Agrawal
1 page