0% found this document useful (0 votes)
17 views

Gradient Boosting in ML

Gradient Boosting is a powerful ensemble learning algorithm that sequentially trains weak learners to minimize the loss function using gradient descent. Unlike AdaBoost, it focuses on the residual errors of previous models rather than adjusting instance weights, and it can utilize various base learners like decision trees. The algorithm continues to add new models until a stopping criterion is met, with predictions being a combination of all models' outputs adjusted by a learning rate.

Uploaded by

Nilay Yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Gradient Boosting in ML

Gradient Boosting is a powerful ensemble learning algorithm that sequentially trains weak learners to minimize the loss function using gradient descent. Unlike AdaBoost, it focuses on the residual errors of previous models rather than adjusting instance weights, and it can utilize various base learners like decision trees. The algorithm continues to add new models until a stopping criterion is met, with predictions being a combination of all models' outputs adjusted by a learning rate.

Uploaded by

Nilay Yadav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Gradient Boosting in ML

Gradient Boosting is a popular boosting algorithm in machine learning used for classification and
regression tasks. Boosting is one kind of ensemble Learning method which trains the model
sequentially and each new model tries to correct the previous model. It combines several weak
learners into strong learners. There is two most popular boosting algorithm i.e

1. AdaBoost
2. Gradient Boosting

Gradient Boosting
Gradient Boosting is a powerful boosting algorithm that combines several weak learners into strong
learners, in which each new model is trained to minimize the loss function such as mean squared
error or cross-entropy of the previous model using gradient descent. In each iteration, the algorithm
computes the gradient of the loss function with respect to the predictions of the current ensemble
and then trains a new weak model to minimize this gradient. The predictions of the new model are
then added to the ensemble, and the process is repeated until a stopping criterion is met.
In contrast to AdaBoost, the weights of the training instances are not tweaked, instead, each
predictor is trained using the residual errors of the predecessor as labels. There is a technique
called the Gradient Boosted Trees whose base learner is CART (Classification and Regression
Trees). The below diagram explains how gradient-boosted trees are trained for regression
problems.

Gradient Boosted Trees for Regression

The ensemble consists of M trees. Tree1 is trained using the feature matrix X and the labels y. The
predictions labeled y1(hat) are used to determine the training set residual errors r1. Tree2 is then
trained using the feature matrix X and the residual errors r1 of Tree1 as labels. The predicted
results r1(hat) are then used to determine the residual r2. The process is repeated until all the M
trees forming the ensemble are trained. There is an important parameter used in this technique
known as Shrinkage. Shrinkage refers to the fact that the prediction of each tree in the ensemble is
shrunk after it is multiplied by the learning rate (eta) which ranges between 0 to 1. There is a trade-
off between eta and the number of estimators, decreasing learning rate needs to be compensated
with increasing estimators in order to reach certain model performance. Since all trees are trained
now, predictions can be made. Each tree predicts a label and the final prediction is given by the
formula,

y(pred) = y1 + (eta * r1) + (eta * r2) + ....... + (eta * rN)

Difference between Adaboost and Gradient Boosting

The difference between AdaBoost and gradient boosting are as follows:

AdaBoost Gradient Boosting

During each iteration in AdaBoost, the weights of incorrectly Gradient Boosting updates the weights by computing
classified samples are increased, so that the next weak learner the negative gradient of the loss function with respect to
focuses more on these samples. the predicted output.

Gradient Boosting can use a wide range of base learners,


AdaBoost uses simple decision trees with one split known as the
such as decision trees, and linear models.
decision stumps of weak learners.

AdaBoost is more susceptible to noise and outliers in the data, as it Gradient Boosting is generally more robust, as it updates
assigns high weights to misclassified samples
the weights based on the gradients, which are less
sensitive to outliers.
Gradient Boosting Algorithm

Step 1:
Let’s assume X, and Y are the input and target having N samples. Our goal is to learn the function
f(x) that maps the input features X to the target variables y. It is boosted trees i.e the sum of trees.
The loss function is the difference between the actual and the predicted variables.

Step 2: We want to minimize the loss function L( f) with respect to f.

If our gradient boosting algorithm is in M stages then To improve the the algorithm can add some
new estimator as having

Step 3: Steepest Descent


For M stage gradient boosting, The steepest Descent finds where is constant and
known as step length and is the gradient of loss function L(f)

Step 4: Solution
The gradient Similarly for M trees:

The current solution will be

You might also like