Boosting
Boosting
1. What is Boosting?
Boosting is an algorithm that helps in reducing variance and bias in a machine learning
ensemble. The algorithm helps in the conversion of weak learners into strong learners by
combining N number of learners.
Boosting also can improve model predictions for learning algorithms. The weak learners are
sequentially corrected by their predecessors, and, in the process, they are converted into
strong learners.
Forms of Boosting
Adaboost aims at combining several weak learners to form a single strong learner. Adaboost
concentrates on weak learners, which are often decision trees with only one split and are
commonly referred to as decision stumps. The first decision stump in Adaboost contains
observations that are weighted equally.
Previous errors are corrected, and any observations that were classified incorrectly are
assigned more weight than other observations that had no error in classification. Algorithms
from Adaboost are popularly used in regression and classification procedures. An error
noticed in previous models is adjusted with weighting until an accurate predictor is made.
B. Gradient Boosting
Gradient boosting, just like any other ensemble machine learning procedure, sequentially
adds predictors to the ensemble and follows the sequence in correcting preceding
predictors to arrive at an accurate predictor at the end of the procedure. Adaboost corrects
its previous errors by tuning the weights for every incorrect observation in every iteration.
Still, gradient boosting aims at fitting a new predictor in the residual errors committed by
the preceding predictor.
Gradient boosting utilizes the gradient descent to pinpoint the challenges in the learners’
predictions used previously. The previous error is highlighted, and by combining one weak
learner to the next learner, the error is reduced significantly over time.
XGBoostimg implements decision trees with boosted gradient, enhanced performance, and
speed. The implementation of gradient boosted machines is relatively slow due to the
model training that must follow a sequence. They, therefore, lack scalability due to their
slowness.
XGBoost provides parallelization in tree building through the use of the CPU cores during
training. It also distributes computing when it is training large models using machine
clusters. Out-of-core computing is utilized for larger data sets that can’t fit in the
conventional memory size. Cache optimization is also utilized for algorithms and data
structures to optimize the use of available hardware.
Pros and Cons of Boosting
One disadvantage of boosting is that it is sensitive to outliers since every classifier is obliged
to fix the errors in the predecessors. Thus, the method is too dependent on outliers.
Another disadvantage is that the method is almost impossible to scale up. This is because
every estimator bases its correctness on the previous predictors, thus making the procedure
difficult to streamline.
Option trees are the substitutes for decision trees. They represent ensemble classifiers
while deriving a single structure. The difference between option trees and decision trees is
that the former includes both option nodes and decision nodes, while the latter includes
decision nodes only.
The classification of an instance requires filtering it down through the tree. A decision node
is required to choose one of the branches, whereas an option node must take the entire
group of branches. This means that, with an option node, one ends up with multiple leaves
that would require being combined into one classification to end up with a prediction.
Therefore, voting is required in the process, where a majority vote means that the node’s
been selected as the prediction for that process.
The above process makes it clear that the option nodes should not come with two options
since they will end up losing the vote if they cannot find a definite winner. The other
possibility is taking the average of probability estimates from various paths by following
approaches such as the Bayesian approach or non-weighted method of averages.
Option trees can also be developed from modifying existing decision tree learners or
creating an option node where several splits are correlated. Every decision tree within an
allowable tolerance level can be converted into option trees.
XGBoost is a popular gradient boosting algorithm. It uses weak regression trees as weak
learners. The algorithm also does cross-validation and computes the feature importance.
Furthermore, it accepts sparse input data.
5.What is gradient boosting vs boosting?
We already know that gradient boosting is a boosting technique. Let us see how the term
'gradient' is related here. Gradient boosting re-defines boosting as a numerical optimisation
problem where the objective is to minimise the loss function of the model by adding weak
learners using gradient descent.
All these rules help us identify whether an image is a Dog or a cat, however, if we were to
classify an image based on an individual (single) rule, the prediction would be flawed. Each
of these rules, individually, are called weak learners because these rules are not strong
enough to classify an image as a cat or dog.
Therefore, to make sure that our prediction is more accurate, we can combine the
prediction from each of these weak learners by using the majority rule or weighted average.
This makes a strong learner model.
In the above example, we have defined 5 weak learners and the majority of these rules (i.e.
3 out of 5 learners predict the image as a cat) gives us the prediction that the image is a cat.
Therefore, our final output is a cat.
7. What Is Boosting?
Boosting is an ensemble learning technique that uses a set of Machine Learning algorithms
to convert weak learner to strong learners in order to increase the accuracy of the model.
What Is Boosting – Boosting Machine Learning
Like I mentioned Boosting is an ensemble learning method, but what exactly is ensemble
learning?
Below I have also discussed the difference between Boosting and Bagging.
Boosting vs Bagging
Ensemble learning can be performed in two ways:
1. Sequential ensemble, popularly known as boosting, here the weak learners are
sequentially produced during the training phase. The performance of the model is
improved by assigning a higher weightage to the previous, incorrectly classified
samples. An example of boosting is the AdaBoost algorithm.
2. Parallel ensemble, popularly known as bagging, here the weak learners are
produced parallelly during the training phase. The performance of the model can be
increased by parallelly training a number of weak learners on bootstrapped data
sets. An example of bagging is the Random Forest algorithm.
Step 1: The base algorithm reads the data and assigns equal weight to each sample
observation.
Step 2: False predictions made by the base learner are identified. In the next iteration, these
false predictions are assigned to the next base learner with a higher weightage on these
incorrect predictions.
Step 3: Repeat step 2 until the algorithm can correctly classify the output.
• Build another weak learner aimed at fixing the errors of the previous learner.
• The final model is obtained by weighting the mean of all weak learners.
Boosting Algorithms
Let’s look at some algorithms that are based on the boosting framework that we have just
discussed.
AdaBoost
AdaBoost works by fitting one weak learner after the other. In subsequent fits, it gives more
weight to incorrect predictions and less weight to correct predictions. In this way, the
models learn to make predictions for the difficult classes. The final predictions are obtained
by weighing the majority class or sum. The learning rate controls the contribution of each
default, the algorithm uses decision trees as the base estimator. In this case, a
the fit will be done with incorrectly predicted instances given more weight.
from sklearn.ensemble import DecisionTreeClassifier, AdaBoostClassifier
model = AdaBoostClassifier(
DecisionTreeClassifier(),
n_estimators=100,
learning_rate=1.0)
model.fit(X_train, y_train)
predictions = rad.predict(X_test)
To improve the performance of the model, the number of estimators, the parameters of the
base estimator, and the learning rate should be tuned. For example, you can tune the
Once training is complete, the impurity-based feature importances are obtained via the
`feature_importances_` attribute.
Gradient tree boosting is an additive ensemble learning approach that uses decision trees as
weak learners. Additive means that the trees are added one after the other. Previous trees
remain unchanged. When adding subsequent trees, gradient descent is used to minimize
the loss.
model.fit(X_train, y_train)
model.score(X_test,y_test)
XGBoost is a popular gradient boosting algorithm. It uses weak regression trees as weak
learners. The algorithm also does cross-validation and computes the feature
XGBoost offers the DMatrix data structure that improves its performance and efficiency.
xg_reg.fit(X_train,y_train)
preds = xg_reg.predict(X_test)<
xgb.plot_importance(xg_reg)
plt.rcParams['figure.figsize'] = [5, 5]
plt.show()
LightGBM
LightGBM is a tree-based gradient boosting algorithm that uses leaf-wise tree growth and
The algorithm can be used for classification and regression problems. LightGBM supports
categorical features via the `categorical_feature` argument. One-hot encoding is not needed
after specifying the categorical columns.
The LightGBM algorithm also has the capacity to deal with null values. This feature can be
disabled by setting `use_missing=false`. It uses NA to represent null values. To use zeros set
`zero_as_missing=true.`
The objective parameter is used to dictate the type of problem. For example, `binary` for
binary classification, `regression` for regression and `multiclass` for multiclass problems.
When using LightGM, you’ll usually first convert the data into the LightGBM Daset format.
import lightgbm as lgb
LightGBM also allows you to specify the boosting type. Available options include random
'objective': 'binary',
'num_leaves': 40,
'learning_rate': 0.1,
'feature_fraction': 0.9
gbm = lgb.train(params,
lgb_train,
num_boost_round=200,
valid_sets=[lgb_train, lgb_eval],
valid_names=['train','valid'],
CatBoost
balanced tree is grown using oblivious trees. In these types of trees the same feature is used
when making right and left splits at each level of the tree.
Using the same feature to make left and right splits.
Like LightGBM, CatBoost supports categorical features, training on GPUs, and handling of
null values. CatBoost can be used for regression and classification problems.
cat = CatBoostRegressor()
cat.fit(X_train,y_train,verbose=False, plot=True)