0% found this document useful (0 votes)
33 views12 pages

Boosting

Uploaded by

abdoalsenaweabdo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views12 pages

Boosting

Uploaded by

abdoalsenaweabdo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Boosting

1. What is Boosting?

Boosting is an algorithm that helps in reducing variance and bias in a machine learning
ensemble. The algorithm helps in the conversion of weak learners into strong learners by
combining N number of learners.

Source: Sirakorn [CC BY-SA]

Boosting also can improve model predictions for learning algorithms. The weak learners are
sequentially corrected by their predecessors, and, in the process, they are converted into
strong learners.

Forms of Boosting

Boosting can take several forms, including:

2. What are the different types of boosting?


Forms of Boosting
A. Adaptive Boosting (Adaboost) Adaboost aims at combining several weak learners to form
a single strong learner. ...
B. Gradient Boosting. ...
C. XGBoost (Extreme Gradient Boosting)
A. Adaptive Boosting (Adaboost)

Adaboost aims at combining several weak learners to form a single strong learner. Adaboost
concentrates on weak learners, which are often decision trees with only one split and are
commonly referred to as decision stumps. The first decision stump in Adaboost contains
observations that are weighted equally.

Previous errors are corrected, and any observations that were classified incorrectly are
assigned more weight than other observations that had no error in classification. Algorithms
from Adaboost are popularly used in regression and classification procedures. An error
noticed in previous models is adjusted with weighting until an accurate predictor is made.

B. Gradient Boosting

Gradient boosting, just like any other ensemble machine learning procedure, sequentially
adds predictors to the ensemble and follows the sequence in correcting preceding
predictors to arrive at an accurate predictor at the end of the procedure. Adaboost corrects
its previous errors by tuning the weights for every incorrect observation in every iteration.
Still, gradient boosting aims at fitting a new predictor in the residual errors committed by
the preceding predictor.

Gradient boosting utilizes the gradient descent to pinpoint the challenges in the learners’
predictions used previously. The previous error is highlighted, and by combining one weak
learner to the next learner, the error is reduced significantly over time.

C. XGBoost (Extreme Gradient Boosting)

XGBoostimg implements decision trees with boosted gradient, enhanced performance, and
speed. The implementation of gradient boosted machines is relatively slow due to the
model training that must follow a sequence. They, therefore, lack scalability due to their
slowness.

XGBoost is reliant on the performance of a model and computational speed. It provides


various benefits, such as parallelization, distributed computing, cache optimization, and out-
of-core computing.

XGBoost provides parallelization in tree building through the use of the CPU cores during
training. It also distributes computing when it is training large models using machine
clusters. Out-of-core computing is utilized for larger data sets that can’t fit in the
conventional memory size. Cache optimization is also utilized for algorithms and data
structures to optimize the use of available hardware.
Pros and Cons of Boosting

As an ensemble model, boosting comes with an easy-to-read and interpret algorithm,


making its prediction interpretations easy to handle. The prediction capability is efficient
through the use of its clone methods, such as bagging or random forest and decision trees.
Boosting is a resilient method that curbs over-fitting easily.

One disadvantage of boosting is that it is sensitive to outliers since every classifier is obliged
to fix the errors in the predecessors. Thus, the method is too dependent on outliers.
Another disadvantage is that the method is almost impossible to scale up. This is because
every estimator bases its correctness on the previous predictors, thus making the procedure
difficult to streamline.

3. What are Option Trees?

Option trees are the substitutes for decision trees. They represent ensemble classifiers
while deriving a single structure. The difference between option trees and decision trees is
that the former includes both option nodes and decision nodes, while the latter includes
decision nodes only.

The classification of an instance requires filtering it down through the tree. A decision node
is required to choose one of the branches, whereas an option node must take the entire
group of branches. This means that, with an option node, one ends up with multiple leaves
that would require being combined into one classification to end up with a prediction.
Therefore, voting is required in the process, where a majority vote means that the node’s
been selected as the prediction for that process.

The above process makes it clear that the option nodes should not come with two options
since they will end up losing the vote if they cannot find a definite winner. The other
possibility is taking the average of probability estimates from various paths by following
approaches such as the Bayesian approach or non-weighted method of averages.

Option trees can also be developed from modifying existing decision tree learners or
creating an option node where several splits are correlated. Every decision tree within an
allowable tolerance level can be converted into option trees.

4. Which boosting algorithm is best?


eXtreme Gradient Boosting - XGBoost

XGBoost is a popular gradient boosting algorithm. It uses weak regression trees as weak
learners. The algorithm also does cross-validation and computes the feature importance.
Furthermore, it accepts sparse input data.
5.What is gradient boosting vs boosting?
We already know that gradient boosting is a boosting technique. Let us see how the term
'gradient' is related here. Gradient boosting re-defines boosting as a numerical optimisation
problem where the objective is to minimise the loss function of the model by adding weak
learners using gradient descent.

6. Why Is Boosting Used?


To solve convoluted problems we require more advanced techniques. Let’s suppose that on
given a data set of images containing images of cats and dogs, you were asked to build a
model that can classify these images into two separate classes. Like every other person, you
will start by identifying the images by using some rules, like given below:

1. The image has pointy ears: Cat


2. The image has cat shaped eyes: Cat
3. The image has bigger limbs: Dog
4. The image has sharpened claws: Cat
5. The image has a wider mouth structure: Dog

All these rules help us identify whether an image is a Dog or a cat, however, if we were to
classify an image based on an individual (single) rule, the prediction would be flawed. Each
of these rules, individually, are called weak learners because these rules are not strong
enough to classify an image as a cat or dog.

Therefore, to make sure that our prediction is more accurate, we can combine the
prediction from each of these weak learners by using the majority rule or weighted average.
This makes a strong learner model.

In the above example, we have defined 5 weak learners and the majority of these rules (i.e.
3 out of 5 learners predict the image as a cat) gives us the prediction that the image is a cat.
Therefore, our final output is a cat.

So this brings us to the question,

7. What Is Boosting?
Boosting is an ensemble learning technique that uses a set of Machine Learning algorithms
to convert weak learner to strong learners in order to increase the accuracy of the model.
What Is Boosting – Boosting Machine Learning

Like I mentioned Boosting is an ensemble learning method, but what exactly is ensemble
learning?

8. What Is Ensemble In Machine Learning?


Ensemble learning is a method that is used to enhance the performance of Machine
Learning model by combining several learners. When compared to a single model, this type
of learning builds models with improved efficiency and accuracy. This is exactly why
ensemble methods are used to win market leading competitions such as the Netflix
recommendation competition, Kaggle competitions and so on.

What Is Ensemble Learning – Boosting Machine Learning

Below I have also discussed the difference between Boosting and Bagging.
Boosting vs Bagging
Ensemble learning can be performed in two ways:

1. Sequential ensemble, popularly known as boosting, here the weak learners are
sequentially produced during the training phase. The performance of the model is
improved by assigning a higher weightage to the previous, incorrectly classified
samples. An example of boosting is the AdaBoost algorithm.
2. Parallel ensemble, popularly known as bagging, here the weak learners are
produced parallelly during the training phase. The performance of the model can be
increased by parallelly training a number of weak learners on bootstrapped data
sets. An example of bagging is the Random Forest algorithm.

9. How Boosting Algorithm Works?


The basic principle behind the working of the boosting algorithm is to generate multiple
weak learners and combine their predictions to form one strong rule. These weak rules are
generated by applying base Machine Learning algorithms on different distributions of the
data set. These algorithms generate weak rules for each iteration. After multiple iterations,
the weak learners are combined to form a strong learner that will predict a more accurate
outcome.

How Does Boosting Algorithm Work – Boosting Machine Learning

Here’s how the algorithm works:

Step 1: The base algorithm reads the data and assigns equal weight to each sample
observation.

Step 2: False predictions made by the base learner are identified. In the next iteration, these
false predictions are assigned to the next base learner with a higher weightage on these
incorrect predictions.
Step 3: Repeat step 2 until the algorithm can correctly classify the output.

Therefore, the main aim of Boosting is to focus more on miss-classified predictions.

10. How does boosting work?

Generally, boosting works as follows:

• Create the initial weak learner.

• Use the weak learner to make predictions on the entire dataset.

• Compute the prediction errors.

• Incorrect predictions are assigned more weight.

• Build another weak learner aimed at fixing the errors of the previous learner.

• Make predictions on the whole dataset using the new learner.

• Repeat this process until the optimal results are obtained.

• The final model is obtained by weighting the mean of all weak learners.

Boosting Algorithms

Let’s look at some algorithms that are based on the boosting framework that we have just

discussed.

AdaBoost

AdaBoost works by fitting one weak learner after the other. In subsequent fits, it gives more

weight to incorrect predictions and less weight to correct predictions. In this way, the

models learn to make predictions for the difficult classes. The final predictions are obtained

by weighing the majority class or sum. The learning rate controls the contribution of each

weak learner to the final prediction. AdaBoost can be used for

both classification and regression problems.


Scikit-learn provides an AdaBoost implementation that you can start using immediately. By

default, the algorithm uses decision trees as the base estimator. In this case, a

`DecisionTreeClassifier` will be fitted on the entire dataset first. In subsequent iterations,

the fit will be done with incorrectly predicted instances given more weight.
from sklearn.ensemble import DecisionTreeClassifier, AdaBoostClassifier

model = AdaBoostClassifier(

DecisionTreeClassifier(),

n_estimators=100,

learning_rate=1.0)

model.fit(X_train, y_train)

predictions = rad.predict(X_test)

To improve the performance of the model, the number of estimators, the parameters of the

base estimator, and the learning rate should be tuned. For example, you can tune the

maximum depth of the decision tree classifier.

Once training is complete, the impurity-based feature importances are obtained via the

`feature_importances_` attribute.

Gradient tree boosting

Gradient tree boosting is an additive ensemble learning approach that uses decision trees as

weak learners. Additive means that the trees are added one after the other. Previous trees

remain unchanged. When adding subsequent trees, gradient descent is used to minimize

the loss.

The quickest way to build a gradient boosting model is to use Scikit-learn.


from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1,
random_state=0)

model.fit(X_train, y_train)

model.score(X_test,y_test)

The algorithm can be used for regression and classification problems.

eXtreme Gradient Boosting - XGBoost

XGBoost is a popular gradient boosting algorithm. It uses weak regression trees as weak

learners. The algorithm also does cross-validation and computes the feature

importance. Furthermore, it accepts sparse input data.

XGBoost offers the DMatrix data structure that improves its performance and efficiency.

XGBoost can be used in R, Java, C++, and Julia.


xg_reg = xgb.XGBRegressor(objective ='reg:linear', colsample_bytree = 0.3, learning_rate =
0.1, max_depth = 5, alpha = 10, n_estimators = 10)

xg_reg.fit(X_train,y_train)

preds = xg_reg.predict(X_test)<

XGBoost offers the feature importance via the `plot_importance()` function.


import matplotlib.pyplot as plt

xgb.plot_importance(xg_reg)

plt.rcParams['figure.figsize'] = [5, 5]

plt.show()

LightGBM

LightGBM is a tree-based gradient boosting algorithm that uses leaf-wise tree growth and

not depth-wise growth.


Leaf-wise tree growth.

The algorithm can be used for classification and regression problems. LightGBM supports
categorical features via the `categorical_feature` argument. One-hot encoding is not needed
after specifying the categorical columns.

The LightGBM algorithm also has the capacity to deal with null values. This feature can be

disabled by setting `use_missing=false`. It uses NA to represent null values. To use zeros set

`zero_as_missing=true.`

The objective parameter is used to dictate the type of problem. For example, `binary` for

binary classification, `regression` for regression and `multiclass` for multiclass problems.

When using LightGM, you’ll usually first convert the data into the LightGBM Daset format.
import lightgbm as lgb

lgb_train = lgb.Dataset(X_train, y_train)

lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

LightGBM also allows you to specify the boosting type. Available options include random

forests and traditional gradient boosting decision trees.


params = {'boosting_type': 'gbdt',

'objective': 'binary',
'num_leaves': 40,

'learning_rate': 0.1,

'feature_fraction': 0.9

gbm = lgb.train(params,

lgb_train,

num_boost_round=200,

valid_sets=[lgb_train, lgb_eval],

valid_names=['train','valid'],

CatBoost

CatBoost is a depth-wise gradient boosting library developed by Yandex. In CatBoost, a

balanced tree is grown using oblivious trees. In these types of trees the same feature is used

when making right and left splits at each level of the tree.
Using the same feature to make left and right splits.

Like LightGBM, CatBoost supports categorical features, training on GPUs, and handling of

null values. CatBoost can be used for regression and classification problems.

Setting `plot=true` while training visualizes the training process.


from catboost import CatBoostRegressor

cat = CatBoostRegressor()

cat.fit(X_train,y_train,verbose=False, plot=True)

You might also like