Boosting
Boosting
Boosting is a method used in machine learning to reduce errors in predictive data analysis. Data
scientists train machine learning software, called machine learning models, on labeled data to
make guesses about unlabelled data. A single machine learning model might make prediction
errors depending on the accuracy of the training dataset. For example, if a cat-identifying model
has been trained only on images of white cats, it may occasionally misidentify a black cat.
Boosting tries to overcome this issue by training multiple models sequentially to improve the
accuracy of the overall system.
Boosting improves machine models' predictive accuracy and performance by converting multiple
weak learners into a single strong learning model. Machine learning models can be weak learners
or strong learners:
Weak learners
Weak learners have low prediction accuracy, similar to random guessing. They are prone to
overfitting—that is, they can't classify data that varies too much from their original dataset. For
example, if you train the model to identify cats as animals with pointed ears, it might fail to
recognize a cat whose ears are curled.
Strong learners
Strong learners have higher prediction accuracy. Boosting converts a system of weak learners
into a single strong learning system. For example, to identify the cat image, it combines a weak
learner that guesses for pointy ears and another learner that guesses for cat-shaped eyes. After
analyzing the animal image for pointy ears, the system analyzes it once again for cat-shaped
eyes. This improves the system's overall accuracy.
Boosting creates an ensemble model by combining several weak decision trees sequentially. It
assigns weights to the output of individual trees. Then it gives incorrect classifications from the
first decision tree a higher weight and input to the next tree. After numerous cycles, the boosting
method combines these weak rules into a single powerful prediction rule.
Boosting compared to bagging
Boosting and bagging are the two common ensemble methods that improve prediction accuracy.
The main difference between these learning methods is the method of training. In bagging, data
scientists improve the accuracy of weak learners by training several of them at once on multiple
datasets. In contrast, boosting trains weak learners one after another.
The boosting algorithm assigns equal weight to each data sample. It feeds the data to the first
machine model, called the base algorithm. The base algorithm makes predictions for each data
sample.
Step 2
The boosting algorithm assesses model predictions and increases the weight of samples with a
more significant error. It also assigns a weight based on model performance. A model that
outputs excellent predictions will have a high amount of influence over the final decision.
Step 3
The algorithm passes the weighted data to the next decision tree.
Step 4
The algorithm repeats steps 2 and 3 until instances of training errors are below a certain
threshold.
What are the types of boosting?
The following are the three main types of boosting:
Adaptive boosting
Adaptive Boosting (AdaBoost) was one of the earliest boosting models developed. It adapts and
tries to self-correct in every iteration of the boosting process.
AdaBoost initially gives the same weight to each dataset. Then, it automatically adjusts the
weights of the data points after every decision tree. It gives more weight to incorrectly classified
items to correct them for the next round. It repeats the process until the residual error, or the
difference between actual and predicted values, falls below an acceptable threshold.
You can use AdaBoost with many predictors, and it is typically not as sensitive as other boosting
algorithms. This approach does not work well when there is a correlation among features or high
data dimensionality. Overall, AdaBoost is a suitable type of boosting for classification problems.
Gradient boosting
Gradient Boosting (GB) is similar to AdaBoost in that it, too, is a sequential training technique.
The difference between AdaBoost and GB is that GB does not give incorrectly classified items
more weight. Instead, GB software optimizes the loss function by generating base learners
sequentially so that the present base learner is always more effective than the previous one. This
method attempts to generate accurate results initially instead of correcting errors throughout the
process, like AdaBoost. For this reason, GB software can lead to more accurate results. Gradient
Boosting can help with both classification and regression-based problems.
Extreme Gradient Boosting (XGBoost) improves gradient boosting for computational speed and
scale in several ways. XGBoost uses multiple cores on the CPU so that learning can occur in
parallel during training. It is a boosting algorithm that can handle extensive datasets, making it
attractive for big data applications. The key features of XGBoost are parallelization, distributed
computing, cache optimization, and out-of-core processing.
What are the benefits of boosting?
Boosting offers the following major benefits:
Ease of implementation
Boosting has easy-to-understand and easy-to-interpret algorithms that learn from their mistakes.
These algorithms don't require any data preprocessing, and they have built-in routines to handle
missing data. In addition, most languages have built-in libraries to implement boosting
algorithms with many parameters that can fine-tune performance.
Reduction of bias
Computational efficiency
Boosting algorithms prioritize features that increase predictive accuracy during training. They
can help to reduce data attributes and handle large datasets efficiently.
Boosting models are vulnerable to outliers or data values that are different from the rest of the
dataset. Because each model attempts to correct the faults of its predecessor, outliers can skew
results significantly.
Real-time implementation
You might also find it challenging to use boosting for real-time implementation because the
algorithm is more complex than other processes. Boosting methods have high adaptability, so
you can use a wide variety of model parameters that immediately affect the model's performance.
Bagging Boosting
Various training data subsets are randomly drawn Each new subset contains the components that
with replacement from the whole training dataset. were misclassified by previous models.
Bagging attempts to tackle the over-fitting issue. Boosting tries to reduce bias.
If the classifier is unstable (high variance), then If the classifier is steady and straightforward
we need to apply bagging. (high bias), then we need to apply boosting.
Every model receives an equal weight. Models are weighted by their performance.
Objective to decrease variance, not bias. Objective to decrease bias, not variance.
It is the easiest way of connecting predictions that It is a way of connecting predictions that
belong to the same type. belong to the different types.
New models are affected by the performance
Every model is constructed independently.
of the previously developed model.