Classification Through Ensembling Techniques
Classification Through Ensembling Techniques
A: You may ask one of your friends to rate the movie for you.
Now it’s entirely possible that the person you have chosen loves you very much and
doesn’t want to break your heart by providing a 1-star rating to the horrible work you
have created.
Max Voting:
The max voting method is generally used for classification problems.
In this technique, multiple models are used to make predictions for each data
point. The predictions by each model are considered as a ‘vote’.
The predictions which we get from the majority of the models are used as the final
prediction.
For example, when you asked 5 of your colleagues to rate your movie (out of 5); we’ll
assume three of them rated it as 4 while two of them gave it a 5. Since the majority gave
a rating of 4, the final rating will be taken as 4. You can consider this as taking the mode
of all the predictions.
Averaging:
Similar to the max voting technique, multiple predictions are made for each data
point in averaging.
In this method, we take an average of predictions from all the models and use it to
make the final prediction.
Averaging can be used for making predictions in regression problems or while
calculating probabilities for classification problems.
For example, in the below case, the averaging method would take the average of all the
values.
i.e. (5+4+5+4+4)/5 = 4.4
Stacking:
Stacking is an ensemble learning technique that uses predictions from multiple models
(for example decision tree, knn or svm) to build a new model. This model is used for
making predictions on the test set. Below is a step-wise explanation for a simple stacked
ensemble:
1. The train set is split into 10 parts.
2. A base model (suppose a decision tree) is fitted on 9 parts and predictions are
made for the 10th part. This is done for each part of the train set.
3. The base model (in this case, decision tree) is then fitted on the whole train
dataset.
4. Using this model, predictions are made on the test set.
5. Steps 2 to 4 are repeated for another base model (say knn) resulting in another
set of predictions for the train set and test set.
6. The predictions from the train set are used as features to build a new model.
7. This model is used to make final predictions on the test prediction set.
Blending
Blending follows the same approach as stacking but uses only a holdout (validation) set
from the train set to make predictions. In other words, unlike stacking, the predictions
are made on the holdout set only. The holdout set and the predictions are used to build a
model which is run on the test set. Here is a detailed explanation of the blending process:
1. The train set is split into training and validation sets
4. The validation set and its predictions are used as features to build a new model.
5. This model is used to make final predictions on the test and meta-features.
Bagging:
The idea behind bagging is combining the results of multiple models (for instance, all
decision trees) to get a generalized result.
Here’s a question: If you create all the models on the same set of data and combine it,
will it be useful? There is a high chance that these models will give the same result since
they are getting the same input. So how can we solve this problem? One of the techniques
is bootstrapping.
Bootstrapping is a sampling technique in which we create subsets of observations from
the original dataset, with replacement. The size of the subsets is the same as the size of
the original set.
Bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to get a fair
idea of the distribution (complete set). The size of subsets created for bagging may be less
than the original set.
1. Multiple subsets are created from the original dataset, selecting observations with
replacement.
2. A base model (weak model) is created on each of these subsets.
3. The models run in parallel and are independent of each other.
4. The final predictions are determined by combining the predictions from all the
models
Boosting:
If a data point is incorrectly predicted by the first model, and then the next (probably all
models), will combining the predictions provide better results? Such situations are taken
care of by boosting.
Boosting is a sequential process, where each subsequent model attempts to correct the
errors of the previous model. The succeeding models are dependent on the previous
model. Let’s understand the way boosting works in the below steps.
1. A subset is created from the original dataset.
2. Initially, all data points are given equal weights.
3. A base model is created on this subset.
4. This model is used to make predictions on the whole dataset
1. Errors are calculated using the actual values and predicted values.
2. The observations which are incorrectly predicted, are given higher weights.
3. (Here, the three misclassified blue-plus points will be given higher weights)
4. Another model is created and predictions are made on the dataset.
5. (This model tries to correct the errors from the previous model)
1. Similarly, multiple models are created, each correcting the errors of the previous
model.
2. The final model (strong learner) is the weighted mean of all the models (weak
learners)
Thus, the boosting algorithm combines a number of weak learners to form a strong
learner. The individual models would not perform well on the entire dataset, but they
work well for some part of the dataset. Thus, each model actually boosts the performance
of the ensemble.
This combination of multiple models is called Ensemble. Ensemble uses two methods:
Bagging: Creating a different training subset from sample training data with replacement
is called Bagging. The final output is based on majority voting.
Boosting: Combing weak learners into strong learners by creating sequential models
such that the final model has the highest accuracy is called Boosting. Example: ADA
BOOST, XG BOOST.
Bagging: From the principle mentioned above, we can understand Random forest uses
the Bagging code. Now, let us understand this concept in detail. Bagging is also known as
Bootstrap Aggregation used by random forest. The process begins with any original
random data. After arranging, it is organised into samples known as Bootstrap Sample.
This process is known as Bootstrapping. Further, the models are trained individually,
yielding different results known as Aggregation. In the last step, all the results are
combined, and the generated output is based on majority voting. This step is known as
Bagging and is done using an Ensemble Classifier.
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset
is given to the Random forest classifier. The dataset is divided into subsets and given to
each decision tree. During the training phase, each decision tree produces a prediction
result, and when a new data point occurs, then based on the majority of results, the
Random Forest classifier predicts the final decision. Consider the below image: