33 - Assignment 7 - Implementation of Ensemble Techniques
33 - Assignment 7 - Implementation of Ensemble Techniques
techniques
Aim:
Select appropriate dataset for classification/regression problem and
implement various ensemble techniques like:
1. Stacking
2. blending
3. Random forest and compare their performances.
Theory:
Ensemble Techniques
There are several main techniques used in ensemble learning:
1. Bagging (Bootstrap Aggregating)
● Creates diversity by generating random samples from the training
data and fitting the same model to each sample.
● Produces a "homogeneous parallel ensemble" of models of the same
type.
● Examples include Random Forests which extend bagging with
decision trees.
2. Boosting
● Follows an iterative process, sequentially training each model on the
errors of the previous model.
● Produces an additive model to progressively reduce the final errors.
● Examples include AdaBoost and Gradient Boosting.
3. Stacking/Blending
● Combines different base models, each trained independently to be
diverse.
● Produces a "heterogeneous parallel ensemble" of different model
types.
● Combines the base models using a meta-model trained on their
outputs.
Other ensemble techniques include:
● Majority voting for classification
● Averaging predictions for regression
● Weighted averaging based on model performance
Applications
Ensemble learning has been successfully applied to a wide range of
machine learning tasks including:
● Classification
● Regression
● Clustering
● Anomaly detection
● Structured prediction
It is particularly effective for improving model performance on noisy,
complex or imbalanced datasets. Ensemble methods are widely used in
areas like random forests, gradient boosting, and stacking models.
Stacking
Stacking, or stacked generalization, is an ensemble learning technique that
combines the predictions of multiple base models (also known as level 0
models) to improve predictive performance. Here’s how it works:
1. Base Models: Different machine learning algorithms are trained on
the same dataset. These models can be of different types (e.g.,
decision trees, support vector machines, etc.) to ensure diversity.
2. Meta-Model: A second-level model, called the meta-model (or level 1
model), is trained on the outputs (predictions) of the base models.
The meta-model learns how to best combine these predictions to
produce a final output.
3. Training Process: Typically, k-fold cross-validation is used to generate
predictions from the base models. Each base model is trained on k-1
folds and validated on the remaining fold, ensuring that the
meta-model is trained on predictions that are not biased by the
training data.
4. Final Prediction: Once the meta-model is trained, it can be used to
make predictions on new data by combining the predictions from the
base models.
Stacking is advantageous because it allows the meta-model to learn the
best way to combine the strengths of various models, potentially leading to
improved accuracy over any single model used alone.
Blending
Blending is a variation of stacking that simplifies the process by using a
holdout validation set instead of k-fold cross-validation. Here’s how
blending differs from stacking:
1. Training and Validation Split: In blending, the training dataset is split
into two parts: a training set and a validation set. The base models
are trained on the training set.
2. Predictions on Validation Set: Each base model makes predictions on
the validation set. These predictions are then used as features to
train the meta-model.
3. Final Prediction: The meta-model is trained on these predictions and
is then used to make predictions on the test dataset.
Blending is generally faster than stacking because it does not require the
computational overhead of k-fold cross-validation. However, it may be less
robust due to the potential for overfitting on the validation set, especially if
the dataset is small.
Random Forest
Random Forest is an ensemble learning method specifically designed for
classification and regression tasks. It is a type of bagging technique that
uses decision trees as its base learners. Here’s how it works:
1. Bootstrap Sampling: Random Forest creates multiple subsets of the
training data through bootstrapping (random sampling with
replacement). Each subset is used to train a separate decision tree.
2. Feature Randomness: When splitting nodes in each decision tree,
Random Forest randomly selects a subset of features rather than
considering all features. This introduces additional diversity among
the trees.
3. Aggregation: For classification tasks, the final prediction is made
through majority voting among the trees. For regression tasks, the
average of the predictions from all trees is taken.
Random Forest is robust against overfitting due to its ensemble nature and
the randomness introduced in both data sampling and feature selection. It
generally performs well on many datasets and is less sensitive to
hyperparameter tuning compared to other algorithms.
Trained on Trained on
Aggregates predictions
Final Model predictions of base predictions of base
from multiple trees
models models
More
Less
computationally More straightforward;
Complexity computationally
intensive due to less tuning required
intensive
cross-validation
Conclusion:
The differences in model performance can be explained by how each
ensemble method leverages the strengths of its base learners and how
sensitive each method is to hyperparameter settings. Blending, with its
optimal tuning, was able to best capitalize on the strengths of its base
models, while Random Forest’s inherent robustness allowed it to perform
well out of the box. Stacking, however, struggled due to potential
mismatches between its base and meta-models.