Ensemble - Part 1
Ensemble - Part 1
Boosting
Boosting
Bootstrap Aggregation (bagging)
• Bootstrap Aggregation (bagging) is an ensemble method that
attempts to resolve overfitting for classification or regression
problems.
• Random Forest leverages a multitude of decision trees to create a ‘forest’ that is more accurate and
robust than its individual components.
• Each decision tree in the Random Forest makes decisions based on the given data. When these trees
work together, they form a ‘forest’ that can handle complex problems much better than any single tree
could.
• The ‘forest’ created by the Random Forest algorithm is trained through Bagging. Bagging aims to reduce
the complexity of models that overfit the training data.
• Bagging consists of aggregation and bootstrapping; combining the words gives us ‘Bagging.’
Bootstrapping is a sampling method where we choose a sample out of a set using the replacement
method, making the selection procedure completely random.
Random
Random Forest Forest for
for Regression
Classification
Boosting
• Weak Learners: Simple models that perform slightly better than
random guessing.
• Sequential Training: Models are trained one after the other, with each
model learning
•from the mistakes of the previous ones.
• Weight Adjustment: Instances that are misclassified by a model are
assigned higher weights,
•so that subsequent models focus more on these difficult cases.
AdaBoost (Adaptive Boosting)
Algorithm
1. Initialize Weights: Assign equal weights to all training samples.
2. Train Weak Learner: Train a weak learner on the weighted training set.
3. Calculate Error: Calculate the weighted error rate of the weak learner.
4. Update Weights: Increase the weights of misclassified samples and decrease the weights of correctly classified
samples.
5. Repeat: Repeat steps 2-4 for a specified number of iterations or until a stopping criterion is met.
6. Combine Weak Learners: Combine the weak learners into a strong learner by taking a weighted majority vote.
Working
• AdaBoost starts with a simple model and iteratively improves it by focusing on the samples that the previous
model misclassified.
• It assigns higher weights to these misclassified samples, ensuring that subsequent models pay more attention to
them.
• The final prediction is a weighted combination of the predictions from all weak learners.
Gradient Boosting
Algorithm
1.Initialize Model: Initialize the model with a constant value.
2.Calculate Residuals: Calculate the residuals (errors) between the current model's
predictions and the true labels.
3.Train Weak Learner: Train a weak learner to predict the residuals.
4.Update Model: Update the model by adding a scaled version of the weak learner's
predictions.
5.Repeat: Repeat steps 2-4 for a specified number of iterations or until a stopping criterion
is met.
Working
• Gradient Boosting treats model training as an optimization problem, aiming to minimize a
loss function.
• Each weak learner is trained to predict the residuals of the previous model.
• The final prediction is the sum of the predictions from all weak learners.