Ensemble

The document discusses ensemble learning techniques for reducing errors in machine learning models. It describes bagging and boosting, two common ensemble methods. Bagging trains models on random subsets of data and combines their predictions by averaging. Boosting iteratively reweights training data based on previous models' errors and combines weighted predictions.

Uploaded by

Hriday Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views14 pages

Ensemble

Uploaded by

Hriday Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Ensemble

Generalization Error
• Components
• Variance: Small test data (50, 50)
– how much models estimated from different training sets
differ from eachother
• Bias: how much the average model over all training sets differ
from the truemodel?
• Error due to inaccurateassumptions/simplifications/restrictions
made by the model
Why Errors
• Errors in learning are caused by:
– Limited representation (representation bias)
– Limited search (search bias)
– Limited data (variance)
– Limited features (noise)
Bias Variance Train error Test error
Underfitting H L H H
Overfitting L H L H
Underfitting and Overfitting
• Underfitting: model is too “simple” torepresent the
relevant classcharacteristics
– High bias and low variance
– High training error and high testerror
• Overfitting: model is too “complex” and fits
irrelevant characteristics (noise) in the data
– Low bias and high variance
– Low training error and high testerror
low bias error and low variance error solution: Ensemble
• Ensembles uses multiple trained (high variance/low bias) models to average out the variance,
leaving just the bias
multiple learning algorithms (classifiers) use different
– Algorithms
– Hyperparameters
– Representations (Modalities)
– Training sets

• Combine the decisions

• more accurate than the individual classifiers
• Generate a group of base-learners
Ensemble Creation Approaches
• Get less correlated errors between models
– Injecting randomness
• initial weights (eg, NN), different learning parameters, different splits (eg, DT) etc.
– Different Training sets
• Bagging, Boosting, different features, etc.
– Forcing differences
• different objective functions
– Different machine learning model
Unweighted Voting (e.g. Bagging)
• Weighted voting – based on accuracy (e.g. Boosting), Expertise, etc.
• Stacking - Learn the combination function
Bagging
• Bagging = “bootstrap aggregation”: Decision trees, neural networks, and support vector machines.
– Bagging an ensemble learning technique that involves training multiple models using subsets of the original
training dataset, then combining their predictions to make a final prediction.
Bagging is particularly useful for reducing the variance of a model and improving its accuracy.
A set of models are trained on random subsets of the training data:
Draw N items from X with replacement
Each subset is created by randomly selecting observations with replacement from the original training data. This
process is known as bootstrapping.
Because each subset is created from the original data with replacement, some observations may appear multiple
times in a given subset, while others may not appear at all.
Once the models are trained on their respective subsets of data,
◦ they can be combined by taking the average of their predictions for regression tasks,
◦ or by taking the majority vote for classification tasks.
◦ This results in a final prediction that is more robust and accurate than that of any single model in the ensemble.
Example:
For a dataset with 1000 observations , build a decision tree to predict whether a customer will
purchase a product based on their personal information and past purchase history.
Instead of training a single decision tree on the entire dataset, we can create a bagging ensemble of
decision trees as follows:
1.Randomly select 500 observations (with replacement) from the original dataset.
2.Train a decision tree on this subset of data.
Repeat steps 1 and 2 to create a total of 10 decision trees.
To make a prediction for a new customer, feed their personal information and past purchase history
into each of the 10 decision trees.
Take the average of the 10 predictions to get the final prediction.
The benefit of this approach is that it reduces the variance of the individual decision trees, which can
help to prevent overfitting and improve the accuracy of the predictions. Additionally, since each tree
in the ensemble is trained on a slightly different subset of data, the ensemble is able to capture a
wider range of patterns and relationships in the data than any single decision tree.
Boosting: An iterative procedure, Adaptively
change distribution of training data
To improve the performance of a weak learner by combining it with multiple instances of itself, each time adjusting the weights of the training data to emphasize the
mistakes made by the previous models.
◦ The key idea behind boosting is to focus on the observations that are hardest to predict, and to assign them higher weights to ensure that subsequent models pay closer attention
to them.
◦ By iteratively re-weighting the training data and training new models, the algorithm is able to build a strong classifier that can accurately predict even the most challenging cases

1.Train a weak decision tree on the entire dataset.

2.Identify the observations that the weak learner misclassified and assign them higher weights for the next iteration of the model.
3.Train a new weak decision tree on the updated weights, with the goal of correctly classifying the misclassified observations from the previous model.
4.Repeat steps 2 and 3 for a set number of iterations, or until the model reaches a desired level of accuracy.
5.Combine the predictions of all the weak models to get the final prediction.

Initially, all N records are assigned equal weights

◦ Weights change at the end of boosting round
◦ On each iteration t:
◦ Weight each training example by how incorrectly it was classified
◦ Learn a hypothesis: ℎ𝑡
◦ A strength for this hypothesis: 𝛼𝑡

◦ AdaBoost (Adaptive Boosting),

AdaBoost:
Bagging applications:
segmentation of medical images to create multiple segmentation models, each trained on a different
subset of the data.
The predictions of these models are then combined to generate a final segmentation that is more
accurate and robust than the segmentation generated by a single model.
for the development of prediction models for disease diagnosis and prognosis. By building multiple
models on different subsets of the data, bagging can help in reducing the variance of the model and
improving the accuracy of the predictions.
Overall, bagging is a powerful technique that can be used in healthcare for various applications,
including medical image analysis, disease diagnosis, and prognosis.
Boosting applications:
diagnosis of cancer to develop models that can accurately predict the probability of malignancy
based on various features such as age, family history, and biopsy results.
Boosting can also be used in medical image analysis, For instance, in the segmentation of brain
tumors from MRI scans, boosting can be used to develop models that can accurately segment
the tumors from the surrounding healthy tissues.
Moreover, boosting can be used in the development of models for drug discovery and drug
design.
Conclusion:
Bagging involves building multiple models on different subsets of the training data and
combining their predictions to reduce the variance and improve the generalization of the model.
On the other hand, boosting focuses on iteratively building weak learners that focus on the
misclassified samples of the previous iterations and combining their predictions to reduce the
bias and improve the accuracy of the model.