0% found this document useful (0 votes)
62 views8 pages

Bagging and Boosting in Data Mining: Carolina Ruiz

Bagging and boosting are two approaches to improve the stability and accuracy of models. Bagging creates multiple bootstrap replicates of the dataset and fits a model to each, then averages the predictions. Boosting iteratively reweights instances to focus on those misclassified by previous models, improving accuracy. Both methods are easy to implement and parallelizable. Bagging stabilizes unstable models while boosting explicitly improves classification performance.

Uploaded by

Rocking Ridz
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views8 pages

Bagging and Boosting in Data Mining: Carolina Ruiz

Bagging and boosting are two approaches to improve the stability and accuracy of models. Bagging creates multiple bootstrap replicates of the dataset and fits a model to each, then averages the predictions. Boosting iteratively reweights instances to focus on those misclassified by previous models, improving accuracy. Both methods are easy to implement and parallelizable. Bagging stabilizes unstable models while boosting explicitly improves classification performance.

Uploaded by

Rocking Ridz
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 8

Bagging and Boosting in Data Mining

Carolina Ruiz
[email protected] http://www.cs.wpi.edu/~ruiz

Motivation and Background

Problem Definition:

Given: a dataset of instances and a target concept Find: a model (e.g. set of association rules, decision tree, neural network) that helps in predicting the classification of unseen instances. The model should be stable (i.e. shouldnt depend too much on input data used to construct it) The model should be a good predictor (difficult to achieve when input dataset is small)

Difficulties:

Two Approaches

Bagging (Bootstrap Aggregating)

Leo Breiman, UC Berkeley

Boosting

Rob Schapire, ATT Research Jerry Friedman, Stanford U.

Bagging

Model Creation:

Create bootstrap replicates of the dataset and fit a model to each one Average/vote predictions of each model

Prediction:

Advantages

Stabilizes unstable methods Easy to implement, parallelizable.

Bagging Algorithm

1. Create k bootstrap replicates of the dataset 2. Fit a model to each of the replicates 3. Average/vote the predictions of the k models

Boosting

Creating the model:

Construct a sequence of datasets and models in such a way that a dataset in the sequence weights an instance heavily when the previous model has misclassified it.

Prediction:

Merge the models in the sequence


Improves classification accuracy

Advantages:

Generic Boosting Algorithm


1. Equally weight all instance in dataset 2. For I = 1 to T


2.1. Fit a model to current dataset 2.2. Upweight poorly predicted instances 2.3 Downweight well-predicted instances

3. Merge the models in the sequence to obtain the final model

Conclusions and References

Boosted nave Bayes tied for first place in KDD-cup 1997 Reference:

Combining Estimators to Improve Performance KDD-99 tutorial notes


John F. Elder Greg Ridgeway

You might also like