0% found this document useful (0 votes)
75 views3 pages

Bagging Boosting

Bootstrapping creates subsets of data by replacement to improve the accuracy of predictions from decision trees in a random forest. Bagging also creates random subsets but averages the outputs to reduce variance. Boosting is an extension where subsets are selected without replacement and misclassified data is emphasized to arrive at more accurate classifications, though it is more prone to overfitting.

Uploaded by

Sudheer Redus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views3 pages

Bagging Boosting

Bootstrapping creates subsets of data by replacement to improve the accuracy of predictions from decision trees in a random forest. Bagging also creates random subsets but averages the outputs to reduce variance. Boosting is an extension where subsets are selected without replacement and misclassified data is emphasized to arrive at more accurate classifications, though it is more prone to overfitting.

Uploaded by

Sudheer Redus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Nageswarao Datatechs

BOOTSTRAPPING, BAGGING AND BOOSTING

Bootstrapping

Bootstrapping represents creation of subsets from the main set of data points. The creation of subsets is
done by replacement.

Through bootstrapping, it is possible to create various subsets of data. Each subset of data is fed to one
decision tree in the random forest. Hence every tree receives different data.

We use bootstrapping to improve the accuracy of final prediction from all the trees.

Bagging (Bootstrap Aggregating)

In bagging, we create a number of subsets of data randomly with replacement. Creating the subsets is
done from training data.

Each sub set of data is fed to a model (or a decision tree) as training data. The model is trained with this
data and output is obtained. The mean or average value of all the outputs from these models will be the
final result (Y).

The main advantage of bagging is to reduce variance (or overfitting) of the model.
Nageswarao Datatechs

Boosting (Ada Boost or Adaptive Boosting)

Boosting is an extension to Bagging.

We first create subsets of data by randomly selecting data from training set, without replacement. Then
the data from subset is fed to a model. The model is tested using the data from training set. In this
phase, certain data points in the training set may be misclassified.

Now, create second subset of data from training set without replacement. Add 50% of previously
misclassified data points to this subset. Feed this data to a model and test the model.

In this way, repeat the steps with several subsets of data and observe how the misclassified data points
are classified by majority of models. Take majority vote. That means, in case of overall data, the
classification done by majority of models should be considered to arrive at accurate result.

Boosting is useful to arrive at better accuracy. But it is more prone to overfitting.


Nageswarao Datatechs

Note: The models in the above discussion (in bagging and boosting) are called ensemble learners. These
models may represent trees in the random forest or they may be various types of models like linear
regression, svm, logistic regression etc. Ensemble means ‘group’. In ensemble learning, it is possible
that individual models come together and bring forth a model that is more accurate.

You might also like