Bagging Boosting
Bagging Boosting
Bootstrapping
Bootstrapping represents creation of subsets from the main set of data points. The creation of subsets is
done by replacement.
Through bootstrapping, it is possible to create various subsets of data. Each subset of data is fed to one
decision tree in the random forest. Hence every tree receives different data.
We use bootstrapping to improve the accuracy of final prediction from all the trees.
In bagging, we create a number of subsets of data randomly with replacement. Creating the subsets is
done from training data.
Each sub set of data is fed to a model (or a decision tree) as training data. The model is trained with this
data and output is obtained. The mean or average value of all the outputs from these models will be the
final result (Y).
The main advantage of bagging is to reduce variance (or overfitting) of the model.
Nageswarao Datatechs
We first create subsets of data by randomly selecting data from training set, without replacement. Then
the data from subset is fed to a model. The model is tested using the data from training set. In this
phase, certain data points in the training set may be misclassified.
Now, create second subset of data from training set without replacement. Add 50% of previously
misclassified data points to this subset. Feed this data to a model and test the model.
In this way, repeat the steps with several subsets of data and observe how the misclassified data points
are classified by majority of models. Take majority vote. That means, in case of overall data, the
classification done by majority of models should be considered to arrive at accurate result.
Note: The models in the above discussion (in bagging and boosting) are called ensemble learners. These
models may represent trees in the random forest or they may be various types of models like linear
regression, svm, logistic regression etc. Ensemble means ‘group’. In ensemble learning, it is possible
that individual models come together and bring forth a model that is more accurate.