Bagging
Bagging
1. Bootstrap Sampling:
o Bootstrap Sampling involves creating multiple new datasets (called
"bootstraps") from the original dataset by sampling with replacement. This
means you randomly select samples from the original data to create each
bootstrap dataset, and some data points might appear multiple times in the
same bootstrap dataset, while others might not appear at all.
2. Training Models:
o For each bootstrap dataset, a separate model (e.g., a decision tree) is trained. If
you create BBB bootstraps, you'll train BBB different models, each on its own
bootstrap dataset.
3. Making Predictions:
o For Classification: When you need to classify a new observation, you run this
observation through all BBB trained models. Each model makes a prediction,
and then a majority voting scheme is used to determine the final class label. If
there is a tie in the votes, the tie can be resolved arbitrarily.
o For Regression: For regression tasks, each model makes a prediction, and the
final prediction is the average of all these predictions.
4. Evaluating and Tuning:
o Bagging not only improves accuracy but also allows for the calculation of
standard errors and confidence intervals for the predictions, providing a
measure of uncertainty.
o The number of bootstraps BBB can be predetermined (e.g., set to 30) or
optimized using a separate validation dataset.
5. Why Bagging Works:
o Bagging is particularly effective with models that are unstable. Unstable
models are those where small changes in the data can lead to large changes in
the model (e.g., decision trees). By training multiple models on different
subsets of the data and aggregating their results, bagging reduces variance and
improves overall model performance.
1. Weighted Data:
o In boosting, each observation in the training data is assigned a weight.
Initially, these weights are usually uniform, meaning each observation has an
equal weight.
2. Iterative Process:
o Boosting works through multiple iterations or boosting rounds. In each
iteration, a new model is trained using the weighted data, and the weights are
updated based on the performance of the model.
3. Focus on Misclassified Cases:
o Observations that are misclassified by the current model are given higher
weights in the next iteration. This means the model will focus more on the
examples it previously got wrong.
4. Final Model:
o The final model is a weighted combination of all the individual models trained
during the boosting process.
1. Initialization:
o Given a dataset with n observations and N input features, the goal is to build a
collection of decision trees to make predictions.
2. Bootstrap Sampling:
o Bootstrap Sample: Given a dataset with n observations and N input features,
the goal is to build a collection of decision trees to make predictions.
o This means sampling n observations from the original dataset with
replacement. Some observations may be repeated, and others may be left out.
3. Building Each Tree:
o Random Feature Selection: For each node in the decision tree, randomly
select mmm features out of the total NNN features. The number mmm is
typically chosen beforehand and can be a constant like 1, 2, or the
recommended floor(log2(N)+1)\text{floor}(\log_2(N) + 1)floor(log2(N)+1).
o Best Split: Among the randomly chosen features, select the best one for
splitting the data at that node based on some criterion (e.g., Gini impurity or
information gain).
o Fully Grow Trees: Continue to grow each decision tree to its full extent
without pruning. This means each tree is allowed to develop its structure fully
without trimming any branches.
4. Making Predictions:
o Classification: For a new observation, each tree in the forest votes on the class
label. The final prediction is determined by majority voting among the trees.
o Regression: For regression tasks, the prediction is the average of the
predictions from all the trees in the forest.
Diversity Among Trees: The diversity of the trees is crucial for the effectiveness of
Random Forests. This diversity is achieved through:
o Bootstrap Sampling: Different trees are trained on different subsets of the
data.
o Random Feature Selection: Each tree makes splitting decisions based on a
random subset of features, reducing the correlation between trees.
Robustness and Accuracy: By averaging or voting across many trees, Random
Forests typically provide better predictive performance and are less prone to
overfitting compared to individual decision trees.