14 - Ensemble Methods
14 - Ensemble Methods
Ensemble Methods
ADF
1 05/12/2025
Outline
Decision Trees
Random Forest
Intuitive to understand
Bias-Variance Tradeoff
– Bias: Representation Power of Decision Trees
– Variance: require a sample size exponential in depth
Original Dataset
1 2
2 1
3 4
4 4
Bootstrapped Dataset
Bootstrapped Dataset
Bootstrapped Dataset
Bootstrapped Dataset
Repeat for the next Step,
Randomly select feature from the
remaining set
Bootstrapped Dataset
Build the tree as usual, but only
considering a random subset of
feature at each step
YES
1 2
2 1
3 4
4 4
Improve accuracy
– Incorporate more diversity and reduce variances
Improve efficiency
– Searching among subsets of features is much faster than
searching among the complete set
Learning as Encoding
– Make no assumption about the true model, neither parametric
form nor free form
– Do not prefer one base model over the other, just average them
Prediction
– Simple averaging over multiple trees
B3: continous
Random threshold
0.6
34 05/12/2025 Machine Learning
Random Decision Tree
Potential Advantages
– Training can be very efficient. Particularly true for very large
datasets.
– No cross-validation based estimation of parameters for some
parametric methods.
– Natural multi-class probability.
– Imposes very little about the structures of the model.