14 Model Ensembles
14 Model Ensembles
• Weak Learners
• Produce a classifier that is more accurate than random guessing
• Not hard to build weak learners
Ensemble Learning
• If p > 1/2, then adding more voters increases the probability that the
majority voting is correct. At the limit, this probability approaches 1
• If p < 1/2, then more voters is bad. It is better to use a single jury
Majority Vote Classifier
Assumptions
• n classifiers
• Each classifier has an accuracy > 0.5
• Errors are uncorrelated
• Soft voting
• Also take the output probability into account
• Classifiers has to be well calibrated
• So all bagged trees will select f at the top of the tree. The only
difference between trees will be in how the rest of the sub-tree
changes which might not be that much
• Solution?
Random Subspace Method
Training data
A test sample
66% confidence
https://en.wikipedia.org/wiki/Out-of-bag_error
Random Forest: Bias and Variance
• Increasing the number of models (trees) decreases variance (less
overfitting)
• Bias is mostly unaffected, but will increase if the forest becomes too
large (oversmoothing)
• MoE
• Weight of expert depends on input x
• Gating network forces experts to “specialize” instead of cooperate
Ensemble Learning Limitations
• If classifiers are accurate and diverse, we can push the accuracy of the
ensemble arbitrarily high by combining classifiers
• A realistic claim: for data points where classifiers predict with > 50%
accuracy, can push accuracy arbitrarily high (some data points just too
hard)
• Shallow trees
• High bias but very low variance (underfitting)
• Keep low variance, reduce bias with Boosting
• Deep trees
• High variance but low bias (overfitting)
• Keep low bias, reduce variance with Bagging
• If model overfits (low bias, high variance): combine with other low-
bias models
• Need to be different: individual mistakes must be different
• Variance reduction. Can be done with Bagging