Ensemble
Ensemble
Generalization Error
• Components
• Variance: Small test data (50, 50)
– how much models estimated from different training sets
differ from eachother
• Bias: how much the average model over all training sets differ
from the truemodel?
• Error due to inaccurateassumptions/simplifications/restrictions
made by the model
Why Errors
• Errors in learning are caused by:
– Limited representation (representation bias)
– Limited search (search bias)
– Limited data (variance)
– Limited features (noise)
Bias Variance Train error Test error
Underfitting H L H H
Overfitting L H L H
Underfitting and Overfitting
• Underfitting: model is too “simple” torepresent the
relevant classcharacteristics
– High bias and low variance
– High training error and high testerror
• Overfitting: model is too “complex” and fits
irrelevant characteristics (noise) in the data
– Low bias and high variance
– Low training error and high testerror
low bias error and low variance error solution: Ensemble
• Ensembles uses multiple trained (high variance/low bias) models to average out the variance,
leaving just the bias
multiple learning algorithms (classifiers) use different
– Algorithms
– Hyperparameters
– Representations (Modalities)
– Training sets