Ensembles Learning
Ensembles Learning
Original
D Training data
Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets
Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers
Step 3:
Combine C*
Classifiers
Why does it work?
– Boosting
Bagging
Main Assumption:
– Combining many unstable predictors to produce a
ensemble (stable) predictor.
– Unstable Predictor: small changes in training data
produce large changes in the model.
e.g.Neural Nets, trees
Stable: SVM (sometimes), Nearest Neighbor.
Hypothesis Space
– Variable size (nonparametric):
Can model any function if you use an appropriate predictor
(e.g. trees)
Ensembles 4
The Bagging Algorithm
For m = 1: M
Obtain bootstrap sample D from the training
m
data D
Build a model G (x) from bootstrap data D
m m
Ensembles 5
The Bagging Model
Regression
M
1
yˆ =
Classification: M
åG
m=1
m (x )
– Vote over classifier outputs
G1 (x),..., GM (x)
Ensembles 6
Bagging Details
Ensembles 7
Bagging Details 2
Usually set M =~ 30
– Or use validation data to pick M
The models G (x) need to be unstable
m
– Usually full length (or slightly pruned) decision
trees.
Ensembles 8
Bagging
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
– Main Assumption:
Combining many weak to produce an ensemble
predictor
The weak predictors or classifiers need to be
stable
– Hypothesis Space
Variable size (nonparametric):
– Can model any function if you use an appropriate predictor
(e.g. trees)
Ensembles 11
Commonly Used Weak Predictor
(or classifier)
Ensembles 12
Boosting
Ensembles 13
Boosting
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4