Ensemble Method
Ensemble Method
Ensemble methods
Use a combination of models to
increase accuracy
Combine a series of k learned
models, M1, M2, …, Mk, with the aim
of creating an improved model M*
Popular ensemble methods
Bagging: averaging the prediction
over a collection of classifiers
Boosting: weighted vote with a
collection of classifiers
Ensemble: combining a set of
heterogeneous classifiers
1
BAGGING: BOOSTRAP AGGREGATION
Analogy: Diagnosis based on multiple doctors’ majority vote
Training
Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled
with replacement from D (i.e., bootstrap)
A classifier model Mi is learned for each training set D i
Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi error rate is the sum of
the weights of the misclassified tuples: d
error ( M i ) w j err ( X j )
j
1 error ( M i )
log
The weight of classifier Mi’s vote is error ( M i )
5
RANDOM FOREST ( BREIMAN
2001)
Random Forest:
Each classifier in the ensemble is a decision tree classifier and is generated
using a random selection of attributes at each node to determine the split
During classification, each tree votes and the most popular class is returned