0% found this document useful (0 votes)
16 views8 pages

Ensemble Method

Ensemble methods combine multiple models to enhance accuracy, with popular techniques including bagging, boosting, and random forests. Bagging involves averaging predictions from various classifiers, while boosting focuses on weighted votes based on classifier accuracy. Additionally, strategies for handling class-imbalanced datasets include oversampling, under-sampling, and threshold-moving, which are essential for improving classification in scenarios with rare positive examples.

Uploaded by

Prerna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Ensemble Method

Ensemble methods combine multiple models to enhance accuracy, with popular techniques including bagging, boosting, and random forests. Bagging involves averaging predictions from various classifiers, while boosting focuses on weighted votes based on classifier accuracy. Additionally, strategies for handling class-imbalanced datasets include oversampling, under-sampling, and threshold-moving, which are essential for improving classification in scenarios with rare positive examples.

Uploaded by

Prerna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

ENSEMBLE METHODS: INCREASING THE ACCURACY

Ensemble methods
 Use a combination of models to
increase accuracy
 Combine a series of k learned
models, M1, M2, …, Mk, with the aim
of creating an improved model M*
Popular ensemble methods
 Bagging: averaging the prediction
over a collection of classifiers
 Boosting: weighted vote with a
collection of classifiers
 Ensemble: combining a set of
heterogeneous classifiers
1
BAGGING: BOOSTRAP AGGREGATION
Analogy: Diagnosis based on multiple doctors’ majority vote
Training
 Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled
with replacement from D (i.e., bootstrap)
 A classifier model Mi is learned for each training set D i

Classification: classify an unknown sample X


 Each classifier Mi returns its class prediction
 The bagged classifier M* counts the votes and assigns the class with the most votes
to X
Prediction: can be applied to the prediction of continuous values by taking the average
value of each prediction for a given test tuple
Accuracy
 Often significantly better than a single classifier derived from D
 For noise data: not considerably worse, more robust
 Proved improved accuracy in prediction
2
BOOSTING
Analogy: Consult several doctors, based on a combination of
weighted diagnoses—weight assigned based on the previous
diagnosis accuracy
How boosting works?
 Weights are assigned to each training tuple
 A series of k classifiers is iteratively learned
 After a classifier Mi is learned, the weights are updated to allow
the subsequent classifier, Mi+1, to pay more attention to the
training tuples that were misclassified by Mi
 The final M* combines the votes of each individual classifier,
where the weight of each classifier's vote is a function of its
accuracy
Boosting algorithm can be extended for numeric prediction
Comparing with bagging: Boosting tends to have greater accuracy,
but it also risks overfitting the model to misclassified data
4
ADABOOST (FREUND AND SCHAPIRE, 1997)

Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)


Initially, all the weights of tuples are set the same (1/d)
Generate k classifiers in k rounds. At round i,
 Tuples from D are sampled (with replacement) to form a training set D i of the same size
 Each tuple’s chance of being selected is based on its weight
 A classification model Mi is derived from Di
 Its error rate is calculated using Di as a test set
 If a tuple is misclassified, its weight is increased, o.w. it is decreased

Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi error rate is the sum of
the weights of the misclassified tuples: d
error ( M i )  w j err ( X j )
j

1  error ( M i )
log
The weight of classifier Mi’s vote is error ( M i )
5
RANDOM FOREST ( BREIMAN
2001)
Random Forest:
 Each classifier in the ensemble is a decision tree classifier and is generated
using a random selection of attributes at each node to determine the split
 During classification, each tree votes and the most popular class is returned

Two Methods to construct Random Forest:


 Forest-RI (random input selection): Randomly select, at each node, F
attributes as candidates for the split at the node. The CART methodology is
used to grow the trees to maximum size
 Forest-RC (random linear combinations): Creates new attributes (or
features) that are a linear combination of the existing attributes (reduces
the correlation between individual classifiers)
Comparable in accuracy to Adaboost, but more robust to errors and outliers
Insensitive to the number of attributes selected for consideration at each
split, and faster than bagging or boosting
7
CLASSIFICATION OF CLASS-IMBALANCED
DATA SETS
Class-imbalance problem: Rare positive example but numerous negative
ones, e.g., medical diagnosis, fraud, oil-spill, fault, etc.
Traditional methods assume a balanced distribution of classes and equal
error costs: not suitable for class-imbalanced data
Typical methods for imbalance data in 2-class classification:
 Oversampling: re-sampling of data from positive class
 Under-sampling: randomly eliminate tuples from negative class
 Threshold-moving: moves the decision threshold, t, so that the rare
class tuples are easier to classify, and hence, less chance of costly false
negative errors
 Ensemble techniques: Ensemble multiple classifiers introduced above
Still difficult for class imbalance problem on multiclass tasks
8

You might also like