0% found this document useful (0 votes)
9 views16 pages

Ensembles Learning

Uploaded by

salahalj2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views16 pages

Ensembles Learning

Uploaded by

salahalj2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 16

Ensemble Learning General Idea

Original
D Training data

Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets

Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers

Step 3:
Combine C*
Classifiers
Why does it work?

 Suppose there are 25 base classifiers


– Each classifier has error rate,  = 0.35
– Assume classifiers are independent
– Probability that the ensemble classifier makes
a wrong prediction:
25
 25  i
 
 i 
i 13 
  (1   ) 25 i
0.06

Examples of Ensemble Methods

 How to generate an ensemble of classifiers?


– Bagging

– Boosting
Bagging

 Main Assumption:
– Combining many unstable predictors to produce a
ensemble (stable) predictor.
– Unstable Predictor: small changes in training data
produce large changes in the model.
e.g.Neural Nets, trees
Stable: SVM (sometimes), Nearest Neighbor.

 Hypothesis Space
– Variable size (nonparametric):
Can model any function if you use an appropriate predictor
(e.g. trees)

Ensembles 4
The Bagging Algorithm

Given data: D = {(x1 , y1 ),..., (x N , y N )}

For m = 1: M
 Obtain bootstrap sample D from the training
m
data D
 Build a model G (x) from bootstrap data D
m m

Ensembles 5
The Bagging Model

 Regression

M
1
yˆ =
 Classification: M
åG
m=1
m (x )
– Vote over classifier outputs

G1 (x),..., GM (x)

Ensembles 6
Bagging Details

 Bootstrap sample of N instances is obtained by


drawing N examples at random, with
replacement.
 On average each bootstrap sample
has 63% of instances
– Encourages predictors to have
uncorrelated errors
This is why it works

Ensembles 7
Bagging Details 2

 Usually set M =~ 30
– Or use validation data to pick M
 The models G (x) need to be unstable
m
– Usually full length (or slightly pruned) decision
trees.

Ensembles 8
Bagging

 Sampling with replacement

Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

 Build classifier on each bootstrap sample

 Each sample has probability (1 – 1/n) n of being


selected
Boosting

 An iterative procedure to adaptively change


distribution of training data by focusing more on
previously misclassified records
– Initially, all N records are assigned equal
weights
– Unlike bagging, weights may change at the
end of boosting round
Boosting

– Main Assumption:
Combining many weak to produce an ensemble
predictor
The weak predictors or classifiers need to be

stable
– Hypothesis Space
Variable size (nonparametric):
– Can model any function if you use an appropriate predictor
(e.g. trees)

Ensembles 11
Commonly Used Weak Predictor
(or classifier)

A Decision Tree Stump (1-R)

Ensembles 12
Boosting

Each classifier Gm (x) is


trained from a weighted
Sample of the training
Data

Ensembles 13
Boosting

 Records that are wrongly classified will have their


weights increased
 Records that are classified correctly will have
their weights decreased

Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify


• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds
Methods for Performance
Evaluation

 How to obtain a reliable estimate of


performance?

 Performance of a model may depend on other


factors besides the learning algorithm:
– Class distribution
– Cost of misclassification
– Size of training and test sets
Learning Curve

 Learning curve shows


how accuracy changes
with varying sample size
 Requires a sampling
schedule for creating
learning curve:
 Arithmetic sampling
(Langley, et al)
 Geometric sampling
(Provost et al)

Effect of small sample size:


- Bias in the estimate
- Variance of estimate

You might also like