Bagging+Boosting+Gradient Boosting
Bagging+Boosting+Gradient Boosting
Bagging+Boosting+Gradient Boosting
Bagging+Boosting+Gradient
Boosting
Pulak Ghosh
IIMB
Introduction & Motivation
Suppose that you are a patient with a set of symptoms
Instead of taking opinion of just one doctor (classifier), you decide to take
opinion of a few doctors!
Is this a good idea? Indeed it is.
Consult many doctors and then based on their diagnosis; you can get a fairly
accurate idea of the diagnosis.
Majority voting - bagging
More weightage to the opinion of some good (accurate) doctors -
boosting
In bagging, you give equal weightage to all classifiers, whereas in boosting
you give weightage according to the accuracy of the classifier.
Ensemble Methods
Multiple Data
S1 S2 Sn
Sets
Multiple C1 C2 Cn
Classifiers
Combined
Classifier
H
Build Ensemble Classifiers
Basic idea:
Build different experts, and let them vote
Advantages:
Improve predictive performance
Other types of classifiers can be directly included
Easy to implement
No too much parameter tuning
Disadvantages:
The combined classifier is not so transparent (black box)
Not a compact representation
Why does it work?
25
25 i
i
i 13
(1 ) 25 i
0.06
Examples of Ensemble Methods
Combine B predictors by
Voting (for classification problem)
Averaging (for estimation problem)
Bagging -- the Idea
Bootstrap
estimators
1
2
B
Bootstrap
samples
X*2
1 X*B
X*
X*B
2
2
B
X*
X*1
Some candidates:
Decision tree, decision stump, regression tree, linear
regression, SVMs
Bias-variance Decomposition
Used to analyze how much selection of any specific training set
affects performance
A ( x, P ) ES ( ( x, S k ))
Why Bagging works?
A ( x, P ) ES [ ( x, S k )]
Direct error:
e ES EY , X [Y ( X , S )] 2
Bagging error:
e A EY , X [Y A ( X , P)] 2
Jensens inequality: E[ Z ]2 E[ Z 2 ]
e E[Y 2 ] 2 E[Y A ] EY , X ES [ 2 ( X , S )]
E (Y A ) 2 e A
Boosting
Algorithms:
AdaBoost: adaptive boosting
Gentle AdaBoost
BrownBoost
Bagging
le t ML
m p n
sa eme f1
dom plac
n
Ra th re
wi ML
f2 f
Ra
wi ndo ML
th m
re sa
pla m
ce ple
fT
m
en
t
Boosting
ML
Training Sample f1
ML
Weighted Sample f2
f
ML
Weighted Sample fT
What is Boosting?
Analogy: Consult several doctors, based on a combination of weighted diagnosesweight
assigned based on the previous diagnosis accuracy
The boosting algorithm can be extended for the prediction of continuous values
Comparing with bagging: boosting tends to achieve greater accuracy, but it also risks
overfitting the model to misclassified data
Basic Idea?
Suppose examples 2, 3, 5 are correctly predicted by this classifier, and examples 1, 4 are wrongly predicted:
2nd round of boosting samples again 5 examples, but now examples 1 and 4 are more likely to be sampled
B
Update distribution
XB
DB
D2 X2
Choose weight
t
t = 1/2ln(1- t / t)
X1
D1 Calculate error t
X = (x1, ..., xn)
Initialize Distribution
D1(i) = 1/n
1 N
err I( y G ( x ))
N
i i
i 1
Ada Boost.M1 (Contd)
Sequentially apply the weak classification to repeatedly
modified versions of data
produce a sequence of weak classifiers Gm(x) m=1,2,..,M
The predictions from all classifiers are combined via majority
vote to produce the final prediction
Adaboost Concept
Adaboost starts with a uniform
distribution of weights over training
examples. The weights tell the learning
algorithm the importance of the
example.
(Repeat)
At the end, carefully make a linear
combination of the weak classifiers
obtained at all iterations.
Final Classifier: integrate the three weak classifiers and obtain a final strong
classifier.
Bagging vs Boosting
Bagging: the construction of complementary base-learners is left
to chance and to the unstability of the learning methods.
Boosting: actively seek to generate complementary base-
learner--- training the next base-learner based on the mistakes of
the previous learners.