0% found this document useful (0 votes)

81 views17 pages

Adaboost Algorithm

The AdaBoost algorithm iteratively trains weak learners to focus on examples that previous learners misclassified. At each iteration, it assigns a stronger vote to learners that minimize weighted errors. This process exponentially decreases training error over iterations and increases the margin of correctly classified examples, improving generalization to new data.

Uploaded by

Prasanth Th

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views17 pages

Adaboost Algorithm

Uploaded by

Prasanth Th

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

The AdaBoost algorithm

(0)
0) Set W̃i = 1/n for i = 1, . . . , n
1) At the mth iteration we find (any) classifier h(x; θ̂m) for
which the weighted classification error �m
� n �
1 � (m−1)
�m = 0.5 − W̃i yih(xi; θ̂m)
2 i=1
is better than chance.
2) The new component is assigned votes based on its error:
α̂m = 0.5 log( (1 − �m)/�m )

3) The weights are updated according to (Zm is chosen so that

(m)
the new weights W̃i sum to one):
(m) 1 (m−1)
W̃i = · W̃i · exp{ −yiα̂mh(xi; θ̂m) }
Zm
Tommi Jaakkola, MIT CSAIL 18
Adaboost properties: exponential loss
• After each boosting iteration, assuming we can find a
component classifier whose weighted error is better than
chance, the combined classifier
ĥm(x) = α̂1h(x; θ̂1) + . . . + α̂mh(x; θ̂m)
is guaranteed to have a lower exponential loss over the
training examples
140

120

100
exponential loss

0
0 10 20 30 40 50
number of iterations

Tommi Jaakkola, MIT CSAIL 20

Adaboost properties: training error
• The boosting iterations also decrease the classification error
of the combined classifier

ĥm(x) = α̂1h(x; θ̂1) + . . . + α̂mh(x; θ̂m)

over the training examples.

0.16

0.14

0.12

0.1
training error

0.08

0.06

0.04

0.02

0
0 10 20 30 40 50
number of iterations

Tommi Jaakkola, MIT CSAIL 21

Adaboost properties: training error cont’d
• The training classification error has to go down exponentially
fast if the weighted errors of the component classifiers, �k ,
are strictly better than chance �k < 0.5
m �
�
err(ĥm) ≤ 2 �k (1 − �k )
k=1

0.16

0.14

0.12

0.1
training error

0.08

0.06

0.04

0.02

0
0 10 20 30 40 50
number of iterations

Tommi Jaakkola, MIT CSAIL 22

Adaboost properties: weighted error
• Weighted error of each new component classifier
� n �
1 � (k−1)
�k = 0.5 − W̃i yih(xi; θ̂k )
2 i=1

tends to increase as a function of boosting iterations.

0.4

0.35

0.3
weighted training error

0.25

0.2

0.15

0.1

0.05
0 10 20 30 40 50
number of iterations

Tommi Jaakkola, MIT CSAIL 23

How Will Test Error Behave? (A First Guess)

0.8

0.6

error
0.4 test
0.2
train
20 40 60 80 100
# of rounds (T)

expect:
• training error to continue to drop (or reach zero)
• test error to increase when Hfinal becomes “too complex”
• “Occam’s razor”
• overfitting
• hard to know when to stop training
Technically...

• with high probability:

!" #
dT
generalization error ≤ training error + Õ
m

• bound depends on
• m = # training examples
• d = “complexity” of weak classifiers
• T = # rounds
• generalization error = E [test error]
• predicts overfitting
“Typical” performance
• Training and test errors of the combined classifier

ĥm(x) = α̂1h(x; θ̂1) + . . . + α̂mh(x; θ̂m)

0.16

0.14

0.12
training/test errors
0.1

0.08

0.06

0.04

0.02

0
0 10 20 30 40 50
number of iterations

• Why should the test error go down after we already have

zero training error?

Tommi Jaakkola, MIT CSAIL 24

AdaBoost and margin
• We can write the combined classifier in a more useful form
by dividing the predictions by the “total number of votes”:
α̂1h(x; θ̂1) + . . . + α̂mh(x; θ̂m)
ĥm(x) =
α̂1 + . . . + α̂m

• This allows us to define a clear notion of “voting margin” that

the combined classifier achieves for each training example:

margin(xi) = yi · ĥm(xi)

The margin lies in [−1, 1] and is negative for all misclassified

examples.

Tommi Jaakkola, MIT CSAIL 25

AdaBoost and margin
• Successive boosting iterations still improve the majority vote
or margin for the training examples
� �
α̂1h(xi; θ̂1) + . . . + α̂mh(xi; θ̂m)
margin(xi) = yi
α̂1 + . . . + α̂m

• Cumulative distributions of margin values:

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

4 iterations 10 iterations

Tommi Jaakkola, MIT CSAIL 26

• Cumulative distributions of margin values:

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

20 iterations 50 iterations

Tommi Jaakkola, MIT CSAIL 27

Can we improve the combination?
• As a result of running the boosting algorithm for m iterations,
we essentially generate a new feature representation for the
data

φi(x) = h(x; θ̂i), i = 1, . . . , m

• Perhaps we can do better by separately estimating a new set

of “votes” for each component. In other words, we could
estimate a linear classifier of the form

f (x; α) = α1φ1(x) + . . . αmφm(x)

where each parameter αi can be now any real number (even

negative). The parameters would be estimated jointly rather
than one after the other as in boosting.

Tommi Jaakkola, MIT CSAIL 28

Can we improve the combination?
• We could use SVMs in a postprocessing step to reoptimize
f (x; α) = α1φ1(x) + . . . αmφm(x)
with respect to α1, . . . , αm. This is not necessarily a good
idea.
0.16 0.16

0.14 0.14

0.12 0.12
training/test errors

training/test errors
0.1 0.1
typically
0.08 0.08

0.06 0.06

0.04 0.04

0.02 0.02

0 0
0 10 20 30 40 50 0 10 20 30 40 50
number of iterations number of components

boosting svm postprocessing

Tommi Jaakkola, MIT CSAIL 29
Practical Advantages of AdaBoost

• fast
• simple and easy to program
• no parameters to tune (except T )
• flexible — can combine with any learning algorithm
• no prior knowledge needed about weak learner
• provably effective, provided can consistently find rough rules
of thumb
→ shift in mind set — goal now is merely to find classifiers
barely better than random guessing
• versatile
• can use with data that is textual, numeric, discrete, etc.
• has been extended to learning problems well beyond
binary classification
Caveats

• performance of AdaBoost depends on data and weak learner

• consistent with theory, AdaBoost can fail if
• weak classifiers too complex
→ overfitting
• weak classifiers too weak (γt → 0 too quickly)
→ underfitting
→ low margins → overfitting
• empirically, AdaBoost seems especially susceptible to uniform
noise
Multiclass Problems
[with Freund]
• say y ∈ Y where |Y | = k
• direct approach (AdaBoost.M1):

ht : X → Y
!
Dt (i) e −αt if yi = ht (xi )
Dt+1 (i) = ·
Zt e αt if yi #= ht (xi )
"
Hfinal (x) = arg max αt
y ∈Y
t:ht (x)=y

• can prove same bound on error if ∀t : #t ≤ 1/2

• in practice, not usually a problem for “strong” weak
learners (e.g., C4.5)
• significant problem for “weak” weak learners (e.g.,
decision stumps)
• instead, reduce to binary
The One-Against-All Approach
• break k-class problem into k binary problems and
solve each separately
• say possible labels are Y = { , , , }

x1 x1 − x1 + x1 − x1 −
x2 x2 − x2 − x2 + x2 −
x3 ⇒ x3 − x3 − x3 − x3 +
x4 x4 − x4 + x4 − x4 −
x5 x5 + x5 − x5 − x5 −

• to classify new example, choose label predicted to be “most”

positive
• ⇒ “AdaBoost.MH” [with Singer]
• problem: not robust to errors in predictions

1 Eric Boosting304FinalRpdf
No ratings yet
1 Eric Boosting304FinalRpdf
19 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
No ratings yet
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
29 pages
Adaboost
No ratings yet
Adaboost
22 pages
Boosting and Applications Yuan
No ratings yet
Boosting and Applications Yuan
41 pages
Ada Boost
No ratings yet
Ada Boost
7 pages
Ada Boost
No ratings yet
Ada Boost
25 pages
Boosting and AdaBoost For Machine Learning
No ratings yet
Boosting and AdaBoost For Machine Learning
18 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Lecture Notes 7
No ratings yet
Lecture Notes 7
8 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
ADABOOST
No ratings yet
ADABOOST
9 pages
A Brief Introduction To Adaboost: Hongbo Deng 6 Feb, 2007
No ratings yet
A Brief Introduction To Adaboost: Hongbo Deng 6 Feb, 2007
35 pages
Zhu - Multiclass Adaboost2009 PDF
No ratings yet
Zhu - Multiclass Adaboost2009 PDF
12 pages
DM (Boosting)
No ratings yet
DM (Boosting)
15 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Adaboost Matas
No ratings yet
Adaboost Matas
136 pages
Addaboost
No ratings yet
Addaboost
12 pages
Adaboost
No ratings yet
Adaboost
29 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
ENG6500 7 Ensembles Boosting
No ratings yet
ENG6500 7 Ensembles Boosting
49 pages
AdaBoost New PDF
No ratings yet
AdaBoost New PDF
45 pages
FAQ - Boosting - Ensemble Techniques - Great Learning
No ratings yet
FAQ - Boosting - Ensemble Techniques - Great Learning
2 pages
Boosting Approach To Machine Learn
No ratings yet
Boosting Approach To Machine Learn
23 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
Adaboost: Derek Hoiem March 31, 2004
No ratings yet
Adaboost: Derek Hoiem March 31, 2004
46 pages
Improving Classification With AdaBoost
No ratings yet
Improving Classification With AdaBoost
20 pages
Boosting Mit
No ratings yet
Boosting Mit
36 pages
AdaBoost Is Consistent
No ratings yet
AdaBoost Is Consistent
22 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Ensemble (v6)
No ratings yet
Ensemble (v6)
45 pages
Introduction To Boosting - 2
No ratings yet
Introduction To Boosting - 2
79 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
Lecture 16: Boosting - Applied ML
No ratings yet
Lecture 16: Boosting - Applied ML
20 pages
Machine Learning: Lecture 8: Ensemble Methods
No ratings yet
Machine Learning: Lecture 8: Ensemble Methods
28 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
Boosting Margin
No ratings yet
Boosting Margin
30 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
L07 Classifiers Combination
No ratings yet
L07 Classifiers Combination
17 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Lec13 PDF
No ratings yet
Lec13 PDF
10 pages
Resilience To Overfitting AdaBoosts Approach
No ratings yet
Resilience To Overfitting AdaBoosts Approach
8 pages
Pradipta Kumar Pattanayak - Ada Boosting
No ratings yet
Pradipta Kumar Pattanayak - Ada Boosting
44 pages
Boosting
No ratings yet
Boosting
31 pages
LECTURE+NOTES Boosting
No ratings yet
LECTURE+NOTES Boosting
8 pages
AdaBoost Final
No ratings yet
AdaBoost Final
97 pages
Boosting: 1. What Is The Difference Between Adaboost and Gradient Boosting?
No ratings yet
Boosting: 1. What Is The Difference Between Adaboost and Gradient Boosting?
2 pages
Lecture 10 Boosting
No ratings yet
Lecture 10 Boosting
20 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Lec. 14 - Bartlett P. Boosting The Margin - A New Explanation For The Effectiveness of Voting Methods. (1998)
No ratings yet
Lec. 14 - Bartlett P. Boosting The Margin - A New Explanation For The Effectiveness of Voting Methods. (1998)
36 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Adaboost
No ratings yet
Adaboost
13 pages
Bagging and Boosting: Amit Srinet Dave Snyder
No ratings yet
Bagging and Boosting: Amit Srinet Dave Snyder
33 pages
ML 9
No ratings yet
ML 9
64 pages
Adaboost
No ratings yet
Adaboost
5 pages
Statistics Project
No ratings yet
Statistics Project
5 pages
Kap07 SX E
No ratings yet
Kap07 SX E
12 pages
Chemistry 1 and 2 WRITE SHOP
No ratings yet
Chemistry 1 and 2 WRITE SHOP
73 pages
Grade 3
No ratings yet
Grade 3
33 pages
Operation Manual For Recloser
No ratings yet
Operation Manual For Recloser
145 pages
Bayesian Networks
No ratings yet
Bayesian Networks
14 pages
Multi-Choice Questions Dye Penetration Testing.
No ratings yet
Multi-Choice Questions Dye Penetration Testing.
9 pages
A Soft-Switched Fast Cell-to-Cell Voltage Equalizer For Electrochemical Energy Storage 1907.02559 PDF
No ratings yet
A Soft-Switched Fast Cell-to-Cell Voltage Equalizer For Electrochemical Energy Storage 1907.02559 PDF
15 pages
MDB - Topic 4 Shear and Moment in Beams
No ratings yet
MDB - Topic 4 Shear and Moment in Beams
6 pages
Course Pack GE 4
No ratings yet
Course Pack GE 4
10 pages
June 2016 Mark Scheme 61 PDF
No ratings yet
June 2016 Mark Scheme 61 PDF
6 pages
C Is A: Mathematics Richard Courant Herbert Robbins England Oxford University Press
No ratings yet
C Is A: Mathematics Richard Courant Herbert Robbins England Oxford University Press
3 pages
Agile Unified Process System Development Technique
No ratings yet
Agile Unified Process System Development Technique
16 pages
MEEG 306 Tutorial Revised May 2019
100% (1)
MEEG 306 Tutorial Revised May 2019
30 pages
Resume Mahir
No ratings yet
Resume Mahir
1 page
Force
No ratings yet
Force
9 pages
Airlines Reservation Sytemterm Paper of Cse
No ratings yet
Airlines Reservation Sytemterm Paper of Cse
16 pages
Dimension Stone Cladding
No ratings yet
Dimension Stone Cladding
15 pages
Insert CREP2 03263991500 V17 en
No ratings yet
Insert CREP2 03263991500 V17 en
8 pages
Epistasis: 02 - 715 Advanced Topics in Computa8onal Genomics
No ratings yet
Epistasis: 02 - 715 Advanced Topics in Computa8onal Genomics
28 pages
Q4W4
No ratings yet
Q4W4
30 pages
Web-Based Traffic System
No ratings yet
Web-Based Traffic System
7 pages
TCS Placement Papers: PART I - Vocabulary Section
No ratings yet
TCS Placement Papers: PART I - Vocabulary Section
3 pages
Lab 1 The Physics Laboratory
No ratings yet
Lab 1 The Physics Laboratory
4 pages
v2 Physics Intervention LAS Free Fall
No ratings yet
v2 Physics Intervention LAS Free Fall
6 pages
Syllabus ADM 01 2023
No ratings yet
Syllabus ADM 01 2023
11 pages
NIST He 2009 Test Report
No ratings yet
NIST He 2009 Test Report
89 pages
Technical Terms Used in Research
No ratings yet
Technical Terms Used in Research
1 page
Example 01 - Clock-Related Problem - Algebra Review
No ratings yet
Example 01 - Clock-Related Problem - Algebra Review
5 pages
Surface and Thin Film Analysis A Compendium of Principles Instrumentation and Applications Second Edition Gernot Friedbacher download full chapters
No ratings yet
Surface and Thin Film Analysis A Compendium of Principles Instrumentation and Applications Second Edition Gernot Friedbacher download full chapters
134 pages
Different Species of Laccaria Mushroom' With You
No ratings yet
Different Species of Laccaria Mushroom' With You
11 pages

Adaboost Algorithm

Uploaded by

Adaboost Algorithm

Uploaded by

The AdaBoost algorithm

3) The weights are updated according to (Zm is chosen so that

Tommi Jaakkola, MIT CSAIL 20

ĥm(x) = α̂1h(x; θ̂1) + . . . + α̂mh(x; θ̂m)

over the training examples.

Tommi Jaakkola, MIT CSAIL 21

Tommi Jaakkola, MIT CSAIL 22

tends to increase as a function of boosting iterations.

Tommi Jaakkola, MIT CSAIL 23

• with high probability:

ĥm(x) = α̂1h(x; θ̂1) + . . . + α̂mh(x; θ̂m)

• Why should the test error go down after we already have

Tommi Jaakkola, MIT CSAIL 24

• This allows us to define a clear notion of “voting margin” that

The margin lies in [−1, 1] and is negative for all misclassified

Tommi Jaakkola, MIT CSAIL 25

• Cumulative distributions of margin values:

Tommi Jaakkola, MIT CSAIL 26

• Cumulative distributions of margin values:

Tommi Jaakkola, MIT CSAIL 27

φi(x) = h(x; θ̂i), i = 1, . . . , m

• Perhaps we can do better by separately estimating a new set

f (x; α) = α1φ1(x) + . . . αmφm(x)

where each parameter αi can be now any real number (even

Tommi Jaakkola, MIT CSAIL 28

boosting svm postprocessing

• performance of AdaBoost depends on data and weak learner

• can prove same bound on error if ∀t : #t ≤ 1/2

• to classify new example, choose label predicted to be “most”

You might also like