Adaboost Algorithm
Adaboost Algorithm
(0)
0) Set W̃i = 1/n for i = 1, . . . , n
1) At the mth iteration we find (any) classifier h(x; θ̂m) for
which the weighted classification error �m
� n �
1 � (m−1)
�m = 0.5 − W̃i yih(xi; θ̂m)
2 i=1
is better than chance.
2) The new component is assigned votes based on its error:
α̂m = 0.5 log( (1 − �m)/�m )
120
100
exponential loss
80
60
40
20
0
0 10 20 30 40 50
number of iterations
0.14
0.12
0.1
training error
0.08
0.06
0.04
0.02
0
0 10 20 30 40 50
number of iterations
0.16
0.14
0.12
0.1
training error
0.08
0.06
0.04
0.02
0
0 10 20 30 40 50
number of iterations
0.35
0.3
weighted training error
0.25
0.2
0.15
0.1
0.05
0 10 20 30 40 50
number of iterations
0.8
0.6
error
0.4 test
0.2
train
20 40 60 80 100
# of rounds (T)
expect:
• training error to continue to drop (or reach zero)
• test error to increase when Hfinal becomes “too complex”
• “Occam’s razor”
• overfitting
• hard to know when to stop training
Technically...
• bound depends on
• m = # training examples
• d = “complexity” of weak classifiers
• T = # rounds
• generalization error = E [test error]
• predicts overfitting
“Typical” performance
• Training and test errors of the combined classifier
0.14
0.12
training/test errors
0.1
0.08
0.06
0.04
0.02
0
0 10 20 30 40 50
number of iterations
margin(xi) = yi · ĥm(xi)
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
4 iterations 10 iterations
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
20 iterations 50 iterations
0.14 0.14
0.12 0.12
training/test errors
training/test errors
0.1 0.1
typically
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0 10 20 30 40 50 0 10 20 30 40 50
number of iterations number of components
• fast
• simple and easy to program
• no parameters to tune (except T )
• flexible — can combine with any learning algorithm
• no prior knowledge needed about weak learner
• provably effective, provided can consistently find rough rules
of thumb
→ shift in mind set — goal now is merely to find classifiers
barely better than random guessing
• versatile
• can use with data that is textual, numeric, discrete, etc.
• has been extended to learning problems well beyond
binary classification
Caveats
ht : X → Y
!
Dt (i) e −αt if yi = ht (xi )
Dt+1 (i) = ·
Zt e αt if yi #= ht (xi )
"
Hfinal (x) = arg max αt
y ∈Y
t:ht (x)=y
x1 x1 − x1 + x1 − x1 −
x2 x2 − x2 − x2 + x2 −
x3 ⇒ x3 − x3 − x3 − x3 +
x4 x4 − x4 + x4 − x4 −
x5 x5 + x5 − x5 − x5 −