05 Ensemble Learning
05 Ensemble Learning
Ali Sharifi-Zarchi
CE Department
Sharif University of Technology
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 1 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 2 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
Condorcet’s jury theorem
Ensemble learning
Ensemble Methods
2 Bagging
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 3 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
Condorcet’s jury theorem
Ensemble learning
Ensemble Methods
2 Bagging
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 4 / 67
Introduction Bagging Boosting AdaBoost Comparison References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 5 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
Condorcet’s jury theorem
Ensemble learning
Ensemble Methods
2 Bagging
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 6 / 67
Introduction Bagging Boosting AdaBoost Comparison References
• Strong learner: we seek to produce one classifier for which the classification error
can be made arbitrarily small.
• So far we were looking for such methods.
• Weak learner: a classifier which is just better than random guessing (for now this
will be our only expectation).
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 7 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Basic idea
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 8 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
Condorcet’s jury theorem
Ensemble learning
Ensemble Methods
2 Bagging
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 9 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Ensemble Methods
Ensemble
methods
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 10 / 67
Introduction Bagging Boosting AdaBoost Comparison References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 11 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
Basic idea & algorithm
Decision tree (quick review)
Random Forest
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 12 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
Basic idea & algorithm
Decision tree (quick review)
Random Forest
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 13 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Basic idea
• It uses bootstrap resampling to generate different training datasets from the origi-
nal training dataset.
• Samples training data uniformly at random with replacement.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 14 / 67
Introduction Bagging Boosting AdaBoost Comparison References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 15 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Algorithm
Algorithm 1 Bagging
1: Input: M (required ensemble size), D = {(x (1) , y (1) ), . . . , (x (N) , y (N) )} (training set)
2: for t = 1 to M do
3: Build a dataset Dt by sampling N items randomly with replacement from D
▷ Bootstrap resampling: like rolling N-face dice N times
4: Train a model ht using Dt and add it to the ensemble
5: end for
H(x) = sign M
¡P ¢
6: t=1 ht (x)
▷ Aggregate models by voting for classification or by averaging for regression
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 16 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
Basic idea & algorithm
Decision tree (quick review)
Random Forest
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 17 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Structure
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 18 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Learning
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 19 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Algorithm
Algorithm 2 Constructing DT
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 20 / 67
Introduction Bagging Boosting AdaBoost Comparison References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 21 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Example
1 Introduction
2 Bagging
Basic idea & algorithm
Decision tree (quick review)
Random Forest
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 23 / 67
Introduction Bagging Boosting AdaBoost Comparison References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 24 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Perfect candidates
• Remember Bagging?
• Train many trees on bootstrapped data, then aggregate (average/majority) the
outputs.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 25 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Algorithm
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 26 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Example
Example, Cont.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 28 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Example, Cont.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 29 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
Motivation & basic idea
Algorithm
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 30 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
Motivation & basic idea
Algorithm
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 31 / 67
Introduction Bagging Boosting AdaBoost Comparison References
• Did we have full control over the usefulness of the weak learners?
• The diversity or complementarity of the weak learners is not controlled in any way, it
is left to chance and to the instability (variance) of the models.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 32 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Basic idea
• The basic idea of boosting is to generate a series of weak learners which comple-
ment each other.
• For this, we will force each learner to focus on the mistakes of the previous learner.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 33 / 67
Introduction Bagging Boosting AdaBoost Comparison References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 34 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
Motivation & basic idea
Algorithm
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 35 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Algorithm
• Try to combine many simple weak learners (in sequence) to find a single strong
learner (For simplicity, suppose that we have a classification problem from now on).
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 36 / 67
Introduction Bagging Boosting AdaBoost Comparison References
• Decision stumps
h(x; θ) = sign(w1 x k − w0 )
θ = {k, w1 , w0 }
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 37 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
4 AdaBoost
Basic idea & algorithm
Loss function & proof
Properties (extra-reading)
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 38 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
4 AdaBoost
Basic idea & algorithm
Loss function & proof
Properties (extra-reading)
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 39 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Basic idea
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 40 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Example
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 41 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Example, Cont.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 42 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Example, Cont.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 43 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Example, Cont.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 44 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Example, Cont.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 45 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Algorithm
1
• HM (x) = [α1 h1 (x) + · · · + αM hM (x)] −→ the complete model y (i) ∈ {−1, 1}
2
• hm (x): m-th weak learner
• αm = ? −→ votes of the m-th weak learner
(i)
• wm : weight of sample i in iteration m
• w(i) = ?
m+1
N
(i)
× I(y (i) ̸= hm (x (i) )) −→ loss of the m-th weak learner
X
• Jm = wm
i=1
PN (i) (i) (i)
• ϵm = i=1 wm × I(y ̸= hm (x ))
PN (i)
−→ weighted error of the m-th weak learner
i=1 wm
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 46 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Algorithm, Cont.
1
• HM (x) = [α1 h1 (x) + · · · + αM hM (x)] −→ the complete model y (i) ∈ {−1, 1}
2
• hm (x): m-th weak learner
1 − ϵm
µ ¶
• αm = ln −→ votes of the m-th weak learner
ϵm
(i)
• wm : weight of sample i in iteration m
(i) αm I(y (i) ̸=h (x (i) ))
• w(i) = wm e m
m+1
N
(i)
× I(y (i) ̸= hm (x (i) )) −→ loss of the m-th weak learner
X
• Jm = wm
i=1
PN (i) (i) (i)
• ϵm = i=1 wm × I(y ̸= hm (x ))
PN (i)
−→ weighted error of the m-th weak learner
i=1 wm
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 47 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Algorithm, Cont.
Algorithm 4 AdaBoost
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 48 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
4 AdaBoost
Basic idea & algorithm
Loss function & proof
Properties (extra-reading)
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 49 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Loss function
ŷ = sign(HM (x))
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 50 / 67
Introduction Bagging Boosting AdaBoost Comparison References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 51 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1
Hm (x) = [α1 h1 (x) + . . . , +αm hm (x)]
2
↑
To have a cleaner form later
N N
(i)
Hm (x (i) ) (i)
[Hm−1 (x (i) )+ 12 αm hm (x (i) )]
Lm = e−y e−y
X X
=
i=1 i=1
N N
Hm−1 (x (i) ) − 12 αm y (i) hm (x (i) ) 1
e− 2 αm yx hm (x
(i) (i) (i)
(i) )
e−y
X X
= x e = wm
i=1 i=1 |{z}
(i) (i)
e−y Hm−1 (x )
Suppose it is fixed at stage m Should be optimized at stage m by seeking hm (x) and αm
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 52 / 67
Introduction Bagging Boosting AdaBoost Comparison References
N 1
(i) − 2 αm y (i)
hm (x (i) )
Lm =
X
wm e
i=1
à ! à !
−αm αm
(i) (i)
X X
=e 2 wm +e 2 wm
y (i) =hm (x (i) ) y (i) ̸=hm (x (i) )
à ! à !
αm −αm −αm
N
(i) (i)
X X
= (e 2 −e 2 ) wm +e 2 wm
y (i) ̸=hm (x (i) ) i=1
{z| }
N ³ ´
(i)
× I y (i) ̸= hm (x (i) )
X
Jm = wm
i=1 ↑
Find hm (x) that minimizes Jm
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 53 / 67
Introduction Bagging Boosting AdaBoost Comparison References
∂L m
=0
∂αm
• Idea: separate the derivative into misclassified and correctly classified samples.
à ! à !
1 αm −αm 1 −αm X N
(i) (i)
X
=⇒ (e 2 + e 2 ) wm = e 2 wm
2 y (i) ̸=hm (x (i) )
2 i=1
−αm P (i)
e 2 y (i) ̸=hm (x (i) ) wm
=⇒ αm −αm = PN (i)
(e +e ) i=1 wm
2 2
separate hm (x (i) )
(i) (i) − 2 αm y 1 (i)
hm (x (i) )
===========⇒ wm+1 = wm e
y (i) hm (x (i) )=1−2I (y (i) ̸=hm (x (i) )) (i) (i) − 21 αm αm I (y (i) ̸=hm (x (i) ))
=====================⇒ wm+1 = wm e e
↑
Independent of i and can be ignored
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 55 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
4 AdaBoost
Basic idea & algorithm
Loss function & proof
Properties (extra-reading)
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 56 / 67
Introduction Bagging Boosting AdaBoost Comparison References
• In each boosting iteration, assuming we can find h(x; θ m ) whose weighted error is
better than chance.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 57 / 67
Introduction Bagging Boosting AdaBoost Comparison References
• Boosting iterations typically decrease the training error of HM (x) over training ex-
amples.
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 58 / 67
Introduction Bagging Boosting AdaBoost Comparison References
• Training error has to go down exponentially fast if the weighted error of each hm is
strictly better than chance (i.e., ϵm < 0.5)
M p
2 ϵm (1 − ϵm )
Y
Etrain (HM ) ≤
m=1
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 59 / 67
Introduction Bagging Boosting AdaBoost Comparison References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 60 / 67
Introduction Bagging Boosting AdaBoost Comparison References
• Test error can still decrease after training error is flat (even zero).
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 61 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Typical behavior
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 62 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 63 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Bagging Boosting
Bootstrapping Weighted
Data Sampling
(random subsets) (by instance importance)
Dependent
Learners Dependency Independent
(on the previous models)
Varying weights
Learner Weighting Equal weights
(based on importance)
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 64 / 67
Introduction Bagging Boosting AdaBoost Comparison References
1 Introduction
2 Bagging
3 Boosting
4 AdaBoost
5 Comparison
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 65 / 67
Introduction Bagging Boosting AdaBoost Comparison References
Contributions
• Mahan Bayhaghi
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 66 / 67
Introduction Bagging Boosting AdaBoost Comparison References
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 67 / 67