0% found this document useful (0 votes)

27 views67 pages

05 Ensemble Learning

Uploaded by

sahandakpou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views67 pages

05 Ensemble Learning

Uploaded by

sahandakpou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Introduction Bagging Boosting AdaBoost Comparison References

Machine Learning (CE 40717)

Fall 2024

Ali Sharifi-Zarchi

CE Department
Sharif University of Technology

October 13, 2024

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 1 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 2 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction
Condorcet’s jury theorem
Ensemble learning
Ensemble Methods

2 Bagging

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 3 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction
Condorcet’s jury theorem
Ensemble learning
Ensemble Methods

2 Bagging

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 4 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Condorcet’s jury theorem

• N voters wish to reach a decision by major-

ity vote.

• Each voter has an independent probability

p of voting for the correct decision.

• Let M be the probability of the majority

voting for the correct decision.

• If p > 0.5 and N → ∞, then M → 1

• How?

Adopted from Wikipedia

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 5 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction
Condorcet’s jury theorem
Ensemble learning
Ensemble Methods

2 Bagging

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 6 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Strong vs. weak Learners

• Strong learner: we seek to produce one classifier for which the classification error
can be made arbitrarily small.
• So far we were looking for such methods.

• Weak learner: a classifier which is just better than random guessing (for now this
will be our only expectation).

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 7 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Basic idea

• Certain weak learners do well in model-

ing one aspect of the data, while others
do well in modeling another.

• Learn several simple models and com-

bine their outputs to produce the final
decision.

• A composite prediction where the final

accuracy is better than the accuracy of
individual models. Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 8 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction
Condorcet’s jury theorem
Ensemble learning
Ensemble Methods

2 Bagging

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 9 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Ensemble Methods

• Weak learners are generated in parallel.

Parallel
• Basic motivation is to use independence
ensemble
methods between the learners.

Ensemble
methods

• Weak learners are generated consecu-

Sequential
tively.
ensemble
methods • Basic motivation is to use dependence be-
tween the base learners.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 10 / 67
Introduction Bagging Boosting AdaBoost Comparison References

What we talk about

• Weak or simple learners

• Low variance: they don’t usually overfit
• High bias: they can’t learn complex functions

• Bagging (parallel): To decrease the variance

• Random Forest

• Boosting (sequential): To decrease the bias (enhance their capabilities)

• AdaBoost

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 11 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging
Basic idea & algorithm
Decision tree (quick review)
Random Forest

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 12 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging
Basic idea & algorithm
Decision tree (quick review)
Random Forest

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 13 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Basic idea

• Bagging = Bootstrap aggregating

• It uses bootstrap resampling to generate different training datasets from the origi-
nal training dataset.
• Samples training data uniformly at random with replacement.

• On the training datasets, it trains different weak learners.

• During testing, it aggregates the weak learners by uniform averaging or majority

voting.
• Works best with unstable models (high variance models). Why?

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 14 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Basic idea, Cont.

Adopted from GeeksForGeeks

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 15 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Algorithm

Algorithm 1 Bagging

1: Input: M (required ensemble size), D = {(x (1) , y (1) ), . . . , (x (N) , y (N) )} (training set)
2: for t = 1 to M do
3: Build a dataset Dt by sampling N items randomly with replacement from D
▷ Bootstrap resampling: like rolling N-face dice N times
4: Train a model ht using Dt and add it to the ensemble
5: end for
H(x) = sign M
¡P ¢
6: t=1 ht (x)
▷ Aggregate models by voting for classification or by averaging for regression

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 16 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging
Basic idea & algorithm
Decision tree (quick review)
Random Forest

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 17 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Structure

• Terminal nodes (leaves) represent tar-

get variable.

• Each internal node denotes a test on

an attribute.

Adopted from Medium

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 18 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Learning

• Learning an optimal decision tree is NP-Complete.

• Instead, we use a greedy search based on a heuristic.
• We can’t guarantee to return the globally-optimal decision tree.

• The most common strategy for DT learning is a greedy top-down approach.

• Tree is constructed by splitting samples into subsets based on an attribute value

test in a recursive manner.

Adopted from G.E. Naumov, "NP-completeness of problems of construction of

optimal decision trees", 1991

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 19 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Algorithm

Algorithm 2 Constructing DT

1: procedure F IND T REE(S, A) ▷ Input: S (samples), A (attributes)

2: if A is empty or all labels in S are the same then
3: status ← leaf
4: class ← most common class in S
5: else
6: status ← internal
7: a ← bestAttribute(S, A) ▷ The attribute value test
8: LeftNode ← FindTree(S(a = 1), A − {a})
9: RightNode ← FindTree(S(a = 0), A − {a})
10: end if
11: end procedure

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 20 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Which attribute is the best?

• Entropy measures the uncertainty in a

specific distribution.
X
H(X ) = − P(x i ) log P(x i )
x i ∈x

• Information Gain (IG)

X |Sv |
Gain(S, A) = HS (Y ) − HSV (Y )
v∈Values(A) |S|

A: variable used to split samples

Y : target variable
S: samples, Sv : subset of S where A = v Adopted from Wikipedia
HS (Y ): entropy of Y over S

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 21 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Example

Adopted from [5]

Gain(S, Humidity) = 0.940 − (7/14)0.985 − (7/14)0.592 = 0.151

Gain(S, Wind) = 0.940 − (8/14)0.811 − (6/14)1.0 = 0.48

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 22 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging
Basic idea & algorithm
Decision tree (quick review)
Random Forest

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 23 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Bagging on decision trees?

Why decision trees?

• Interpretable
• Robust to outliers
• Low bias
• High variance

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 24 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Perfect candidates

• Why are DTs good candidates for ensembles?

• Consider averaging many (nearly) unbiased tree estimators.
• Bias remains similar, but variance is reduced.

• Remember Bagging?
• Train many trees on bootstrapped data, then aggregate (average/majority) the
outputs.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 25 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Algorithm

Algorithm 3 Random Forest

1: Input: T (number of trees), m (number of variables used to split each node)

2: for t = 1 to T do
3: Draw a bootstrap dataset
4: Select m features randomly out of d features as candidates for splitting
5: Learn a tree on this dataset
6: end for p
7: Output: ▷ Usually: m ≤ d
8: Regression: average of the outputs
9: Classification: majority voting

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 26 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Example

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 27 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Example, Cont.

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 28 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Example, Cont.

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 29 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting
Motivation & basic idea
Algorithm

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 30 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting
Motivation & basic idea
Algorithm

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 31 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Problems with bagging

• Bagging created a diversity of weak learners by creating random datasets.

• Examples: Decision stumps (shallow decision trees), Logistic regression, . . .

• Did we have full control over the usefulness of the weak learners?
• The diversity or complementarity of the weak learners is not controlled in any way, it
is left to chance and to the instability (variance) of the models.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 32 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Basic idea

• We would expect a better performance if the weak learners also complemented

each other.
• They would have "expertise" on different subsets of the dataset.
• So they would work better on different subsets.

• The basic idea of boosting is to generate a series of weak learners which comple-
ment each other.
• For this, we will force each learner to focus on the mistakes of the previous learner.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 33 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Basic idea, Cont.

Adopted from GeeksForGeeks

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 34 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting
Motivation & basic idea
Algorithm

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 35 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Algorithm

• Try to combine many simple weak learners (in sequence) to find a single strong
learner (For simplicity, suppose that we have a classification problem from now on).

• Each component is a simple binary ±1 classifier

• Voted combinations of component classifiers

HM (x) = α1 h(x; θ 1 ) + · · · + αM h(x; θ M )

• To simplify notations: h(x; θ i ) = hi (x)

HM (x) = α1 h1 (x) + · · · + αM hM (x)

• Prediction: ŷ = sign(HM (x))

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 36 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Candidate for hi (x)

• Decision stumps

• Each classifier is based

on only a single feature
of x (e.g., x k ):

h(x; θ) = sign(w1 x k − w0 )
θ = {k, w1 , w0 }

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 37 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting

4 AdaBoost
Basic idea & algorithm
Loss function & proof
Properties (extra-reading)

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 38 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting

4 AdaBoost
Basic idea & algorithm
Loss function & proof
Properties (extra-reading)

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 39 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Basic idea

• Sequential production of classifiers

• Iteratively add the classifier whose addition will be most helpful.

• Represent the important of each sample by assigning weights to them.

• Correct classification =⇒ smaller weights
• Misclassified samples =⇒ larger weights

• Each classifier is dependent on the previous ones.

• Focuses on the previous ones’ error.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 40 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Example

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 41 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Example, Cont.

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 42 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Example, Cont.

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 43 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Example, Cont.

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 44 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Example, Cont.

Adopted from [4]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 45 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Algorithm

1
• HM (x) = [α1 h1 (x) + · · · + αM hM (x)] −→ the complete model y (i) ∈ {−1, 1}
2
• hm (x): m-th weak learner
• αm = ? −→ votes of the m-th weak learner
(i)
• wm : weight of sample i in iteration m
• w(i) = ?
m+1

N
(i)
× I(y (i) ̸= hm (x (i) )) −→ loss of the m-th weak learner
X
• Jm = wm
i=1
PN (i) (i) (i)
• ϵm = i=1 wm × I(y ̸= hm (x ))
PN (i)
−→ weighted error of the m-th weak learner
i=1 wm

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 46 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Algorithm, Cont.

1
• HM (x) = [α1 h1 (x) + · · · + αM hM (x)] −→ the complete model y (i) ∈ {−1, 1}
2
• hm (x): m-th weak learner
1 − ϵm
µ ¶
• αm = ln −→ votes of the m-th weak learner
ϵm
(i)
• wm : weight of sample i in iteration m
(i) αm I(y (i) ̸=h (x (i) ))
• w(i) = wm e m
m+1

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 47 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Algorithm, Cont.

Algorithm 4 AdaBoost

1: Initialize data weight w1(i) = 1

N for all N samples
2: for m = 1 to M do
N
(i)
× I(y (i) ̸= hm (x (i) ))
X
3: Find hm (x) by minimizing the loss: Jm = wm
PNi=1 (i)
wm × I(y (i) ̸= hm (x (i) ))
4: Find the weighted error of hm (x): ϵm = i=1 PN (i)
i=1 wm
1 − ϵm
µ ¶
5: Assign votes αm = ln
ϵm
(i) (i) αm I(y (i)
̸=hm (x (i) ))
6: Update the weights: wm+1 = wm e
7: end for
1 PM
8: Combined classifier: ŷ = sign(HM (x)) where HM (x) = αm hm (x)
2 m=1

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 48 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting

4 AdaBoost
Basic idea & algorithm
Loss function & proof
Properties (extra-reading)

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 49 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Loss function

• There are many options for the loss function.

• AdaBoost is equivalent to using the following exponential loss.

L (y, HM (x)) = e−y×HM (x)

ŷ = sign(HM (x))

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 50 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Why the exponential loss?

• Differentiable approximation (bound) of

the 0/1 loss
• Easy to optimize
• Optimizing an upper bound on
classification error.

Adopted from [2]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 51 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Step 1: Calculating the exponential loss

• We need to calculate the exponential loss for:

1
Hm (x) = [α1 h1 (x) + . . . , +αm hm (x)]
2
↑
To have a cleaner form later

• Idea: consider adding the m-th component:

N N
(i)
Hm (x (i) ) (i)
[Hm−1 (x (i) )+ 12 αm hm (x (i) )]
Lm = e−y e−y
X X
=
i=1 i=1
N N
Hm−1 (x (i) ) − 12 αm y (i) hm (x (i) ) 1
e− 2 αm yx hm (x
(i) (i) (i)
(i) )
e−y
X X
= x e = wm
i=1  i=1 |{z} 
(i) (i)
 e−y Hm−1 (x ) 
Suppose it is fixed at stage m Should be optimized at stage m by seeking hm (x) and αm

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 52 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Step 2: Deriving the weighted error function

• We need to derive the weighted error function, Jm

N 1
(i) − 2 αm y (i)
hm (x (i) )
Lm =
X
wm e
i=1
Ã ! Ã !
−αm αm
(i) (i)
X X
=e 2 wm +e 2 wm
y (i) =hm (x (i) ) y (i) ̸=hm (x (i) )
Ã ! Ã !
αm −αm −αm
N
(i) (i)
X X
= (e 2 −e 2 ) wm +e 2 wm
y (i) ̸=hm (x (i) ) i=1
{z| }
N ³ ´
(i)
× I y (i) ̸= hm (x (i) )
X
Jm = wm
i=1 ↑
Find hm (x) that minimizes Jm

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 53 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Step 3: Deriving ϵm and αm

• We need to derive ϵm and αm by setting the derivative equal to zero:

∂L m
=0
∂αm
• Idea: separate the derivative into misclassified and correctly classified samples.
Ã ! Ã !
1 αm −αm 1 −αm X N
(i) (i)
X
=⇒ (e 2 + e 2 ) wm = e 2 wm
2 y (i) ̸=hm (x (i) )
2 i=1
−αm P (i)
e 2 y (i) ̸=hm (x (i) ) wm
=⇒ αm −αm = PN (i)
(e +e ) i=1 wm
2 2

PN (i) ¡ (i) (i)

¢
i=1 wm I y ̸= hm (x ) 1 − ϵm
µ ¶
• Set ϵm = =⇒ αm = ln
(i) ϵm
PN
i=1 wm
CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 54 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Step 4: Justifying the weight update mechanism

• We need to justify the weight update mechanism.

(i)
(i) (i) HM (x (i) )
• Idea: we have wm from the first step as wm+1 = e−y

separate hm (x (i) )
(i) (i) − 2 αm y 1 (i)
hm (x (i) )
===========⇒ wm+1 = wm e
y (i) hm (x (i) )=1−2I (y (i) ̸=hm (x (i) )) (i) (i) − 21 αm αm I (y (i) ̸=hm (x (i) ))
=====================⇒ wm+1 = wm e e
↑
Independent of i and can be ignored

(i) (i) αm I (y (i)

̸=hm (x (i) ))
=⇒ wm+1 = wm e

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 55 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting

4 AdaBoost
Basic idea & algorithm
Loss function & proof
Properties (extra-reading)

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 56 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Exponential loss properties

• In each boosting iteration, assuming we can find h(x; θ m ) whose weighted error is
better than chance.

Hm (x) = 12 [α1 h(x; θ 1 ) + · · · + αm h(x; θ m )]

• Thus, lower exponential loss over training data is guaranteed.

Adopted from [6]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 57 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Training error properties

• Boosting iterations typically decrease the training error of HM (x) over training ex-
amples.

Adopted from [6]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 58 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Training error properties, Cont.

• Training error has to go down exponentially fast if the weighted error of each hm is
strictly better than chance (i.e., ϵm < 0.5)
M p
2 ϵm (1 − ϵm )
Y
Etrain (HM ) ≤
m=1

Adopted from [6]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 59 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Weighted error properties

• Weighted error of each new component classifier tends to increase as a function of

boosting iterations.
PN (i) ¡ (i) (i)
¢
i=1 wm I y ̸= hm (x )
ϵm = PN (i)
i=1 wm

Adopted from [6]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 60 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Test error properties

• Test error can still decrease after training error is flat (even zero).

• But, is it robust to overfitting?

• May easily overfit in the presence of labeling noise or overlap of classes.

Adopted from [6] Adopted from [3]

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 61 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Typical behavior

• Exponential loss goes strictly down.

• Training error of H goes down.

• Weighted error ϵm goes up =⇒ share of votes αm goes down.

• Test error decreases even after a flat training error.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 62 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 63 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Bagging vs. Boosting

Bagging Boosting

Training Strategy Parallel training Sequential training

Bootstrapping Weighted
Data Sampling
(random subsets) (by instance importance)

Dependent
Learners Dependency Independent
(on the previous models)

Varying weights
Learner Weighting Equal weights
(based on importance)

More robust More sensitive

Tolerance to Noise
(due to aggregation) (may overfit to noise)

Reduces bias and variance

Properties Reduces variance
(focus on bias)

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 64 / 67
Introduction Bagging Boosting AdaBoost Comparison References

1 Introduction

2 Bagging

3 Boosting

4 AdaBoost

5 Comparison

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 65 / 67
Introduction Bagging Boosting AdaBoost Comparison References

Contributions

• This slide has been prepared thanks to:

• Nikan Vasei

• Mahan Bayhaghi

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 66 / 67
Introduction Bagging Boosting AdaBoost Comparison References

[1] C. M., Pattern Recognition and Machine Learning.

Information Science and Statistics, New York, NY: Springer, 1 ed., Aug. 2006.
[2] M. Soleymani Baghshah, “Machine learning.” Lecture slides.
[3] R. E. Schapire, “The boosting approach to machine learning: An overview,”
Nonlinear estimation and classification, pp. 149–171, 2003.
[4] L. Serrano, Grokking machine learning.
New York, NY: Manning Publications, Jan. 2022.
[5] T. Mitchell, Machine Learning.
McGraw-Hill series in computer science, New York, NY: McGraw-Hill Professional,
Mar. 1997.
[6] T. Jaakkola, “Machine learning course slides.” Lecture slides.

CE Department (Sharif University of Technology) Machine Learning (CE 40717) October 13, 2024 67 / 67

Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Peso Guideline
No ratings yet
Peso Guideline
16 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Level Questions Chapter 9 Business English, 10th Edition
100% (1)
Level Questions Chapter 9 Business English, 10th Edition
8 pages
Bikaner Zone
No ratings yet
Bikaner Zone
3,170 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Feasibility Analysis Presentation
No ratings yet
Feasibility Analysis Presentation
12 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Unseen Battles
No ratings yet
Unseen Battles
15 pages
Kingdom of Gold (Degeneration of Christians)
No ratings yet
Kingdom of Gold (Degeneration of Christians)
171 pages
Nutrition Education and Counseling
No ratings yet
Nutrition Education and Counseling
14 pages
From The Colonial To The Postcolonial in
No ratings yet
From The Colonial To The Postcolonial in
4 pages
Spektator Issue 6
No ratings yet
Spektator Issue 6
28 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
Bagging and Boosting: Amit Srinet Dave Snyder
No ratings yet
Bagging and Boosting: Amit Srinet Dave Snyder
33 pages
UAE Market Overview
No ratings yet
UAE Market Overview
46 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
Unit 3
No ratings yet
Unit 3
63 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
2023 Survey Hotel
No ratings yet
2023 Survey Hotel
69 pages
FAR 2 Q2 - Sample Problems With Solutions - FOR EMAIL
No ratings yet
FAR 2 Q2 - Sample Problems With Solutions - FOR EMAIL
11 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Referencia Técnica
No ratings yet
Referencia Técnica
25 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
109N132V02
No ratings yet
109N132V02
39 pages
Chapter 3 - Strategic Planning in Retailing
No ratings yet
Chapter 3 - Strategic Planning in Retailing
28 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Voting or Averaging of Predictions of Multiple Pre-Trained Models
No ratings yet
Voting or Averaging of Predictions of Multiple Pre-Trained Models
23 pages
A Critical Analysis On: Child Witness in Legal Proceeding
No ratings yet
A Critical Analysis On: Child Witness in Legal Proceeding
26 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Machine Learning: Lecture 8: Ensemble Methods
No ratings yet
Machine Learning: Lecture 8: Ensemble Methods
28 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Exp 3
No ratings yet
Exp 3
11 pages
Module 6 Comm. Engagement
No ratings yet
Module 6 Comm. Engagement
20 pages
02.steno 2023 T-1
No ratings yet
02.steno 2023 T-1
18 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
Unit 4
No ratings yet
Unit 4
45 pages
U1-Ensemble Methods
No ratings yet
U1-Ensemble Methods
17 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
APP Guidance
No ratings yet
APP Guidance
17 pages
Unit 5 ML
No ratings yet
Unit 5 ML
14 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
ML U3 Notes
No ratings yet
ML U3 Notes
10 pages
Ensemble Learning
No ratings yet
Ensemble Learning
16 pages
Anime Emotion Cards
No ratings yet
Anime Emotion Cards
10 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
24 pages
Bagging Vs Boosting in Machine Learning
No ratings yet
Bagging Vs Boosting in Machine Learning
5 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Discount Rate
No ratings yet
Discount Rate
7 pages
Sea Symphony - A Song For All Ships Unfinished
No ratings yet
Sea Symphony - A Song For All Ships Unfinished
8 pages
Bagging Vs Boosting in Machine Learning - GeeksforGeeks
No ratings yet
Bagging Vs Boosting in Machine Learning - GeeksforGeeks
9 pages
Iii Semester BCom Financial Education and Investment Awareness Module 1 Foundation For Finance Intro
No ratings yet
Iii Semester BCom Financial Education and Investment Awareness Module 1 Foundation For Finance Intro
8 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Bagging Vs Boosting - Javatpoint
No ratings yet
Bagging Vs Boosting - Javatpoint
8 pages
AI25
No ratings yet
AI25
7 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
What Is Ensemble Learning
No ratings yet
What Is Ensemble Learning
4 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
GeM Bidding 7833537
No ratings yet
GeM Bidding 7833537
7 pages
ML Chapter 3
No ratings yet
ML Chapter 3
25 pages
Bagging
No ratings yet
Bagging
7 pages
Application Form NTS 2016
No ratings yet
Application Form NTS 2016
5 pages
Bagging Vs Boosting in Machine Learning
No ratings yet
Bagging Vs Boosting in Machine Learning
4 pages
Mohamad Adil Abdul Rafik
No ratings yet
Mohamad Adil Abdul Rafik
4 pages
ML Exp 9
No ratings yet
ML Exp 9
3 pages
Ensemble Methods Send
No ratings yet
Ensemble Methods Send
20 pages
Probability and Statistics I
No ratings yet
Probability and Statistics I
2 pages
Reaction Paper-WPS Office
No ratings yet
Reaction Paper-WPS Office
2 pages
Admit Card
No ratings yet
Admit Card
2 pages
Signsofthe Timesby Howard Ratcliffe
No ratings yet
Signsofthe Timesby Howard Ratcliffe
7 pages
REPOWER High School STEM: 21st-Century STEM Education Problems Cannot Be Solved With a 19th-Century Academic Structure
From Everand
REPOWER High School STEM: 21st-Century STEM Education Problems Cannot Be Solved With a 19th-Century Academic Structure
Kenneth M Chapman
No ratings yet
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
A Critical Analysis of National Apprenticeship Training Scheme (NATS) and It's Employability on Technical Graduates: A Case Study of the Eastern Region.
From Everand
A Critical Analysis of National Apprenticeship Training Scheme (NATS) and It's Employability on Technical Graduates: A Case Study of the Eastern Region.
DR. S.M. EJAZ AHMAD
No ratings yet