0% found this document useful (0 votes)

0 views

Lecture-10-boosting

The document discusses the concept of boosting, particularly the Adaboost algorithm, which combines weak learners to create a strong classifier by iteratively adjusting weights based on classification errors. It explains the process of initializing weights, fitting predictors, computing error rates, and updating weights to improve classification accuracy. Additionally, it addresses empirical risk control and generalization error bounds, highlighting the trade-off between approximation and estimation errors based on the number of iterations.

Uploaded by

Lê Thị Hồng Hà

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Lecture-10-boosting

Uploaded by

Lê Thị Hồng Hà

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Boosting

Son P. Nguyen

University of Economics and Law

Vietnam National University-HCMC

July 30, 2024

Introduction

▶ First algorithm of “boosting”: Tukey in 1972!

▶ Build a set of rules (predictors) that are then aggregated.
▶ Process recursive: the rule built in step m depends on the one
built in step m − 1.
Introduction
Weak Learner

▶ The term Boosting refers to general methods for producing

precise decisions from weak learner rules.
▶ A rule g that is slightly better than chance is called a weak
learner:
1
∃γ > 0 s.t. P(g (X ) ̸= Y ) = − γ.
2
▶ Some examples of weak learners include:
▶ 1-nearest neighbor (1-nn)
▶ Decision trees with 2 terminal nodes (stumps)
The Adaboost Algorithm
A weak learner g and a number of iterations M.
1. Initialize: Set the weights of each data point:
1
∀i ∈ {1, . . . , n} : wi,1 = .
n
2. For m = 1 to M:
2.1 Fit the predictor gm to the sample Dn weighted by
w1,m , . . . , wn,m .
2.2 Compute the error rate:
Pn
wi,m 1{gm (xi ) ̸= yi }
em = i=1 Pn .
i=1 wi,m

2.3 Compute:
1 − em
αm = ln .
em
2.4 Update weights:

∀i ∈ {1, . . . , n} : wi,m+1 = wi,m exp(αm 1{gm (xi ) ̸= yi }).

The Adaboost Algorithm

Final hypothesis:
M
!
X
G (x) = sign αm gm (x) .
m=1
Comments on the Adaboost Algorithm

▶ Handling Weights in Weak Learners:

▶ If the weak learner cannot incorporate weights directly, the
predictor can be trained on a subsample of Dn where
observations are randomly drawn according to weights
w1,m , . . . , wn,m .
▶ Updating Weights:
▶ The weights w1,m , . . . , wn,m are updated after each iteration:
▶ If the i-th individual is correctly classified, its weight remains
unchanged.
▶ If the i-th individual is misclassified, its weight is increased.
▶ Weight of the Rule αm :
▶ The weight αm of the rule gm increases with its performance
on Dn :
▶ αm increases as the error rate em decreases.
▶ The rule must not be ”too weak”: if em > 0.5, then αm < 0.
Classification

Goal: create a good classifier by combining several weak classifiers

▶ a weak classifier is a classifier which is able to produce results
only slightly better than a random guess
▶ Idea: apply repeatedly (iteratively ) a weak classifier to
modifications of the data
▶ at each iteration, give more weight to the misclassified
observations.
An example

Initially all examples are equally important

An example

Initially all examples are equally important.

h1 = The best classifier on this data.
An example

Initially all examples are equally important.

h1 = The best classifier on this data.
Clearly there are mistakes. Error ϵ1 = 0.3
An example
Initially all examples are equally important.
h1 = The best classifier on this data.
Clearly there are mistakes. Error ϵ1 = 0.3

For the next round, increase the importance of the examples with
mistakes and down-weight the examples that h1 got correctly
An example

Dt = Set of weights at round t, one for each example. Think

“How much should the weak learner care about this example in its
choice of the classifier ?”
h1 A classifier learned on this data. Has an error ϵ2 = 0.21
Why not 0.3 ? Because while computing error, we will weight each
example xi by its Dt (i).
An example

m
!
1 1 X
ϵt = − Dt (i)yi h(xi )
2 2
i=1

Why is this a reasonable definition ?

Consider two cases
1. When y ̸= h(x), we have yi h(xi ) = −1
2. When y = h(x), we have yi h(xi ) = 1
Therefore, ϵt is in fact
X
ϵt = Dt (i)
yi ̸=h(xi )

Represents the total error, but each example only contributes to

the extent that it is important.
An example

Dt = Set of weights at round t, one for each example. Think

“How much should the weak learner care about this example in its
choice of the classifier?”
h2 = A classifier learned on this data. Has an error ϵ = 0.21
For the next round, increase the importance of the mistakes and
down-weight the examples that h2 got correctly
An example

m
!
1 1 X
ϵt = − Dt (i)yi h(xi )
2 2
i=1

h2 = A classifier learned on this data. Has an error ϵ3 = 0.14.

An example

The final hypothesis is a combination of all the hi ’s we have seen

so far

Think of the α values as the vote for each weak classifier and the
boosting algorithm has to somehow specify them
An outline of boosting

Given a training set (x1 , y1 ), . . . , (xm , ym ).

▶ Instances xi ∈ X labeled with yi ∈ {−1, 1}
For t = 1, 2, . . . , T :
▶ Construct a distribution Dt on {1, 2, . . . , m}.
▶ Find a weak hypothesis (rule of thumb) h. such that it has a
small weighted error ϵ.
Construct a final output Hfinal .
Empirical Risk Control (Empirical Error)
▶ Error Rate em :
▶ em refers to the error rate of the weak learner ĝm on the
dataset D1n : Pn
wi I{gm (xi ) ̸= yi }
em = i=1 Pn
i=1 wi
▶ Gain over Pure Chance (γm ):
▶ γm measures the improvement of gm over random guessing:

1
em = − γm
2
▶ Empirical Risk Bound:
▶ The empirical risk, Rn (ĝ ), decreases with more iterations:

M
!
X
2
Rn (ĝ ) ≤ exp −2 γm
m=1

The empirical risk tends to 0 when the number of iterations

increases.
Risk Control (Generalization Error)

▶ Generalization Error Bound:

▶ The generalization error R(ĝ ) is bounded as follows:
r !
MV
R(ĝ ) ≤ Rn (ĝ ) + O
n

▶ Here, V denotes the Vapnik-Chervonenkis (VC) dimension,

and n is the number of samples.
▶ The bias/variance (approximation/estimation error) trade-off
is regulated by the number of iterations M:
▶ Small M: The first term (approximation error) dominates.
▶ Large M: The second term (estimation error) dominates.
▶ When M is very large, Adaboost overfits

2017-11-11 Infeed Group For XZB380
No ratings yet
2017-11-11 Infeed Group For XZB380
19 pages
LectureNotes7
No ratings yet
LectureNotes7
8 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
Adaboost Matas
No ratings yet
Adaboost Matas
136 pages
Boosting Mit
No ratings yet
Boosting Mit
36 pages
Boosting
No ratings yet
Boosting
11 pages
CS229 Supplemental Lecture Notes: 1 Boosting
No ratings yet
CS229 Supplemental Lecture Notes: 1 Boosting
11 pages
Artificial Intelligence Fundamentals: Learning: Boosting
No ratings yet
Artificial Intelligence Fundamentals: Learning: Boosting
24 pages
Boosting and AdaBoost For Machine Learning
No ratings yet
Boosting and AdaBoost For Machine Learning
18 pages
ENG6500 7 Ensembles Boosting
No ratings yet
ENG6500 7 Ensembles Boosting
49 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
Introduction To Boosting - 2
No ratings yet
Introduction To Boosting - 2
79 pages
Boosting and Applications Yuan
No ratings yet
Boosting and Applications Yuan
41 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
ML 14 Boosting
No ratings yet
ML 14 Boosting
57 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Boosting
No ratings yet
Boosting
13 pages
1 Eric Boosting304FinalRpdf
No ratings yet
1 Eric Boosting304FinalRpdf
19 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
No ratings yet
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
29 pages
Lecture18 Boosting
No ratings yet
Lecture18 Boosting
21 pages
Ada Boost
No ratings yet
Ada Boost
25 pages
Lect4 Log Reg
No ratings yet
Lect4 Log Reg
20 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Adaboost: Derek Hoiem March 31, 2004
No ratings yet
Adaboost: Derek Hoiem March 31, 2004
46 pages
_LECTURE+NOTES_Boosting
No ratings yet
_LECTURE+NOTES_Boosting
8 pages
16 Boosting
No ratings yet
16 Boosting
7 pages
ML Minors Exp8
No ratings yet
ML Minors Exp8
6 pages
chapter 3- boosting theory
No ratings yet
chapter 3- boosting theory
7 pages
ensemble
No ratings yet
ensemble
33 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Lec13 PDF
No ratings yet
Lec13 PDF
10 pages
Survey - Gradient Boosting Machine
No ratings yet
Survey - Gradient Boosting Machine
9 pages
Lec5 Boosting v2.7 1
No ratings yet
Lec5 Boosting v2.7 1
46 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Boosting Approach To Machine Learn
No ratings yet
Boosting Approach To Machine Learn
23 pages
ml1_Lab_6
No ratings yet
ml1_Lab_6
5 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
ML8Ensembles (1)
No ratings yet
ML8Ensembles (1)
31 pages
Zhu - Multiclass Adaboost2009 PDF
No ratings yet
Zhu - Multiclass Adaboost2009 PDF
12 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
Machine Learning: Lecture 8: Ensemble Methods
No ratings yet
Machine Learning: Lecture 8: Ensemble Methods
28 pages
addaboost
No ratings yet
addaboost
12 pages
Ada Boost
No ratings yet
Ada Boost
22 pages
Adaboost Algorithm
No ratings yet
Adaboost Algorithm
17 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Statistics Project
No ratings yet
Statistics Project
5 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
Module3
No ratings yet
Module3
26 pages
sol3_2016
No ratings yet
sol3_2016
8 pages
09_EnsembleLearning
No ratings yet
09_EnsembleLearning
36 pages
05_optimization_basics
No ratings yet
05_optimization_basics
94 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Cheatsheet
No ratings yet
Cheatsheet
2 pages
21 - Advanced AppLayer
No ratings yet
21 - Advanced AppLayer
14 pages
05 - AppLayer Overview
No ratings yet
05 - AppLayer Overview
14 pages
BTVN Buổi 2 Solution
No ratings yet
BTVN Buổi 2 Solution
6 pages
Important Questions in Computational Biology
No ratings yet
Important Questions in Computational Biology
1 page
Iot Lab Manual
No ratings yet
Iot Lab Manual
18 pages
Mech of Materials
No ratings yet
Mech of Materials
34 pages
1997ImplementationICRIinMexico PDF
No ratings yet
1997ImplementationICRIinMexico PDF
276 pages
Project Plan
No ratings yet
Project Plan
8 pages
Middle East Companies
100% (3)
Middle East Companies
112 pages
Xitron Raster Blaster Pro 4.0 B3 - 2017: Graphic Mediation Europe V.O.F. Netherlands
No ratings yet
Xitron Raster Blaster Pro 4.0 B3 - 2017: Graphic Mediation Europe V.O.F. Netherlands
1 page
Change of Switching Authority Via Function Keys
No ratings yet
Change of Switching Authority Via Function Keys
6 pages
Sony Imx415 Datasheet
No ratings yet
Sony Imx415 Datasheet
101 pages
Black Dog Institute Online Clinic Assessment Report
No ratings yet
Black Dog Institute Online Clinic Assessment Report
7 pages
Problem Solving and Creativity
No ratings yet
Problem Solving and Creativity
40 pages
Tutorial 2 - Sol
No ratings yet
Tutorial 2 - Sol
2 pages
AiN 2-2014 Zwierzynska
No ratings yet
AiN 2-2014 Zwierzynska
14 pages
Innovation and Technology Transfer For B
No ratings yet
Innovation and Technology Transfer For B
6 pages
Cristinnata LH - QAQC of Mine Geology Sampling
100% (1)
Cristinnata LH - QAQC of Mine Geology Sampling
35 pages
2001 Nissan X Trail 58
No ratings yet
2001 Nissan X Trail 58
76 pages
NDY LifeCycle10
No ratings yet
NDY LifeCycle10
28 pages
Mcqs PPSC
No ratings yet
Mcqs PPSC
4 pages
Cal Poly Resume 1
No ratings yet
Cal Poly Resume 1
1 page
2024 Gedik
No ratings yet
2024 Gedik
8 pages
Plant Layout: End Term Jury
No ratings yet
Plant Layout: End Term Jury
16 pages
Mosier Digital Citizenship 42slides
No ratings yet
Mosier Digital Citizenship 42slides
42 pages
Data Academy - Data Science Basics
No ratings yet
Data Academy - Data Science Basics
114 pages
MNR School OF Excellence: "Parking Management System"
No ratings yet
MNR School OF Excellence: "Parking Management System"
23 pages
Association of Autonomous Astronauts Zine
100% (1)
Association of Autonomous Astronauts Zine
44 pages
04 - Catalogo CNC PDF
No ratings yet
04 - Catalogo CNC PDF
32 pages
SM2 L6 LanguageTestB U1
No ratings yet
SM2 L6 LanguageTestB U1
2 pages
Tattersall Mason 2015
No ratings yet
Tattersall Mason 2015
718 pages
Fall 2021 DS5003 Assignment # 1
No ratings yet
Fall 2021 DS5003 Assignment # 1
4 pages