0% found this document useful (0 votes)

57 views5 pages

Boosting: I I I I

Boosting is an algorithm that combines weak learners into a strong learner. AdaBoost is an example boosting algorithm. It works by iteratively updating weights on training examples to focus on incorrectly classified examples. AdaBoost is proven to exponentially decrease the training error over iterations. While the training error decreases quickly, the generalization error depends on the number of iterations and complexity of the weak learners. Margin theory can be used to relate the generalization error to the empirical margin, providing guarantees that do not depend on the number of iterations.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views5 pages

Boosting: I I I I

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Boosting

(Following Mohri, Rostamizadeh and Talwalkar.)

Let Zi = (Xi , Yi ) where Yi ∈ {−1, +1}. Boosting is a way to combine weak classifers into
a better classifier. We make the weak learning assumption: for some γ > 0 we have an
algorithm returns h ∈ H such that, for all P ,
P (R(h) ≤ 1/2 − γ) ≥ 1 − δ
where γ > 0 is the edge.

Let us recall the AdaBboost algorithm:

1. Set D1 (i) = 1/n for i = 1, . . . , n.

2. Repeat for t = 1, . . . , T :
(a) Let ht = argminh∈H PDt (Yi 6= h(Xi )).
(b) t = PDt (Yi 6= ht (Xi )).
(c) αt = (1/2) log((1 − t )/t ).
(d) Let
Dt (i)e−Yi αt ht (Xi )
Dt+1 (i) =
Zt
where Zt is a normalizing constant.
P
3. Set g(x) = t αt ht (x).
4. Return h(x) = signg(x).

Training Error. Now we show that the training error decreases exponentially fast.

Lemma 1 We have p
Zt = 2 t (1 − t ).

P
Proof. Since Dt (i) = 1 we have
i
X X X
Zt = Dt (i)e−αt Yi ht (Xi ) = Dt (i)e−αt + Dt (i)eαt
i Yi ht (Xi )=1 Yi ht (Xi )=−1
−αt αt
p
= (1 − t )e + t e = 2 t (1 − t ).
since αt = (1/2) log((1 − t )/t ).

1
Theorem 2 Suppose that γ ≤ (1/2) − t for all t. Then
2
R(h)
b ≤ e−2γ T .

Hence, the training error goes to 0 quickly.

Proof. Recall that D1 (i) = 1/n. So

Dt (i)e−αt Yi ht (Xi ) Dt−1 (i)e−αt−1 Yi ht−1 (Xi ) e−αt Yi ht (Xi )
Dt+1 (i) = =
Zt Zt Zt−1
P
−Yi t αt ht (Xi ) −Yi g(Xi )
e e
= ··· = Q = Q
n t Zt n t Zt
which implies that Y
e−Yi g(Xi ) = nDT +1 (i) Zt . (1)
t

Since I(u ≤ 0) ≤ e−u we have

T
1X 1 X −Yi g(Xi ) 1X Y Y
R(h) =
b I(Yi g(Xi ) ≤ 0) ≤ e = n( Zt )DT +1 (i) = Zt
n i n i n i t t=1
Y p Yp
= 2 t (1 − t ) = 1 − 4(1/2 − t )2

t t
)2
Y
≤ e−2(1/2−t since 1 − x ≤ e−x
t
−2 t (1/2−t )2 2
P
=e ≤ e−2γ T .

Generalization Error. The training error gets small very quickly. But how well do we do
in terms of prediction error?

Let ( )
X
F= sign( αt ht ) : αt ∈ R, ht ∈ H .
t

For fixed h = (h1 , . . . , hT ) this is just a set of linear classifiers which has VC dimension T .
So the shattering number is
en T
.
T
If H is finite then the shattering number is
en T
.|H|T .
T
2
If H is infinite but has VC dimension d then the shattering number is bounded by
en T en dT
nT d .
T d
By the VC theorem, with probability at least 1 − δ,
r
T d log n
R(bh) ≤ R(h)
b + .
n
Unfortunately this depends on T . We can fix this using margin theory.

P
Margins. Consider the classifier h(x) = sign(g(x)) where g(x) = t αt ht (x). The classifier
is unchanged if we multiply g by a scalar. In particular, we can replace g with ge = g/||α||1 .
This form of the classifier is a convex combination of the ht ’s.
P
We define the margin at x of g = t αt ht by
yg(x)
ρ(x) = = ye
g (x).
||α||1
Think of |ρ(x)| as our confidence in classifying x. The margin of g is defined to be
Yi g(Xi )
ρ = min ρ(Xi ) = min .
i i ||α||1
Note that ρ ∈ [−1, 1].

To proceed we need to review Radamacher complexity. Given a class of functions F with

−1 ≤ f (x) ≤ 1 we define
" #
1 X
Rn (F) = Eσ sup σi f (Zi )
f ∈F n i

where P (σi = 1) = P (σi = −1) = 1/2. If H is finite then

r
2 log |H|
Rn (H) ≤ .
n
If H has VC dimension d then
r
2d log(en/d)
Rn (H) ≤ .
n
We will need the following two facts. First,

Rn (conv(H)) = Rn (H)

3
where conv(H) is the convex hull of H. Second, if

|φ(x) − φ(y)| ≤ L||x − y||

for all x, y then

Rn (φ ◦ F) ≤ LRn (F).
The set of margin functions is

M = {yf (x) : f ∈ conv(H)}.

We then have
Rn (M) = Rn (conv(H)) = Rn (H).
A key result is that, with probability at least 1 − δ, for all f ∈ F,
r
1X 2 log(1/δ)
E[f (Z)] ≤ f (Zi ) + 2Rn (F) + . (2)
n i n

Now fix a number ρ and define the margin-sensitive loss function


1
 u≤0
u
φ(u) = 1 − ρ 0 ≤ ρ

0 u ≥ ρ.


Note that
I(u ≤ 0) ≤ φ(u) ≤ I(u ≤ ρ).
Assume that H has VC dimension d. Then
r
1 2d log(en/d)
Rn (φ ◦ M) ≤ LRn (M) ≤ LRn (H) ≤ .
ρ n

Now define the empirical margin sensitive loss of a classifer f by

bρ = 1
X
R I(Yi f (Xi ) ≤ ρ).
n i

Theorem 3 With probability at least 1 − δ,

r r
1 2d log(en/d) 2 log(2/δ)
R(g) ≤ Rbρ (g/||α||1 ) ≤ + .
ρ n n

4
Proof. Recall that I(u ≤ 0) ≤ φ(u) ≤ I(u ≤ ρ). Also recall that g and ge = g/||α||1 are
equivalent classifiers. Then using (2) we have
r
1X 2 log(2/δ)
R(g) = R(e g ) = P (Y ge(X) ≤ 0) ≤ φ(Yi ge(Xi )) + 2Rn (φ ◦ M) +
n i n
r r
1X 1 2d log(en/d) 2 log(2/δ)
≤ φ(Yi ge(Xi )) + +
n i ρ n n
r r
=R bρ (g/||α||1 ) + 1 2d log(en/d) + 2 log(2/δ) .
ρ n n

Next we bound R
bρ (g/||α||1 ).

Theorem 4 We have
T q
Y
bρ (g/||α||1 ) ≤
R 41−ρ
t (1 − t )1+ρ .
t=1

Proof. Since φ(u) ≤ I(u ≤ ρ) we have

bρ (g/||α||1 ) ≤ 1
X
R I(Yi g(Xi ) − ρ||α||1 ≤ 0)
n i
1 X −Yi g(Xi )
≤ eρ||α||1 e
n i
1X Y Y
= eρ||α||1 nDT +1 (i) Zt = eρ||α||1 Zt
n i t t
YT q
= 41−ρ
t (1 − t )1+ρ
t=1
p
since Zt = 2 t (1 − t ) and αt = (1/2) log((1 − t )/t ).
q
Assuming γ ≤ (1/2 − t ) and ρ < γ then it can be shown that 41−ρ t (1 − t )1+ρ ≡ b < 1.
bρ (g/||α||1 ) ≤ bT . Combining with the previous result we have, with probability at least
So R
1 − δ, r r
1 2d log(en/d) 2 log(2/δ)
R(g) ≤ bT + + .
ρ n n
This shows that we get small error even with T large (unlike the earlier bound based only
on VC theory).

36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
Practising Certificate Application Form
67% (3)
Practising Certificate Application Form
1 page
T R Ik-Cl Ervor Er Kis: (Example)
No ratings yet
T R Ik-Cl Ervor Er Kis: (Example)
122 pages
MLB HA 6 Answers Final
No ratings yet
MLB HA 6 Answers Final
13 pages
Sol3 2016
No ratings yet
Sol3 2016
8 pages
Foundations of Machine Learning: Boosting
No ratings yet
Foundations of Machine Learning: Boosting
41 pages
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
Foundations of Machine Learning: Courant Institute and Google Research
No ratings yet
Foundations of Machine Learning: Courant Institute and Google Research
42 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
100 pages
Lect4 Log Reg
No ratings yet
Lect4 Log Reg
20 pages
Kakade S. Tewari A. - Topics in Artificial Intelligence (Learning Theory)
No ratings yet
Kakade S. Tewari A. - Topics in Artificial Intelligence (Learning Theory)
68 pages
Lecture 1
No ratings yet
Lecture 1
4 pages
05 Optimization Basics
No ratings yet
05 Optimization Basics
94 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Lecture Notes 7
No ratings yet
Lecture Notes 7
8 pages
Selected Theoretical Aspects of ML and Deep Learning
No ratings yet
Selected Theoretical Aspects of ML and Deep Learning
46 pages
Convexity: 18.657: Mathematics of Machine Learning
No ratings yet
Convexity: 18.657: Mathematics of Machine Learning
6 pages
16 Boosting
No ratings yet
16 Boosting
7 pages
Boosting
No ratings yet
Boosting
11 pages
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
No ratings yet
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
5 pages
CS229 Supplemental Lecture Notes: 1 Boosting
No ratings yet
CS229 Supplemental Lecture Notes: 1 Boosting
11 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Convexity, Classification, and Risk Bounds
No ratings yet
Convexity, Classification, and Risk Bounds
36 pages
An Adventure of Epic Porpoises
No ratings yet
An Adventure of Epic Porpoises
174 pages
Artificial Intelligence Fundamentals: Learning: Boosting
No ratings yet
Artificial Intelligence Fundamentals: Learning: Boosting
24 pages
Lec10 PDF
No ratings yet
Lec10 PDF
8 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Support Vector Machines: 1 What's SVM
No ratings yet
Support Vector Machines: 1 What's SVM
25 pages
09 EnsembleLearning
No ratings yet
09 EnsembleLearning
36 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
Support Vector Machines: CS229 Lecture Notes
100% (2)
Support Vector Machines: CS229 Lecture Notes
25 pages
CH 1
No ratings yet
CH 1
24 pages
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
No ratings yet
Online Learning: 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
33 pages
IML Summary
No ratings yet
IML Summary
12 pages
DM - Lecture 4
No ratings yet
DM - Lecture 4
65 pages
Support Vector Machines
No ratings yet
Support Vector Machines
25 pages
Perceptron
No ratings yet
Perceptron
23 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Uncertainty Notes
No ratings yet
Uncertainty Notes
166 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
Support Vector Machines: CS229 Lecture Notes
No ratings yet
Support Vector Machines: CS229 Lecture Notes
25 pages
Perceptron Learning Algorithm Lecture Supplement
No ratings yet
Perceptron Learning Algorithm Lecture Supplement
6 pages
Final F03soln
No ratings yet
Final F03soln
10 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
A Simple Proof of AdaBoost Algorithm
No ratings yet
A Simple Proof of AdaBoost Algorithm
4 pages
Support Vector Machines: CS229 Lecture Notes
No ratings yet
Support Vector Machines: CS229 Lecture Notes
25 pages
SML Lecture5
No ratings yet
SML Lecture5
45 pages
Prediction Errors Tech Report
No ratings yet
Prediction Errors Tech Report
9 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
119 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Ds 2
No ratings yet
Ds 2
27 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
08 Classification
No ratings yet
08 Classification
46 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
Review Materials 0 8 1
No ratings yet
Review Materials 0 8 1
140 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
Online Learning: T T T T T T T T
No ratings yet
Online Learning: T T T T T T T T
8 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
10/36-702 Statistical Machine Learning Homework #2 Solutions
No ratings yet
10/36-702 Statistical Machine Learning Homework #2 Solutions
11 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
HW7
No ratings yet
HW7
1 page
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
1 Review
No ratings yet
1 Review
7 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
AY23-24 CSE TT Time Table - Template - Final
No ratings yet
AY23-24 CSE TT Time Table - Template - Final
2 pages
1 Introduction To Linguistics
No ratings yet
1 Introduction To Linguistics
4 pages
NV Operating Systems UNITIII
No ratings yet
NV Operating Systems UNITIII
61 pages
ARM Cortex Portfolio - Public Version - 2113
No ratings yet
ARM Cortex Portfolio - Public Version - 2113
5 pages
WISDOM Installation
No ratings yet
WISDOM Installation
14 pages
Inst Requirements novAA800series
No ratings yet
Inst Requirements novAA800series
15 pages
F650man I
No ratings yet
F650man I
553 pages
Debugging 9
No ratings yet
Debugging 9
16 pages
Chapter-1. Introduction To Communication Systems:-: (April-2010) (07) (2.1 & 2.2)
No ratings yet
Chapter-1. Introduction To Communication Systems:-: (April-2010) (07) (2.1 & 2.2)
6 pages
Sarthak Bhutani - SDE2 - 3.5YOE - Resume PDF
No ratings yet
Sarthak Bhutani - SDE2 - 3.5YOE - Resume PDF
1 page
Chapter 4 Supply Management Integration For Competitive Advantage
No ratings yet
Chapter 4 Supply Management Integration For Competitive Advantage
81 pages
SPM 1119 Essay Samples
No ratings yet
SPM 1119 Essay Samples
4 pages
Conexion Opc Server S7-1200 OPC SIMATIC-NET TIA-Portal e
No ratings yet
Conexion Opc Server S7-1200 OPC SIMATIC-NET TIA-Portal e
36 pages
SAP MDG Questions and Answers - Ambikeya
No ratings yet
SAP MDG Questions and Answers - Ambikeya
17 pages
Microwave Power Dividers and Couplers Primer
No ratings yet
Microwave Power Dividers and Couplers Primer
8 pages
Norzulsuriana Binti Yahaya Yahaya KBR Tvmdol LGK Flight - Originating
No ratings yet
Norzulsuriana Binti Yahaya Yahaya KBR Tvmdol LGK Flight - Originating
2 pages
Mainboard ESC Model P4M800PRO M
No ratings yet
Mainboard ESC Model P4M800PRO M
29 pages
Audio Compression
No ratings yet
Audio Compression
53 pages
Introduction To BOL Programming
No ratings yet
Introduction To BOL Programming
17 pages
p139 Data Mining Mafia
No ratings yet
p139 Data Mining Mafia
13 pages
Loss Prevention Bulletin Vol.44 Light
100% (1)
Loss Prevention Bulletin Vol.44 Light
32 pages
Loan Risk Prediction Using User Transaction Information
No ratings yet
Loan Risk Prediction Using User Transaction Information
3 pages
Unit 4
No ratings yet
Unit 4
16 pages
Quotation: Shenzhen Manridy Technology Co., LTD
No ratings yet
Quotation: Shenzhen Manridy Technology Co., LTD
2 pages
MA-K27468-KW Oil Analysis Solutions Iss9 Small
No ratings yet
MA-K27468-KW Oil Analysis Solutions Iss9 Small
8 pages
OrangePi 5 Plus RK3588 User-Manual v1.5.1
100% (1)
OrangePi 5 Plus RK3588 User-Manual v1.5.1
531 pages
Azure SQL Database & SQL DW: Data Platform Airlift
No ratings yet
Azure SQL Database & SQL DW: Data Platform Airlift
46 pages
Functionality Doc Fleet Management App Version1
No ratings yet
Functionality Doc Fleet Management App Version1
15 pages
04 - Harris, Ives, Junglas - 2012 - IT Consumerization When Gadgets Turn Into Enterprise IT Tools, MIS Quarterly Executive
No ratings yet
04 - Harris, Ives, Junglas - 2012 - IT Consumerization When Gadgets Turn Into Enterprise IT Tools, MIS Quarterly Executive
14 pages

Boosting: I I I I

Uploaded by

Boosting: I I I I

Uploaded by

Boosting

(Following Mohri, Rostamizadeh and Talwalkar.)

Let us recall the AdaBboost algorithm:

1. Set D1 (i) = 1/n for i = 1, . . . , n.

Hence, the training error goes to 0 quickly.

Proof. Recall that D1 (i) = 1/n. So

Since I(u ≤ 0) ≤ e−u we have

To proceed we need to review Radamacher complexity. Given a class of functions F with

where P (σi = 1) = P (σi = −1) = 1/2. If H is finite then

|φ(x) − φ(y)| ≤ L||x − y||

for all x, y then

M = {yf (x) : f ∈ conv(H)}.

Now fix a number ρ and define the margin-sensitive loss function

Now define the empirical margin sensitive loss of a classifer f by

Theorem 3 With probability at least 1 − δ,

Proof. Since φ(u) ≤ I(u ≤ ρ) we have

You might also like