0% found this document useful (0 votes)

107 views12 pages

Learning From Uniform Convergence

1. The document discusses uniform convergence and its implications for PAC learnability. 2. A hypothesis class has the uniform convergence property if the empirical risks of all hypotheses converge uniformly to the true risks as the sample size increases. 3. If a class has uniform convergence, then it is agnostically PAC learnable via empirical risk minimization, with sample complexity that depends on the uniform convergence rate.

Uploaded by

Emerson Vero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views12 pages

Learning From Uniform Convergence

Uploaded by

Emerson Vero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Learning from Uniform Convergence

Machine Learning 2021

UML Book Chapter 4
Slides P. Zanuttigh (some material from F. Vandin slides)
Empirical and True Risk
Learning algorithm:
❑ Receive a training set S
❑ Evaluate the error of each possible ℎ ∈ ℋ on S and
select the one with lowest empirical error ℎ∗
❑ Is ℎ∗ ∈ ℋ minimizing the empirical error on S also
minimizing the true error on D ?

It suffices to ensure that the empirical error of

all ℎ ∈ ℋ is a good approximation of their true error
( i.e., 𝐿𝑆 ℎ similar to 𝐿𝐷 ℎ , ∀ℎ )
Notice: sufficient not necessary condition
ε-Representative Set
Idea: focus on when the empirical risks (errors) of all members of ℋ are good
approximations of their true risk

Definition (ε-representative)
A training set S is called ε-representative (w.r.t. domain Z, hypothesis class ℋ,
loss function ℓ, and distribution D) if
∀ℎ ∈ ℋ: |𝐿𝑆 ℎ − 𝐿𝐷 (ℎ)| ≤ 𝜖

Theorem:
𝜖
Assume that training set S is -representative (w.r.t. domain Z, hypothesis
2
class ℋ, loss function ℓ, distribution D). Then, any output of 𝐸𝑅𝑀ℋ 𝑆 (i.e.,
any hs ∈ 𝑎𝑟𝑔min 𝐿𝑆 ℎ ) satisfies:
ℎ∈ℋ
𝐿𝐷 (ℎ𝑆 ) ≤ min 𝐿𝐷 ℎ + 𝜖
ℎ∈ℋ

Consequence: if with probability at least 1-δ, a random training set S is

ε-representative then the ERM rule is an agnostic PAC learner
Demonstration
Proof of the theorem:
𝜖 𝜖
1. ε/2−representative : ∀ℎ ∈ ℋ: |𝐿𝑆 ℎ𝑠 − 𝐿𝐷 ℎ𝑠 | ≤ → 𝐿𝐷 ℎ𝑠 ≤ 𝐿𝑆 ℎ𝑠 +
2 2
𝜖
2. ℎ𝑠 ERM predictor: ∀ℎ ∈ ℋ: 𝐿𝑆 ℎ𝑠 ≤ 𝐿𝑆 ℎ → 𝐿𝐷 ℎ𝑠 ≤ 𝐿𝑆 ℎ +
2
𝜖 𝜖
3. ε/2−representative : ∀ℎ ∈ ℋ: |𝐿𝑆 ℎ − 𝐿𝐷 ℎ | ≤ → 𝐿𝑆 ℎ ≤ 𝐿𝐷 ℎ +
2 2

Combine together:

𝜖 𝜖 𝜖 𝜖
𝐿𝐷 ℎ𝑠 ≤ 𝐿𝑆 ℎ𝑠 + ≤ 𝐿𝑆 ℎ + ≤ 𝐿𝐷 ℎ + +
2 2 2 2
⇓
𝐿𝐷 ℎ𝑠 ≤ 𝐿𝐷 ℎ + 𝜖
Uniform Convergence

Same m for all h and all D

Definition (uniform convergence):

An hypothesis class ℋ has the uniform convergence property

w.r.t. to a domain Z and a loss function ℓ if there exist a
𝑈𝐶
function 𝑚ℋ : 0,1 2 → ℕ such that for every 𝜖, 𝛿 ∈ 0,1
and for every probability distribution D over Z, if S is a set of
𝑈𝐶
𝑚 ≥ 𝑚ℋ (𝜖, 𝛿) i.i.d examples drawn from D, then with
probability ≥ 1 − 𝛿 , S is 𝜖-representative
Uniform Convergence
and PAC Learnability
If a class ℋ has the uniform convergence property with a function
𝑈𝐶
𝑚ℋ then:

1. The class is agnostically PAC learnable with sample complexity

𝑈𝐶 𝜖
𝑚ℋ 𝜖, 𝛿 ≤ 𝑚ℋ ( , 𝛿)
2
2. The 𝐸𝑅𝑀ℋ paradigm is a successful agnostic PAC learner for ℋ

• Demonstration follows from the previous theorem and the

definition of uniform convergence
𝜖
• Recall that the theorem requires an − 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒 set to
2
achieve an accuracy of 𝜖
Finite Classes are
Agnostic PAC Learnable
Proposition:
Let ℋ be a finite hypothesis class, let Z be a domain and let
ℓ: ℋ𝑥𝑍 → [0,1] be a loss function. Then:
• ℋ enjoys the uniform convergence property with sample
complexity
2ℋ
log
𝑈𝐶 𝛿
𝑚ℋ 𝜖, 𝛿 ≤
2𝜖 2
• ℋ is agnostic PAC learnable using the ERM algorithm with sample
complexity 𝜖
Need to be
2ℋ − 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒
2log 2
𝑈𝐶 𝜖 𝛿
𝑚ℋ 𝜖, 𝛿 ≤ 𝑚ℋ ,𝛿 ≤
2 𝜖2
Proof not part of the course, basic idea: first prove that uniform convergence holds for a finite
hypothesis class, then use previous result on uniform convergence and PAC learnability
Discretization Trick

Note: In many real world applications we consider hypothesis classes

determined by a set of parameters in ℝ

❑ Assume an hypothesis class determined by d real number parameters

❑ In principle the hypothesis class is of infinite size, but…
❑ … in practice we use a computer: e.g., real numbers represented with 64 bits
double precision variables
❑ For d parameters ℋ = 264𝑑 -> ℋ is large but finite
264𝑑
𝑈𝐶 𝜖 2log(2 )
𝛿
❑ Sample complexity bounded by 𝑚ℋ 𝜖, 𝛿 ≤ 𝑚ℋ ( , 𝛿) ≤
2 𝜖2
o Check book recalling that 𝑙𝑜𝑔 = log 𝑒 ≠ log 2 !
❑ Issue: the bound depends on the chosen number representation
Demonstration (1)
1. Uniform convergence (UC): with probability ≥ 1 − 𝛿, S is 𝜖 − 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑣𝑒:
𝐷 𝑚 𝑆: ∀ℎ ∈ ℋ, |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | ≤ 𝜖 ≥ 1 − 𝛿

2. Rewrite focusing on the probability of not having UC:

𝑃𝑏𝑎𝑑 = 𝐷 𝑚 𝑆: ∃ℎ ∈ ℋ, |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 ≤𝛿
error in the book!

3. Rewrite set 𝑆: ∃ℎ ∈ ℋ, |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 as the union over h:

𝑆: ∃ℎ ∈ ℋ, |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 = ራ 𝑆: |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖
ℎ∈ℋ

4. Apply union bound:

Demonstration not part of the course

Demonstration (2)
Consider:

𝐷𝑚 𝑆: ∃ℎ ∈ ℋ, |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 = 𝐷𝑚 ራ 𝑆: |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 ≤ ෍ 𝐷𝑚 𝑆: |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖

ℎ∈ℋ ℎ∈ℋ

❑ Next step: demonstrate that for any fixed hypotheses h the difference
|𝐿𝑠 ℎ − 𝐿𝐷 ℎ | is likely to be small
❑ Notice that 𝐿𝐷 ℎ is the expectation and 𝐿𝑠 ℎ the average value: the
random variable should not deviate too much from its expectation
❑ INTUITIVE IDEA from law of large numbers: if m is large the average
converges to the expectation

Demonstration not part of the course

Demonstration (3)

❑ Apply Hoeffding inequality to our case (assuming [0,1] interval):

𝑚
1 2
𝐷𝑚 𝑆: |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 =𝑃 ෍ 𝜃𝑖 − 𝜇 > 𝜖 ≤ 2𝑒 −2𝑚𝜖
𝑚
𝑖=1
❑ Apply to the sum over h:
2
෍ 𝐷𝑚 𝑆: |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 ≤ |ℋ|2𝑒 −2𝑚𝜖
ℎ∈ℋ
❑ Finally: we already demonstrated that purple part is smaller or equal than red, need to find m for
which it is smaller or equal than 𝛿 :
2ℋ
o force red part to be smaller than 𝛿 ⇒ 𝑚 ≥ log( )/2𝜖 2
𝛿
Demonstration not part of the course
Demonstration (4)
2ℋ
For 𝑚 ≥ log( )/2𝜖 2 it holds that σℎ∈ℋ 𝐷𝑚 𝑆: |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 ≤𝛿
𝛿
(demonstrated in previous slide)

Consequence: any finite hypothesis class has uniform convergence property

𝑈𝐶 2ℋ
with sample complexity 𝑚ℋ ≤ log( )/2𝜖 2
𝛿

From theorem* (uniform convergence implies PAC learnable): ℋ is PAC learnable with
𝑈𝐶 𝜖 2ℋ
sample complexity 𝑚ℋ 𝜖, 𝛿 ≤ 𝑚ℋ (2 , 𝛿) ≤ 2log( )/𝜖 2
𝛿

(*) recall:

Demonstration not part of the course

MATH 499 Homework 2
100% (3)
MATH 499 Homework 2
2 pages
Test Method - Alum
100% (4)
Test Method - Alum
9 pages
Understanding Machine Learning Solution Manual: 2 Gentle Start
No ratings yet
Understanding Machine Learning Solution Manual: 2 Gentle Start
67 pages
Litere. Teme Licenta Engleza
No ratings yet
Litere. Teme Licenta Engleza
4 pages
Sense and Sensibility by Jane Austen Preview
100% (2)
Sense and Sensibility by Jane Austen Preview
20 pages
Internship Report - Md. Tanvir Hossain
No ratings yet
Internship Report - Md. Tanvir Hossain
73 pages
(English) Фильм 14+ «История первой любви» Смотреть в HD (DownSub.com)
No ratings yet
(English) Фильм 14+ «История первой любви» Смотреть в HD (DownSub.com)
21 pages
The Doctrine of Justification in The Work of N. T. Wright: by William N. Wilder Center For Christian Study
No ratings yet
The Doctrine of Justification in The Work of N. T. Wright: by William N. Wilder Center For Christian Study
43 pages
Kanban Guide For Scrum Teams
100% (1)
Kanban Guide For Scrum Teams
9 pages
2024-JEE Advanced Full Test-1 - Paper-2 - Solutions PDF
No ratings yet
2024-JEE Advanced Full Test-1 - Paper-2 - Solutions PDF
16 pages
Muslim Prayer Guide Part I
No ratings yet
Muslim Prayer Guide Part I
496 pages
Scilab 5 B
No ratings yet
Scilab 5 B
51 pages
Asymp2 Analogy 2006-04-05 Mms
No ratings yet
Asymp2 Analogy 2006-04-05 Mms
63 pages
Lady of Shalott
No ratings yet
Lady of Shalott
20 pages
Class14 PDF
No ratings yet
Class14 PDF
29 pages
VADUE Goes To Peru
No ratings yet
VADUE Goes To Peru
19 pages
RIP Routing Protocol
No ratings yet
RIP Routing Protocol
27 pages
Kuma Kuma Kuma Bear - Volume 13
No ratings yet
Kuma Kuma Kuma Bear - Volume 13
273 pages
GAM Summary Group 4
No ratings yet
GAM Summary Group 4
45 pages
ML Lecture23
No ratings yet
ML Lecture23
57 pages
Lecture 2.4
No ratings yet
Lecture 2.4
28 pages
Lesson Plan On Nutrion 2
No ratings yet
Lesson Plan On Nutrion 2
2 pages
ML 13 Strong and Weak Learners
No ratings yet
ML 13 Strong and Weak Learners
13 pages
More Discrete R.V
No ratings yet
More Discrete R.V
40 pages
Tamas Hardness Da
No ratings yet
Tamas Hardness Da
19 pages
Class 02
No ratings yet
Class 02
42 pages
Master Akmal Sept
No ratings yet
Master Akmal Sept
6 pages
Emiprical Risk Minimization
No ratings yet
Emiprical Risk Minimization
12 pages
Parker Palmer Interview Leader To Leader
100% (1)
Parker Palmer Interview Leader To Leader
8 pages
Informal Assessment GRADE 11
No ratings yet
Informal Assessment GRADE 11
37 pages
First Inspiration by Jose Rizal
100% (3)
First Inspiration by Jose Rizal
7 pages
Class3 ML MaxEnt
No ratings yet
Class3 ML MaxEnt
6 pages
Solution 02
No ratings yet
Solution 02
6 pages
Paper of Alexander Huth, Austin, Texas University On fMRI
No ratings yet
Paper of Alexander Huth, Austin, Texas University On fMRI
20 pages
Foundations of Machine Learning, Second Edition (2nd Ed) (Instructor Res. N. 1 of 3, Solution Manual, Solutions) (201
100% (1)
Foundations of Machine Learning, Second Edition (2nd Ed) (Instructor Res. N. 1 of 3, Solution Manual, Solutions) (201
61 pages
Mole Concept Class XI Notes CBSE
No ratings yet
Mole Concept Class XI Notes CBSE
25 pages
Section 2
No ratings yet
Section 2
29 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
SRM - Note 14 Oct. 2022
No ratings yet
SRM - Note 14 Oct. 2022
10 pages
Solutions
No ratings yet
Solutions
32 pages
Econometric Analysis MT Official Problem Set Solution 3
No ratings yet
Econometric Analysis MT Official Problem Set Solution 3
9 pages
HW 3
No ratings yet
HW 3
1 page
Poetry
No ratings yet
Poetry
3 pages
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
No ratings yet
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
6 pages
Web Development Dissertation Topics
100% (2)
Web Development Dissertation Topics
6 pages
2. Given Information:: 3. Calculate ε
No ratings yet
2. Given Information:: 3. Calculate ε
9 pages
Notes
No ratings yet
Notes
10 pages
10-601 Machine Learning
No ratings yet
10-601 Machine Learning
7 pages
Discussion Notes 2-6
No ratings yet
Discussion Notes 2-6
3 pages
Supermind - Hustle & Conquer
No ratings yet
Supermind - Hustle & Conquer
58 pages
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
No ratings yet
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
9 pages
Bommel Et Al. - 2016 - Livelihoods of Local Communities in An Amazonian Floodplain Coping With Global Changes From Role-Playing Games T
No ratings yet
Bommel Et Al. - 2016 - Livelihoods of Local Communities in An Amazonian Floodplain Coping With Global Changes From Role-Playing Games T
11 pages
Lecture Notes For ECE 695-09/08/03
No ratings yet
Lecture Notes For ECE 695-09/08/03
3 pages
Lec 4
No ratings yet
Lec 4
8 pages
Universal Approximation Updated
No ratings yet
Universal Approximation Updated
39 pages
The Probabilistic Method - ProbabilisticMethod
No ratings yet
The Probabilistic Method - ProbabilisticMethod
9 pages
Hackerrank
No ratings yet
Hackerrank
4 pages
Concepts in Deep Learning Solutions v1.0
No ratings yet
Concepts in Deep Learning Solutions v1.0
110 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
MUML Preliminiaries
No ratings yet
MUML Preliminiaries
24 pages
CH 3
No ratings yet
CH 3
24 pages
EEC 126 Discussion 4 Solutions
100% (1)
EEC 126 Discussion 4 Solutions
4 pages
IMO PS Speed Logs
No ratings yet
IMO PS Speed Logs
2 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Conference Book of Abstracts - I-CMME 2022
No ratings yet
Conference Book of Abstracts - I-CMME 2022
139 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Lecture 01
No ratings yet
Lecture 01
11 pages
02 First Model of Learning
No ratings yet
02 First Model of Learning
37 pages
Lecture 1
No ratings yet
Lecture 1
5 pages
Random Walks
No ratings yet
Random Walks
23 pages
Chapter 3 Solutions Understanding Machine Learning
No ratings yet
Chapter 3 Solutions Understanding Machine Learning
6 pages
Cs Ai Lecture Notes 02
No ratings yet
Cs Ai Lecture Notes 02
103 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Unit 1
No ratings yet
Unit 1
35 pages
GRADE 1 LERIO (Reading and Literacy) Describing Words
No ratings yet
GRADE 1 LERIO (Reading and Literacy) Describing Words
10 pages
SML Lecture2
No ratings yet
SML Lecture2
35 pages
Supervised Learning
No ratings yet
Supervised Learning
61 pages
Supervised Learning
No ratings yet
Supervised Learning
52 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
3 pages
Lecture 4 U Statistics
No ratings yet
Lecture 4 U Statistics
11 pages
Lecture Note 4
No ratings yet
Lecture Note 4
27 pages
Solutions For Exercises in Foundations of Machine Learning, 2nd Edition - Mohri & Rostamizadeh
100% (1)
Solutions For Exercises in Foundations of Machine Learning, 2nd Edition - Mohri & Rostamizadeh
5 pages
Open MOAT Solutions
No ratings yet
Open MOAT Solutions
16 pages
Chapter 2 Solutions Understanding Machine Learning
No ratings yet
Chapter 2 Solutions Understanding Machine Learning
4 pages
ML B HA4 Answers Final
No ratings yet
ML B HA4 Answers Final
8 pages
Homework 1
No ratings yet
Homework 1
8 pages
MLB Assignment 7 Final
No ratings yet
MLB Assignment 7 Final
16 pages
Csup AL
No ratings yet
Csup AL
5 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Learning From Uniform Convergence

Uploaded by

Learning From Uniform Convergence

Uploaded by

Learning from Uniform Convergence

Machine Learning 2021

It suffices to ensure that the empirical error of

Consequence: if with probability at least 1-δ, a random training set S is

Same m for all h and all D

Definition (uniform convergence):

An hypothesis class ℋ has the uniform convergence property

1. The class is agnostically PAC learnable with sample complexity

• Demonstration follows from the previous theorem and the

Note: In many real world applications we consider hypothesis classes

❑ Assume an hypothesis class determined by d real number parameters

2. Rewrite focusing on the probability of not having UC:

3. Rewrite set 𝑆: ∃ℎ ∈ ℋ, |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 as the union over h:

4. Apply union bound:

Demonstration not part of the course

𝐷𝑚 𝑆: ∃ℎ ∈ ℋ, |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 = 𝐷𝑚 ራ 𝑆: |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖 ≤ ෍ 𝐷𝑚 𝑆: |𝐿𝑠 ℎ − 𝐿𝐷 ℎ | > 𝜖

Demonstration not part of the course

❑ Apply Hoeffding inequality to our case (assuming [0,1] interval):

Consequence: any finite hypothesis class has uniform convergence property

Demonstration not part of the course

You might also like