Lect 0329

Uploaded by

Satyam Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views3 pages

Lect 0329

Uploaded by

Satyam Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

8803 Machine Learning Theory

Maria Florina Balcan Lecture 21: March 30th, 2010

The PAC model assumes access to a noise-free oracle for examples of the target concept. In reality,
we need learning algorithms with at least some tolerance for mislabeled examples. In this lecture
we study a classic model for learning in the presence of noise, the Random Classiﬁcation noise
model. This is a simple noise model in which one can get positive algorithmic results.

The Random Classification Noise Model

In this model, a learning algorithm will have access to a modiﬁed and noisy oracle for examples,
denoted by EXCNη
(c∗ , D). Here c∗ and D are the target concept and distribution, and 0 ≤ η < 1/2
is a new parameter called the classiﬁcation error rate.
{
with probability 1 − η, returns ⟨x, c∗ (x)⟩ from EX(c∗ , D)
η
EXCN (c∗ , D) =
with probability η, returns ⟨x, ¬c∗ (x)⟩ from EX(c∗ , D)

This model was ﬁrst introduced by Angluin and Laird (1988).

As the noise rate approaches 0.5, the labels provided by the noisy oracle are providing less and
less information about the target concept. The learning algorithm thus needs more oracle calls
and more computation time as the noise rate approaches 0.5. When the noise rate is equal to 0.5,
PAC learning becomes impossible, because every label seen by the algorithm is the outcome of an
unbiased coin ﬂip, and gives no information about the target concept.

Definition 1 C is PAC-learnable by H in the presence of noise if there exists a learning algorithm

L with the property that (∀c ∈ C)(∀D on X)(∀ε, 0 < ε < 1) (∀δ, 0 < δ < 1), (∀η, 0 ≤ η < 0.5),
if L is given inputs ε, δ, ηb (η ≤ ηb < 0.5) and access to EX η (c∗ , D), then it will with probability
≥ 1 − δ produce an ouptut hypothesis h ∈ H s.t. error(h) ≤ ε. C is eﬃciently PAC-learnable if the
running time of L is polynomial in n, 1ε , 1δ , 1−2η
1
b
.

Note:

• Despite the classiﬁcation noise in the examples received, the goal of the learner remains that
of ﬁnding a good approximation to the target concept with respect to the distribution D.
The error rate is measured with respect to the target concept and distribution.
∑
error(h) = Pr(x)
D
x:c∗ (x)̸=h(x)

• The learner is allowed more time as ηb → 0.5.

Empirical Risk Minimization

The typical noise-free PAC algorithm draws a large number of samples and outputs a consistent
hypothesis. With classiﬁcation noise, however, there may not be a consistent hypothesis. Angluin

1
and Laird show that ERM (empirical risk minimization) can be used to learn in the random
classification noise model, though not necessarily efficiently. To remind you, the ERM algorithm is
specified as follows:

• Draw a “large enough” sample.

• Output hypothesis c ∈ C which minimizes disagreements with the sample.

To analyze this, suppose concept ci has true error rate di . What is the probability pi that ci
η
disagrees with a labelled example drawn from EXCN (c∗ , D)? We have two cases.

η
1. EXCN (c∗ , D) reports correctly, but ci is incorrect: di (1 − η)
η
2. EXCN (c∗ , D) reports incorrectly, but ci is correct: (1 − di )η

pi = di (1 − η) + (1 − di )η
pi = η + di (1 − 2η)

An ε-good hypothesis has expected disagreement rate ≤ η + ε(1 − 2η).

We need m large enough such that an ε-bad hypothesis will not minimize disagreements with the
hypothesis. Consider the point η + ε(1−2η)
2 . If an ε-bad hypothesis minimizes disagreements, then
the target function must have at least as large of a disagreement rate. Thus, at least one of the
following events must hold:

ε(1−2η)
1. Some ε-bad hypothesis ci has empirical disagreement rate ≤ η + 2 .

2. Target concept c∗ has empirical disagreement rate ≥ η + ε(1−2η)

2 .

If C is ﬁnite, by Hoeﬀding bound, if we choose

( ( ))
1 |C|
m=O 2 ln ,
ε (1 − 2ηb )2 δ

then the probability that either of the events occurs is at most δ. That is, the probability that
an ϵ-bad hypothesis minimizes disagreements is at most δ. Thus, if we draw a sample of size m
as specified above, and then find a hypothesis which minimizes disagreements with the sample,
then we have an algorithm which PAC learns in the presence of classification noise. This gives the
following theorem.

Theorem 1 For all ﬁnite concept classes C, we can PAC learn in the presence of classiﬁcation
noise.

We can get similar results for classes of ﬁnite VCdimension.

2
Minimizing disagreements (ERM) can be NP-hard

Theorem 2 Finding monotone conjunctions which minimize disagreements with a given sample is
NP-hard.

Proof: The proof is a polynomial-time reduction from the decision version of the vertex cover
problem to the decision version of our problem. The decision version of the vertex cover problem
is speciﬁed by an undirected graph G = (V, E) of n vertices and a positive integer c < n, and
the question is whether there exists a set C of at most c vertices of G such that every edge of G
is incident to at least one vertex in C. (Such a set C is called a vertex cover.) This problem is
NP-complete.
Our reduction is as follows. If there are n vertices in the graph, then our instance space is
{0, 1}n . For each vertex vi in the graph, we introduce a positive example (ai , +), where ai =
(11 · · · 101 · · · 11), with 0 in i-th position, and for each edge e in the graph we introduce n + 1
negative examples (be , −), where be = (11 · · · 101 · · · 101 · · · 11), with 0’s in i-th and j-th positions.
Thus we get a set S of labeled examples of size n + |E|(n + 1). Clearly, the computation of S from
(G, c) can clearly be carried out in polynomial time.
It is easy to show the following.

Claim 1 G has a vertex cover of size at most c if and only if there is a monotone conjunction with
at most c disagreements.

Suppose G has a vertex cover C of at most c vertices. Let f denote the product of those xi such
that vi is in C. How many examples from S disagree with c? By construction, for each vertex vi ,
f (ai ) = − iﬀ vi ∈ C. Thus, f disagrees with at most c positive examples from S. For each edge
e = (vi , vj ), the set C contains at least one of vi or vj , so f contains at least one of xi or xj . So,
f (be ) = −. Thus, f agrees with all the negative examples in S. Hence the number of disagreements
is at most c, as claimed.
Now suppose that there exists some conjunction f such that f disagrees with at most c examples
in S. Since c < n, this means that f must agree with all the negative examples in S, since each
one is repeated n + 1 times. Hence f can only disagree with positive examples in S, and at most
c of them. Thus f must contain at most c literals xi . Deﬁne the set C to be all those vertices vi
such that xi appears in the conjunction f . Then C contains at most c vertices; it remains to see
that it is a vertex cover. If e = (vi , vj ) is any edge in G then f (be ) = −, since f agrees with all
the negative examples. But f (be ) = −, if and only if f contains at least one of xi or xj . Thus C
contains at least one of vi or vj , so C is a vertex cover of G.

In the following lecture, we will show how to learn conjunctions eﬃciently in the random classiﬁ-
cation noise model by using statistical queries.

Graduate Program
100% (1)
Graduate Program
9 pages
Graph Theory Tutorial
0% (1)
Graph Theory Tutorial
15 pages
Difference Between Steganography and Cryptography
No ratings yet
Difference Between Steganography and Cryptography
5 pages
IV-Sem-MACHINE LEARNING
No ratings yet
IV-Sem-MACHINE LEARNING
2 pages
DS Lab Manual (BCS304) @vtunetwork
100% (1)
DS Lab Manual (BCS304) @vtunetwork
70 pages
Class 8 Math TB Chapter 2 Linear Equations and One Variable
No ratings yet
Class 8 Math TB Chapter 2 Linear Equations and One Variable
6 pages
FT of AI
No ratings yet
FT of AI
109 pages
Shortest Path Problem
100% (1)
Shortest Path Problem
7 pages
Experiment No.2 - Cyber Security
No ratings yet
Experiment No.2 - Cyber Security
16 pages
Solutions For Exercises in Foundations of Machine Learning, 2nd Edition - Mohri & Rostamizadeh
100% (1)
Solutions For Exercises in Foundations of Machine Learning, 2nd Edition - Mohri & Rostamizadeh
5 pages
ML 3
No ratings yet
ML 3
36 pages
Ai 2
No ratings yet
Ai 2
10 pages
Ds Lab Manual
No ratings yet
Ds Lab Manual
32 pages
Module 4
No ratings yet
Module 4
9 pages
Multiplication of Sparse Matrix
No ratings yet
Multiplication of Sparse Matrix
5 pages
Keri WP
No ratings yet
Keri WP
140 pages
Distributed Computing QP - Comp
No ratings yet
Distributed Computing QP - Comp
1 page
Viola Jones Algorithm
No ratings yet
Viola Jones Algorithm
19 pages
Chapter 14 - Nonlinear Regression Models
No ratings yet
Chapter 14 - Nonlinear Regression Models
20 pages
Probability Distributions
No ratings yet
Probability Distributions
129 pages
ch2 Maths Important Questions For Pre Board Exam 2023 24
No ratings yet
ch2 Maths Important Questions For Pre Board Exam 2023 24
2 pages
Divide and Conquer
No ratings yet
Divide and Conquer
4 pages
6 Quiz9
No ratings yet
6 Quiz9
8 pages
Boston Consulting Group Matrix: Relative Market Share (Cash Generation)
No ratings yet
Boston Consulting Group Matrix: Relative Market Share (Cash Generation)
9 pages
Module - 1 - ECE3047 - Machine Learning - 1 (8748)
No ratings yet
Module - 1 - ECE3047 - Machine Learning - 1 (8748)
38 pages
Dsa Assignment 4 2024
No ratings yet
Dsa Assignment 4 2024
2 pages
A14 PTSP (Ece) Reg
No ratings yet
A14 PTSP (Ece) Reg
2 pages
Tries and Huffman Encoding
No ratings yet
Tries and Huffman Encoding
16 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
CH 5 Discussion Bnad 277
No ratings yet
CH 5 Discussion Bnad 277
6 pages
N Gram, RNN Tranformer
No ratings yet
N Gram, RNN Tranformer
2 pages
Unit-2 MLT
No ratings yet
Unit-2 MLT
84 pages
Foundations of Machine Learning, Second Edition (2nd Ed) (Instructor Res. N. 1 of 3, Solution Manual, Solutions) (201
100% (1)
Foundations of Machine Learning, Second Edition (2nd Ed) (Instructor Res. N. 1 of 3, Solution Manual, Solutions) (201
61 pages
Lecture22 s12
No ratings yet
Lecture22 s12
21 pages
ML Notes
No ratings yet
ML Notes
161 pages
Realizable Learning Is All You Need: Max Hopkins Daniel M. Kane Shachar Lovett Gaurav Mahajan
No ratings yet
Realizable Learning Is All You Need: Max Hopkins Daniel M. Kane Shachar Lovett Gaurav Mahajan
62 pages
Larose, D. T. (2006) Data Mining Methods and Models, Hoboken: John Wiley & Sons, Inc. Morgan Kaufmann
No ratings yet
Larose, D. T. (2006) Data Mining Methods and Models, Hoboken: John Wiley & Sons, Inc. Morgan Kaufmann
1 page
Csup AL
No ratings yet
Csup AL
5 pages
Fair Classification With Adversarial Perturbations
No ratings yet
Fair Classification With Adversarial Perturbations
72 pages
Computational Learning Theory Mini Project Michaelmas Term 2022
No ratings yet
Computational Learning Theory Mini Project Michaelmas Term 2022
8 pages
Pac Learning
No ratings yet
Pac Learning
30 pages
ML Chapter 7 (CLT) Notes
No ratings yet
ML Chapter 7 (CLT) Notes
59 pages
ML B HA4 Answers Final
No ratings yet
ML B HA4 Answers Final
8 pages
SML Lecture2
No ratings yet
SML Lecture2
35 pages
Randomproj
No ratings yet
Randomproj
17 pages
Al3451 - Machine Learning - Answer Key 13 Mark
No ratings yet
Al3451 - Machine Learning - Answer Key 13 Mark
22 pages
ML Lecture23
No ratings yet
ML Lecture23
57 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
ML - Unit 4
No ratings yet
ML - Unit 4
15 pages
Lec 6
No ratings yet
Lec 6
29 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Lecture 2.4
No ratings yet
Lecture 2.4
28 pages
Lect1116-18 Active Learning
No ratings yet
Lect1116-18 Active Learning
6 pages
Chapter 3 Solutions Understanding Machine Learning
No ratings yet
Chapter 3 Solutions Understanding Machine Learning
6 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
Fairness Lectures-21
No ratings yet
Fairness Lectures-21
63 pages
MLT Pac
No ratings yet
MLT Pac
3 pages
10 Hardness2
No ratings yet
10 Hardness2
5 pages
07 Agnostic Pac
No ratings yet
07 Agnostic Pac
5 pages
An Integer Programming Approach To Inductive Learning Using Genetic Algorithm (2002)
No ratings yet
An Integer Programming Approach To Inductive Learning Using Genetic Algorithm (2002)
6 pages
Ex 1
No ratings yet
Ex 1
2 pages
PSO
No ratings yet
PSO
74 pages
06 Lower
No ratings yet
06 Lower
3 pages
Tutorial
No ratings yet
Tutorial
81 pages
ML 13 Strong and Weak Learners
No ratings yet
ML 13 Strong and Weak Learners
13 pages
43 Paper Smart Signal Noise 508
No ratings yet
43 Paper Smart Signal Noise 508
18 pages
Colt Tutorial
No ratings yet
Colt Tutorial
43 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
No ratings yet
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
6 pages
NP-complete Problem: Prof. S M Lee Department of Computer Science
No ratings yet
NP-complete Problem: Prof. S M Lee Department of Computer Science
44 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
Computational Learning Theory
No ratings yet
Computational Learning Theory
15 pages
1 Review of Definitions: COS 511: Theoretical Machine Learning
No ratings yet
1 Review of Definitions: COS 511: Theoretical Machine Learning
9 pages
Hw1 Theory Solution PuHK4fmHvB
No ratings yet
Hw1 Theory Solution PuHK4fmHvB
4 pages
Sol All
No ratings yet
Sol All
66 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
Module 4 - Bayesian Learning
No ratings yet
Module 4 - Bayesian Learning
36 pages
Er Graph Matching
No ratings yet
Er Graph Matching
5 pages
Lecture 1: Brief Overview - PAC Learning
No ratings yet
Lecture 1: Brief Overview - PAC Learning
3 pages
Machine Learning - Computational Learning Theory PDF
No ratings yet
Machine Learning - Computational Learning Theory PDF
7 pages
10-601 Machine Learning
No ratings yet
10-601 Machine Learning
7 pages
Understanding Machine Learning Solution Manual: 2 Gentle Start
No ratings yet
Understanding Machine Learning Solution Manual: 2 Gentle Start
67 pages
How Many Samples To Learn A Finite Class?
No ratings yet
How Many Samples To Learn A Finite Class?
4 pages
Machine Leaning 3
No ratings yet
Machine Leaning 3
44 pages
Imprecise Probability: Frank P.A. Coolen, Matthias C.M. Troffaes, Thomas Augustin
No ratings yet
Imprecise Probability: Frank P.A. Coolen, Matthias C.M. Troffaes, Thomas Augustin
5 pages