Lec 6
Lec 6
Nakul Verma
Announcements
• HW3 posted
Last time…
• Linear Regression
• Kernel Regression
(generalization error of f )
We call
• The model class is PAC-learnable.
• If the is polynomial in and , then is efficiently PAC-learnable
A popular algorithm:
Empirical risk minimizer (ERM) algorithm
PAC learning simple model classes
Theorem (finite size ):
Pick any tolerance level , and any confidence level
let be examples drawn from an unknown
We need to analyze:
≤0
Uniform deviations of
expectation of a random
variable to the sample
Proof sketch
Fix any and a sample , define random variable
Markov’s Inequality
Why?
Observation
Take expectation on both sides.
c<c
X
Concentration of Measure
Using Markov to bound deviation from mean…
Observation:
by Markov’s Inequality
Chebyshev’s Inequality
True for all distributions! EX X
Concentration of Measure
Sharper estimates using an exponential!
Observation:
by Markov’s Inequality
Define Yi := Xi – EXi
By Cherneoff’s bounding
technique
Yi i.i.d.
Another example:
= Rectangles in R2
VC dimension:
• A combinatorial concept to capture the true richness of
• Often (but not always!) proportional to the degrees-of-freedom or
the number of independent parameters in
VC Theorem
Theorem (Vapnik-Chervonenkis ’71):
Pick any tolerance level , and any confidence level
let be examples drawn from an unknown
From our discussion it may seem that ERM algorithm is universally consistent.
• Formalizing learning
• PAC learnability
• VC theorem
• No Free-lunch theorem
Questions?
Next time…
Midterm!
Unsupervised learning.
Announcements
• HW3 posted