Lecture16 VC
Lecture16 VC
Eric Xing
Parameterization
I.e., if for any set of labels {y(1), … , y(d)}, there exists some
hH so that h(x(i)) = y(i) for all i = 1, …, m.
Instance space X
Open intervals:
H1: if x>a, then y=1 else y=0
Closed intervals:
ie., to guarantee that any hypothesis that perfectly fits the training data is
probably (1-δ) approximately (ε) correct on testing data from the same
distribution
%error
Test error
Training error
number of training examples
%error
Test error
Training error
number of training examples
or
% error
Sample size m
By doing this we can attain an upper bound on the actual risk. This does not prevent a
particular machine with the same value for empirical risk, and whose function set has
higher VC dimension, from having better performance.
What is the VC of a kNN?
© Eric Xing @ CMU, 2006-2015 30
Structural Risk Minimization
Which hypothesis space should we choose?
is valid
That is, for each subset, we must be able either to compute d, or to get a bound
on d itself.
Risk
Empirical
Confidence interval
Risk
In h/L
Model Complexity
h*
© Eric Xing @ CMU, 2006-2015 34
Putting SRM into action:
linear models case (1)
There are many SRM-based strategies to build models:
Minimize ||w||²,
with (yi= +/-1)
and yi(<w|xi> + b) >=1 for all i=1,..,L
Recall that the kernel trick used by SVM alleviates the need to
find explicit expression of (.) to compute the transformation
y
i 1
i i 0.
then
Within the PAC learning setting, we can bound the probability that
learner will output hypothesis with given error
For ANY consistent learner (case where c in H)
For ANY “best fit” hypothesis (agnostic learning, where perhaps c not in H)