ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)
ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)
Cho-Jui Hsieh
UC Davis
Feb 5, 2018
VC Dimension
Definition
H is positive rays:
dVC = 1
Examples
H is positive rays:
dVC = 1
H is 2D perceptrons:
dVC = 3
Examples
H is positive rays:
dVC = 1
H is 2D perceptrons:
dVC = 3
H is convex sets:
dVC = ∞
VC dimension and Learning
For d = 2, dVC = 3
VC dimension of perceptrons
For d = 2, dVC = 3
What if d > 2?
VC dimension of perceptrons
For d = 2, dVC = 3
What if d > 2?
In general,
dVC = d + 1
VC dimension of perceptrons
For d = 2, dVC = 3
What if d > 2?
In general,
dVC = d + 1
To prove dVC ≥ d + 1
VC dimension of perceptrons
To prove dVC ≥ d + 1
A set of N = d + 1 points in Rd shattered by the perceptron
VC dimension of perceptrons
To prove dVC ≥ d + 1
A set of N = d + 1 points in Rd shattered by the perceptron
X is invertible!
Can we shatter the dataset?
y1 ±1
y2 ±1
For any y = = .. , can we find w satisfying
..
. .
yd+1 ±1
sign(X w ) = y
Can we shatter the dataset?
y1 ±1
y2 ±1
For any y = = .. , can we find w satisfying
..
. .
yd+1 ±1
sign(X w ) = y
sign(X w ) = y
x1 , x2 , · · · , xd+1 , xd+2
X
xj = ai xi
i6=j
X
xj = ai xi
i6=j
X
xj = ai xi
i6=j
Number of parameters w0 , · · · , wd
d + 1 parameters!
Putting it together
Number of parameters w0 , · · · , wd
d + 1 parameters!
Parameters create degrees of freedom
Examples
1 2
P[|Ein (g ) − Eout (g )| > ] ≤ 4mH (2N)e − 8 N
| {z }
δ
If we want certain and δ, how does N depend on dVC ?
Number of data points needed
1 2
P[|Ein (g ) − Eout (g )| > ] ≤ 4mH (2N)e − 8 N
| {z }
δ
If we want certain and δ, how does N depend on dVC ?
Need N d e −N = small value
Number of data points needed
1 2
P[|Ein (g ) − Eout (g )| > ] ≤ 4mH (2N)e − 8 N
| {z }
δ
If we want certain and δ, how does N depend on dVC ?
Need N d e −N = small value
Get in terms of δ:
r
− 18 2 N 8 4mH (2N)
δ = 4mH (2N)e ⇒ = log
N δ
Rearranging things
Get in terms of δ:
r
− 18 2 N 8 4mH (2N)
δ = 4mH (2N)e ⇒ = log
N δ
With probability 1 − δ,
r
8 4mH (2N)
Eout ≤ Ein + log
N δ
Learning curve
Conclusions
Questions?