Lecture27 VC
Lecture27 VC
Lecture 27 VC Dimension
Spring 2020
Stanley Chan
Lecture 25 Generalization
Lecture 26 Growth Function
Lecture 27 VC Dimension
Today’s Lecture:
From Dichotomy to Shattering
Review of dichotomy
The Concept of Shattering
VC Dimension
Example of VC Dimension
Rectangle Classifier
Perceptron Algorithm
Two Cases
c Stanley Chan 2020. All Rights Reserved.
2 / 23
Probably Approximately Correct
If you can find an algorithm A such that for any and δ, there exists an N which can
make the above inequality holds, then we say that the target function is PAC-learnable.
H is positive ray:
mH (N) = N + 1
H is positive interval:
N2 N
N +1
mH (N) = +1= + +1
2 2 2
H is convex set:
mH (N) = 2N
So if we can replace M by mH (N)
And if mH (N) is a polynomial
Then we are good.
c Stanley Chan 2020. All Rights Reserved.
8 / 23
Shatter
Definition
If a hypothesis set H is able to generate all 2N dichotomies, then we say that H shatter
x 1, . . . , x N .
Lecture 25 Generalization
Lecture 26 Growth Function
Lecture 27 VC Dimension
Today’s Lecture:
From Dichotomy to Shattering
Review of dichotomy
The Concept of Shattering
VC Dimension
Example of VC Dimension
Rectangle Classifier
Perceptron Algorithm
Two Cases
c Stanley Chan 2020. All Rights Reserved.
11 / 23
Example: Rectangle
What is the VC Dimension of a 2D classifier with a rectangle shape?
You can try putting 4 data points in whatever way.
There will be 16 possible configurations.
You can show that the rectangle classifier can shatter all these 16 points
If you do 5 data points, then not possible. (Put one negative in the interior, and four
positive at the boundary.)
So VC dimension is 4.
The “+1” comes from the bias term (w0 if you recall)
So a linear classifier is “no more complicated” than d + 1
The best it can shatter is d + 1 in a d-dimensional space
E.g., If d = 2, then dVC = 3
Then
d+1
X
w T x d+2 = ai w T x i .
i=1
Perceptron: yi = sign(w T x i ).
By our design, yi = sign(ai ).
So ai w T x i > 0
This forces
d+1
X
ai w T x i > 0.
i=1
The perceptron example we showed in this lecture can be proved using Radon’s theorem.
Theorem (Radon’s Theorem)
Any set X of d + 2 data points in Rd can be partitioned into two subsets X1 and X2 such that
the convex hulls of X1 and X2 intersect.