0% found this document useful (0 votes)

4 views23 pages

Lecture27 VC

The lecture discusses the concept of VC Dimension, which quantifies the capacity of a hypothesis set to shatter a given number of points. It defines the VC dimension as the largest number of points that can be shattered by a hypothesis set and provides examples, including the VC dimension of perceptrons and rectangle classifiers. The lecture also references Radon's theorem to support the findings regarding the limitations of shattering points in higher dimensions.

Uploaded by

Uddipto Jana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views23 pages

Lecture27 VC

Uploaded by

Uddipto Jana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

ECE595 / STAT598: Machine Learning I

Lecture 27 VC Dimension

Spring 2020

Stanley Chan

School of Electrical and Computer Engineering

Purdue University

c Stanley Chan 2020. All Rights Reserved.

1 / 23
Outline

Lecture 25 Generalization
Lecture 26 Growth Function
Lecture 27 VC Dimension

Probably: Quantify error using probability:

P |Ein (h) − Eout (h)| ≤ ≥ 1 − δ

Approximately Correct: In-sample error is an approximation of the out-sample error:

P [|Ein (h) − Eout (h)| ≤ ] ≥ 1 − δ

If you can find an algorithm A such that for any and δ, there exists an N which can
make the above inequality holds, then we say that the target function is PAC-learnable.

c Stanley Chan 2020. All Rights Reserved.

3 / 23
Overcoming the M Factor
The Bad events Bm are
Bm = {|Ein (hm ) − Eout (hm )| > }
The factor M is here because of the Union bound:
P[B1 or . . . or BM ] ≤ P[B1 ] + . . . + P[BM ].

c Stanley Chan 2020. All Rights Reserved.

4 / 23
Dichotomy
Definition
Let x 1 , . . . , x N ∈ X . The dichotomies generated by H on these points are

H(x 1 , . . . , x N ) = {(h(x 1 ), . . . , h(x N )) | h ∈ H} .

c Stanley Chan 2020. All Rights Reserved.

5 / 23
Dichotomy
Definition
Let x 1 , . . . , x N ∈ X . The dichotomies generated by H on these points are

H(x 1 , . . . , x N ) = {(h(x 1 ), . . . , h(x N )) | h ∈ H} .

c Stanley Chan 2020. All Rights Reserved.

6 / 23
Candidate to Replace M
So here is our candidate replacement for M.
Define Growth Function
mH (N) = max |H(x 1 , . . . , x N )|
x 1 ,...,x N ∈X

You give me a hypothesis set H

You tell me there are N training samples
My job: Do whatever I can, by allocating x 1 , . . . , x N , so that the number of dichotomies
is maximized
Maximum number of dichotomy = the best I can do with your H
mH (N): How expressive your hypothesis set H is
Large mH (N) = more expressive H = more complicated H
mH (N) only depends on H and N
Doesn’t depend on the learning algorithm A
Doesn’t depend on the distribution p(x) (because I’m giving you the max.) c Stanley Chan 2020. All Rights Reserved.
7 / 23
Summary of the Examples

H is positive ray:
mH (N) = N + 1
H is positive interval:

N2 N

N +1
mH (N) = +1= + +1
2 2 2

H is convex set:
mH (N) = 2N
So if we can replace M by mH (N)
And if mH (N) is a polynomial
Then we are good.
c Stanley Chan 2020. All Rights Reserved.
8 / 23
Shatter

Definition
If a hypothesis set H is able to generate all 2N dichotomies, then we say that H shatter
x 1, . . . , x N .

H = hyperplane returned by a perceptron algorithm in 2D.

If N = 3, then H can shatter
Because we can achieve 23 = 8 dichotomies
If N = 4, then H cannot shatter
Because we can only achieve 14 dichotomies

c Stanley Chan 2020. All Rights Reserved.

9 / 23
VC Dimension
Definition (VC Dimension)
The Vapnik-Chervonenkis dimension of a hypothesis set H, denoted by dVC , is the largest
value of N for which H can shatter all N training samples.

You give me a hypothesis set H, e.g., linear model

You tell me the number of training samples N
Start with a small N
I will be able to shatter for a while, until I hit a bump
E.g., linear in 2D: N = 3 is okay, but N = 4 is not okay
So I find the largest N such that H can shatter N training samples
E.g., linear in 2D: dVC = 3
If H is complex, then expect large dVC
Does not depend on p(x), A and f
c Stanley Chan 2020. All Rights Reserved.
10 / 23
Outline

Lecture 25 Generalization
Lecture 26 Growth Function
Lecture 27 VC Dimension

Today’s Lecture:
From Dichotomy to Shattering
Review of dichotomy
The Concept of Shattering
VC Dimension
Example of VC Dimension
Rectangle Classifier
Perceptron Algorithm
Two Cases
c Stanley Chan 2020. All Rights Reserved.
11 / 23
Example: Rectangle
What is the VC Dimension of a 2D classifier with a rectangle shape?
You can try putting 4 data points in whatever way.
There will be 16 possible configurations.
You can show that the rectangle classifier can shatter all these 16 points
If you do 5 data points, then not possible. (Put one negative in the interior, and four
positive at the boundary.)
So VC dimension is 4.

c Stanley Chan 2020. All Rights Reserved.

12 / 23
VC Dimension of a Perceptron

Theorem (VC Dimension of a Perceptron)

Consider the input space X = Rd ∪ {1}, i.e., (x = [1, x1 , . . . , xd ]T ). The VC dimension of a
perceptron is
dVC = d + 1.

The “+1” comes from the bias term (w0 if you recall)
So a linear classifier is “no more complicated” than d + 1
The best it can shatter is d + 1 in a d-dimensional space
E.g., If d = 2, then dVC = 3

c Stanley Chan 2020. All Rights Reserved.

13 / 23
Why?

We claim dVC ≥ d + 1 and dVC ≤ d + 1

dVC ≥ d + 1:
H can shatter at least d + 1 points
It may shatter more, or it may not shatter more. We don’t know by just looking at this
statement
dVC ≤ d + 1:
H cannot shatter more than d + 1 points
So with dVC ≥ d + 1, we show that dVC = d + 1

14 / 23
dVC ≥ d + 1
Goal: Show that there is at least one configuration of d + 1 points that can be shattered
by H
Think about the 2D case: Put the three points anywhere not on the same line
Choose
x n = [1, 0, . . . , 1, . . . , 0]T .
Linear classifier: sign(w T x n ) = yn .
For all d + 1 data points, we have
  
1 0 0 ... 0      
1 1 0 w 0  y1 ±1
. . . 0   

 w1   y2  ±1
   
sign 1 0 1 0
  ..  =  ..  =  .. 

 ..   .   .   . 
 . 0 
wd yd+1 ±1
1 0 ... 0 1
c Stanley Chan 2020. All Rights Reserved.
15 / 23
dVC ≥ d + 1
We can remove the sign because we are trying to find one configuration of points that
can be shattered.
 
1 0 0 ... 0      
1 w0 y1 ±1
1 0 . . . 0   

 1   2  ±1
w y
   
1 0 1 0
   ..  =  ..  =  .. 
 ..  .   .   . 
 . 0
wd yd+1 ±1
1 0 ... 0 1
We are only interested in whether the problem solvable
So we just need to see if we can ever find a w that shatters
If there exists at least one w that makes all ±1 correct, then H can shatter (if you use
that particular w )
So is this (d + 1) × (d + 1) system invertible?
Yes. It is. So H can shatter at least d + 1 points c Stanley Chan 2020. All Rights Reserved.
16 / 23
dVC ≤ d + 1

Can we shatter more than d + 1 points?

No.
You only have d + 1 variables
If you have d + 2 equations, then one equation will be either redundant or contradictory
If redundant, you can ignore it because it is not the worst case
If contradictory, then you cannot solve the system of linear equation
So we cannot shatter more than d + 1 points
You can always construct a nasty x 1 , . . . , x d+1 to cause contradiction

17 / 23
dVC ≤ d + 1

You give me x 1 , . . . , x d+1 , x d+2

I can always write x d+2 as
d+1
X
x d+2 = ai x i
i=1

Not all ai ’s are zero. Otherwise it will be trivial.

My job: Construct a dichotomy which cannot be shattered by any h.
Here is a dichotomy.
x 1 , . . . , x d+1 get yi = sign(ai ).
x d+2 gets yd+2 = −1.

18 / 23
dVC ≤ d + 1

Then
d+1
X
w T x d+2 = ai w T x i .
i=1

Perceptron: yi = sign(w T x i ).
By our design, yi = sign(ai ).
So ai w T x i > 0
This forces
d+1
X
ai w T x i > 0.
i=1

So yd+2 = sign(w T x d+2 ) = +1, contradiction.

So we found a dichotomy which cannot be shattered by any h.
c Stanley Chan 2020. All Rights Reserved.
19 / 23
Summary of the Examples
H is positive ray: mH (N) = N + 1.
If N = 1, then mH (1) = 2
If N = 2, then mH (2) = 3
So dVC = 1
N2 N
H is positive interval: mH (N) = 2 + 2 + 1.
If N = 2, then mH (2) = 4
If N = 4, then mH (4) = 5
So dVC = 2
H is perceptron in d-dimensional space
Just showed
dVC = d + 1
H is convex set: mH (N) = 2N
No matter which N we choose, we always have mH (N) = 2N
So dVC = ∞
The model is as complex as it can be
c Stanley Chan 2020. All Rights Reserved.
20 / 23
Reading List

Yasar Abu-Mostafa, Learning from Data, chapter 2.1

Mehrya Mohri, Foundations of Machine Learning, Chapter 3.2
Stanford Note http://cs229.stanford.edu/notes/cs229-notes4.pdf

21 / 23
Appendix

22 / 23
Radon Theorem

The perceptron example we showed in this lecture can be proved using Radon’s theorem.
Theorem (Radon’s Theorem)
Any set X of d + 2 data points in Rd can be partitioned into two subsets X1 and X2 such that
the convex hulls of X1 and X2 intersect.

Proof: See Mehryar Mohri, Foundations of Machine Learning, Theorem 3.13.

If two sets are separated by a hyperplane, then their convex hulls are separated.
So if you have d + 2 points, Radon says the convex hulls intersect.
So you cannot shatter the d + 2 points.
d + 1 is okay as we have proved. So the VC dimension is d + 1.

23 / 23

AMSLI Questions Part 3
100% (4)
AMSLI Questions Part 3
5 pages
12 Ways To Write A Sentence
No ratings yet
12 Ways To Write A Sentence
18 pages
CS246 Final Exam Solutions, Winter 2011
No ratings yet
CS246 Final Exam Solutions, Winter 2011
18 pages
ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)
No ratings yet
ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)
43 pages
SML Lecture3
No ratings yet
SML Lecture3
36 pages
05 VC Theory
No ratings yet
05 VC Theory
11 pages
CS60050: Machine Learning: Autumn 2024
No ratings yet
CS60050: Machine Learning: Autumn 2024
45 pages
MLSM Lecture3 190923
No ratings yet
MLSM Lecture3 190923
36 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
Lecture26 Growth
No ratings yet
Lecture26 Growth
25 pages
05 VC Bound
No ratings yet
05 VC Bound
27 pages
VC Dim
No ratings yet
VC Dim
22 pages
VC-dimension For Characterizing Classifiers
No ratings yet
VC-dimension For Characterizing Classifiers
40 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Pac Learning
No ratings yet
Pac Learning
30 pages
Vapnik-Chervonenkis Dimension
No ratings yet
Vapnik-Chervonenkis Dimension
6 pages
Slides Lect 07
No ratings yet
Slides Lect 07
22 pages
Untitled 13
No ratings yet
Untitled 13
3 pages
The Bias Complexity Trade-Off: No Free Lunch Theorem, Error Decomposition
No ratings yet
The Bias Complexity Trade-Off: No Free Lunch Theorem, Error Decomposition
38 pages
hw2 5
No ratings yet
hw2 5
4 pages
Week 7 Notes
No ratings yet
Week 7 Notes
11 pages
How Many Samples To Learn A Finite Class?
No ratings yet
How Many Samples To Learn A Finite Class?
4 pages
Lecture16 VC
No ratings yet
Lecture16 VC
42 pages
2021-02-19 Fri 20CS92R02
No ratings yet
2021-02-19 Fri 20CS92R02
6 pages
Lect 3
No ratings yet
Lect 3
4 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
VC Dim
No ratings yet
VC Dim
16 pages
Lec 10 SVM
No ratings yet
Lec 10 SVM
35 pages
hw2 Sol
No ratings yet
hw2 Sol
3 pages
Unit6 - Lecture 26 1
No ratings yet
Unit6 - Lecture 26 1
23 pages
Lec 10 SVM
No ratings yet
Lec 10 SVM
35 pages
Lec14 PDF
No ratings yet
Lec14 PDF
7 pages
VC Dimension Explanation
No ratings yet
VC Dimension Explanation
2 pages
Vcdim
No ratings yet
Vcdim
18 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
10-601 Machine Learning
No ratings yet
10-601 Machine Learning
7 pages
Support Vector Machines
No ratings yet
Support Vector Machines
35 pages
ML.1.Lecture.9 (Where It Actually Comes From)
No ratings yet
ML.1.Lecture.9 (Where It Actually Comes From)
31 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
3 pages
Notes Chapter Feature Representation
No ratings yet
Notes Chapter Feature Representation
6 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
03 Hypothesis Spaces Commented4
No ratings yet
03 Hypothesis Spaces Commented4
45 pages
04 Growth Function
No ratings yet
04 Growth Function
32 pages
VC Dimension
No ratings yet
VC Dimension
6 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
Support Vector Machine SVM
No ratings yet
Support Vector Machine SVM
58 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
04 Handout
No ratings yet
04 Handout
53 pages
ML Question Bank CA-II
No ratings yet
ML Question Bank CA-II
10 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Notes Chapter Linear Classifiers
No ratings yet
Notes Chapter Linear Classifiers
4 pages
SVM and Kernel
No ratings yet
SVM and Kernel
57 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
Midterm Examination CS540-1: Introduction To Artificial Intelligence
No ratings yet
Midterm Examination CS540-1: Introduction To Artificial Intelligence
4 pages
I-K-Means and Clustering
No ratings yet
I-K-Means and Clustering
6 pages
Thirteen 19240 PDF
No ratings yet
Thirteen 19240 PDF
17 pages
AA1 Tema4
No ratings yet
AA1 Tema4
37 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Lab - CC14 - 183146-21-0060 - Sayantan Sarcar
No ratings yet
Lab - CC14 - 183146-21-0060 - Sayantan Sarcar
101 pages
Python Prac Answers Mcqs
No ratings yet
Python Prac Answers Mcqs
3 pages
Shibam Pandit Webtech Assignment 4
No ratings yet
Shibam Pandit Webtech Assignment 4
13 pages
Udpchat
No ratings yet
Udpchat
3 pages
Humanities 3rd Sem (Complete)
No ratings yet
Humanities 3rd Sem (Complete)
44 pages
ALE HCM FI Integration
No ratings yet
ALE HCM FI Integration
2 pages
KTM 7669
No ratings yet
KTM 7669
4 pages
Poetry: Rhythm and Rhyme: Bed in Summer
No ratings yet
Poetry: Rhythm and Rhyme: Bed in Summer
1 page
Ethical World Conception
No ratings yet
Ethical World Conception
24 pages
Design and Software Development For Vaccine Management System Using Java
No ratings yet
Design and Software Development For Vaccine Management System Using Java
3 pages
Cuet Icar Ug
No ratings yet
Cuet Icar Ug
2 pages
What Is Coaching?: in This Chapter We Will Look at
No ratings yet
What Is Coaching?: in This Chapter We Will Look at
7 pages
Jesus Is Arrested and Crucified
No ratings yet
Jesus Is Arrested and Crucified
22 pages
How To Write Chapter 5
No ratings yet
How To Write Chapter 5
14 pages
Spectral Density Estimation Using P-Spline Priors
No ratings yet
Spectral Density Estimation Using P-Spline Priors
15 pages
Clause As Message
No ratings yet
Clause As Message
6 pages
Read 366 Lit Assess Lesson Plan
No ratings yet
Read 366 Lit Assess Lesson Plan
2 pages
Template Reading Assessment Monitoring Tool
No ratings yet
Template Reading Assessment Monitoring Tool
4 pages
Quentin Tarantino - Pulp Fiction Script
100% (3)
Quentin Tarantino - Pulp Fiction Script
141 pages
Online Food Ordering System in ASP Net S
No ratings yet
Online Food Ordering System in ASP Net S
5 pages
Khutbahs by Almaghrib Institute Instructors
No ratings yet
Khutbahs by Almaghrib Institute Instructors
11 pages
Jyotish in The Vaidic Stage
No ratings yet
Jyotish in The Vaidic Stage
6 pages
Test For Unit 22: He Asked Critics To The Next Two Weeks Thinking About Whether That's True
No ratings yet
Test For Unit 22: He Asked Critics To The Next Two Weeks Thinking About Whether That's True
4 pages
Great Events in Religion 3 Volumes An Encyclopedia of Pivotal Events in Religious History Instant Download
No ratings yet
Great Events in Religion 3 Volumes An Encyclopedia of Pivotal Events in Religious History Instant Download
82 pages
Dissertation de Philosophie Sur La Culture
100% (1)
Dissertation de Philosophie Sur La Culture
7 pages
KKHSOU Sample Paper
No ratings yet
KKHSOU Sample Paper
3 pages
English Grammar: Modal Auxiliary Verbs
No ratings yet
English Grammar: Modal Auxiliary Verbs
30 pages
Chapter 1 Legally Hai
No ratings yet
Chapter 1 Legally Hai
5 pages
Design and Configure Azure Front Door
No ratings yet
Design and Configure Azure Front Door
11 pages
Adv Unit3 Revision
No ratings yet
Adv Unit3 Revision
2 pages
JMAK
0% (1)
JMAK
21 pages
SSC Shedule by Shubh Chahhc 2025
No ratings yet
SSC Shedule by Shubh Chahhc 2025
93 pages
A Review of Practices and Digital Technology Integration in Reading Instruction and Suggestions For The Philippines
No ratings yet
A Review of Practices and Digital Technology Integration in Reading Instruction and Suggestions For The Philippines
10 pages

Lecture27 VC

Uploaded by

Lecture27 VC

Uploaded by

ECE595 / STAT598: Machine Learning I

School of Electrical and Computer Engineering

c Stanley Chan 2020. All Rights Reserved.

Probably: Quantify error using probability:

Approximately Correct: In-sample error is an approximation of the out-sample error:

P [|Ein (h) − Eout (h)| ≤ ] ≥ 1 − δ

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

H(x 1 , . . . , x N ) = {(h(x 1 ), . . . , h(x N )) | h ∈ H} .

c Stanley Chan 2020. All Rights Reserved.

H(x 1 , . . . , x N ) = {(h(x 1 ), . . . , h(x N )) | h ∈ H} .

c Stanley Chan 2020. All Rights Reserved.

You give me a hypothesis set H

H = hyperplane returned by a perceptron algorithm in 2D.

c Stanley Chan 2020. All Rights Reserved.

You give me a hypothesis set H, e.g., linear model

c Stanley Chan 2020. All Rights Reserved.

Theorem (VC Dimension of a Perceptron)

c Stanley Chan 2020. All Rights Reserved.

We claim dVC ≥ d + 1 and dVC ≤ d + 1

c Stanley Chan 2020. All Rights Reserved.

Can we shatter more than d + 1 points?

c Stanley Chan 2020. All Rights Reserved.

You give me x 1 , . . . , x d+1 , x d+2

Not all ai ’s are zero. Otherwise it will be trivial.

c Stanley Chan 2020. All Rights Reserved.

So yd+2 = sign(w T x d+2 ) = +1, contradiction.

Yasar Abu-Mostafa, Learning from Data, chapter 2.1

c Stanley Chan 2020. All Rights Reserved.

c Stanley Chan 2020. All Rights Reserved.

Proof: See Mehryar Mohri, Foundations of Machine Learning, Theorem 3.13.

c Stanley Chan 2020. All Rights Reserved.

You might also like

P [|Ein (h) − Eout (h)| ≤ ] ≥ 1 − δ