0% found this document useful (0 votes)

65 views

ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)

This document summarizes a lecture on VC dimension. It defines VC dimension as the largest number of points a hypothesis set can shatter. It provides examples of VC dimensions for different hypothesis sets, like positive rays having a VC dimension of 1 and 2D perceptrons having a VC dimension of 3. The lecture proves that the VC dimension of d-dimensional perceptrons is d+1. It discusses how VC dimension relates to generalization and the number of data points needed for good generalization. It also derives a generalization bound involving VC dimension, sample size, and confidence.

Uploaded by

svwnerlgwr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)

Uploaded by

svwnerlgwr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

ECS171: Machine Learning

Lecture 8: VC Dimension (LFD 2.2)

Cho-Jui Hsieh
UC Davis

Feb 5, 2018
VC Dimension
Definition

The VC dimension of a hypothesis set H, denoted by dVC (H), is

the largest value of N for which mH (N) = 2N

“the most points H can shatter”

Definition

The VC dimension of a hypothesis set H, denoted by dVC (H), is

the largest value of N for which mH (N) = 2N

“the most points H can shatter”

N ≤ dVC (H) ⇒ H can shatter N points

Definition

The VC dimension of a hypothesis set H, denoted by dVC (H), is

the largest value of N for which mH (N) = 2N

“the most points H can shatter”

N ≤ dVC (H) ⇒ H can shatter N points

k > dVC (H) ⇒ H cannot be shattered
The smallest break point is 1 above VC-dimension
The growth function

In terms of a break point k:

k−1
X N
mH (N) ≤
i
i=0

In terms of the VC dimension dVC :

dVC
X N
mH (N) ≤
i
i=0
Examples

H is positive rays:
dVC = 1
Examples

H is positive rays:
dVC = 1
H is 2D perceptrons:
dVC = 3
Examples

H is positive rays:
dVC = 1
H is 2D perceptrons:
dVC = 3
H is convex sets:
dVC = ∞
VC dimension and Learning

dVC (H) is finite ⇒ g ∈ H will generalize

When N is large enough, Eout ≈ Ein
VC dimension and Learning

dVC (H) is finite ⇒ g ∈ H will generalize

When N is large enough, Eout ≈ Ein
Independent of the learning algorithm
VC dimension and Learning

dVC (H) is finite ⇒ g ∈ H will generalize

When N is large enough, Eout ≈ Ein
Independent of the learning algorithm
Independent of the input distribution
VC dimension and Learning

dVC (H) is finite ⇒ g ∈ H will generalize

When N is large enough, Eout ≈ Ein
Independent of the learning algorithm
Independent of the input distribution
Independent of the target function
VC dimension of perceptrons

For d = 2, dVC = 3
VC dimension of perceptrons

For d = 2, dVC = 3
What if d > 2?
VC dimension of perceptrons

For d = 2, dVC = 3
What if d > 2?

In general,
dVC = d + 1
VC dimension of perceptrons

For d = 2, dVC = 3
What if d > 2?

In general,
dVC = d + 1

We will prove dVC ≥ d + 1 and dVC ≤ d + 1

VC dimension of perceptrons

To prove dVC ≥ d + 1
VC dimension of perceptrons

To prove dVC ≥ d + 1
A set of N = d + 1 points in Rd shattered by the perceptron
VC dimension of perceptrons

To prove dVC ≥ d + 1
A set of N = d + 1 points in Rd shattered by the perceptron

X is invertible!
Can we shatter the dataset?
   
y1 ±1
 y2  ±1
For any y =   =  .. , can we find w satisfying
   
..
 .   . 
yd+1 ±1

sign(X w ) = y
Can we shatter the dataset?
   
y1 ±1
 y2  ±1
For any y =   =  .. , can we find w satisfying
   
..
 .   . 
yd+1 ±1

sign(X w ) = y

Easy! Just set w = X −1 y

Can we shatter the dataset?
   
y1 ±1
 y2  ±1
For any y =   =  .. , can we find w satisfying
   
..
 .   . 
yd+1 ±1

sign(X w ) = y

Easy! Just set w = X −1 y

So, dVC ≥ d + 1
VC dimension of perceptrons

To show dVC ≤ d + 1, we need to show

We cannot shatter any set of d + 2 points

VC dimension of perceptrons

To show dVC ≤ d + 1, we need to show

We cannot shatter any set of d + 2 points

For any d + 2 points

x1 , x2 , · · · , xd+1 , xd+2

More points than dimensions ⇒ linear dependent

X
xj = ai xi
i6=j

where not all ai ’s are zeros

VC dimension of perceptrons

X
xj = ai xi
i6=j

Now we construct a dichotomy that cannot be generated:

(
sign(ai ) if i 6= j
yi =
−1 if i = j
VC dimension of perceptrons

X
xj = ai xi
i6=j

Now we construct a dichotomy that cannot be generated:

(
sign(ai ) if i 6= j
yi =
−1 if i = j

For all i 6= j, assume the labels are correct: sign(ai ) = sign(w T xi )

⇒ ai w T x i > 0
VC dimension of perceptrons

X
xj = ai xi
i6=j

Now we construct a dichotomy that cannot be generated:

(
sign(ai ) if i 6= j
yi =
−1 if i = j

For all i 6= j, assume the labels are correct: sign(ai ) = sign(w T xi )

⇒ ai w T x i > 0
For j-th data, X
w T xj = ai w T xi > 0
i6=j

Therefore, yj = sign(w T xj ) = +1 (cannot be −1)

Putting it together

We proved for d-dimensional perceptrons

dVC ≤ d + 1 and dVC ≥ d + 1 ⇒ dVC = d + 1

Putting it together

We proved for d-dimensional perceptrons

dVC ≤ d + 1 and dVC ≥ d + 1 ⇒ dVC = d + 1

Number of parameters w0 , · · · , wd
d + 1 parameters!
Putting it together

We proved for d-dimensional perceptrons

dVC ≤ d + 1 and dVC ≥ d + 1 ⇒ dVC = d + 1

Number of parameters w0 , · · · , wd
d + 1 parameters!
Parameters create degrees of freedom
Examples

Positive rays: 1 parameters, dVC = 1

Examples

Positive rays: 1 parameters, dVC = 1

Positive intervals: 2 parameters, dVC = 2

Examples

Positive rays: 1 parameters, dVC = 1

Positive intervals: 2 parameters, dVC = 2

Not always true · · ·

dVC measures the effective number of parameters
Number of data points needed

1 2
P[|Ein (g ) − Eout (g )| > ] ≤ 4mH (2N)e − 8 N
| {z }
δ
If we want certain and δ, how does N depend on dVC ?
Number of data points needed

1 2
P[|Ein (g ) − Eout (g )| > ] ≤ 4mH (2N)e − 8 N
| {z }
δ
If we want certain and δ, how does N depend on dVC ?
Need N d e −N = small value
Number of data points needed

1 2
P[|Ein (g ) − Eout (g )| > ] ≤ 4mH (2N)e − 8 N
| {z }
δ
If we want certain and δ, how does N depend on dVC ?
Need N d e −N = small value

N is almost linear with dVC

Generalization Bounds
Rearranging things

Start from the VC inequality:

1 2
P[|Ein (g ) − Eout (g )| > ] ≤ 4mH (2N)e − 8 N
Rearranging things

Start from the VC inequality:

1 2
P[|Ein (g ) − Eout (g )| > ] ≤ 4mH (2N)e − 8 N

Get in terms of δ:
r
− 18 2 N 8 4mH (2N)
δ = 4mH (2N)e ⇒ = log
N δ
Rearranging things

Start from the VC inequality:

1 2
P[|Ein (g ) − Eout (g )| > ] ≤ 4mH (2N)e − 8 N

Get in terms of δ:
r
− 18 2 N 8 4mH (2N)
δ = 4mH (2N)e ⇒ = log
N δ

With probability 1 − δ,
r
8 4mH (2N)
Eout ≤ Ein + log
N δ
Learning curve
Conclusions

Next class: LFD 3.4

Questions?

Project Report (Design and Fabrication of A Hydraulic Arm Using Wood and Syringes)
No ratings yet
Project Report (Design and Fabrication of A Hydraulic Arm Using Wood and Syringes)
27 pages
Lecture27_vc
No ratings yet
Lecture27_vc
23 pages
VC-dimension For Characterizing Classifiers
No ratings yet
VC-dimension For Characterizing Classifiers
40 pages
SML_Lecture3
No ratings yet
SML_Lecture3
36 pages
Slides Lect 07
No ratings yet
Slides Lect 07
22 pages
vcdim
No ratings yet
vcdim
18 pages
MLSM Lecture3 190923
No ratings yet
MLSM Lecture3 190923
36 pages
Lecture16 VC
No ratings yet
Lecture16 VC
42 pages
Vapnik-Chervonenkis Dimension
No ratings yet
Vapnik-Chervonenkis Dimension
6 pages
VC Dimension
No ratings yet
VC Dimension
6 pages
VC-dim
No ratings yet
VC-dim
16 pages
hw2 5
No ratings yet
hw2 5
4 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
05 VC Theory
No ratings yet
05 VC Theory
11 pages
VC_Dim
No ratings yet
VC_Dim
22 pages
Thirteen 19240 PDF
No ratings yet
Thirteen 19240 PDF
17 pages
4 Solid Property
No ratings yet
4 Solid Property
30 pages
PAC
No ratings yet
PAC
45 pages
17-612
No ratings yet
17-612
17 pages
lect3
No ratings yet
lect3
4 pages
Lec14 PDF
No ratings yet
Lec14 PDF
7 pages
05-vc-bound
No ratings yet
05-vc-bound
27 pages
Untitled 13
No ratings yet
Untitled 13
3 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Unit6 - Lecture 26 1
No ratings yet
Unit6 - Lecture 26 1
23 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
VC_Dimension_Explanation
No ratings yet
VC_Dimension_Explanation
2 pages
ML Questions - GROUP - 08
No ratings yet
ML Questions - GROUP - 08
23 pages
hw2 Sol
No ratings yet
hw2 Sol
3 pages
Unit 1 ML_Ver 2
No ratings yet
Unit 1 ML_Ver 2
56 pages
The Bias Complexity Trade-Off: No Free Lunch Theorem, Error Decomposition
No ratings yet
The Bias Complexity Trade-Off: No Free Lunch Theorem, Error Decomposition
38 pages
A Practical Approach To Sizing Neural Networks
No ratings yet
A Practical Approach To Sizing Neural Networks
13 pages
lec10svm
No ratings yet
lec10svm
35 pages
PAC LEARNING
No ratings yet
PAC LEARNING
30 pages
Lec 10 SVM
No ratings yet
Lec 10 SVM
35 pages
Main
No ratings yet
Main
12 pages
03 Hypothesis Spaces Commented4
No ratings yet
03 Hypothesis Spaces Commented4
45 pages
Week_7_Notes[1]
No ratings yet
Week_7_Notes[1]
11 pages
Optimization
No ratings yet
Optimization
95 pages
How Many Samples To Learn A Finite Class?
No ratings yet
How Many Samples To Learn A Finite Class?
4 pages
Notes Chapter Feature Representation
No ratings yet
Notes Chapter Feature Representation
6 pages
svm (2)
No ratings yet
svm (2)
35 pages
20.NeuralNets Short
No ratings yet
20.NeuralNets Short
60 pages
ML Lecture 8
No ratings yet
ML Lecture 8
12 pages
Unit 1 ML_Ver 2
No ratings yet
Unit 1 ML_Ver 2
56 pages
Prof. Richardson Neuralnetworks
No ratings yet
Prof. Richardson Neuralnetworks
61 pages
Learnability Can Be Undecidable-Nicolelis
No ratings yet
Learnability Can Be Undecidable-Nicolelis
5 pages
ML_UNIT-1
No ratings yet
ML_UNIT-1
64 pages
LearningTheory
No ratings yet
LearningTheory
19 pages
svm and kernel
No ratings yet
svm and kernel
57 pages
L03 Slides.perceptron
No ratings yet
L03 Slides.perceptron
22 pages
Pac VC PDF
No ratings yet
Pac VC PDF
32 pages
Neural Network Theory22
No ratings yet
Neural Network Theory22
60 pages
Unit 3
No ratings yet
Unit 3
5 pages
ML-II UNIT-1
No ratings yet
ML-II UNIT-1
4 pages
Support Vector Machines and Artificial Neural Networks: Dr.S.Veena, Associate Professor/CSE
No ratings yet
Support Vector Machines and Artificial Neural Networks: Dr.S.Veena, Associate Professor/CSE
78 pages
QB UNIT 1
No ratings yet
QB UNIT 1
6 pages
VC Dimension Detailed
No ratings yet
VC Dimension Detailed
2 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
写作个人陈述的指南
100% (1)
写作个人陈述的指南
8 pages
Defining E-Leadership As Competence in ICT-Mediated Communications: An Exploratory Assessment
No ratings yet
Defining E-Leadership As Competence in ICT-Mediated Communications: An Exploratory Assessment
15 pages
Outsourced Teacher - UCSP - Evaluation
No ratings yet
Outsourced Teacher - UCSP - Evaluation
3 pages
PDF Drones to Go A Crash Course for Scientists and Makers 1st Edition Julio Alberto Mendoza-Mendoza download
100% (3)
PDF Drones to Go A Crash Course for Scientists and Makers 1st Edition Julio Alberto Mendoza-Mendoza download
50 pages
Digital Activism, Community Media, And Sustainable Communication In Latin America 1st Edition Edition Cheryl Martens 2024 scribd download
100% (1)
Digital Activism, Community Media, And Sustainable Communication In Latin America 1st Edition Edition Cheryl Martens 2024 scribd download
66 pages
1st SUBMISSION (JAN '19) - Corridor Window Shop Drawing - : 'The Cruise 326' Cicet, Puchong
No ratings yet
1st SUBMISSION (JAN '19) - Corridor Window Shop Drawing - : 'The Cruise 326' Cicet, Puchong
6 pages
Vectors Homework Answers
100% (1)
Vectors Homework Answers
6 pages
Dynamic Balancing of The Axial Compressor of A Gas Turbine
No ratings yet
Dynamic Balancing of The Axial Compressor of A Gas Turbine
3 pages
Gann Fans: 242 Aiq Tradingexpert Pro Reference Manual
No ratings yet
Gann Fans: 242 Aiq Tradingexpert Pro Reference Manual
5 pages
The Global Divides South and North
No ratings yet
The Global Divides South and North
9 pages
SWIMMING POOL LINER Renolit
No ratings yet
SWIMMING POOL LINER Renolit
23 pages
Phase Plan - Panini-Pinnacle 2224-12th Class
No ratings yet
Phase Plan - Panini-Pinnacle 2224-12th Class
1 page
Individual Assignment
No ratings yet
Individual Assignment
6 pages
AIP - Summary - 2023
No ratings yet
AIP - Summary - 2023
5 pages
Osm June 19
No ratings yet
Osm June 19
19 pages
Validation of Bernoulli's
No ratings yet
Validation of Bernoulli's
4 pages
pf_30
No ratings yet
pf_30
2 pages
Recommender Systems-Chapter 4
No ratings yet
Recommender Systems-Chapter 4
76 pages
Single Phase Energy Meter Calibration by Direct Loading
100% (1)
Single Phase Energy Meter Calibration by Direct Loading
3 pages
Word Problem Practice Conjectures and Counterexamples
No ratings yet
Word Problem Practice Conjectures and Counterexamples
1 page
Crio - Copy Business Operations - Case Study Assignment
No ratings yet
Crio - Copy Business Operations - Case Study Assignment
3 pages
Agphp
No ratings yet
Agphp
15 pages
Pengembangan Media Audio Visual Layanan Klasikal Bidang Karir Pada Materi Orientasi Masa Depan Di Sma Srijaya Negara Palembang
No ratings yet
Pengembangan Media Audio Visual Layanan Klasikal Bidang Karir Pada Materi Orientasi Masa Depan Di Sma Srijaya Negara Palembang
10 pages
116983-Article Text-324598-1-10-20150514
No ratings yet
116983-Article Text-324598-1-10-20150514
19 pages
Unit 4 PPT
No ratings yet
Unit 4 PPT
8 pages
Foundations of Quantum Field Theory 3rd Edition Klaus Dieter Rothe instant download
100% (2)
Foundations of Quantum Field Theory 3rd Edition Klaus Dieter Rothe instant download
62 pages
BTPSG Green Fund Second Party Opinion 09302019
No ratings yet
BTPSG Green Fund Second Party Opinion 09302019
10 pages
1. Đề thi thử TN THPT 2021 - Môn Tiếng Anh - Nhóm GV MGB - Đề 1 - File word có lời giải chi tiết
No ratings yet
1. Đề thi thử TN THPT 2021 - Môn Tiếng Anh - Nhóm GV MGB - Đề 1 - File word có lời giải chi tiết
7 pages
Goertzel Algorithm Generalized To Non-Integer Mult
No ratings yet
Goertzel Algorithm Generalized To Non-Integer Mult
9 pages

ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)

Uploaded by

ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)

Uploaded by

ECS171: Machine Learning

Lecture 8: VC Dimension (LFD 2.2)

The VC dimension of a hypothesis set H, denoted by dVC (H), is

the largest value of N for which mH (N) = 2N

“the most points H can shatter”

The VC dimension of a hypothesis set H, denoted by dVC (H), is

the largest value of N for which mH (N) = 2N

“the most points H can shatter”

N ≤ dVC (H) ⇒ H can shatter N points

The VC dimension of a hypothesis set H, denoted by dVC (H), is

the largest value of N for which mH (N) = 2N

“the most points H can shatter”

N ≤ dVC (H) ⇒ H can shatter N points

In terms of a break point k:

In terms of the VC dimension dVC :

dVC (H) is finite ⇒ g ∈ H will generalize

dVC (H) is finite ⇒ g ∈ H will generalize

dVC (H) is finite ⇒ g ∈ H will generalize

dVC (H) is finite ⇒ g ∈ H will generalize

We will prove dVC ≥ d + 1 and dVC ≤ d + 1

Easy! Just set w = X −1 y

Easy! Just set w = X −1 y

To show dVC ≤ d + 1, we need to show

We cannot shatter any set of d + 2 points

To show dVC ≤ d + 1, we need to show

We cannot shatter any set of d + 2 points

For any d + 2 points

More points than dimensions ⇒ linear dependent

where not all ai ’s are zeros

Now we construct a dichotomy that cannot be generated:

Now we construct a dichotomy that cannot be generated:

For all i 6= j, assume the labels are correct: sign(ai ) = sign(w T xi )

Now we construct a dichotomy that cannot be generated:

For all i 6= j, assume the labels are correct: sign(ai ) = sign(w T xi )

Therefore, yj = sign(w T xj ) = +1 (cannot be −1)

We proved for d-dimensional perceptrons

dVC ≤ d + 1 and dVC ≥ d + 1 ⇒ dVC = d + 1

We proved for d-dimensional perceptrons

dVC ≤ d + 1 and dVC ≥ d + 1 ⇒ dVC = d + 1

We proved for d-dimensional perceptrons

dVC ≤ d + 1 and dVC ≥ d + 1 ⇒ dVC = d + 1

Positive rays: 1 parameters, dVC = 1

Positive rays: 1 parameters, dVC = 1

Positive intervals: 2 parameters, dVC = 2

Positive rays: 1 parameters, dVC = 1

Positive intervals: 2 parameters, dVC = 2

Not always true · · ·

N is almost linear with dVC

Start from the VC inequality:

Start from the VC inequality:

Start from the VC inequality:

Next class: LFD 3.4

You might also like