0% found this document useful (0 votes)

12 views3 pages

hw2 4

Uploaded by

explosion4601

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views3 pages

hw2 4

Uploaded by

explosion4601

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Mehryar Mohri

Foundations of Machine Learning 2015

Courant Institute of Mathematical Sciences
Homework assignment 2
October 23, 2015
Due: November 09, 2015
A. VC-dimension of convex combinations

1. Let H be a family of functions mapping from an input space X to

{−1, +1} and let T be a positive integer. Give an upper bound on the
VC-dimension of the family of functions FT defined by
T T
( ! )
X X
F = sgn αt ht : ht ∈ H, αt ≥ 0, αt ≤ 1 .
t=1 t=1

(Hint: you can use Problem C. of (Foundations of Machine Learning,

HW2, 2014, http://www.cs.nyu.edu/~mohri/ml14/hw2.pdf and its
solution).

B. Growth function

1. A linearly separable labeling of a set X of vectors in Rd is a classification

of X into two sets X + and X − with X + = {x ∈ X : w · x > 0} and
X − = {x ∈ X : w · x < 0} for some w ∈ Rd .
Let X = {x1 , . . . , xm } be a subset of Rd .

(a) Let {X + , X − } be a dichotomy of X and let xm+1 ∈ Rd . Show

that {X + ∪ {xm+1 }, X − } and {X + , X − ∪ {xm+1 }} are linearly
separable by a hyperplane going through the origin if and only
if {X + , X − } is linearly separable by a hyperplane going through
the origin and xm+1 .
(b) Let X = {x1 , . . . , xm } be a subset of Rd such that any k-element
subset of X with k ≤ d is linearly independent. Then,Pthe number
of linearly separable labelings of X is C(m, d) = 2 d−1 m−1

k=0 k .
(Hint: prove by induction that C(m + 1, d) = C(m, d) + C(m, d −
1).
(c) Let f1 , . . . , fp be p functions mapping Rd to R. Define F as
the family of classifiers based on linear combinations of these

1
functions:
p
X
F= x 7→ sgn ak fk (x) : a1 , . . . , ap ∈ R .
k=1

Define Ψ by Ψ(x) = (f1 (x), . . . , fp (x)). Assume that there exists

x1 , . . . , xm ∈ Rd such that every p-subset of {Ψ(x1 ), . . . , Ψ(xm )}
is linearly independent. Then, show that
p−1
X m−1
ΠF (m) = 2 .
i
i=0

C. Support Vector Machines

1. Download and install the libsvm software library from:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/ ,

and briefly consult the documentation to become more familiar with

the tools.

2. Consider the splice data set

http://www.cs.toronto.edu/~delve/data/splice/desc.html.

Download the already formatted training and test files of a noisy ver-
sion of that dataset from
http://www.cs.nyu.edu/~mohri/ml15/splice_noise_train.txt
http://www.cs.nyu.edu/~mohri/ml15/splice_noise_test.txt.

Use the libsvm scaling tool to scale the features of all the data. The
scaling parameters should be computed only on the training data and
then applied to the test data.

3. Consider the corresponding binary classification which consists of dis-

tinguishing two types of splice junctions in DNA sequences using about
60 features. Use SVMs combined with polynomial kernels to tackle this
problem.
To do that, randomly split the training data into ten equal-sized dis-
joint sets. For each value of the polynomial degree, d = 1, 3, 5, plot the

2
average cross-validation error plus or minus one standard deviation as
a function of C (let other parameters of polynomial kernels in libsvm
be equal to their default values), varying C in powers of 5, starting
from a small value C = 5−k to C = 5k , for some value of k. k should be
chosen so that you see a significant variation in training error, starting
from a very high training error to a low training error. Expect longer
training times with libsvm as the value of C increases.

4. Let (C ∗ , d∗ ) be the best pair found previously. Fix C to be C ∗ . Plot the

ten-fold cross-validation error and the test errors for the hypotheses
obtained as a function of d. Plot the average number of support vectors
obtained as a function of d. How many of the support vectors lie on
the marginal hyperplanes? Plot the soft margin of the solution as a
function of d.

5. Now, combine SVMs with Gaussian kernels to tackle the same task.
Use cross-validation as before to determine the best value of C and σ,
varying C in powers of 5, and σ in powers of 2 for a reasonable range
so that you see a significant variation in training error, as before. Fix
C and σ to the best values found via cross-validation. How does the
test error of the solution compare to the best result obtained using
polynomial kernels? What is the value of the soft margin?

6. Here, use as a kernel the sum of the best polynomial kernel (degree
d∗ ) and the Gaussian kernel with the best parameter σ you found in
the previous question. Use cross-validation as before to determine the
best value of C. How does the test error of the solution compare to
the best result obtained in the previous questions?

D. Kernels

Show that the following kernels are PDS.

PN n (x2 −
1. Let n be a positive integer. K is defined by K(x, y) = i=1 cos i
yi2 ) for all (x, y) ∈ RN × RN .
kx−yk
2. Let σ be a positive real number. K is defined by K(x, y) = e− σ for
all (x, y) ∈ RN × RN (Hint: you could show that K is the normalized
kernel of a kernel K 0 and show that K 0 is PDS using the following
R +∞ 1−e−tkx−yk2
equality: kx − yk = 2Γ(1 1 ) 0 3 dt valid for all x, y).
2 t2

Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
SVM Extra Kernels
No ratings yet
SVM Extra Kernels
29 pages
EGIRAFFE Computational Intelligence UE - DZ - H - Hausuebung - 2020SS - Assignment 4
No ratings yet
EGIRAFFE Computational Intelligence UE - DZ - H - Hausuebung - 2020SS - Assignment 4
15 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
No ratings yet
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
15 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
EIE520 Neural Computation: The Hong Kong Polytechnic University
No ratings yet
EIE520 Neural Computation: The Hong Kong Polytechnic University
14 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
HW 3
No ratings yet
HW 3
3 pages
B43 Exp3 ML
No ratings yet
B43 Exp3 ML
5 pages
HW2 2
No ratings yet
HW2 2
3 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
All JEE JAN 2024 PYQs Matrices & Determinant
No ratings yet
All JEE JAN 2024 PYQs Matrices & Determinant
84 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lect 3
No ratings yet
Lect 3
14 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
hw2 3
No ratings yet
hw2 3
3 pages
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
No ratings yet
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
12 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Signal System Handwitten Notes
No ratings yet
Signal System Handwitten Notes
191 pages
Matrices and Linear Algebra
No ratings yet
Matrices and Linear Algebra
258 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Applied Comp Methods Notes1
No ratings yet
Applied Comp Methods Notes1
13 pages
Grade 12 Mathematics
No ratings yet
Grade 12 Mathematics
9 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
LADR4 e
No ratings yet
LADR4 e
404 pages
AI18-Support Vector Machines
No ratings yet
AI18-Support Vector Machines
24 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
SVM
No ratings yet
SVM
36 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Balki QM Problem Set PDF
No ratings yet
Balki QM Problem Set PDF
50 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Comparative Analysis of Coiflet and Daubechies Wavelets Using Global Threshold For Image De-Noising
No ratings yet
Comparative Analysis of Coiflet and Daubechies Wavelets Using Global Threshold For Image De-Noising
8 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
1B Methods Lecture Notes: Richard Jozsa, DAMTP Cambridge Rj310@cam - Ac.uk
No ratings yet
1B Methods Lecture Notes: Richard Jozsa, DAMTP Cambridge Rj310@cam - Ac.uk
26 pages
An Improved Training Algorithm For Support Vector Machines
No ratings yet
An Improved Training Algorithm For Support Vector Machines
10 pages
S&S Handouts Iammanuprasadlearn
No ratings yet
S&S Handouts Iammanuprasadlearn
36 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
ECEN 687 VLSI Design Automation: Nonlinear Programming
No ratings yet
ECEN 687 VLSI Design Automation: Nonlinear Programming
26 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
CS 726: Nonlinear Optimization 1 Lecture 04A: Matrix Background
No ratings yet
CS 726: Nonlinear Optimization 1 Lecture 04A: Matrix Background
24 pages
10 SVM
No ratings yet
10 SVM
23 pages
ConvexOptimization Boyd Slides
No ratings yet
ConvexOptimization Boyd Slides
394 pages
Problem 1
No ratings yet
Problem 1
17 pages
Differential Calculus
No ratings yet
Differential Calculus
9 pages
Singular Value Decomposition
No ratings yet
Singular Value Decomposition
3 pages
Variational Formulation and Optimal Control of Fra
No ratings yet
Variational Formulation and Optimal Control of Fra
14 pages
Econ 605 - Lecture 2
No ratings yet
Econ 605 - Lecture 2
38 pages
Differentiation Introduction
No ratings yet
Differentiation Introduction
37 pages
Reproducing Kernel Hilbert Spaces
No ratings yet
Reproducing Kernel Hilbert Spaces
5 pages
Tensors Presentation
No ratings yet
Tensors Presentation
7 pages
05 - Integration Substitution
No ratings yet
05 - Integration Substitution
4 pages
JM S40 Matrices
No ratings yet
JM S40 Matrices
11 pages
Exercise Sheet 5: Quantum Information - Summer Semester 2020
No ratings yet
Exercise Sheet 5: Quantum Information - Summer Semester 2020
3 pages
Type of Matrices
No ratings yet
Type of Matrices
1 page
Inverse Laplace Transform
No ratings yet
Inverse Laplace Transform
2 pages
1991 JDDE Miklavcic
No ratings yet
1991 JDDE Miklavcic
20 pages
Ch3 Matrix Assignment
No ratings yet
Ch3 Matrix Assignment
2 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
11th Maths EM Vol2 Study Materials English Medium PDF Download
No ratings yet
11th Maths EM Vol2 Study Materials English Medium PDF Download
21 pages
SVM
No ratings yet
SVM
21 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Lesson Plan - Signals & Systems 2012
No ratings yet
Lesson Plan - Signals & Systems 2012
3 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)

hw2 4

Uploaded by

hw2 4

Uploaded by

Mehryar Mohri

Foundations of Machine Learning 2015

1. Let H be a family of functions mapping from an input space X to

(Hint: you can use Problem C. of (Foundations of Machine Learning,

1. A linearly separable labeling of a set X of vectors in Rd is a classification

(a) Let {X + , X − } be a dichotomy of X and let xm+1 ∈ Rd . Show

Define Ψ by Ψ(x) = (f1 (x), . . . , fp (x)). Assume that there exists

C. Support Vector Machines

1. Download and install the libsvm software library from:

and briefly consult the documentation to become more familiar with

2. Consider the splice data set

3. Consider the corresponding binary classification which consists of dis-

4. Let (C ∗ , d∗ ) be the best pair found previously. Fix C to be C ∗ . Plot the

Show that the following kernels are PDS.

You might also like