TD1 SVM
TD1 SVM
Exercise II In contrast to ordinary least squares which has a cost function, called ridge regression
m
1 X > (i)
J(θ) = (θ x − y (i) )2
2 i=1
1. Find a closed-form expression for the value of θ which minimizes the ridge regression cost function.
2. Suppose that we want to use kernels to implicitly represent our feature vectors in a high-dimensional
space. Using a feature mapping φ, the ridge regression cost function becomes
m
1X > λ
J(θ) = (θ φ(x(i) ) − y (i) )2 + kθk2
2 i=1 2
Making a prediction on a new input xnew would now be done by computing θ> φ(xnew ). Show how
we can use the kernel strick to obtain a closed form for the prediction one the new input without ever
explicitly computing φ(xnew ). You may assume thatPmthe parameter vector θ can be expressed as a linear
combination of the input feature vectors i.e. θ = i=1 αi φ(x(i) ) for some set of parameters αi .
Exercise III In class, we saw that if our data is not linearly separable, then we need to modify our support
vector machine algorithm by introducing an error margin that must be minimized. Specifically, the formulation
we have looked at is known as the `1 norm soft margin SVM. In this problem we will consider an alternative
method, known as the `2 norm soft margin SVM. This new algorithm is given by the following optimization
problem (notice that the slack penalties are now squared):
n
1 CX 2
min kwk2 + ξ
,
w∈R b∈R,ξ 2 2 i=1 i
s.t. yi (w> xi + b) ≥ 1 − ξi ∀i = 1, . . . , n
1. Notice that we have dropped the ξi ≥ 0 constraint in the `2 problem. Show that these non-negativity
constraints can be removed. That is, show that the optimal value of the objective will be the same
whether or not these constraints are present.
2. What is the Lagrangian of the `2 soft margin SVM optimization problem?
3. Minimize the Lagrangian with respect to w, b, and ξ.
4. What is the dual of the `2 soft margin SVM optimization problem?