hw2 2020
hw2 2020
Homework 2
Due Date: November 25, 2020 (11:59pm)
Instructions:
• Only electronic submissions will be accepted. Your main PDF writeup must be typeset in LaTeX (please
also refer to the “Additional Instructions” below).
• The PDF writeup containing your solution has to be submitted via Gradescope https://www.gradescope.
com/ and the code for the programming part (to be submitted via this Dropbox link: https://
tinyurl.com/cs771-a20-hw2)
• We have created your Gradescope account (you should have received the notification). Please use your
IITK CC ID (not any other email ID) to login. Use the “Forgot Password” option to set your password.
Additional Instructions
• We have provided a LaTeX template file hw2sol.tex to help typeset your PDF writeup. There is also
a style file ml.sty that contain shortcuts to many of the useful LaTeX commends for doing things such
as boldfaced/calligraphic fonts for letters, various mathematical/greek symbols, etc., and others. Use of
these shortcuts is recommended (but not necessary).
• Your answer to every question should begin on a new page. The provided template is designed to do this
automatically. However, if it fails to do so, use the \clearpage option in LaTeX before starting the
answer to a new question, to enforce this.
• While submitting your assignment on the Gradescope website, you will have to specify on which page(s)
is question 1 answered, on which page(s) is question 2 answered etc. To do this properly, first ensure that
the answer to each question starts on a different page.
• Be careful to flush all your floats (figures, tables) corresponding to question n before starting the answer
to question n + 1 otherwise, while grading, we might miss your important parts of your answers.
• Your solutions must appear in proper order in the PDF file i.e. solution to question n must be complete in
the PDF file (including all plots, tables, proofs etc) before you present a solution to question n + 1.
• For the programming part, all the code and README should be zipped together and submitted as a single
file named yourrollnumber.zip. Please DO NOT submit the data provided.
1
Problem 1 (20 marks)
(Second-Order Optimization for Logistic Regression) Show that, for the logistic Pregression>model (assum-
ing each label yn ∈ {0, 1}, and no regularization) with loss function L(w) = − N n=1 (yn w xn − log(1 +
> (t+1) −1
exp(w xn ))), iteration t of a second-order optimization based update w = w − H(t) g (t) , where H
(t)
denotes the Hessian and g denotes the gradient, reduces to solving an importance-weighted regression problem
(t) (t)
of the form w(t+1) = arg minw N > 2 th
P
n=1 γn (ŷn − w xn ) , where γn denotes the importance of the n train-
ing example and ŷn denotes a modified real-valued label. Also, clearly write down the expression for both, and
provide a brief justification as to why the expression of γn makes intuitive sense here.
Problem 2 (20 marks)
(Perceptron with Kernels) We have seen that, due to the form of Perceptron P updates w = w + yn xn (ignore
the bias b), the weight vector learned by Perceptron can be written as w = N n=1 αn yn xn , where αn is the
number of times Perceptron makes a mistake on example n. Suppose our goal is to make Perceptron learn
nonlinear boundaries, using a kernel k with feature map φ. Modify the standard Perceptron algorithm to do
this. In particular, for this kernelized variant of the Perceptron algorithm (1) Give the initialization, (2) Give the
mistake condition, and (3) Give the update equation.
Problem 3 (20 marks)
(SVM with Unequal Class Importance) Sometimes it costs us a lot more to classify negative points as positive
than positive points as negative. (for instance, if we are predicting if someone has cancer then we would rather
err on the side of caution (predicting “yes” when the answer is “no”) than vice versa). One way of expressing
this in the support vector machine model is to assign different costs to the two kinds of mis-classification. The
primal formulation of this is:
N
||w||2 X
min + C yn ξ n
w,b,ξ 2
n=1
2
• Note that the SGD update requires a step size. For your derived SGD update, suggest a good choice of
the step size (and mention why you think it is a good choice).