hw2 2020

This document provides instructions for Homework 2 of the course Intro to Machine Learning (CS771A) at IIT Kanpur for the Autumn 2020 semester. It details submission requirements and instructions, including submitting a LaTeX PDF writeup via Gradescope and code via a Dropbox link. It outlines 6 problems to solve related to machine learning algorithms like logistic regression, perceptron, SVM, and K-means clustering. Students are instructed to typeset their solutions in LaTeX using a provided template and style file. Solutions must begin on new pages and figures/tables must be flushed before new questions. The programming problem involves implementing Gaussian and SVM classifiers on a binary classification dataset.

Uploaded by

Prakhar Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views3 pages

hw2 2020

Uploaded by

Prakhar Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Intro to Machine Learning (CS771A, Autumn 2020)

Homework 2
Due Date: November 25, 2020 (11:59pm)

Instructions:

• Only electronic submissions will be accepted. Your main PDF writeup must be typeset in LaTeX (please
also refer to the “Additional Instructions” below).

• The PDF writeup containing your solution has to be submitted via Gradescope https://www.gradescope.
com/ and the code for the programming part (to be submitted via this Dropbox link: https://
tinyurl.com/cs771-a20-hw2)

• We have created your Gradescope account (you should have received the notification). Please use your
IITK CC ID (not any other email ID) to login. Use the “Forgot Password” option to set your password.

Additional Instructions

• We have provided a LaTeX template file hw2sol.tex to help typeset your PDF writeup. There is also
a style file ml.sty that contain shortcuts to many of the useful LaTeX commends for doing things such
as boldfaced/calligraphic fonts for letters, various mathematical/greek symbols, etc., and others. Use of
these shortcuts is recommended (but not necessary).

• Your answer to every question should begin on a new page. The provided template is designed to do this
automatically. However, if it fails to do so, use the \clearpage option in LaTeX before starting the
answer to a new question, to enforce this.

• While submitting your assignment on the Gradescope website, you will have to specify on which page(s)
is question 1 answered, on which page(s) is question 2 answered etc. To do this properly, first ensure that
the answer to each question starts on a different page.

• Be careful to flush all your floats (figures, tables) corresponding to question n before starting the answer
to question n + 1 otherwise, while grading, we might miss your important parts of your answers.

• Your solutions must appear in proper order in the PDF file i.e. solution to question n must be complete in
the PDF file (including all plots, tables, proofs etc) before you present a solution to question n + 1.

• For the programming part, all the code and README should be zipped together and submitted as a single
file named yourrollnumber.zip. Please DO NOT submit the data provided.

1
Problem 1 (20 marks)
(Second-Order Optimization for Logistic Regression) Show that, for the logistic Pregression>model (assum-
ing each label yn ∈ {0, 1}, and no regularization) with loss function L(w) = − N n=1 (yn w xn − log(1 +
> (t+1) −1
exp(w xn ))), iteration t of a second-order optimization based update w = w − H(t) g (t) , where H
(t)

denotes the Hessian and g denotes the gradient, reduces to solving an importance-weighted regression problem
(t) (t)
of the form w(t+1) = arg minw N > 2 th
P
n=1 γn (ŷn − w xn ) , where γn denotes the importance of the n train-
ing example and ŷn denotes a modified real-valued label. Also, clearly write down the expression for both, and
provide a brief justification as to why the expression of γn makes intuitive sense here.
Problem 2 (20 marks)
(Perceptron with Kernels) We have seen that, due to the form of Perceptron P updates w = w + yn xn (ignore
the bias b), the weight vector learned by Perceptron can be written as w = N n=1 αn yn xn , where αn is the
number of times Perceptron makes a mistake on example n. Suppose our goal is to make Perceptron learn
nonlinear boundaries, using a kernel k with feature map φ. Modify the standard Perceptron algorithm to do
this. In particular, for this kernelized variant of the Perceptron algorithm (1) Give the initialization, (2) Give the
mistake condition, and (3) Give the update equation.
Problem 3 (20 marks)
(SVM with Unequal Class Importance) Sometimes it costs us a lot more to classify negative points as positive
than positive points as negative. (for instance, if we are predicting if someone has cancer then we would rather
err on the side of caution (predicting “yes” when the answer is “no”) than vice versa). One way of expressing
this in the support vector machine model is to assign different costs to the two kinds of mis-classification. The
primal formulation of this is:
N
||w||2 X
min + C yn ξ n
w,b,ξ 2
n=1

subject to yn (wT x n + b) ≥ 1 − ξn and ξn ≥ 0, ∀n.

The only difference is that instead of one cost parameter C, there are two, C+1 and C−1 , representing the costs
of misclassifying positive examples and misclassifying negative examples, respectively.
Write down the Lagrangian problem of this modified SVM. Take derivatives w.r.t. the primal variables and
construct the dual, namely, the maximization problem that depends only on the dual variables α, rather than
the primal variables. In your final PDF write-up, you need not give each of every step in these derivations
(e.g., standard steps of substituting and eliminating some variables) but do write down the key steps. Explain
(intuitively) how this differs from the standard SVM dual problem; in particular, how the C variables differ
between the two duals.
Problem 4 (20 marks)
(SGD for K-means Objective) Recall the K-means objective function: L = N
P PK 2
n=1 k=1 znk ||xn − µk || .
As we have seen, the K- means algorithm minimizes this objective by taking a greedy iterative approach of
assigning each point to its closest center (finding the znk ’s) and updating the cluster means {µk }K k=1 . The
standard K-means algorithm is a batch algorithm and uses all the data in every iteration. It however can be made
online by taking a random example xn at a time, and then (1) assigning xn “greedily” to the “best” cluster, and
(2) updating the cluster means using SGD on the objective L. Assuming you have initialized {µk }K k=1 randomly
and are reading one data point xn at a time,

• How would you solve step 1?

• What will be the SGD-based cluster mean update equations for step 2? Intuitively, why does the update
equation make sense?

2
• Note that the SGD update requires a step size. For your derived SGD update, suggest a good choice of
the step size (and mention why you think it is a good choice).

Problem 5 (20 marks)

(Kernel K-means) Assuming a kernel k with an infinite dimensional feature map φ (e.g., an RBF kernel), we
can neither store the kernel-induced feature map representation of the data points nor can store the cluster means
in the kernel-induced feature space. How can we still implement the kernel K-means algorithm in practice?
Justify your answer by sketching the algorithm, showing all the steps (initialization, cluster assignment, mean
computation), clearly giving the mathematical operations in each. In particular, what is the difference between
how the clusters means would need to be stored in kernel K-means versus how they are stored in standard K-
means? Finally, assuming each input to be D-dimensional in the original feature space, and N to be the number
of inputs, how does kernel K-means compare with standard K-means in terms of the cost of input to cluster
mean distance calculation (please answer this using the big O notation)?
Problem 6 (Programming Problem, 50 marks)
Part 1: You are provided a dataset in the file binclass.txt. In this file, the first two numbers on each line
denote the two features of the input xn , and the third number is the binary label yn ∈ {−1, +1}.
Implement a generative classification model for this data assuming Gaussian class-conditional distributions of
the positive and negative class examples to be N (x|µ+ , σ+ 2 I ) and N (x|µ , σ 2 I ), respectively. Note that
2 − − 2
here I2 denotes a 2 × 2 identity matrix. Assume the class-marginal to be p(yn = 1) = 0.5, and use MLE
estimates for the unknown parameters. Your implementation need not be specific to two-dimensional inputs and
it should be almost equally easy to implement it such that it works for any number of features (but it is okay if
your implementation is specific to two-dimensional inputs only).
On a two-dimensional plane, plot the examples from both the classes (use red color for positives and blue color
for negatives) and the learned decision boundary for this model. Note that we are not providing any separate
test data. Your task is only to learn the decision boundary using the provided training data and visualize it.
Next, repeat the same exercise but assuming the Gaussian class-conditional distributions of the positive and
negative class examples to be N (x|µ+ , σ 2 I2 ) and N (x|µ− , σ 2 I2 ), respectively.
Finally, try out an SVM classifier (with linear kernel) on this data (we’ve also provided the data in the format
libSVM requires) and show the learn decision boundary. For this part, you do not need to implement SVM.
There are many nice implementations of SVM available, such as the one in scikit-learn and the very popular
libSVM toolkit. Assume the “C” (or λ) hyperparameter of SVM in these implementations to be 1.
Part 2: Repeat the same experiments as you did for part 1 but now using a different dataset binclassv2.txt.
Looking at the results of both the parts, which of the two models (generative classification with Gaussian class-
conditional and SVM) do you think seems to work better for each of these datasets, and in general?
Deliverables: Include your plots (use a separate, appropriately labeled plot, for each case) and experimental
findings in the main writeup PDF. Submit your codes in a separate zip file on the provided Dropbox link. Please
comment the code so that it is easy to read and also provide a README that briefly explains how to run the
code. For the SVM part, you do not have to submit any code but do include the plots in the PDF (and mention
the software used - scikit-learn or libSVM).

CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Homework 1
0% (1)
Homework 1
4 pages
DR Nazir A. Zafar Advanced Algorithms Analysis and Design
0% (1)
DR Nazir A. Zafar Advanced Algorithms Analysis and Design
20 pages
Verilog Expt 3
No ratings yet
Verilog Expt 3
5 pages
HW 3
No ratings yet
HW 3
7 pages
CS273a Final Exam
No ratings yet
CS273a Final Exam
9 pages
Homework 2
No ratings yet
Homework 2
3 pages
HW 4
No ratings yet
HW 4
13 pages
Assignment 2 Specification
No ratings yet
Assignment 2 Specification
3 pages
CSCI 5521 Spring 2025 Final Exam
No ratings yet
CSCI 5521 Spring 2025 Final Exam
8 pages
Sample Final AI
No ratings yet
Sample Final AI
9 pages
Kinetic Theory of Gases Notes
No ratings yet
Kinetic Theory of Gases Notes
5 pages
HW 02
No ratings yet
HW 02
3 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
8 pages
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
No ratings yet
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
12 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
Ps 2
No ratings yet
Ps 2
11 pages
HW 1
No ratings yet
HW 1
12 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Qs ML
No ratings yet
Qs ML
8 pages
ML June 2024
No ratings yet
ML June 2024
12 pages
2011 End Spring 2011 Computer Science Machine Learning
No ratings yet
2011 End Spring 2011 Computer Science Machine Learning
10 pages
Machine Learning Assignments and Answers
No ratings yet
Machine Learning Assignments and Answers
35 pages
Col774 Ass1 v1
No ratings yet
Col774 Ass1 v1
5 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
COL 774: Assignment 2
No ratings yet
COL 774: Assignment 2
3 pages
CS2011 Ai & ML End Sem
No ratings yet
CS2011 Ai & ML End Sem
2 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
178 hw3
No ratings yet
178 hw3
3 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
HW 1
No ratings yet
HW 1
4 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
67 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
2024 Machine Learning
No ratings yet
2024 Machine Learning
8 pages
Ps 1
No ratings yet
Ps 1
9 pages
15-381 Spring 2007 Assignment 6: Learning
No ratings yet
15-381 Spring 2007 Assignment 6: Learning
14 pages
CS4100 CS5100 CW1 20241001
No ratings yet
CS4100 CS5100 CW1 20241001
10 pages
178 hw1
No ratings yet
178 hw1
4 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
DSCI 303: Machine Learning For Data Science Fall 2020
No ratings yet
DSCI 303: Machine Learning For Data Science Fall 2020
5 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
HW 7
No ratings yet
HW 7
4 pages
Machine Learning PYQ 2022 Ans
No ratings yet
Machine Learning PYQ 2022 Ans
17 pages
HW 1
No ratings yet
HW 1
3 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
Problem 1 Report Trần Minh Long 2052154 Final
No ratings yet
Problem 1 Report Trần Minh Long 2052154 Final
31 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
CSL7620 A2
No ratings yet
CSL7620 A2
2 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Week 4
No ratings yet
Week 4
3 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
University of Edinburgh College of Science and Engineering School of Informatics
No ratings yet
University of Edinburgh College of Science and Engineering School of Informatics
5 pages
CS 229, Summer 2019 Problem Set #2 Solutions
No ratings yet
CS 229, Summer 2019 Problem Set #2 Solutions
18 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet
3914.practical Methods of Optimization. Volume 1. Unconstrained Optimization by R. Fletcher
100% (1)
3914.practical Methods of Optimization. Volume 1. Unconstrained Optimization by R. Fletcher
126 pages
Digital Arithmetic - Ercegovac & Lang 2004 Chapter 7: Solutions To Exercises
No ratings yet
Digital Arithmetic - Ercegovac & Lang 2004 Chapter 7: Solutions To Exercises
6 pages
Cloud Security Lecture 3
No ratings yet
Cloud Security Lecture 3
37 pages
Van Liebergen - Machine Learning in Compliance Risk Management PDF
No ratings yet
Van Liebergen - Machine Learning in Compliance Risk Management PDF
8 pages
Complex Engineering Problem DCS 2020
No ratings yet
Complex Engineering Problem DCS 2020
1 page
Chapter Three: Simple Sorting and Searching Algorithms
No ratings yet
Chapter Three: Simple Sorting and Searching Algorithms
20 pages
EE320A Solutions For Tutorial 2
No ratings yet
EE320A Solutions For Tutorial 2
14 pages
Numerical Methods in Civil Engineering: Quiz 1 (15 Points)
No ratings yet
Numerical Methods in Civil Engineering: Quiz 1 (15 Points)
8 pages
Class Imbalance Should Not Throw You Off Balance - Choosing The Right Classifiers and Performance Metrics For Brain Decoding With Imbalanced Data
No ratings yet
Class Imbalance Should Not Throw You Off Balance - Choosing The Right Classifiers and Performance Metrics For Brain Decoding With Imbalanced Data
14 pages
Capgemini Coding Live Sessions Part 5 (30 Aug 2022)
No ratings yet
Capgemini Coding Live Sessions Part 5 (30 Aug 2022)
7 pages
DAA Notes
No ratings yet
DAA Notes
126 pages
Discrete Distributions
No ratings yet
Discrete Distributions
106 pages
Introduction To Computational Science
No ratings yet
Introduction To Computational Science
9 pages
Seleksi Variabel Personal Hygiene Omnibus Tests of Model Coefficients
No ratings yet
Seleksi Variabel Personal Hygiene Omnibus Tests of Model Coefficients
7 pages
Relationship Between Record Keeping Practices and
No ratings yet
Relationship Between Record Keeping Practices and
7 pages
LL Evaluationmatrics
No ratings yet
LL Evaluationmatrics
2 pages
1 Modeling of First Order Systems and Response Analysis 2
No ratings yet
1 Modeling of First Order Systems and Response Analysis 2
8 pages
Link DAMPER ELEMENT
No ratings yet
Link DAMPER ELEMENT
9 pages
Chapter 3 - Boosting Theory
No ratings yet
Chapter 3 - Boosting Theory
7 pages
Ada 1
No ratings yet
Ada 1
12 pages
Chapter 2. Signals and Spectra
No ratings yet
Chapter 2. Signals and Spectra
105 pages
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
No ratings yet
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
7 pages
Optimization Using Concept of Simulated Annealing
No ratings yet
Optimization Using Concept of Simulated Annealing
18 pages
Liquid Neural Networks A Novel Approach To Dynamic Information Processing
No ratings yet
Liquid Neural Networks A Novel Approach To Dynamic Information Processing
6 pages
BME314 Signalscw 6
No ratings yet
BME314 Signalscw 6
24 pages
09 Crypto Intro
No ratings yet
09 Crypto Intro
46 pages
Meadows or Malls - 3 24 21 - Report Writeup and Reflection
No ratings yet
Meadows or Malls - 3 24 21 - Report Writeup and Reflection
3 pages
Ba Sas
No ratings yet
Ba Sas
5 pages

hw2 2020

Uploaded by

hw2 2020

Uploaded by

Intro to Machine Learning (CS771A, Autumn 2020)

subject to yn (wT x n + b) ≥ 1 − ξn and ξn ≥ 0, ∀n.

• How would you solve step 1?

Problem 5 (20 marks)

You might also like