0% found this document useful (0 votes)

114 views5 pages

10-601 Machine Learning: Homework 7: Instructions

This homework assignment on machine learning covers several topics: 1. K-means clustering can be modified to allow for non-linear separability between clusters by using kernels. This "kernel k-means" represents cluster centers and distances using only inner products between data points. 2. Dimensionality reduction techniques like PCA and autoencoders can be represented as neural networks. Kernel PCA finds principal components as linear combinations of data in a feature space and can be modeled by a network using "kernel nodes". 3. Co-training algorithms use two classifiers trained on different feature subsets to label unlabeled data. On a sample dataset, co-training fails if the classifiers produce identical thresholds, as this "groupthink" prevents

Uploaded by

Sasanka Sekhar Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views5 pages

10-601 Machine Learning: Homework 7: Instructions

Uploaded by

Sasanka Sekhar Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

10-601 Machine Learning: Homework 7

Due 5 p.m. Wednesday, April 22, 2015

Instructions
• Late homework policy: Homework is worth full credit if submitted before the due date, half credit
during the next 48 hours, and zero credit after that. You must turn in at least n−1 of the n homeworks
to pass the class, even if for zero credit.

• Collaboration policy: Homeworks must be done individually, except where otherwise noted in the
assignments. “Individually” means each student must hand in their own answers, and each student
must write and use their own code in the programming parts of the assignment. It is acceptable for
students to collaborate in figuring out answers and to help each other solve the problems, though you
must in the end write up your own solutions individually, and you must list the names of students you
discussed this with. We will be assuming that, as participants in a graduate course, you will be taking
the responsibility to make sure you personally understand the solution to any work arising from such
collaboration.
• Online submission: You must submit your solutions online on autolab. We recommend that you
use LATEX to type your solutions to the written questions, but we will accept scanned solutions as well.
On the Homework 7 autolab page, you can download the template, which is a tar archive containing
a blank placeholder pdf for the written questions. Replace each pdf file with one that contains your
solutions to the written questions. When you are ready to submit, create a new tar archive of the
top-level directory and submit your archived solutions online by clicking the “Submit File” button.
You should submit a single tar archive identical to the template, except with the blank pdfs replaced
by your solutions for the written questions. You are free to submit as many times as you like. DO
NOT change the name of any of the files or folders in the submission template. In other words,
your submitted files should have exactly the same names as those in the submission template. Do not
modify the directory structure.

Problem 1: k-means Clustering [40 pt + 10 Extra Credit]

Recall that in k−means clustering we attempt to find k cluster centers cj ∈ Rd , j ∈ {1, . . . , k} such that
the total distance between each datapoint and the nearest cluster center is minimized. In other words, we
attempt to find c1 , . . . , ck that minimizes
n
X
min kxi − cj k2 , (1)
j∈{1,...,k}
i=1

where n is the number of data points. To do so, we iterate between assigning xi to the nearest cluster
center and updating each cluster center cj to the average of all points assigned to the j th cluster.

Choosing k is no easy matter

1. [10 pt] Instead of holding the number of clusters k fixed, one can think of minimizing (1) over both k
and c. Show that this is a bad idea. Specifically, what is the minimum possible value of (1) ? what
values of k and c result in this value ?

1
k-means can be kernelized too !
k-means with Euclidean distance metric assumes that each pair of clusters is linearly separable. This may
not be the case. A classical example is where we have two clusters corresponding to data points on two
concentric circles in the R2 plane. We have seen that we can use kernels to obtain a non-linear version of
an algorithm that is linear by nature and k-means is no exception. Recall that there are two main aspects
of kernelized algorithms: (i) the solution is expressed as a linear combination of training examples, (ii) the
algorithm relies only on inner products between data points rather than their explicit representation. We
will show that these two aspects can be satisfied in k-means.

2. [5 pt] Let zij be an indicator that is equal to 1 if the xi is currently assigned to the j th cluster
Pn and 0
otherwise (1 ≤ i ≤ n and 1 ≤ j ≤ k). Show that the j th cluster center cj can be updated as i=1 αij xi .
Specifically, show how αij can be computed given all z’s.
3. [5 pt] Given two data points x1 and x2 , show that the square distance kx1 − x2 k2 can be computed
using only (linear combinations of) inner products.
4. [5 pt] Given the results of parts 2 and 3, show how to compute the square distance kxi − cj k2 using
only (linear combinations of) inner products between the data points x1 , . . . , xn .
Note: This means that given a kernel K, we can run Lloyd’s algorithm. We begin with some initial
data points as centers and use the answer to part 2 to find the closest center for each data point, giving
us the initial zij ’s. We then repeatedly use the answer to part 3 to reassign the points to centers and
update the zij ’s.

k-means for single dimensional data

In general, minimizing (1) for a fixed k is an NP-hard problem. However it can be solved in polynomial time
if the datapoints are single dimensional (d = 1). For the remaining of this question, we will focus on that
case.

5. [2 pt] Consider the case where k = 3 and we have 4 data points x1 = 1, x2 = 2, x3 = 5, x4 = 7. What
is the optimal clustering for this data ? What is the corresponding value of the objective (1) ?
6. [3 pt] One might be tempted to think that Lloyd’s algorithm is guaranteed to converge to the global
minimum when d = 1. Show that there exists a suboptimal cluster assignment for the data in part 5
that Lloyd’s algorithm will not be able to improve (to get full credit, you need to show the assignment,
show why it is suboptimal and explain why it will not be improved).
7. [10 pt] Assume we sort our data points such that x1 ≤ x2 ≤ · · · ≤ xn , prove that an optimal cluster
assignment has the property that each cluster corresponds to some interval of points. That is, for each
cluster j there exists i1 , i2 such that the cluster consists of {xi1 , xi1 +1 , . . . , xi2 }.

8. (Extra Credit [10 pt]) Develop an O(kn2 ) dynamic programming algorithm for single dimensional
k-means. [Hint: From part 7, what we need to optimize are k − 1 cluster boundaries where the ith
boundary marks the largest data point in the ith cluster.]

2
Problem 2: Dimensionality Reduction and Representation Learn-
ing [30 pt]
In this question, we explore the relation between PCA, kernel PCA and auto encoder neural networks
(trained to output the same vector they receive as input). We will use n and d to denote the number and
dimensionality of the given data points respectively.

1. [10 pt] Consider an auto encoder with a single hidden layer of k nodes. Let wij denote the weight of
the edge from the ith input node to the j th hidden node. Similarly, let vij denote the weight of the
edge from the ith hidden node to the j th output node. Show how you can set the activation functions
of hidden and output nodes as well as the weights wij and vij such that the resulting auto encoder
resembles PCA.
2. [10 pt] Kernel PCA is a non-linear dimensionality reduction reduction where a principal vector uj is
computed as a linear combination of training examples in the feature space
n
X
uj = αij φ(xi ).
i=1

Computing the principal component of a new point x can then be done using kernel evaluations
n
X n
X
zj (x) = huj , φ(x)i = αij hφ(xi ), φ(x)i = αij k(xi , x).
i=1 i=1

You will show that kernel PCA can be represented by a neural network. First we define a kernel
node. A kernel node with a vector wi of incoming weights and an input vector x computes the output
y = k(x, wi ). Show that, given a data set x1 , . . . , xn , there exists a network with a single hidden layer
and the output of the network is the kernel principal components z1 (x), . . . , zk (x) for a given input x.
Specify the number of nodes in the input, output and hidden layers, the type and activation function
of hidden and output nodes , and the weights of the edges in terms of α, x1 , . . . , xn .
3. [5 pt] What is the number of parameters (weights) required to store the network in part 2?
4. [5 pt] Another way to do non-linear dimensionality reduction is to train an auto encoder with non-linear
activation functions (e.g. sigmoid) in the hidden layers. State one advantage and one disadvantage of
that approach compared to kernel PCA.

3
Problem 3: Co-training Doesn’t Like Groupthink [30 pt]
Consider the data set in figure 1. Each data point has two features. Circled data points are unlabeled
points where the true label (shown inside the circle) is invisible to the learning algorithm. We will use this
dataset to co-train two threshold based classifiers C1 and C2 where Ci is trained using feature i and produces
a decision threshold on feature i that maximizes the margin between training examples and the threshold.
The ”confidence” of a classifier for a new data points is measured by how far away it is from the threshold. In
particular, the farther away the point is from the decision boundary, the more confident the classifier is. We
will run iterative co-training such that, in each iteration, each classifier adds the unlabeled example it is most
confident about to the training data. Assume that co-training halts when, for each classifier, the unlabeled
point that is farthest from the threshold (i.e. most confident) is between the largest known negative example
and the smallest known positive example (at that point, the algorithm deems the unlabeled examples too
uncertain to label for the other classifier).

Figure 1: Data set for problem 3.

1. [10 pt] Explain what happens in a single iteration of co-training. Specifically, illustrate:
• The initial thresholds produced by C1 and C2 given labeled examples.
• The new labeled example (coordinates and label) that will be provided to C2 by C1 and vice
versa.
• The new thresholds after incorporating the new examples.
• The number of data points misclassified by C1 using the initial and updated thresholds.

4
2. [15 pt] Now assume that we train both C1 and C2 using feature 1. Therefore they share the same
view of the data. What happens if we run co-training to completion ? What are the initial thresholds,
which unlabeled example will be added in each iteration at what are the final thresholds ?
3. [5 pt] Based on your observations of parts 1 and 2, provide an intuitive explanation (in no more than
two lines) for why having features that satisfy independence given the label helps co-training to be
successful.

Extra Credit Problem: From Active Heaven to Adversarial Hell

[10 pt]
In this problem we will explore how the rate by which an algorithm learns a concept can drastically change
based on how labeled examples are obtained. For that we look at three settings: (i) an active learning setting
where the algorithm has the luxury of specifying a data point and querying its label, (ii) a passive learning
setting where labeled examples are drawn at random and (iii) an adversarial setting where training examples
are given by an adversary that tries to make your life hard.
Consider a binary classification problem where each data point consists of d binary features. Let H be
the hypothesis class of conjunctions of subsets of the d features and their negations. So for example one
hypothesis could be h1 (x) = x1 ∧ x2 ∧ ¬xd (where ∧ denotes the logical ”and” and ¬ denotes the logical
”not”). Another hypothesis could be h2 (x) = ¬x3 ∧x5 . A third hypothesis could be h3 (x) = 1 (a conjunction
of zero features). A conjunction in H cannot contain both xi and ¬xi . We assume a consistent learning
scenario where there exists a hypothesis h∗ ∈ H that is consistent with the true label for all data points.

1. [4 pt] In the active learning setting, the learning algorithm can query the label of an unlabeled example.
Assume that you can query any possible example. Show that, starting with a single positive example,
you can exactly learn the true hypothesis h∗ using d queries.
2. [3 pt] In the passive learning setting, the examples are drawn i.i.d from an unknown distribution.
According to PAC learning thoery, how many examples (in big-O notation) are required to guarantee
a generalization error less than with probability 1 − δ ? (Hint: the VC dimension of the class of
conjunctions of d binary features is d)
Note: The result of part 1 is much stronger that that of part 2; it guarantees that the classifier will
exactly learn the true hypothesis with probability 1. PAC learning guarantees, on the other hand,
would require an infinite number of examples as the error and probability of failure δ got to 0. In
other words, it is ”surely exactly correct” compared to ”probably approximately correct”.
3. [3 pt] Show that if the training data is not representative of the underlying distribution, a consistent
hypothesis can perform poorly. Specifically, assume that the true hypothesis h∗ is a conjunction of k
out of the d features for some k > 0 and that all possible data points are equally likely. Show that
there exists a training set of 2(d−k) unique examples and a hypothesis ĥ that is consistent with this
training set but achieves a classification error ≥ 50% when tested on all possible data points.
Note: The result of part 3 does not contradict that of part 2; the adversarial unrepresentative sample
given in part 3 could still occur with random i.i.d sampling. The probability of having such unrepre-
sentative training sets is included in the failure probability δ.

MidaCrochet PSYDUCK CAPTAIN
100% (1)
MidaCrochet PSYDUCK CAPTAIN
15 pages
Final Solution
No ratings yet
Final Solution
12 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Lesson 44 - Place Value and Value of A Digit in A Given Decimal Number Through Hundredths
100% (4)
Lesson 44 - Place Value and Value of A Digit in A Given Decimal Number Through Hundredths
15 pages
Technical Delay Report
100% (1)
Technical Delay Report
1 page
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Lecture10 Mid
No ratings yet
Lecture10 Mid
43 pages
EE2211 Past Paper Ans
No ratings yet
EE2211 Past Paper Ans
19 pages
Kamala Das Poems
No ratings yet
Kamala Das Poems
14 pages
VetcoGray S-Series SVXT
No ratings yet
VetcoGray S-Series SVXT
2 pages
Stochastic Gradient Descent 1
No ratings yet
Stochastic Gradient Descent 1
42 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Machine Learning Assignments and Answers
No ratings yet
Machine Learning Assignments and Answers
35 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
HW 3
No ratings yet
HW 3
7 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
BJT Biasing (Complete)
No ratings yet
BJT Biasing (Complete)
64 pages
Final Exam Epfl 2020 Machine Leaning
No ratings yet
Final Exam Epfl 2020 Machine Leaning
16 pages
Final 2006
No ratings yet
Final 2006
15 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Manual de Mantenimiento S331D
No ratings yet
Manual de Mantenimiento S331D
32 pages
Endsem ML Regular AK
No ratings yet
Endsem ML Regular AK
7 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
No ratings yet
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
12 pages
Laphormur F7 - Rieter Manual
No ratings yet
Laphormur F7 - Rieter Manual
391 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
hw7 Sol
No ratings yet
hw7 Sol
12 pages
Final Exam: CS 189 Spring 2020 Introduction To Machine Learning
No ratings yet
Final Exam: CS 189 Spring 2020 Introduction To Machine Learning
19 pages
EE4146 Test1 202324 Semb Solution
No ratings yet
EE4146 Test1 202324 Semb Solution
7 pages
Int 10
No ratings yet
Int 10
21 pages
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
No ratings yet
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
25 pages
ML 20240315
No ratings yet
ML 20240315
8 pages
hw1 PDF
No ratings yet
hw1 PDF
6 pages
Machine Learning Solutions
No ratings yet
Machine Learning Solutions
6 pages
INAIO Stage 2 Sample Problems MLTheory
No ratings yet
INAIO Stage 2 Sample Problems MLTheory
6 pages
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
No ratings yet
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
12 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Homework4 v1.0
No ratings yet
Homework4 v1.0
5 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
AI42001 Practice 2
No ratings yet
AI42001 Practice 2
4 pages
Quiz 1-A
No ratings yet
Quiz 1-A
5 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
Sample Quiz1 Questions
No ratings yet
Sample Quiz1 Questions
8 pages
Set3sol 2022
No ratings yet
Set3sol 2022
3 pages
HW02 - KNN DT
No ratings yet
HW02 - KNN DT
3 pages
CS273a Final Exam
No ratings yet
CS273a Final Exam
9 pages
Quiz3 2023
No ratings yet
Quiz3 2023
2 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
HW 02
No ratings yet
HW 02
3 pages
Quiz 1
No ratings yet
Quiz 1
3 pages
CS-3035 (ML) - CS Mid March 2023
No ratings yet
CS-3035 (ML) - CS Mid March 2023
3 pages
js6 PDF
No ratings yet
js6 PDF
5 pages
Ex 83622 2025 1
No ratings yet
Ex 83622 2025 1
2 pages
HW 3
No ratings yet
HW 3
5 pages
Problem Sheet 1
No ratings yet
Problem Sheet 1
3 pages
Homework Set 3
No ratings yet
Homework Set 3
7 pages
Answer All Questions, Each Carries 4 Marks
No ratings yet
Answer All Questions, Each Carries 4 Marks
3 pages
REFLEX ACT III™ Quick User Guide v12
100% (1)
REFLEX ACT III™ Quick User Guide v12
20 pages
CS 229, Summer 2019 Problem Set #1 Solutions
No ratings yet
CS 229, Summer 2019 Problem Set #1 Solutions
22 pages
HP250 G7 Laptop PDF
No ratings yet
HP250 G7 Laptop PDF
4 pages
141
No ratings yet
141
12 pages
CS 229, Summer 2019 Problem Set #3 Solutions
No ratings yet
CS 229, Summer 2019 Problem Set #3 Solutions
19 pages
CS 229, Summer 2019 Problem Set #2 Solutions
No ratings yet
CS 229, Summer 2019 Problem Set #2 Solutions
18 pages
2as Scientific Streams 2020
No ratings yet
2as Scientific Streams 2020
6 pages
Trevor Ivan - Final Assessment
No ratings yet
Trevor Ivan - Final Assessment
3 pages
Tripping Batteries
No ratings yet
Tripping Batteries
5 pages
History of Computers DBS
No ratings yet
History of Computers DBS
34 pages
Integrado POFF - AD7858AN Datasheet
No ratings yet
Integrado POFF - AD7858AN Datasheet
32 pages
T2222-Advanced Operation Research
No ratings yet
T2222-Advanced Operation Research
3 pages
Maleic Anhydride Plant Design
No ratings yet
Maleic Anhydride Plant Design
46 pages
Activity 2.1 Scavenger Hunt Form
No ratings yet
Activity 2.1 Scavenger Hunt Form
2 pages
Computer Graphics Solution Manual Hearn and Baker
No ratings yet
Computer Graphics Solution Manual Hearn and Baker
5 pages
CS 229, Autumn 2016 Problem Set #0 Solutions: Linear Algebra and Multivariable Calculus
No ratings yet
CS 229, Autumn 2016 Problem Set #0 Solutions: Linear Algebra and Multivariable Calculus
4 pages
Corex Delivery
No ratings yet
Corex Delivery
37 pages
Tracy Resume
No ratings yet
Tracy Resume
2 pages
NOTES LIFE PROCESSES (Respiration, Excretion
No ratings yet
NOTES LIFE PROCESSES (Respiration, Excretion
3 pages
NL2SQL Schema Linked Guide
No ratings yet
NL2SQL Schema Linked Guide
4 pages
Pran Yog
No ratings yet
Pran Yog
3 pages
Pas Bahasa Inggris Kelas Ix
No ratings yet
Pas Bahasa Inggris Kelas Ix
7 pages
Nippon Paints
No ratings yet
Nippon Paints
19 pages
Examen Parcial AMERICA
No ratings yet
Examen Parcial AMERICA
11 pages
The Nearest Neighbour Algorithm
No ratings yet
The Nearest Neighbour Algorithm
3 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet

10-601 Machine Learning: Homework 7: Instructions

Uploaded by

10-601 Machine Learning: Homework 7: Instructions

Uploaded by

10-601 Machine Learning: Homework 7

Due 5 p.m. Wednesday, April 22, 2015

Problem 1: k-means Clustering [40 pt + 10 Extra Credit]

Choosing k is no easy matter

k-means for single dimensional data

Figure 1: Data set for problem 3.

Extra Credit Problem: From Active Heaven to Adversarial Hell

You might also like