HW02 - KNN DT
HW02 - KNN DT
This homework consists of 3 theoretical/intuitive problems related to kNN, 1 related to the Curse of Di-
mensionality, 2 related to Decision Trees and one programming task implementing the kNN algorithm.
Homework
kNN Classification
Problem 1: Compute the distance between
1 −1
x = 3 , and y = 0
5 4
using
a) ⟨x, y⟩ := xT y
2 1 0
b) ⟨x, y⟩ := xT Ay, A = 1 3 −1
0 −1 2
Name x1 x2 class
A 0.5 4.0 1
B 1.0 3.0 1
C 1.5 3.0 1
D 3.5 2.0 2
E 6.0 2.0 2
F 6.0 1.0 2
We perform 1-NN classification with leave-one-out cross validation on the data in the plot.
a) Compute the distance between each point and its nearest neighbor using L1 -norm as the distance
measure.
b) Compute the distance between each point and its nearest neighbor using L2 -norm as the distance
measure.
c) What can you say about classification if you compare the two distance measures?
Upload a single PDF file with your homework solution to Moodle by 06.02.2024, 23:59. We recommend to typeset your solution
(using LaTex or Word), but handwritten solutions are also accepted (bring them to the next lecture or put it in my box).
Collaboration is fine but submitting the same or extremely similar solutions is not allowed. Homework rules are in the syllabus.
CS251/CS340 Machine Learning Page 2
Problem 3: Consider a dataset with 3 classes C = {A, B, C}, with the following class distribution NA =
16, NB = 8, NC = 8. We use unweighted k-NN classifier, and set k to be equal to the number of data
points, i.e. k = NA + NB + NC =: N .
a) What can we say about the prediction for a new point xnew ?
b) How about if we use the weighted (by distance) version of k-Nearest Neighbors?
Curse of Dimensionality
Problem 4: When the dimensionality of the feature space, denoted as k, becomes large, the perfor-
mance of kNN and other local prediction methods tends to decline. These approaches rely on observations
near the test observation for making predictions. This decline in performance is referred to as the curse
of dimensionality. We will now investigate this curse.
b) Now, consider a situation where we possess a set of observations, each featuring measurements on
two distinct features (k = 2), denoted as X1 and X2 . We make the assumption that the pair (X1 , X2 )
follows a uniform distribution over the intervals [0, 1] for both features. The goal is to predict the re-
sponse of a test observation by exclusively considering observations within a 10% range of both the
X1 and X2 values closest to that test observation. To illustrate, when predicting the response for a
test observation with X1 = 0.2 and X2 = 0.75, we will utilize observations in the range [0.15, 0.25]
for X1 and in the range [0.7, 0.8] for X2 . On average, what proportion of the available observations
will be employed in making this prediction?
c) Now, consider a scenario where we have a set of observations involving k = 100 features. Once again,
these observations are uniformly distributed on each feature, and each feature spans values from 0 to
1. The goal is to predict the response of a test observation by utilizing observations within the 10%
range of each feature’s values that are closest to that specific test observation. What proportion of
the available observations will be employed, on average, to make this prediction?
d) By referencing your responses to sections (a)-(c), argue that the disadvantage of kNN in situations
where the number of features, k, is large is the scarcity of training observations that are considered
”close” to a given test observation.
e) Consider now that our goal is to make a prediction for a test observation by establishing a p-dimensional
hypercube centered around it, which, on average, encompasses 10 % of the training observations. For
k values of 1, 2 and 100, what is the extent of each side of the hypercube? Comment on your answer.
Note: A hypercube is a generalization of a cube to an arbitrary number of dimensions. When k =
1, a hypercube is simply a line segment, when k = 2 it is a square, and when k = 100 it is a 100-
dimensional cube.
Upload a single PDF file with your homework solution to Moodle by 06.02.2024, 23:59. We recommend to typeset your solution
(using LaTex or Word), but handwritten solutions are also accepted (bring them to the next lecture or put it in my box).
Collaboration is fine but submitting the same or extremely similar solutions is not allowed. Homework rules are in the syllabus.
CS251/CS340 Machine Learning Page 3
Problem 6: You are developing a model to classify games at which machine learning will beat the world
champion within five years. The following table contains the data you have collected.
b) Build the optimal decision tree of depth 1 using entropy as the impurity measure.
Programming Task
Problem 7: Load the notebook hw 02 notebook.ipynb from Moodle. Fill in the missing code and
run the notebook. Save the evaluated notebook and add it to your submission.
Note: We suggest that you use Anaconda for installing Python and Jupyter, as well as for managing pack-
ages. We recommend that you use Python 3 or higher.
For more information on Jupyter notebooks, consult the Jupyter documentation. Instructions for convert-
ing the Jupyter notebooks to PDF are provided within the notebook.
Upload a single PDF file with your homework solution to Moodle by 06.02.2024, 23:59. We recommend to typeset your solution
(using LaTex or Word), but handwritten solutions are also accepted (bring them to the next lecture or put it in my box).
Collaboration is fine but submitting the same or extremely similar solutions is not allowed. Homework rules are in the syllabus.