CS4780 Homework 5 SP24-2
CS4780 Homework 5 SP24-2
Note: You can work in a group of up to 3. Please include your teammates’ NetIDs
and names on the front page and form a group on Gradescope.
1. (1) Independent of how K-NN treats classification, say you are asked to draw a single line
to demarcate the boundary between high chance of AFib (red area) vs low chance of AFib
(blue area). Where would you draw the line? Make a copy of figure 1 on your submission
and overlay your line. Keep your line smooth.
While we haven’t discussed the mathematical form of bias, variance and noise terms for classifi-
cation models, we can safely assume that variance will depend on the number of times hD (x) will
disagree with h̄(x), while bias will depend on how often h̄(x) will differ from ȳ(x).
Here you can estimate ȳ(x) (expected class for x) as the class predicted using the line you drew
in part 1, hD (x) is the class predicted by the K-NN classifier trained on dataset D, and h̄(x) can
be estimated by taking majority poll of the predicted class from all hD (x) for D = 1 to 5.
For the sampled datasets D given in figure 2, we will now visualize what happens when hD (x)
is obtained from Ak (D) (the K-NN algorithm with the desired k). We will repeat this process of
training 5 classifiers for k = 1, 10, 30. Each dataset has 30 points - 10 from the red class, 20 from
the blue.
2. (2) Let’s start with k = 1. For your reference, we have given the voronoi cell boundaries for
all these 5 datasets in figure 3. Refer to them in your submission to state and justify whether
each statement below is true or false.
(a) For most datasets, there are blue and red regions both above and below the line you
drew in part 1.
(b) The 1-NN will correctly predict the class for all train points.
Takeaway: This shows whether the classifier is overfitting or underfitting.
1
Figure 1: Plot of heart condition (red) and healthy heart (blue) when plotted against Age (y-axis)
and Minutes of Exercise Weekly (x-axis)
2
Figure 3: 1-NN boundaries for given datasets
3. (3) Show for k = 1 and the datasets given, that variance is higher than bias. Specifically,
show that:
X X X X
I[hD (x) ̸= h̄(x)] > I[h̄(x) ̸= ȳ(x)] where hD = A1 (D)
D (x,y)∈D D (x,y)∈D
where I is the indicator function. (in words: you need to show that hD (x) disagrees with h̄(x)
more often than h̄(x) disagrees with ȳ(x)). You will not need to sum over all points, instead
pick a few points per dataset to show RHS is low, while LHS is high
Hint: focus on the outliers in each dataset, because all hD (x), h̄(x), ȳ(x) will agree for other
points.
4. (2) Now let’s try k = 30. Notice that k = |D|. First, for each dataset in figure 2, draw
the 30-NN classification boundary by shading the area of the plot red and blue based on its
prediction. If you are not submitting a colored submission, shade the red area and leave the
blue area un-shaded.
5. (2) By looking at your answers in part 4, state and justify whether the following statements
are true or false.
(a) For most datasets, there are blue and red regions both above and below the line you
drew in part 1.
(b) The 30-NN will correctly predict the class for all train points.
6. (3) Show for k = 30 and the datasets given, that bias is higher than variance. Specifically,
show that:
X X X X
I[hD (x) ̸= h̄(x)] < I[h̄(x) ̸= ȳ(x)] where hD = A30 (D)
D (x,y)∈D D (x,y)∈D
3
where I is the indicator function. (in words: you need to show that hD (x) disagrees with h̄(x)
less often than h̄(x) disagrees with ȳ(x)). Again, you will not need to manually sum over all
points in this part to show that LHS is low and RHS is high for each dataset.
7. Finally let’s look at k = 5. For each dataset in figure 2, draw the the best approximation of
classification boundaries that a 5-NN would make. You don’t have to be exact - a hand-drawn
shading will do.
8. (3) Simply by eye-balling, what conclusions can you make about the 5-NN classifiers? Is the
variance lower than 1-NN? Is the bias lower than 30-NN? No need to justify. Refer to your
answer in part 7.
4
Problem 2: Regressi-knn [15 points]
Enough eye-balling. We will now understand the relation between k in the K-NN algorithm and
its error terms, mathematically.
We will be using K-NN for a regression task, since its easiest to do the derivations for regression
(and mean-square error loss)
Suppose we have data generated by a model yi = f (xi ) + εi , where εi are i.i.d. random variables
with E[εi ] = 0 and Var[εi ] = σ 2 . Denote D as the training set. The expected prediction error at a
single x is
EPEk (x) = ED,(x,y) [(y − hk (x))2 ],
where y = f (x) + ε. (Here, ε is also i.i.d. and from the same distribution as εi .) For simplicity, we
assume that the values of xi and x in the training sample are fixed in advance (nonrandom), while
the value of yi and y are random variables as defined. In the specific KNN regression model,
k k
1X 1X
hk (x) = y(l) = (f (x(l) ) + ε(l) ),
k k
l=1 l=1
2. (Bonus) Let h̄k (x) be the expected classifier. What can we say about E[h̄k (x) − y(x)]?
EPEk (x) = ED,(x,y) [(hk (x) − h̄k (x))2 ] + ED,(x,y) [(h̄k (x) − ȳ(x))2 ] + ED,(x,y) [(ȳ(x) − y(x))2 ]
4. (8) Can you simplify the terms further by representing it in terms x(1) , ..., x(l) , x, σ and f ?
5
Problem 3: Overfitting/Underfitting [6 points]
Which of the following strategies can be used when overfitting / underfitting happens?
overfitting underfitting
increase the regularization
decrease the regularization
use less features
use more features
use a more complex model
use a less complex model
6
and then using the Cauchy-Schwarz inequality:
Takeaway: By adding regularization, we essentially bound the variance of the model which reduces
overfitting.