0% found this document useful (0 votes)
31 views3 pages

The Nearest Neighbour Algorithm

The document summarizes the k-nearest neighbors (k-NN) machine learning algorithm. k-NN is a classification algorithm that predicts the label of a test point based on the labels of the k closest training points. For k=1, the algorithm simply predicts the label of the single closest point. The algorithm scales poorly with large training sets and test points require computing distances to all training points. Higher k values result in simpler classifiers with higher training error but potentially better generalization to new data.

Uploaded by

Nicolas Lapautre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views3 pages

The Nearest Neighbour Algorithm

The document summarizes the k-nearest neighbors (k-NN) machine learning algorithm. k-NN is a classification algorithm that predicts the label of a test point based on the labels of the k closest training points. For k=1, the algorithm simply predicts the label of the single closest point. The algorithm scales poorly with large training sets and test points require computing distances to all training points. Higher k values result in simpler classifiers with higher training error but potentially better generalization to new data.

Uploaded by

Nicolas Lapautre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Machine Learning — Statistical Methods for Machine Learning

The Nearest Neighbour algorithm


Instructor: Nicolò Cesa-Bianchi version of March 6, 2023

We now introduce a concrete learning algorithm for classification. This algorithm differs from ERM
because it is not minimizing the training error in a given class of predictors. For now, we restrict our
attention to binary classification tasks with numerical features, namely X = Rd and Y = {−1, 1}.
Given a training set, the classifier generated by this algorithm is based on the following simple rule:
predict every point in the training set with its own label, and predict any other point with the label
of the point in the training set which is closest to it.

More formally, given a training set S ≡ (x1 , y1 ), . . . , (xm , ym ) , the nearest neighbour algorithm
(NN) generates a classifier hNN : Rd → {−1, 1} defined by:

hNN (x) = label yt of the point xt ∈ S closest to x.

If there is more than one point in S with smallest distance to x, then the algorithm predicts with
the majority of the labels of these closest points. If there is an equal number of closest points with
positive and negative labels, then the algorithm predicts a default value in {−1, 1} (for instance,
the most frequent label in the training set).

Note that hNN (xt ) = yt for every training example (xt , yt ). The distance between x = (x1 , . . . , xd )
and xt = (xt,1 , . . . , xt,d ), denoted by ∥x − xt ∥, is computed using the Euclidean distance,
v
u d
uX
∥x − xt ∥ = t (xi − xt,i )2 .
i=1

Figure 1: Voronoi diagram for a training set in R2 .

The classifier generated by NN induces a partition of Rd in Voronoi cells, where each training
instance xt (here called a “center”) is contained in a cell, and the border between two cells is the
set of points in Rd that have equal distance from the two cell centers (see Figure 1).

1
As NN typically stores the entire training set, the algorithm does not scale well with the number
|S| = m of training points. Moreover, given any test point x, computing hNN (x) is costly, as it
requires computing the distance between x and every point of the training set, which in Rd takes
time Θ(dm) (shorter running times are possible when distances are approximated rather than being
computed exactly). Finally, note that NN always generates a classifier hNN such that ℓS (hNN ) = 0.
This is not surprising because, as we already said, NN stores the entire training set.
Complexity of the classifier

+ + + − − + + + − − − −

+ + + − − + + + − − − −

+ + + − − + + + − − − −

Figure 2: Plot of the hk−NN classifier for k = 1, 3, 5 on a 1-dimensional training set. As k increases,
the classifier becomes simpler and the number of mistaken points in the training set increases.

Starting from NN, we can obtain a family of algorithms denoted by k-NN for k = 1, 3, 5, . . ., where
k cannot be taken larger than
 the size of the training set. These algorithms are defined as follows:
given a training set S = (x1 , y1 ), . . . , (xm , ym ) , k-NN generates a classifier hk−NN such that
hk−NN (x) is the label yt ∈ {−1, 1} appearing in the majority of the k points xt ∈ S which are
closest to x.1 Hence, in order to compute hk−NN (x), we perform the following operations:

1. Find the k training points xt1 , . . . , xtk closest to x.1 Let yt1 , . . . , ytk be their labels.

2. If the majority of the labels yt1 , . . . , ytk is +1, then hk−NN (x) = +1; if the majority is −1,
then hk−NN (x) = −1.

Note that, for each k ≥ 1 and for each xt in the training set, xt is always included in the k points
that are closest to xt .

It is important to note that, unlike 1-NN, in general we have that ℓS (hk−NN ) > 0. Moreover, in
Figure 2 we see that, as k grows, the classifiers generated by k-NN become simpler. In particular,
when k is equal to the size of the training set, hk−NN becomes a constant classifier that always
predicts the most common label in the training set.
1
Just like in the case of 1-NN, there could be training points at the same distance from x such that more than k
points are closest to x. In this case we proceed by ranking the training points based on their distance from x and
then taking the k′ closest points where k′ is the smallest integer bigger or equal to k such that the (k′ + 1)-th point in
the ranking has distance from x strictly larger that the k′ -th point. If no such k′ exists, then we take all the points

2
The figure above shows the typical trend of training error (orange curve) and test error (blue curve)
of the k-NN classifier for increasing values of the parameter k on a real dataset (Breast Cancer
Wisconsin) for binary classification with zero-one loss. Note that the minimum of the test error
is attained at a value corresponding to a hk−NN classifier with training error generally bigger than
zero. The learning algorithm suffers from high test error for small values of k (overfitting) and for
large values of k (underfitting).

In addition to binary classification, k-NN can be used to solve multiclass classification problems
(where Y contains more than two symbols) and also regression problems (where Y = R). In the first
case, we operate like in the binary case and predict using the label corresponding to the majority
of the labels of the k closest training points. In the second case, the prediction is the average of
the labels of the k closest training points.

in the training set. If k′ is strictly bigger than k, even, and there is an equal number of closest points with positive
and negative labels, then the algorithm predicts a default value in {−1, 1}.

You might also like