Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang

Week 3.
k-Nearest Neighbours
(kNN)
Dr. Shuo Wang
Overview
§ Intuitive understanding
§ The kNN algorithm
§ Pros/cons
kNN Basics
§ Full name: k-Nearest Neighbours (kNN, or k-NN).
§ It is nonparametric.
No assumption about the functional form of the model.
§ It is instance-based.
The prediction is based on a comparison of a new point with
data points in the training set, rather than a model.
§ It is a lazy algorithm.
No explicit training step. Defers all the computation until
prediction.
§ Can be used for both classification and regression problems.
Intuitive Understanding
Instead of approximating a model function 𝑓(𝑥) globally, kNN approximates
the label of a new point based on its nearest neighbours in training data.
New point label?
Q1: How to choose k? e.g. let k = 3 to avoid issues.

Q2: how to we measure the distance between examples?
Distance metrics (or similarity metrics)
(") (") (") ($) ($) ($)
Given two points 𝒙(") = 𝑥" , 𝑥$ , … , 𝑥% , 𝒙($) = 𝑥" , 𝑥$ , … , 𝑥% in a d-dimensional space:
§ Minkowski distance (or Lp norm)
! %
" $ (") ($) (
𝐷 𝒙 ,𝒙 = ' 𝑥& − 𝑥&
&'"
§ When 𝑝=1, it becomes Manhattan distance

%
" $ (") ($)
𝐷 𝒙 ,𝒙 = ' 𝑥& − 𝑥&
&'"
§ When 𝑝=2, it becomes Euclidean distance
%
" $ (") ($) $
𝐷 𝒙 ,𝒙 = ' 𝑥& − 𝑥&
&'"
Distance metrics in kNN (common choice)
§ Euclidean distance for real values (also called L2 distance).

%
! " (!) (") "
𝐷 𝒙 ,𝒙 = % 𝑥# − 𝑥#
#$!
§ Hamming distance for discrete/categorical values, e.g. 𝑥 ∈

{𝑟𝑎𝑖𝑛𝑦, 𝑠𝑢𝑛𝑛𝑦}.
0, 𝑖𝑓 𝑥 (!) = 𝑥 (")
𝐷 𝑥 (!) , 𝑥 (") =2
1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
kNN algorithm
Input: neighbour size 𝑘 > 0, training set {(𝒙(() , 𝑦 (() ): 𝑛 =

1,2 … 𝑁}, a new unlabelled data 𝒙())
for n = 1, 2… N // each example in the training set

Calculate 𝐷(𝒙 ) , 𝒙(() ) // distance between 𝑥 ! 𝑎𝑛𝑑 𝑥 (#)
Select 𝑘 training examples closest to 𝒙 )
Return 𝑦 ()) = the plurality vote of labels from the k examples.
(classification) or
𝑦 ()) = average/median of the y values of the k examples.
(regression)
Check your understanding
Consider a binary problem (lemon or orange) with 2 dimensions (height and width)
with following training examples:
§ 𝒙(") =(6,6), 𝑦 (") =orange &
(") (#) #
" #
𝐷 𝑥 ,𝑥 = % 𝑥$ − 𝑥$
§ 𝒙($) =(8,10), 𝑦 ($) =lemon
$%"
§ 𝒙())
=(7,6), 𝑦 ()) =orange
New example
§ 𝒙(*) =(8,7), 𝑦 (*) =? Using k=1 nearest neighbour, and Euclidean distance
(*) (") $ (*) (") (*) (")
§ 𝐷 𝒙 * ,𝒙 " = ∑%&'" 𝑥& − 𝑥& = (𝑥" − 𝑥" )$+(𝑥$ − 𝑥$ )$=
(8 − 6)$+(7 − 6)$= 5
§ Can you calculate 𝐷 𝒙 * ,𝒙 $ and 𝐷 𝒙 * ,𝒙 ) , and see which point 𝒙(*) is
closest to? 3 and 2
Check your understanding
Consider a regression problem (lemon’ weight) with 2 dimensions (height and width)
with following training examples:
§ 𝒙(") =(6,6), 𝑦 (") =10
§ 𝒙($) =(8,10), 𝑦 ($) =20
§ 𝒙()) =(7,6), 𝑦 ()) =15
New example
§ 𝒙(*) =(8,7), 𝑦 (*) =?
§ 𝐷 𝒙 * ,𝒙 " = 5, 𝐷 𝒙 * , 𝒙($) = 3, 𝐷 𝒙 * ,𝒙 ) = 2
§ If k = 2, what is the label of 𝒙 * ?
§ 𝑦 (*) = (10+15)/2 = 12.5

How to choose k?
§ Recall: Overfitting and Underfitting
§ k changes model complexity: smaller k -> higher complexity
underfit overfit
1/k
Image from https://neptune.ai/blog/knn-algorithm-explanation-opportunities-limitations

How to choose k?
§ Small k -> small neighborhood -> high complexity -> may overfit
§ Large k -> large neighborhood -> low complexity -> may underfit
§ Practicians often choose k between 3 – 15, or k < 𝑁 (N is the
number of training examples).
§ Refer to “model selection/evaluation” to be learnt next week.
The issue in numeric attribute ranges
§ Attributes 𝒙 = 𝑥! , 𝑥" , … , 𝑥% may have different ranges.

§ The attribute with a larger range is treated as more important by
the kNN algorithm (some learning bias is embedded!)
§ It can affect the performance if you don’t want to treat attributes
differently.
§ For example, if 𝑥! is in [0, 2] (e.g. height), and 𝑥" is in [0, 100]
(e.g. age), 𝑥" will affect the distance more.
§ Solutions?
Normalisation and Standardization
§ Method 1 Normalisation: Linearly scale the range of each attribute to

be, e.g. in [0,1].
(()
(()
𝑥) − min 𝑥)
𝑥)_(+, =
max 𝑥) − min 𝑥)
§ Method 2 Standardization: Linearly scale each dimension to have 0
mean and variance 1 (by computing mean 𝜇 and variance 𝜎 " ).
(,)
(() -+ ./+ ! 1 (() ! 1
𝑥)_(+, = , where 𝜇) = ∑($! 𝑥) , 𝜎) = ∑($!(𝑥) ( − 𝜇) )"
0+ 1 1
(/)
(/) 𝑥- − min 𝑥-
Example 𝑥-_/01 =
max 𝑥- − min 𝑥-
§ Consider a dataset with 2 dimensions (ie. Attributes), where 𝑥%

represents the age of a patient and 𝑥& represents the body weight. The
output 𝑦 ∈ {𝑛𝑜𝑟𝑚𝑎𝑙, 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙}.
Patient 𝑥" 𝑥$ 𝑦
§ Normalize each attribute of 𝒙(%) 𝒙(") 14 70 n
to [0,1]. 𝒙($) 12 90 a
(!)
§
(%)
𝑥%_#() =
*! +%&
=
%-+%&
= 0.667 𝒙()) 15 66 n
%,+%& %,+%&
(!)
(%) *$ +.. 10+..
§ 𝑥&_#() = = = 0.167
/0+.. /0+..
§ 𝒙(%) : (14, 70) -> (0.667, 0.167), 𝒙(&) and 𝒙(2) ?
§ 𝒙(&) : (12, 90) -> (0, 1) 𝑥 (L) : (16, 64)?
§ 𝒙(2) : (15, 66) -> (1, 0)
kNN algorithm with normalization/standardization
Input: neighbour size 𝑘 > 0, training set {(𝒙(() , 𝑦 (() ): 𝑛 =

1,2 … 𝑁}, a new unlabelled data 𝒙())
())
Normalise/standardize 𝒙()) → 𝒙(+,
for n = 1, 2… N // each example in the training set
(()
Normalise/standardize 𝒙(() → 𝒙(+,
()) (()
Calculate 𝐷(𝒙(+, , 𝒙(+, ) // normalized/standardized distance
Select 𝑘 training examples closest to 𝑥 )
Return 𝑦 ()) = the plurality vote of labels from the k examples.
(classification) or
𝑦 ()) = average/median of the y values of the k examples.
(regression)
Pros/cons
§ kNN is a nonparametric, instance-based, lazy algorithm.
§ Need to specify the distance function and pre-define k value.
§ Easy to implement and interpret.
§ It can approximate complex functions, so it has very good accuracy.
§ It has to store all training data (large memory space), and calculate distance
of each training example to the new example.
There are smarter ways to store and use training data, e.g. KD-trees, remove redundant data.
§ It can be sensitive to noise, especially when k is small.
§ Its performance is degraded greatly as data dimension increases. (curse of
dimensionality)
As the volume grows larger, the “neighbors” become further apart and and not so close
anymore. The prediction thus becomes less accurate.
Fun project using kNN: where on earth is this photo from?
§ Problem: where was this picture taken (country or GPS)?
§ http://graphics.cs.cmu.edu/projects/im2gps/
§ Get images from Flickr with gps info.

§ Represent each image with meaningful features
§ Apply kNN.
Q/A
Teams Channel: www.birmingham.ac.uk/
Office Hour: [faculty or individual email]@bham.ac.uk

Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang

Uploaded by

Copyright:

Available Formats

Week 3.

New point label?

Q1: How to choose k? e.g. let k = 3 to avoid issues.

§ When 𝑝=1, it becomes Manhattan distance

§ Euclidean distance for real values (also called L2 distance).

§ Hamming distance for discrete/categorical values, e.g. 𝑥 ∈

Input: neighbour size 𝑘 > 0, training set {(𝒙(() , 𝑦 (() ): 𝑛 =

for n = 1, 2… N // each example in the training set

§ 𝑦 (*) = (10+15)/2 = 12.5

Image from https://neptune.ai/blog/knn-algorithm-explanation-opportunities-limitations

§ Attributes 𝒙 = 𝑥! , 𝑥" , … , 𝑥% may have different ranges.

§ Method 1 Normalisation: Linearly scale the range of each attribute to

§ Consider a dataset with 2 dimensions (ie. Attributes), where 𝑥%

Input: neighbour size 𝑘 > 0, training set {(𝒙(() , 𝑦 (() ): 𝑛 =

§ Get images from Flickr with gps info.

You might also like

Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang

Uploaded by

Document Informationclick to expand document informationThis document provides an overview of the k-nearest neighbors (kNN) machine learning algorithm. It discusses the intuition behind kNN, how it works, how to choose parameters like k and distance metrics, and the pros and cons of kNN.

Document Informationclick to expand document information

Copyright:

Available Formats

Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang

Uploaded by

Copyright:

Available Formats

Week 3.

New point label?

Q1: How to choose k? e.g. let k = 3 to avoid issues.

§ When 𝑝=1, it becomes Manhattan distance

§ Euclidean distance for real values (also called L2 distance).

§ Hamming distance for discrete/categorical values, e.g. 𝑥 ∈

Input: neighbour size 𝑘 > 0, training set {(𝒙(() , 𝑦 (() ): 𝑛 =

for n = 1, 2… N // each example in the training set

§ 𝑦 (*) = (10+15)/2 = 12.5

Image from https://neptune.ai/blog/knn-algorithm-explanation-opportunities-limitations

§ Attributes 𝒙 = 𝑥! , 𝑥" , … , 𝑥% may have different ranges.

§ Method 1 Normalisation: Linearly scale the range of each attribute to

§ Consider a dataset with 2 dimensions (ie. Attributes), where 𝑥%

Input: neighbour size 𝑘 > 0, training set {(𝒙(() , 𝑦 (() ): 𝑛 =

§ Get images from Flickr with gps info.

You might also like