0% found this document useful (0 votes)
20 views18 pages

Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 18

Week 3.

k-Nearest Neighbours
(kNN)
Dr. Shuo Wang
Overview
§ Intuitive understanding
§ The kNN algorithm
§ Pros/cons
kNN Basics
§ Full name: k-Nearest Neighbours (kNN, or k-NN).
§ It is nonparametric.
No assumption about the functional form of the model.
§ It is instance-based.
The prediction is based on a comparison of a new point with
data points in the training set, rather than a model.
§ It is a lazy algorithm.
No explicit training step. Defers all the computation until
prediction.
§ Can be used for both classification and regression problems.
Intuitive Understanding
Instead of approximating a model function 𝑓(𝑥) globally, kNN approximates
the label of a new point based on its nearest neighbours in training data.

New point label?

Q1: How to choose k? e.g. let k = 3 to avoid issues.


Q2: how to we measure the distance between examples?
Distance metrics (or similarity metrics)
(") (") (") ($) ($) ($)
Given two points 𝒙(") = 𝑥" , 𝑥$ , … , 𝑥% , 𝒙($) = 𝑥" , 𝑥$ , … , 𝑥% in a d-dimensional space:
§ Minkowski distance (or Lp norm)
! %
" $ (") ($) (
𝐷 𝒙 ,𝒙 = ' 𝑥& − 𝑥&
&'"

§ When 𝑝=1, it becomes Manhattan distance


%
" $ (") ($)
𝐷 𝒙 ,𝒙 = ' 𝑥& − 𝑥&
&'"
§ When 𝑝=2, it becomes Euclidean distance
%
" $ (") ($) $
𝐷 𝒙 ,𝒙 = ' 𝑥& − 𝑥&
&'"
Distance metrics in kNN (common choice)

§ Euclidean distance for real values (also called L2 distance).


%
! " (!) (") "
𝐷 𝒙 ,𝒙 = % 𝑥# − 𝑥#
#$!

§ Hamming distance for discrete/categorical values, e.g. 𝑥 ∈


{𝑟𝑎𝑖𝑛𝑦, 𝑠𝑢𝑛𝑛𝑦}.
0, 𝑖𝑓 𝑥 (!) = 𝑥 (")
𝐷 𝑥 (!) , 𝑥 (") =2
1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
kNN algorithm

Input: neighbour size 𝑘 > 0, training set {(𝒙(() , 𝑦 (() ): 𝑛 =


1,2 … 𝑁}, a new unlabelled data 𝒙())

for n = 1, 2… N // each example in the training set


Calculate 𝐷(𝒙 ) , 𝒙(() ) // distance between 𝑥 ! 𝑎𝑛𝑑 𝑥 (#)
Select 𝑘 training examples closest to 𝒙 )
Return 𝑦 ()) = the plurality vote of labels from the k examples.
(classification) or
𝑦 ()) = average/median of the y values of the k examples.
(regression)
Check your understanding
Consider a binary problem (lemon or orange) with 2 dimensions (height and width)
with following training examples:
§ 𝒙(") =(6,6), 𝑦 (") =orange &
(") (#) #
" #
𝐷 𝑥 ,𝑥 = % 𝑥$ − 𝑥$
§ 𝒙($) =(8,10), 𝑦 ($) =lemon
$%"
§ 𝒙())
=(7,6), 𝑦 ()) =orange
New example
§ 𝒙(*) =(8,7), 𝑦 (*) =? Using k=1 nearest neighbour, and Euclidean distance
(*) (") $ (*) (") (*) (")
§ 𝐷 𝒙 * ,𝒙 " = ∑%&'" 𝑥& − 𝑥& = (𝑥" − 𝑥" )$+(𝑥$ − 𝑥$ )$=
(8 − 6)$+(7 − 6)$= 5
§ Can you calculate 𝐷 𝒙 * ,𝒙 $ and 𝐷 𝒙 * ,𝒙 ) , and see which point 𝒙(*) is
closest to? 3 and 2
Check your understanding
Consider a regression problem (lemon’ weight) with 2 dimensions (height and width)
with following training examples:
§ 𝒙(") =(6,6), 𝑦 (") =10
§ 𝒙($) =(8,10), 𝑦 ($) =20
§ 𝒙()) =(7,6), 𝑦 ()) =15
New example
§ 𝒙(*) =(8,7), 𝑦 (*) =?
§ 𝐷 𝒙 * ,𝒙 " = 5, 𝐷 𝒙 * , 𝒙($) = 3, 𝐷 𝒙 * ,𝒙 ) = 2
§ If k = 2, what is the label of 𝒙 * ?

§ 𝑦 (*) = (10+15)/2 = 12.5


How to choose k?
§ Recall: Overfitting and Underfitting
§ k changes model complexity: smaller k -> higher complexity

underfit overfit

1/k

Image from https://neptune.ai/blog/knn-algorithm-explanation-opportunities-limitations


How to choose k?

§ Small k -> small neighborhood -> high complexity -> may overfit
§ Large k -> large neighborhood -> low complexity -> may underfit
§ Practicians often choose k between 3 – 15, or k < 𝑁 (N is the
number of training examples).
§ Refer to “model selection/evaluation” to be learnt next week.
The issue in numeric attribute ranges

§ Attributes 𝒙 = 𝑥! , 𝑥" , … , 𝑥% may have different ranges.


§ The attribute with a larger range is treated as more important by
the kNN algorithm (some learning bias is embedded!)
§ It can affect the performance if you don’t want to treat attributes
differently.
§ For example, if 𝑥! is in [0, 2] (e.g. height), and 𝑥" is in [0, 100]
(e.g. age), 𝑥" will affect the distance more.
§ Solutions?
Normalisation and Standardization

§ Method 1 Normalisation: Linearly scale the range of each attribute to


be, e.g. in [0,1].
(()
(()
𝑥) − min 𝑥)
𝑥)_(+, =
max 𝑥) − min 𝑥)
§ Method 2 Standardization: Linearly scale each dimension to have 0
mean and variance 1 (by computing mean 𝜇 and variance 𝜎 " ).
(,)
(() -+ ./+ ! 1 (() ! 1
𝑥)_(+, = , where 𝜇) = ∑($! 𝑥) , 𝜎) = ∑($!(𝑥) ( − 𝜇) )"
0+ 1 1
(/)
(/) 𝑥- − min 𝑥-
Example 𝑥-_/01 =
max 𝑥- − min 𝑥-

§ Consider a dataset with 2 dimensions (ie. Attributes), where 𝑥%


represents the age of a patient and 𝑥& represents the body weight. The
output 𝑦 ∈ {𝑛𝑜𝑟𝑚𝑎𝑙, 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙}.
Patient 𝑥" 𝑥$ 𝑦
§ Normalize each attribute of 𝒙(%) 𝒙(") 14 70 n
to [0,1]. 𝒙($) 12 90 a
(!)
§
(%)
𝑥%_#() =
*! +%&
=
%-+%&
= 0.667 𝒙()) 15 66 n
%,+%& %,+%&
(!)
(%) *$ +.. 10+..
§ 𝑥&_#() = = = 0.167
/0+.. /0+..
§ 𝒙(%) : (14, 70) -> (0.667, 0.167), 𝒙(&) and 𝒙(2) ?
§ 𝒙(&) : (12, 90) -> (0, 1) 𝑥 (L) : (16, 64)?
§ 𝒙(2) : (15, 66) -> (1, 0)
kNN algorithm with normalization/standardization

Input: neighbour size 𝑘 > 0, training set {(𝒙(() , 𝑦 (() ): 𝑛 =


1,2 … 𝑁}, a new unlabelled data 𝒙())
())
Normalise/standardize 𝒙()) → 𝒙(+,
for n = 1, 2… N // each example in the training set
(()
Normalise/standardize 𝒙(() → 𝒙(+,
()) (()
Calculate 𝐷(𝒙(+, , 𝒙(+, ) // normalized/standardized distance
Select 𝑘 training examples closest to 𝑥 )
Return 𝑦 ()) = the plurality vote of labels from the k examples.
(classification) or
𝑦 ()) = average/median of the y values of the k examples.
(regression)
Pros/cons
§ kNN is a nonparametric, instance-based, lazy algorithm.
§ Need to specify the distance function and pre-define k value.
§ Easy to implement and interpret.
§ It can approximate complex functions, so it has very good accuracy.
§ It has to store all training data (large memory space), and calculate distance
of each training example to the new example.
There are smarter ways to store and use training data, e.g. KD-trees, remove redundant data.
§ It can be sensitive to noise, especially when k is small.
§ Its performance is degraded greatly as data dimension increases. (curse of
dimensionality)
As the volume grows larger, the “neighbors” become further apart and and not so close
anymore. The prediction thus becomes less accurate.
Fun project using kNN: where on earth is this photo from?
§ Problem: where was this picture taken (country or GPS)?
§ http://graphics.cs.cmu.edu/projects/im2gps/

§ Get images from Flickr with gps info.


§ Represent each image with meaningful features
§ Apply kNN.
Q/A
Teams Channel: www.birmingham.ac.uk/
Office Hour: [faculty or individual email]@bham.ac.uk

You might also like