K- Nearest Neighbor
Carla P. Gomes
CS4700
1-Nearest Neighbor
One of the simplest of all machine learning classifiers
Simple idea: label a new point the same as the closest known point
Label it red.
Carla P. Gomes
CS4700
Distance Metrics
Different metrics can change the decision surface
Dist(a,b) =(a1 – b1)2 + (a2 – b2)2 Dist(a,b) =(a1 – b1)2 + (3a2 – 3b2)2
Standard Euclidean distance metric:
– Two-dimensional: Dist(a,b) = sqrt((a1 – b1)2 + (a2 – b2)2)
Adapted from “Instance-Based Learning”
– Multivariate: Dist(a,b) = sqrt(∑ (ai – bi)2) lecture slides by Andrew Moore, CMU.
Carla P. Gomes
CS4700
1-NN’s Aspects as an
Instance-Based Learner:
A distance metric
– Euclidean
– When different units are used for each dimension
normalize each dimension by standard deviation
– For discrete data, can use hamming distance
D(x1,x2) =number of features on which x1 and x2 differ
– Others (e.g., normal, cosine)
How many nearby neighbors to look at?
– One
How to fit with the local points?
– Just predict the same output as the nearest neighbor.
Adapted from “Instance-Based Learning”
lecture slides by Andrew
Carla P.Moore,
Gomes CMU.
CS4700
k – Nearest Neighbor
Generalizes 1-NN to smooth away noise in the labels
A new point is now assigned the most frequent label of its k nearest
neighbors
Label it red, when k = 3
Label it blue, when k = 7
Carla P. Gomes
CS4700
KNN Example
Food Chat Fast Price Bar BigTip
(3) (2) (2) (3) (2)
1 great yes yes normal no yes
2 great no yes normal no yes
3 mediocre yes no high no no
4 great yes yes normal yes yes
Similarity metric: Number of matching attributes (k=2)
New examples:
– Example 1 (great, no, no, normal, no) Yes
most similar: number 2 (1 mismatch, 4 match) yes
Second most similar example: number 1 (2 mismatch, 3 match) yes
– Example 2 (mediocre, yes, no, normal, no) Yes/No
Most similar: number 3 (1 mismatch, 4 match) no
Second most similar example: number 1 (2 mismatch, 3 match) yes
Selecting the Number of Neighbors
Increase k:
– Makes KNN less sensitive to noise
Decrease k:
– Allows capturing finer structure of space
Pick k not too large, but not too small (depends on data)
Carla P. Gomes
CS4700
Curse-of-Dimensionality
Prediction accuracy can quickly degrade when number of attributes
grows.
– Irrelevant attributes easily “swamp” information from relevant
attributes
– When many irrelevant attributes, similarity/distance measure
becomes less reliable
Remedy
– Try to remove irrelevant attributes in pre-processing step
– Weight attributes differently
– Increase k (but not too much)
Carla P. Gomes
CS4700
Advantages and Disadvantages of KNN
Need distance/similarity measure and attributes that “match” target
function.
For large training sets,
Must make a pass through the entire dataset for each classification.
This can be prohibitive for large data sets.
Prediction accuracy can quickly degrade when number of attributes
grows.
Simple to implement algorithm;
Requires little tuning;
Often performs quite weel!
(Try it first on a new learning problem). Carla P. Gomes
CS4700