Part A 3. KNN Classification
Part A 3. KNN Classification
k-Nearest Neighbors
Given a query item:
Find k closest matches
in a labeled dataset ↓
k-Nearest Neighbors
Given a query item: Return the most
Find k closest matches Frequent label
k-Nearest Neighbors
k = 3 votes for “cat”
k-Nearest Neighbors
2 votes for cat,
1 each for Buffalo, Cat wins…
Deer, Lion
K-Nearest Neighbor Learning
Basics
The learning methods based on decision trees, sets of rules, posterior probabilities,
and hyper-planes etc. are called eager learning methods as they learn some kinds
of models from the training data.
k-nearest neighbor (kNN) is a lazy learning method in the sense that no model is
learned from the training data.
Algorithm kNN(D, d, k)
1. Compute the distance between d and every example in D
2. Choose the k examples in D that are nearest to d, denote the set by P
3 Assign d the class that is the most frequent class in P (or the majority class)
K-Nearest Neighbor Learning
The key component of a kNN algorithm is the distance/similarity function, which is
chosen based on applications and the nature of the data.
The k value that gives the best accuracy on the validation set is usually selected.
For relational data, the Euclidean distance is commonly used. For text documents,
cosine similarity is a popular choice.
Distance
Minkowski Distance
If X = Rd , the Minkowski distance of order p > 0
is defined as:
K-NN metrics
Euclidean Distance: Simplest, fast to compute
X1 X2 Class X1 X2
10 100 Square 10 100
2 4 Root 2 4
Nearest Neighbor Search
p
q
K-NN
K=5
K-NN
P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
KNN Example
KNN
P1 P2 P3 P4
P1 P2 P3 P4