20 KNN Presentation
20 KNN Presentation
Classifier
MTL 782
IIT DELHI
Instance-Based Classifiers
Set of Stored Cases • Store the training records
• Use training records to
Atr1 ……... AtrN Class
predict the class label of
A unseen cases
B
B
Unseen Case
C
A Atr1 ……... AtrN
C
B
Instance Based Classifiers
• Examples:
– Rote-learner
• Memorizes entire training data and performs classification only if attributes
of record match one of the training examples exactly
– Nearest neighbor
• Uses k “closest” points (nearest neighbors) for performing classification
Nearest Neighbor Classifiers
• Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably a duck
Compute
Distance Test
Record
X X X
– Manhatten distance
𝑑 𝑝, 𝑞 = 𝑝 − 𝑞
– q norm distance
/
𝑑 𝑝, 𝑞 = (∑ 𝑝 − 𝑞 )
• Determine the class from nearest neighbor list
– take the majority vote of class labels among the k-nearest neighbors
y’ = argmax ∑ 𝒙, ϵ 𝐼( 𝑣 = 𝑦 )
5. end for
KNN Classification
$2,50,000
$2,00,000
$1,50,000
Loan$ Non-Default
$1,00,000 Default
$50,000
$0
0 10 20 30 40 50 60 70
Age
Nearest Neighbor Classification…
• Choosing the value of k:
– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from other classes
X
Nearest Neighbor Classification…
• Scaling issues
– Attributes may have to be scaled to prevent distance measures from
being dominated by one of the attributes
– Example:
• height of a person may vary from 1.5m to 1.8m
• weight of a person may vary from 60 KG to 100KG
• income of a person may vary from Rs10K to Rs 2 Lakh
Nearest Neighbor Classification…
• Problem with Euclidean measure:
– High dimensional data
• curse of dimensionality: all vectors are almost equidistant to the query vector
– Can produce undesirable results
111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142