0% found this document useful (0 votes)
34 views16 pages

Nearest-Neighbor Classifier: MTL 782 Iit Delhi

machine learning KNN

Uploaded by

webdev397
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views16 pages

Nearest-Neighbor Classifier: MTL 782 Iit Delhi

machine learning KNN

Uploaded by

webdev397
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Nearest-Neighbor

Classifier
KNN
MTL 782
IIT DELHI
Instance-Based Classifiers
Set of Stored Cases • Store the training records

……... • Use training records to


Atr1 AtrN Class
predict the class label of
A unseen cases
B
B
Unseen Case
C
Atr1 ……... AtrN
A
C
B
Instance Based Classifiers
• Examples:
– Rote-learner
• Memorizes entire training data and performs classification only if attributes
of record match one of the training examples exactly

– Nearest neighbor
• Uses k “closest” points (nearest neighbors) for performing classification
Nearest Neighbor Classifiers
• Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute
Distance Test
Record

Training Choose k of the


Records “nearest” records
Nearest-Neighbor Classifiers
Unknown record l Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

l To classify an unknown record:


– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points


that have the k smallest distance to x
1 nearest-neighbor
Voronoi Diagram
Nearest Neighbor Classification
• Compute distance between two points:
– Euclidean distance
d ( p, q )   ( pi
i
q )
i
2

– Manhatten distance
𝑑 𝑝, 𝑞 = 𝑝𝑖 − 𝑞𝑖
𝑖
– q norm distance
𝑑 𝑝, 𝑞 = ( 𝑝𝑖 − 𝑞𝑖 𝑞 ) 1/𝑞
𝑖
• Determine the class from nearest neighbor list
– take the majority vote of class labels among the k-nearest neighbors
y’ = argmax 𝒙𝑖 ,𝑦𝑖 ϵ 𝐷𝑧 𝐼( 𝑣 = 𝑦𝑖 )
𝑣
where Dz is the set of k closest training examples to z.
– Weigh the vote according to distance
y’ = argmax 𝒙𝑖 ,𝑦𝑖 ϵ 𝐷𝑧 𝑤𝑖 × 𝐼( 𝑣 = 𝑦𝑖 )
𝑣
• weight factor, w = 1/d2
The KNN classification algorithm
Let k be the number of nearest neighbors and D be the set of
training examples.
1. for each test example z = (x’,y’) do
2. Compute d(x’,x), the distance between z and every
example, (x,y) ϵ D
3. Select Dz ⊆ D, the set of k closest training examples to z.
4. y’ = argmax 𝒙𝑖 ,𝑦𝑖 ϵ 𝐷𝑧 𝐼( 𝑣 = 𝑦𝑖 )
𝑣
5. end for
KNN Classification
$2,50,000

$2,00,000

$1,50,000

Loan$ Non-Default
$1,00,000 Default

$50,000

$0
0 10 20 30 40 50 60 70

Age
Nearest Neighbor Classification…
• Choosing the value of k:
– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from other classes

X
Nearest Neighbor Classification…
• Scaling issues
– Attributes may have to be scaled to prevent distance measures from
being dominated by one of the attributes
– Example:
• height of a person may vary from 1.5m to 1.8m
• weight of a person may vary from 60 KG to 100KG
• income of a person may vary from Rs10K to Rs 2 Lakh
Nearest Neighbor Classification…
• Problem with Euclidean measure:
– High dimensional data
• curse of dimensionality: all vectors are almost equidistant to the query vector
– Can produce undesirable results
111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142

 Solution: Normalize the vectors to unit length


Nearest neighbor Classification…
• k-NN classifiers are lazy learners
– It does not build models explicitly
– Unlike eager learners such as decision tree induction and rule-based
systems
– Classifying unknown records are relatively expensive
Thank You

You might also like