K Nearest Neighbor Classification
K Nearest Neighbor Classification
Classification
Nearest Neighbor Classifiers
Basic idea:
If it walks like a duck, quacks like a duck, then it’s
probably a duck
Compute
Distance Test
Record
X X X
Properties:
1) All possible points
within a sample's
Voronoi cell are the
nearest neighboring
points for that
sample
2) For any sample, the
nearest sample is
determined by the
closest Voronoi cell
edge
Other Distance Measures
City-block distance (Manhattan dist)
Add absolute value of differences
Cosine similarity
Measure angle formed by the two samples (with
the origin)
Jaccard distance
Determine percentage of exact matches
between the samples (not including unavailable
data)
Predicting Continuous Values
Replace by:
Rule of thumb:
K = sqrt(N)
N: number of training points X
Distance Metrics
Distance Measure: Scale Effects
is the value for the ith sample and jth feature
is the average of all for feature j
is the standard deviation of all over all input
samples
Range and scale of z-scores should be similar
(providing distributions of raw feature values are
alike)
Nearest Neighbor : Dimensionality
Problem with Euclidean measure:
High dimensional data
curse of dimensionality
Can produce counter-intuitive results
Shrinking density – sparsification effect
1 111111111 1 000000000
1 0 vs 0 0
0 111111111 0 000000000
1 1 0 1
d = 1.4142 d = 1.4142
Distance for Nominal Attributes
Distance for Heterogeneous Data
A Voronoi diagram
divides the space into
such cells.
Every query point will be assigned the classification of the sample
within that cell. The decision boundary separates the class regions
based on the 1-NN decision rule.
Knowledge of this boundary is sufficient to classify new points.
The boundary itself is rarely computed; many algorithms seek to retain
only those points necessary to generate an identical boundary.
Condensing
Query
kd-tree: data structure for range
search
Index data into a tree
Search on the tree
Tree construction: At each level we use a different
dimension to split
x=5
x<5 x>=5
C
B y=6
y=3
A E D x=6
kd-tree example
y=6
y=5
Y=6
x=8 x=7
x=3
Y=2 y=2
X=5 X=8
KNN: Alternate Terminologies