K-Nearest Neighbor Learning
K-Nearest Neighbor Learning
Learning
Different Learning Methods
Eager Learning
Explicit description of target function on
the whole training set
Instance-based Learning
Learning=storing all training instances
Classification=assigning target function
to a new instance
Referred to as “Lazy” learning
Instance-based Learning
X X X
Properties:
1)All possible points within
a sample's Voronoi cell are
the nearest neighboring
points for that sample
2)For any sample, the
nearest sample is
determined by the closest
Voronoi cell edge
Remarks
+Highly effective inductive inference
method for noisy training data and
complex target functions
+Target function for a whole space
may be described as a combination
of less complex local approximations
+Learning is very simple
- Classification is time consuming
Nearest-Neighbor
Classifiers: Issues
– The value of k, the number of nearest
neighbors to retrieve
– Choice of Distance Metric to compute
distance between records
– Computational complexity
– Size of training set
– Dimension of data
Issues
Value of K
Choosing the value of k:
If k is too small, sensitive to noise points
If k is too large, neighborhood may include
points from other classes
Rule of thumb:
K = sqrt(N)
N: number of training points
•Noise in attributes X
Small k
captures fine
structure of the
problem space
better
May be
necessary for
small training
set
Large k
Classifieris less
sensitive to
noise in output
class
Better
probability
estimates for
discrete classes
Suitable for
larger training
sets.
Effect of k
Distance-Weighted Nearest
Neighbor Algorithm
Assign weights to the neighbors
based on their ‘distance’ from the
query point
We will assign different weights to
different attributes.
Weight ‘may’ be inverse square of the
distances
Distance Measure: Scale
Effects
Different features may have different
measurement scales
E.g., patient weight in kg (range [50,200]) vs.
blood protein values in ng/dL (range [-3,3])
Consequences
Patient weight will have a much greater
influence on the distance between samples
May bias the performance of the classifier
Standardization
111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142
Curse of Dimensionality
If we have a instance based, defined in
terms of large number of attributes or
features, it possess a problem in defining
an appropriate similarity metric.
Some features are important while some
are irrelevant
Remove irrelevant features.
Feature reduction is very important
Too many features of higher dimension
lead to curse of dimensionality.
Nearest Neighbour :
Computational Complexity
Expensive
To determine the nearest neighbour of a query point q,
must compute the distance to all N training examples
+ Pre-sort training examples into fast data structures (kd-trees)
+ Compute only an approximate distance (LSH)
+ Remove redundant data
Storage Requirements
Must store all training data P
+ Remove redundant data Pre-sorting often increases the storage
requirements
High Dimensional Data
“Curse of Dimensionality”
• Required amount of training data increases exponentially with
dimension
• Computational cost also increases dramatically
• Partitioning techniques degrade to linear search in high dimension
Feature Reduction
Features contain information about target
However more features does not imply better
discriminative power
In K-nn algorithm irrelevant features introduces
noise and fools the decision.
Redundant features decreases the performance
Reduction in Computational
Complexity
Reduce size of training set
Feature Selection
Feature Extraction