0% found this document useful (0 votes)
74 views31 pages

K-Nearest Neighbor Learning

K-nearest neighbor (KNN) is an instance-based learning algorithm where classification of new data points is based on the majority class of its k nearest neighbors. It stores all training examples and classifies new examples based on similarity measure like Euclidean distance. The value of k affects noise sensitivity and computational complexity. Feature selection and reduction techniques can help address issues like curse of dimensionality for high-dimensional data.

Uploaded by

Edward Kenway
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views31 pages

K-Nearest Neighbor Learning

K-nearest neighbor (KNN) is an instance-based learning algorithm where classification of new data points is based on the majority class of its k nearest neighbors. It stores all training examples and classifies new examples based on similarity measure like Euclidean distance. The value of k affects noise sensitivity and computational complexity. Feature selection and reduction techniques can help address issues like curse of dimensionality for high-dimensional data.

Uploaded by

Edward Kenway
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

K-Nearest Neighbor

Learning
Different Learning Methods
 Eager Learning
 Explicit description of target function on
the whole training set
 Instance-based Learning
 Learning=storing all training instances
 Classification=assigning target function
to a new instance
 Referred to as “Lazy” learning
Instance-based Learning

Its very similar to a


Desktop!!
Instance-based Learning
 K-Nearest Neighbor Algorithm
 Weighted Regression
 Case-based reasoning
Definition of Nearest
Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data


points that have the k smallest distance to x
K-Nearest Neighbor
 Given set of training data set
{xi,yi}, i=1,2,3….N Find an estimate
y for test data x
 No model is created a priori (so lazy
algorithm)
 Training data is just stored
 Decision regarding class is done at
the time of prediction.
K-Nearest Neighbor
 Training phase: Save the training examples
(instances)
 Prediction time: Given test data xt find training
example (xi, yi) which is closest to xt. Predict yi
as output yt.
 Classification: predict the most frequent class
among the k yi’s
 Regression: predict the average of the k yi’s.
Voronoi Diagram

Decision surface formed by the training examples

Properties:
1)All possible points within
a sample's Voronoi cell are
the nearest neighboring
points for that sample
2)For any sample, the
nearest sample is
determined by the closest
Voronoi cell edge
Remarks
+Highly effective inductive inference
method for noisy training data and
complex target functions
+Target function for a whole space
may be described as a combination
of less complex local approximations
+Learning is very simple
- Classification is time consuming
Nearest-Neighbor
Classifiers: Issues
– The value of k, the number of nearest
neighbors to retrieve
– Choice of Distance Metric to compute
distance between records
– Computational complexity
– Size of training set
– Dimension of data
Issues
Value of K
 Choosing the value of k:
 If k is too small, sensitive to noise points
 If k is too large, neighborhood may include
points from other classes

Rule of thumb:
K = sqrt(N)
N: number of training points
•Noise in attributes X

•Noise in class labels


•Classes may be partially
overlapping
When to use Euclidean
distance?
 All attributes are not equally
important
 Give equal weight only if the scale of
the attributes and differences are
similar.
 Scale attributes to equal range and
variance
 Classes are spherical
 What if more noise present if
attributes?
 Classes are not spherical
 Use larger k
 Use weighted distance metric
Small value of k

Small k
captures fine
structure of the
problem space
better
May be
necessary for
small training
set
Large k

Classifieris less
sensitive to
noise in output
class
Better
probability
estimates for
discrete classes
Suitable for
larger training
sets.
Effect of k
Distance-Weighted Nearest
Neighbor Algorithm
 Assign weights to the neighbors
based on their ‘distance’ from the
query point
 We will assign different weights to
different attributes.
 Weight ‘may’ be inverse square of the
distances
Distance Measure: Scale
Effects
 Different features may have different
measurement scales
 E.g., patient weight in kg (range [50,200]) vs.
blood protein values in ng/dL (range [-3,3])
 Consequences
 Patient weight will have a much greater
influence on the distance between samples
 May bias the performance of the classifier
Standardization

 Transform raw feature values into z-scores

 is the value for the ith sample and jth feature


 is the average of all for feature j
 is the standard deviation of all over all
input samples
 Range and scale of z-scores should be
similar (providing distributions of raw feature values are
alike)
How to use weighted
distance function?
Locally weighted Averaging
 Considers entire training dataset
 Every point will be weighted depending on its
distance from the test point.
Distance Metrics
Nearest Neighbor :
Dimensionality
 Problem with Euclidean measure:
 High dimensional data
• curse of dimensionality
 Can produce counter-intuitive results
 Shrinking density – sparsification effect

111111111110 100000000000
vs
011111111111 000000000001

d = 1.4142 d = 1.4142
Curse of Dimensionality
 If we have a instance based, defined in
terms of large number of attributes or
features, it possess a problem in defining
an appropriate similarity metric.
 Some features are important while some
are irrelevant
 Remove irrelevant features.
 Feature reduction is very important
 Too many features of higher dimension
lead to curse of dimensionality.
Nearest Neighbour :
Computational Complexity
 Expensive
 To determine the nearest neighbour of a query point q,
must compute the distance to all N training examples
+ Pre-sort training examples into fast data structures (kd-trees)
+ Compute only an approximate distance (LSH)
+ Remove redundant data
 Storage Requirements
 Must store all training data P
+ Remove redundant data Pre-sorting often increases the storage
requirements
 High Dimensional Data
 “Curse of Dimensionality”
• Required amount of training data increases exponentially with
dimension
• Computational cost also increases dramatically
• Partitioning techniques degrade to linear search in high dimension
Feature Reduction
 Features contain information about target
 However more features does not imply better
discriminative power
 In K-nn algorithm irrelevant features introduces
noise and fools the decision.
 Redundant features decreases the performance
Reduction in Computational
Complexity
 Reduce size of training set
 Feature Selection
 Feature Extraction

 Use geometric data structure for


high dimensional search
Feature Selection
 Given a set of features F={}, find a subset F’CF
so that it optimizes certain parameters.
 What is to optimize?
 Either improve or maintain classifier complexity.
 Simplify classifier complexity
 For N dimensional data 2N subsets are possible.
 It is impossible to search these subsets.
 Feature selection
 Heuristic (forward and backward)
 Optimum (filter and warpper)
 Randomized
Forward Selection
 Start with empty feature set and add
features one by one.
 For each case estimate classification/
regression error.
 Select the feature that gives maximum
improvement.
 Use validation set and not the training set
for feature selection
 Stop when no significant improvement
Backward Feature selection
 Start with the full feature set and
then try removing features.
 Identify the feature which has
smallest impact on the error.
 Drop the feature which does not
contribute to the improvement.
 Stop when no significant
improvement

You might also like