K-Nearest Neighbor Learning

K-nearest neighbor (KNN) is an instance-based learning algorithm where classification of new data points is based on the majority class of its k nearest neighbors. It stores all training examples and classifies new examples based on similarity measure like Euclidean distance. The value of k affects noise sensitivity and computational complexity. Feature selection and reduction techniques can help address issues like curse of dimensionality for high-dimensional data.

Uploaded by

Edward Kenway

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views31 pages

K-Nearest Neighbor Learning

Uploaded by

Edward Kenway

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 31

K-Nearest Neighbor

Learning
Different Learning Methods
 Eager Learning
 Explicit description of target function on
the whole training set
 Instance-based Learning
 Learning=storing all training instances
 Classification=assigning target function
to a new instance
 Referred to as “Lazy” learning
Instance-based Learning

Its very similar to a

Desktop!!
Instance-based Learning
 K-Nearest Neighbor Algorithm
 Weighted Regression
 Case-based reasoning
Definition of Nearest
Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data

points that have the k smallest distance to x
K-Nearest Neighbor
 Given set of training data set
{xi,yi}, i=1,2,3….N Find an estimate
y for test data x
 No model is created a priori (so lazy
algorithm)
 Training data is just stored
 Decision regarding class is done at
the time of prediction.
K-Nearest Neighbor
 Training phase: Save the training examples
(instances)
 Prediction time: Given test data xt find training
example (xi, yi) which is closest to xt. Predict yi
as output yt.
 Classification: predict the most frequent class
among the k yi’s
 Regression: predict the average of the k yi’s.
Voronoi Diagram

Decision surface formed by the training examples

Properties:
1)All possible points within
a sample's Voronoi cell are
the nearest neighboring
points for that sample
2)For any sample, the
nearest sample is
determined by the closest
Voronoi cell edge
Remarks
+Highly effective inductive inference
method for noisy training data and
complex target functions
+Target function for a whole space
may be described as a combination
of less complex local approximations
+Learning is very simple
- Classification is time consuming
Nearest-Neighbor
Classifiers: Issues
– The value of k, the number of nearest
neighbors to retrieve
– Choice of Distance Metric to compute
distance between records
– Computational complexity
– Size of training set
– Dimension of data
Issues
Value of K
 Choosing the value of k:
 If k is too small, sensitive to noise points
 If k is too large, neighborhood may include
points from other classes

Rule of thumb:
K = sqrt(N)
N: number of training points
•Noise in attributes X

•Noise in class labels

•Classes may be partially
overlapping
When to use Euclidean
distance?
 All attributes are not equally
important
 Give equal weight only if the scale of
the attributes and differences are
similar.
 Scale attributes to equal range and
variance
 Classes are spherical
 What if more noise present if
attributes?
 Classes are not spherical
 Use larger k
 Use weighted distance metric
Small value of k

Small k
captures fine
structure of the
problem space
better
May be
necessary for
small training
set
Large k

Classifieris less
sensitive to
noise in output
class
Better
probability
estimates for
discrete classes
Suitable for
larger training
sets.
Effect of k
Distance-Weighted Nearest
Neighbor Algorithm
 Assign weights to the neighbors
based on their ‘distance’ from the
query point
 We will assign different weights to
different attributes.
 Weight ‘may’ be inverse square of the
distances
Distance Measure: Scale
Effects
 Different features may have different
measurement scales
 E.g., patient weight in kg (range [50,200]) vs.
blood protein values in ng/dL (range [-3,3])
 Consequences
 Patient weight will have a much greater
influence on the distance between samples
 May bias the performance of the classifier
Standardization

 Transform raw feature values into z-scores

 is the value for the ith sample and jth feature

 is the average of all for feature j
 is the standard deviation of all over all
input samples
 Range and scale of z-scores should be
similar (providing distributions of raw feature values are
alike)
How to use weighted
distance function?
Locally weighted Averaging
 Considers entire training dataset
 Every point will be weighted depending on its
distance from the test point.
Distance Metrics
Nearest Neighbor :
Dimensionality
 Problem with Euclidean measure:
 High dimensional data
• curse of dimensionality
 Can produce counter-intuitive results
 Shrinking density – sparsification effect

111111111110 100000000000
vs
011111111111 000000000001

d = 1.4142 d = 1.4142
Curse of Dimensionality
 If we have a instance based, defined in
terms of large number of attributes or
features, it possess a problem in defining
an appropriate similarity metric.
 Some features are important while some
are irrelevant
 Remove irrelevant features.
 Feature reduction is very important
 Too many features of higher dimension
lead to curse of dimensionality.
Nearest Neighbour :
Computational Complexity
 Expensive
 To determine the nearest neighbour of a query point q,
must compute the distance to all N training examples
+ Pre-sort training examples into fast data structures (kd-trees)
+ Compute only an approximate distance (LSH)
+ Remove redundant data
 Storage Requirements
 Must store all training data P
+ Remove redundant data Pre-sorting often increases the storage
requirements
 High Dimensional Data
 “Curse of Dimensionality”
• Required amount of training data increases exponentially with
dimension
• Computational cost also increases dramatically
• Partitioning techniques degrade to linear search in high dimension
Feature Reduction
 Features contain information about target
 However more features does not imply better
discriminative power
 In K-nn algorithm irrelevant features introduces
noise and fools the decision.
 Redundant features decreases the performance
Reduction in Computational
Complexity
 Reduce size of training set
 Feature Selection
 Feature Extraction

 Use geometric data structure for

high dimensional search
Feature Selection
 Given a set of features F={}, find a subset F’CF
so that it optimizes certain parameters.
 What is to optimize?
 Either improve or maintain classifier complexity.
 Simplify classifier complexity
 For N dimensional data 2N subsets are possible.
 It is impossible to search these subsets.
 Feature selection
 Heuristic (forward and backward)
 Optimum (filter and warpper)
 Randomized
Forward Selection
 Start with empty feature set and add
features one by one.
 For each case estimate classification/
regression error.
 Select the feature that gives maximum
improvement.
 Use validation set and not the training set
for feature selection
 Stop when no significant improvement
Backward Feature selection
 Start with the full feature set and
then try removing features.
 Identify the feature which has
smallest impact on the error.
 Drop the feature which does not
contribute to the improvement.
 Stop when no significant
improvement

Statistics With Computer Application
100% (6)
Statistics With Computer Application
17 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
05 KNN
No ratings yet
05 KNN
49 pages
ML - 3 - Sovan - KNN - 1
No ratings yet
ML - 3 - Sovan - KNN - 1
95 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Lecture 17 - KNN
No ratings yet
Lecture 17 - KNN
18 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
3a KNN PDF
No ratings yet
3a KNN PDF
26 pages
06 KNN
No ratings yet
06 KNN
41 pages
Week 07
No ratings yet
Week 07
24 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
No ratings yet
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
15 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
K-Nearest Neighbourhood
100% (1)
K-Nearest Neighbourhood
7 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
KNN
No ratings yet
KNN
53 pages
KNN ALGORITHM IN MACHINELEARNING
No ratings yet
KNN ALGORITHM IN MACHINELEARNING
10 pages
Chapter 4. K Nearest Neighbors
No ratings yet
Chapter 4. K Nearest Neighbors
55 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
No ratings yet
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
23 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
03d Algind KNN Eng
No ratings yet
03d Algind KNN Eng
23 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
No ratings yet
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
93 pages
Lec 02 - KNN
No ratings yet
Lec 02 - KNN
36 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Informal Essay
100% (2)
Informal Essay
3 pages
On-The-Job Training (OJT) Orientation
No ratings yet
On-The-Job Training (OJT) Orientation
65 pages
Battlecruiser 3000AD (Galactic Cartography Maps - 1996)
No ratings yet
Battlecruiser 3000AD (Galactic Cartography Maps - 1996)
5 pages
ICH Q8 and Q9 - A Review
No ratings yet
ICH Q8 and Q9 - A Review
22 pages
Comparison of "Infiltration" and "Block" Technique in Control of Extraction Pain of Primary Mandibular Canine in 7 - 9 Years Old Children
No ratings yet
Comparison of "Infiltration" and "Block" Technique in Control of Extraction Pain of Primary Mandibular Canine in 7 - 9 Years Old Children
3 pages
Notes On Pavlov
No ratings yet
Notes On Pavlov
5 pages
Chcac318b Work Effectively With Older People TMG
No ratings yet
Chcac318b Work Effectively With Older People TMG
32 pages
WEEKLY LEARNING PLAN Practical Research II K.Ponsaran
No ratings yet
WEEKLY LEARNING PLAN Practical Research II K.Ponsaran
19 pages
Course: Transformational Leadership in Development Group Assignment (Value 30%)
No ratings yet
Course: Transformational Leadership in Development Group Assignment (Value 30%)
6 pages
Forces 3 QP
No ratings yet
Forces 3 QP
7 pages
Code Erorr Sharp
No ratings yet
Code Erorr Sharp
26 pages
SWOT Analysis
No ratings yet
SWOT Analysis
22 pages
The Merciad, April 14, 1978
No ratings yet
The Merciad, April 14, 1978
8 pages
Table 2.1: Summary of Literature Review
No ratings yet
Table 2.1: Summary of Literature Review
3 pages
Velcro How To Make A Velcro Activity
No ratings yet
Velcro How To Make A Velcro Activity
3 pages
Anshu Yadav Rakesh Kataria Rituparna Neog Subhasmita Panigrahi Nift, Mfm-2 SEM
No ratings yet
Anshu Yadav Rakesh Kataria Rituparna Neog Subhasmita Panigrahi Nift, Mfm-2 SEM
34 pages
Humanoid
No ratings yet
Humanoid
21 pages
Reznick - One Introduction To Mathematical Research
No ratings yet
Reznick - One Introduction To Mathematical Research
4 pages
v1.1 ABC LTB LG Day 4
No ratings yet
v1.1 ABC LTB LG Day 4
32 pages
Desmos Lesson
No ratings yet
Desmos Lesson
2 pages
Disabling XP - Cmdshell Is It Really A "Best Practice"?: by Jeff Moden
No ratings yet
Disabling XP - Cmdshell Is It Really A "Best Practice"?: by Jeff Moden
67 pages
Journal of Graphic Novels & Comics: Publication Details, Including Instructions For Authors and Subscription Information
100% (1)
Journal of Graphic Novels & Comics: Publication Details, Including Instructions For Authors and Subscription Information
20 pages
001 2014 4 e PDF
No ratings yet
001 2014 4 e PDF
168 pages
Section I: Personal Information
No ratings yet
Section I: Personal Information
2 pages
The Importance of Waste Management Knowledge To Encourage Householdwastesorting Behaviour in Indonesia 2252 5211 1000309
No ratings yet
The Importance of Waste Management Knowledge To Encourage Householdwastesorting Behaviour in Indonesia 2252 5211 1000309
5 pages
LH 1600 Alarm Reference Guide
No ratings yet
LH 1600 Alarm Reference Guide
212 pages
RectorDecryptor.2.3.14.0 07.05.2011 22.02.03 Log
No ratings yet
RectorDecryptor.2.3.14.0 07.05.2011 22.02.03 Log
2 pages
Rap Vs Stone Column
No ratings yet
Rap Vs Stone Column
2 pages
International Journal of Chemtech Research: Anil Kumar Reddy Chammireddy, Karthikeyan J
No ratings yet
International Journal of Chemtech Research: Anil Kumar Reddy Chammireddy, Karthikeyan J
5 pages