0% found this document useful (0 votes)

10 views30 pages

K Nearest Neighbor Classification

Uploaded by

eshakaastha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views30 pages

K Nearest Neighbor Classification

Uploaded by

eshakaastha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

K Nearest Neighbor

Classification
Nearest Neighbor Classifiers
 Basic idea:
 If it walks like a duck, quacks like a duck, then it’s
probably a duck

Compute
Distance Test
Record

Training Choose k of the

Records “nearest” records
Basic Idea

 k-NN classification rule is to assign to a test

sample the majority category label of its k
nearest training samples
 In practice, k is usually chosen to be odd, so
as to avoid ties
 The k = 1 rule is generally called the nearest-
neighbor classification rule
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data

points that have the k smallest distance to x
Voronoi Diagram

Properties:
1) All possible points
within a sample's
Voronoi cell are the
nearest neighboring
points for that
sample
2) For any sample, the
nearest sample is
determined by the
closest Voronoi cell
edge
Other Distance Measures
 City-block distance (Manhattan dist)
 Add absolute value of differences
 Cosine similarity
 Measure angle formed by the two samples (with
the origin)
 Jaccard distance
 Determine percentage of exact matches
between the samples (not including unavailable
data)
Predicting Continuous Values

 Replace by:

 Note: unweighted corresponds to wi=1 for all i

Nearest-Neighbor Classifiers: Issues
– The value of k, the number of
nearest neighbors to retrieve
– Choice of Distance Metric to
compute distance between
records
– Computational complexity
– Size of training set
– Dimension of data
Value of K
 Choosing the value of k:
 If k is too small, sensitive to noise points
 If k is too large, neighborhood may include
points from other classes

Rule of thumb:
K = sqrt(N)
N: number of training points X
Distance Metrics
Distance Measure: Scale Effects

 Different features may have different

measurement scales
 E.g., patient weight in kg (range [50,200]) vs.
blood protein values in ng/dL (range [-3,3])
 Consequences
 Patient weight will have a much greater influence
on the distance between samples
 May bias the performance of the classifier
Standardization

 Transform raw feature values into z-scores


 is the value for the ith sample and jth feature
 is the average of all for feature j
 is the standard deviation of all over all input
samples
 Range and scale of z-scores should be similar
(providing distributions of raw feature values are
alike)
Nearest Neighbor : Dimensionality
 Problem with Euclidean measure:
 High dimensional data
 curse of dimensionality
 Can produce counter-intuitive results
 Shrinking density – sparsification effect

1 111111111 1 000000000
1 0 vs 0 0
0 111111111 0 000000000
1 1 0 1
d = 1.4142 d = 1.4142
Distance for Nominal Attributes
Distance for Heterogeneous Data

Wilson, D. R. and Martinez, T. R., Improved Heterogeneous Distance Functions, Journal of

Artificial Intelligence Research, vol. 6, no. 1, pp. 1-34, 1997
Nearest Neighbour : Computational
Complexity
 Expensive
 To determine the nearest neighbour of a query point q,
must compute the distance to all N training examples
+ Pre-sort training examples into fast data structures (kd-trees)
+ Compute only an approximate distance (LSH)
+ Remove redundant data (condensing)
 Storage Requirements
 Must store all training data P
+ Remove redundant data (condensing)
- Pre-sorting often increases the storage requirements
 High Dimensional Data
 “Curse of Dimensionality”
 Required amount of training data increases exponentially with
dimension
 Computational cost also increases dramatically
 Partitioning techniques degrade to linear search in high dimension
Reduction in Computational
Complexity
 Reduce size of training set
 Condensation, editing

 Use geometric data structure for high

dimensional search
Condensation: Decision Regions
Each cell contains one
sample, and every
location within the cell
is closer to that sample
than to any other
sample.

A Voronoi diagram
divides the space into
such cells.
Every query point will be assigned the classification of the sample
within that cell. The decision boundary separates the class regions
based on the 1-NN decision rule.
Knowledge of this boundary is sufficient to classify new points.
The boundary itself is rarely computed; many algorithms seek to retain
only those points necessary to generate an identical boundary.
Condensing

 Aim is to reduce the number of training samples

 Retain only the samples that are needed to define the decision boundary

 Decision Boundary Consistent – a subset whose nearest neighbour decision

boundary is identical to the boundary of the entire training set
 Minimum Consistent Set – the smallest subset of the training data that
correctly classifies all of the original training data

Original data Condensed data Minimum Consistent

Set
Condensing

 Condensed Nearest Neighbour • Incremental

(CNN) • Order dependent
1. Initialize subset with a
single (or K) training
• Neither minimal nor
example decision boundary
2. Classify all remaining consistent
samples using the subset, • O(n3) for brute-force
and transfer any method
incorrectly classified
samples to the subset
3. Return to 2 until no
transfers occurred or the
subset is full
Condensing

 Condensed Nearest Neighbour

(CNN)
1. Initialize subset with a
single training example
2. Classify all remaining
samples using the subset,
and transfer any
incorrectly classified
samples to the subset
3. Return to 2 until no
transfers occurred or the
subset is full
Condensing
 Condensed Nearest Neighbour
(CNN)
1. Initialize subset with a
single training example
2. Classify all remaining
samples using the subset,
and transfer any
incorrectly classified
samples to the subset
3. Return to 2 until no
transfers occurred or the
subset is full
Condensing

 Condensed Nearest Neighbour

(CNN)

1. Initialize subset with a

single training example
2. Classify all remaining
samples using the subset,
and transfer any
incorrectly classified
samples to the subset
3. Return to 2 until no
transfers occurred or the
subset is full
Condensing

 Condensed Nearest Neighbour

 Given a point set and a nearest neighbor query point

 Find the points enclosed in a rectangle (range) around the query

 Perform linear search for nearest neighbor only in the rectangle

Query
kd-tree: data structure for range
search
 Index data into a tree
 Search on the tree
 Tree construction: At each level we use a different
dimension to split
x=5
x<5 x>=5
C
B y=6
y=3

A E D x=6
kd-tree example

X=3 X=7 X=5

y=6
y=5

Y=6
x=8 x=7
x=3

Y=2 y=2

X=5 X=8
KNN: Alternate Terminologies

 Instance Based Learning

 Lazy Learning
 Case Based Reasoning
 Exemplar Based Learning

05 KNN
No ratings yet
05 KNN
49 pages
Lec 02 - KNN
No ratings yet
Lec 02 - KNN
36 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
2.1 PPT - Homogeneous and Hetero Mixtures
No ratings yet
2.1 PPT - Homogeneous and Hetero Mixtures
60 pages
K-Nearest Neighbors: Nipun Batra July 5, 2020
No ratings yet
K-Nearest Neighbors: Nipun Batra July 5, 2020
66 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
K-Nearest Neighbourhood
100% (1)
K-Nearest Neighbourhood
7 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
ML 5
No ratings yet
ML 5
35 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
Lecture 17 - KNN
No ratings yet
Lecture 17 - KNN
18 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
Nearest Neighbour: Condensing and Editing
No ratings yet
Nearest Neighbour: Condensing and Editing
27 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
06 KNN
No ratings yet
06 KNN
41 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
KNN
No ratings yet
KNN
53 pages
ML KN
No ratings yet
ML KN
12 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
Foundation Class X PCMB
No ratings yet
Foundation Class X PCMB
1,571 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
Week 07
No ratings yet
Week 07
24 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
Non Parametric Classification: Pattern Recognition
No ratings yet
Non Parametric Classification: Pattern Recognition
74 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
RCD Tester Rev.1 Sop
67% (3)
RCD Tester Rev.1 Sop
2 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
Python Programs by Narayana
100% (1)
Python Programs by Narayana
18 pages
Cement Mill Certificate
100% (2)
Cement Mill Certificate
1 page
Inolab Cond 730
No ratings yet
Inolab Cond 730
80 pages
Sartorius PR5510 X4
No ratings yet
Sartorius PR5510 X4
4 pages
T 14.419.003 SH1 AA - CEF - Signed PDF
No ratings yet
T 14.419.003 SH1 AA - CEF - Signed PDF
33 pages
vb8 Datasheet
No ratings yet
vb8 Datasheet
9 pages
LN40D550 - Fast Track Troubleshooting Manual PDF
No ratings yet
LN40D550 - Fast Track Troubleshooting Manual PDF
4 pages
Activity Based Costing
No ratings yet
Activity Based Costing
34 pages
225 WEIGHT INDICATOR Installation and Technical Manual
No ratings yet
225 WEIGHT INDICATOR Installation and Technical Manual
156 pages
B. Stage 1 and 2
No ratings yet
B. Stage 1 and 2
20 pages
Lecture 23: Outline: Yell If You Have Any Questions
No ratings yet
Lecture 23: Outline: Yell If You Have Any Questions
43 pages
Yamaha Fzr400swc 89 Parts Catalogue
100% (42)
Yamaha Fzr400swc 89 Parts Catalogue
6 pages
The Energy Transition Conference 2023 - Delegates Brochure
No ratings yet
The Energy Transition Conference 2023 - Delegates Brochure
25 pages
Kowsi Final Project
No ratings yet
Kowsi Final Project
50 pages
ER Model and Relational Model: Learning Objectives
No ratings yet
ER Model and Relational Model: Learning Objectives
18 pages
Đê DX Duyên H I Final
No ratings yet
Đê DX Duyên H I Final
14 pages
Project Name: Wilmont's Pharmacy Drone Case: Qualitative Risk Analysis
100% (1)
Project Name: Wilmont's Pharmacy Drone Case: Qualitative Risk Analysis
3 pages
Aqa A Level English Literature Coursework Mark Scheme
100% (1)
Aqa A Level English Literature Coursework Mark Scheme
4 pages
Cusat Btech Ece S8 Syllabus
No ratings yet
Cusat Btech Ece S8 Syllabus
4 pages
Izar Net 2 14
No ratings yet
Izar Net 2 14
3 pages
Nostalgia Funny Car Rules V1
No ratings yet
Nostalgia Funny Car Rules V1
5 pages
912 Brochure
No ratings yet
912 Brochure
3 pages
A Conversation With William Rathje-Anthropology Today
No ratings yet
A Conversation With William Rathje-Anthropology Today
7 pages
Sales Prediction Using Regression Analysis: Problem Statement
No ratings yet
Sales Prediction Using Regression Analysis: Problem Statement
3 pages
The Inventory Control Account Balance of Magic Fashions at June 30
No ratings yet
The Inventory Control Account Balance of Magic Fashions at June 30
2 pages
T-HA Series: Panasonic Industrial Company
No ratings yet
T-HA Series: Panasonic Industrial Company
6 pages
Daftar Referensi Jurnal Enzim1
No ratings yet
Daftar Referensi Jurnal Enzim1
7 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

K Nearest Neighbor Classification

Uploaded by

K Nearest Neighbor Classification

Uploaded by

K Nearest Neighbor

Training Choose k of the

 k-NN classification rule is to assign to a test

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data

 Note: unweighted corresponds to wi=1 for all i

 Different features may have different

 Transform raw feature values into z-scores

Wilson, D. R. and Martinez, T. R., Improved Heterogeneous Distance Functions, Journal of

 Use geometric data structure for high

 Aim is to reduce the number of training samples

 Decision Boundary Consistent – a subset whose nearest neighbour decision

Original data Condensed data Minimum Consistent

 Condensed Nearest Neighbour • Incremental

 Condensed Nearest Neighbour

 Condensed Nearest Neighbour

 Condensed Nearest Neighbour

 Condensed Nearest Neighbour

1. Initialize subset with a

 Condensed Nearest Neighbour

 Given a point set and a nearest neighbor query point

 Find the points enclosed in a rectangle (range) around the query

X=3 X=7 X=5

 Instance Based Learning

You might also like