0% found this document useful (0 votes)

20 views26 pages

CSE445 NSU Week - 5

Nearest Neighbor Classifiers utilize labeled records and a proximity metric to classify unknown records based on the majority class of their nearest neighbors. The algorithm requires careful selection of the number of neighbors (k) and may involve distance metrics like Euclidean or Manhattan distance. Feature scaling is essential to ensure that no single attribute dominates the distance calculations, and various query types such as exact match, range, and nearest-neighbor queries are discussed.

Uploaded by

Rabiul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views26 pages

CSE445 NSU Week - 5

Uploaded by

Rabiul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Nearest Neighbor Classifiers

● Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably
a duck

Compute
Distance Test
Record

Training Choose k of the

Records “nearest” records
Nearest-Neighbor Classifiers
● Requires the following:
– A set of labeled records (supervised learning)
– Proximity metric to compute distance/similarity
between a pair of records
– e.g., Euclidean distance, Manhattan distance
– The value of k, the number of nearest neighbors
to retrieve: how many neighbors you want to
consider
– A method for using class labels of K nearest
neighbors to determine the class label of
unknown record (e.g., by taking majority vote)
How to Determine the class label of a Test Sample?

: Unweighted voting
Query Types
• Exact match query: Asks for the object(s) whose key matches query
key exactly.

• Range query: Asks for the objects whose key lies in a specified query
range (interval).

• Nearest-neighbor query: Asks for the objects whose key is “close” to

query key.

4
Exact Match Query
• Suppose that we store employee records in a database:
• Asks for the object(s) whose key matches query key exactly.

• ID Name Age Salary #Children

• Example:
• key=ID: retrieve the record with ID=12345

5
Range Query
• Example:
• key=Age: retrieve all records satisfying
20 < Age < 50
• key= #Children: retrieve all records satisfying
1 < #Children < 4

ID Name Age Salary #Children

6
Nearest-Neighbor(s) (NN) Query
• Example:
• key=Salary: retrieve the employee whose salary is closest to $50,000 (i.e., 1-
NN).
• key=Age: retrieve the 5 employees whose age is closest to 40 (i.e., k-NN, k=5).

ID Name Age Salary #Children

7
Nearest Neighbor(s) Query
• What is the closest restaurant to my hotel?

8
Nearest Neighbor(s) Query
(cont’d)
• Find the 4 closest restaurants to my hotel

9
Nearest Neighbor Query in High
Dimensions
• Very important and practical problem!
• Image retrieval

find N closest
matches (i.e., N
nearest neighbors)
(f1,f2, .., fk)

11
Nearest Neighbor Query in High
Dimensions
• Face recognition

find closest match

(i.e., nearest neighbor)

12
Interpreting Queries Geometrically
• Multi-dimensional keys can be thought as “points” in
high dimensional spaces.

Queries about records  Queries about points

13
Example 1- Range Search in 2D

age = 10,000 x year + 100 x month + day

14
Example 2 – Range Search in 3D

15
Example 3 – Nearest Neighbors
Search

Query
Point

Measure point-to-point distance.

16
Classification…
● Data preprocessing is often required
– Attributes MUST have to be scaled to prevent distance measures from being dominated by
one of the attributes
◆Example:
– height of a person may vary from 1.5m to 1.8m
– weight of a person may vary from 90lb to 300lb
– income of a person may vary from $10K to $1M

– Time series are often standardized to have 0 means a standard deviation of 1

If K =3: Predicted class: Y (majority/unweighted)

02/02/2025 18
K-Nearest Neighbor (kNN)
● kNN is a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset
● During the training phase, the KNN algorithm stores the entire training dataset as
a reference
● Manhattan distance: Manhattan Distance is the sum of absolute differences
between points across all the dimensions

● Minkowski Distance is the generalized form of Euclidean and Manhattan Distance

● When the order is 1, both Minkowski and Manhattan Distance are the same
● When the order is 2, we can see that Minkowski and Euclidean distances are the
same
K-Nearest Neighbor (kNN)
● When making predictions, it calculates the distance between the input data point
and all the training examples, using a chosen distance metric such as Euclidean
distance.

● Next, the algorithm identifies the K nearest neighbors to the input data point
based on their distances.
● In the case of classification, the algorithm assigns the most common class label
among the K neighbors as the predicted label for the input data point.
● For regression, it calculates the average or weighted average of the target values
of the K neighbors to predict the value for the input data point
Need for Standardizing Attributes: Feature Scaling

• Feature Scaling is a technique to standardize the independent features present in

the data in a fixed range
• If Feature Scaling is NOT done: LOAN feature will dominate all other features
while predicting the class of the given data point
• : Min-Max scaling - This technique re-scales a feature between 0 and 1
• : Standardization
• Standardization is the process of scaling data so that they have a mean value of
0 and a standard deviation of 1
• Good for normally distributed features
• MinMaxScaler and StandardScaler: sklearn

21
Need for Standardizing Attributes

: Min-Max scaling

If K =3: Predicted class: N (majority/unweighted)

02/02/2025 22
Classification…
● Choosing the value of k:
– Min k = 1; Max k = # of samples
– If k is too small, sensitive/vulnerable to noise points/outlier
– If k is too large, neighborhood may include points from other classes: ZeroR classifier
hyperparameters
● sklearn.neighbors.KNeighborsClassifier
● class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, *,
weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski',
metric_params=None, n_jobs=None)
● ‘uniform’ : uniform weights. All points in each neighborhood are weighted
equally.
● ‘distance’ : weight points by the inverse of their distance. in this case, closer
neighbors of a query point will have a greater influence than neighbors which are
further away
● algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

● Power parameter for the Minkowski metric. When p = 1, this is equivalent to

using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary
p, minkowski_distance (l_p) is used
Classification…
● How to handle missing values in training and test
sets?
– Proximity computations normally require the presence of
all attributes
– Some approaches use the subset of attributes present in
two instances
◆ This may not produce good results since it effectively uses
different proximity measures for each pair of instances
◆ Thus, proximities are not comparable
K-NN Classifiers…
Handling Irrelevant and Redundant Attributes
– Irrelevant attributes add noise to the proximity measure
– Redundant attributes bias the proximity measure towards certain attributes
– BMI and weight

– KNN: slow in testing

Improving KNN Efficiency
● Avoid having to compute distance to all objects in the
training set
– Multi-dimensional access methods (k-d trees)
– Fast approximate similarity search
– Locality Sensitive Hashing (LSH)
● Condensing
– Determine a smaller set of objects that give the same
performance
● Editing
– Remove objects to improve efficiency

12 ML KNN
No ratings yet
12 ML KNN
28 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
Introduction To Classification - KNN
No ratings yet
Introduction To Classification - KNN
29 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
3 2KNN
No ratings yet
3 2KNN
27 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
Lec 4
No ratings yet
Lec 4
28 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
06 KNN
No ratings yet
06 KNN
41 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
05 KNN
No ratings yet
05 KNN
49 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
Ch2 - Lec2 - K Nearest Neighbour (KNN)
No ratings yet
Ch2 - Lec2 - K Nearest Neighbour (KNN)
18 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Classification (K-Nearest Neighbor)
No ratings yet
Classification (K-Nearest Neighbor)
22 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
Design and Analysis of Cryptographic Algrithm
100% (1)
Design and Analysis of Cryptographic Algrithm
273 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
ML 06 KNN
No ratings yet
ML 06 KNN
28 pages
Week 07
No ratings yet
Week 07
24 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
ML KN
No ratings yet
ML KN
12 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Lecture 4 KNN
No ratings yet
Lecture 4 KNN
17 pages
Tally Question
No ratings yet
Tally Question
59 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
2marks WSN
No ratings yet
2marks WSN
8 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Quectel HCM111Z AT Commands Manual V1.0.0 Preliminary 20230916
No ratings yet
Quectel HCM111Z AT Commands Manual V1.0.0 Preliminary 20230916
50 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
Number Theory Mcqs Mcqs Covering All Topics
0% (1)
Number Theory Mcqs Mcqs Covering All Topics
18 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
Electronic Age-Wps Office
No ratings yet
Electronic Age-Wps Office
94 pages
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
No ratings yet
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
15 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Final PPT Optical Computers
75% (8)
Final PPT Optical Computers
24 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
Course Outline 2024
100% (1)
Course Outline 2024
18 pages
EDA Manual
No ratings yet
EDA Manual
20 pages
Memory Access Method
No ratings yet
Memory Access Method
14 pages
Automatic Free
No ratings yet
Automatic Free
4 pages
5380CRP Operation Manual Main Body
No ratings yet
5380CRP Operation Manual Main Body
214 pages
B - Tech - 2nd - Year - AIML, AIDS, Computer - SC - and - Design - 2022 - 23 - Revised
No ratings yet
B - Tech - 2nd - Year - AIML, AIDS, Computer - SC - and - Design - 2022 - 23 - Revised
14 pages
24c16 - Memoria
No ratings yet
24c16 - Memoria
8 pages
CSC 101 - Ams 103 (Introduction To Computer Science)
No ratings yet
CSC 101 - Ams 103 (Introduction To Computer Science)
9 pages
Fundamentals of Computers
No ratings yet
Fundamentals of Computers
24 pages
Kaviti Ashokkumar 826336521
No ratings yet
Kaviti Ashokkumar 826336521
4 pages
Learning Activity Sheet Empowerment Technologies-Senior High School
No ratings yet
Learning Activity Sheet Empowerment Technologies-Senior High School
6 pages
Computing Presentation MYM Edit
No ratings yet
Computing Presentation MYM Edit
17 pages
Zkfi: Privacy-Preserving and Regulation Compliant Transactions Using Zero Knowledge Proofs
No ratings yet
Zkfi: Privacy-Preserving and Regulation Compliant Transactions Using Zero Knowledge Proofs
10 pages
Shenzhen Weizhongyun Technology Co.,Ltd
No ratings yet
Shenzhen Weizhongyun Technology Co.,Ltd
1 page
IT 100 Chapter 7 and 8
No ratings yet
IT 100 Chapter 7 and 8
9 pages
Bitcoin - Security & Bitcoin Script Combined V2
No ratings yet
Bitcoin - Security & Bitcoin Script Combined V2
15 pages
Csi ZG520 Ec-3r First Sem 2023-2024
No ratings yet
Csi ZG520 Ec-3r First Sem 2023-2024
4 pages
CV Jayant Kumar
No ratings yet
CV Jayant Kumar
1 page
5G in Military Usage
No ratings yet
5G in Military Usage
1 page
Ahmad Javaid - Software Engineer
No ratings yet
Ahmad Javaid - Software Engineer
1 page
Question Paper Part-2 Virtual ITT Batch - 010 (Rewari Branch of NIRC of ICAI) Project Work Based Questions 275 Marks
No ratings yet
Question Paper Part-2 Virtual ITT Batch - 010 (Rewari Branch of NIRC of ICAI) Project Work Based Questions 275 Marks
4 pages
Panchamis - Shree Shantadurga Vijayate
No ratings yet
Panchamis - Shree Shantadurga Vijayate
1 page
Important Information
No ratings yet
Important Information
1 page

CSE445 NSU Week - 5

Uploaded by

CSE445 NSU Week - 5

Uploaded by

Nearest Neighbor Classifiers

Training Choose k of the

• Nearest-neighbor query: Asks for the objects whose key is “close” to

• ID Name Age Salary #Children

ID Name Age Salary #Children

ID Name Age Salary #Children

find closest match

Queries about records  Queries about points

age = 10,000 x year + 100 x month + day

Measure point-to-point distance.

– Time series are often standardized to have 0 means a standard deviation of 1

● Minkowski Distance is the generalized form of Euclidean and Manhattan Distance

• Feature Scaling is a technique to standardize the independent features present in

If K =3: Predicted class: N (majority/unweighted)

● Power parameter for the Minkowski metric. When p = 1, this is equivalent to

– KNN: slow in testing

You might also like