0% found this document useful (0 votes)
73 views8 pages

003 01 KNN - Intro W3L1

The document summarizes the K-Nearest Neighbors (KNN) algorithm. KNN is a supervised machine learning algorithm that classifies new data points based on the majority class of its k nearest neighbors. It does not make assumptions about the data's distribution and only has one hyperparameter, k, which needs to be tuned. KNN can be used for both classification and regression problems. It is an instance-based, lazy learning algorithm that delays computations until a query is made.

Uploaded by

DOESSKKU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views8 pages

003 01 KNN - Intro W3L1

The document summarizes the K-Nearest Neighbors (KNN) algorithm. KNN is a supervised machine learning algorithm that classifies new data points based on the majority class of its k nearest neighbors. It does not make assumptions about the data's distribution and only has one hyperparameter, k, which needs to be tuned. KNN can be used for both classification and regression problems. It is an instance-based, lazy learning algorithm that delays computations until a query is made.

Uploaded by

DOESSKKU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Nearest Neighbors Methods

The KNN Algorithm

Agha Ali Raza

CS535/EE514 – Machine Learning


Sources
• Machine Learning for Intelligent Systems, Kilian Weinberger, Cornell University, Lecture 2,
https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote02_kNN.html
• Nearest Neighbor Methods, Victor Lavrenko, Assistant Professor at the University of
Edinburgh, https://www.youtube.com/playlist?list=PLBv09BD7ez_48heon5Az-TsyoXVYOJtDZ
• Wiki K-Nearest Neighbors: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
• Effects of Distance Measure Choice on KNN Classifier Performance - A Review, V. B.
Surya Prasath et al., https://arxiv.org/pdf/1708.04321.pdf
• A Comparative Analysis of Similarity Measures to find Coherent Documents, Mausumi
Goswami et al. http://www.ijamtes.org/gallery/101.%20nov%20ijmte%20-%20as.pdf
• A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous
Data, Ali Seyed Shirkhorshidi et al.,
https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0144059&type=printable
The K Nearest Neighbors Algorithm
Basic idea: Similar Inputs have similar outputs
Classification rule:
For a test input 𝑥, assign the most common label
amongst its 𝑘 most similar (nearest) training inputs
Formal Definition
Assuming 𝑥 to be our test point, lets denote the set of the 𝑘 nearest neighbors
of 𝑥 as 𝑆𝑥
Formally, 𝑆𝑥 is defined as
𝑺𝒙 ⊆ 𝑫 𝒔. 𝒕. 𝑺𝒙 = 𝒌
𝑎𝑛𝑑
∀ 𝒙′ , 𝒚′ ∈ 𝑫 ∖ 𝑺𝒙 ,
𝒅𝒊𝒔𝒕(𝒙, 𝒙′) ≥ ′′𝐦𝐚𝐱
′′
𝒅𝒊𝒔𝒕(𝒙, 𝒙′′) ,
𝒙 ,𝒚 ∈𝑺𝒙
That is, every point that is in 𝐷 but not in 𝑆𝑥 is at least as far away from 𝑥 as the
furthest point in 𝑆𝑥 .
We define the classifier ℎ() as a function returning the most common label in
𝑆𝑥 :
𝒉(𝒙) = 𝒎𝒐𝒅𝒆({𝒚′′: (𝒙′′, 𝒚′′) ∈ 𝑺𝒙}),
where mode(⋅) means to select the label of the highest occurrence.
So, what do we do if there is a draw?
• Keep 𝑘 odd or return the result of 𝑘-NN with a smaller 𝑘
KNN Decision Boundary
Voronoi Tessellation and KNN decision boundaries

K=1
KNN Decision Boundary
Voronoi Tessellation and KNN decision boundaries

K=1
The KNN Algorithm is:
A supervised, non-parametric algorithm
• It does not make any assumptions about the underlying distribution nor tries to
estimate it
• There are no parameters to train like in Logistic/Linear Regression or Bayes
o Parameters allow models to make predictions
• There is a hyperparameter 𝑘, that needs to be tuned
o Hyperparameters help with the learning/prediction process
Used for classification and regression
• Classification: Choose the most frequent class label amongst k-nearest neighbors
• Regression: Take an average over the output values of the k-nearest neighbors and
1
assign to the test point – may be weighted e.g. w = 𝑑 (𝑑: distance from 𝑥)
An Instance-based learning algorithm
• Instead of performing explicit generalization, form hypotheses by comparing new
problem instances with training instances
• (+) Can easily adapt to unseen data
• (-) Complexity of prediction is a function of 𝑛 (size of training data)
A lazy learning algorithm
• Delay computations on training data until a query is made, as opposed to eager
learning
• (+) Good for continuously updated training data like recommender systems
• (-) Slower to evaluate and need to store the whole training data
For more details please visit

http://aghaaliraza.com

Thank you!
8

You might also like