0% found this document useful (0 votes)

24 views9 pages

K NN Annotated Slides

The document discusses the k-nearest neighbors (k-NN) machine learning algorithm. It provides a formal definition of k-NN, explains how it works using an example, and discusses factors like the choice of k, distance metrics, and decision boundaries. It also briefly introduces the concept of a Bayes optimal predictor.

Uploaded by

Dejene Chala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views9 pages

K NN Annotated Slides

Uploaded by

Dejene Chala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

k-Nearest Neighbors 1. The k-NN Algorithm

2. Why/When does K-NN work

September 11th, 2023

3. Curse of dimensionality (i.e., when it can fail)

1/33 2/33

Announcement Plan

1. The k-NN Algorithm

• Please ensure that Anaconda and Jupyter Notebook are installed
on your laptop before our class meeting this Wednesday.
• Additionally, remember to bring your laptop with you to class on 2. Why/When does K-NN work
that day.
3. Curse of dimensionality (i.e., when it can fail)

3/33 4/33
Introduction The k-NN Algorithm - Formal Deﬁnition

• Assumption: Similar Inputs have similar outputs

• Classification rule: For a test input x, assign the most
common label amongst its k most similar training inputs.
• Amongst the simplest of all machine learning algorithms. No • Formal definition of k-NN:
• Test point: x
explicit training or model. • Sx = set of k nearest neighbors of x.
• Can be used both for classification and regression. • Sx ⊆ D s.t. |Sx | = k
• Use x’s k-Nearest Neighbors to vote on what k’s label should be. • ∀(x , y ) ∈ D \ Sx , d(x, x ) ≥ max (x ,y )
• Define
• Is considered a nonparametric model. • Classification:

fˆ(x) = mode ({y : (x , y ) ∈ Sx }) (1)

• Regression:
1
fˆ(x) = y (2)
k
(x ,y )∈Sx

5/33 6/33

The k-NN Algorithm The k-NN Algorithm - Example

Input: classiﬁcation training dataset D = {(x 1 , y1 ), . . . , (x n , yn )}, 3-NN for binary classiﬁcation using Euclidean distance
and parameter k ∈ N+ , and a distance metric d (x, x )(e.g.
x − x 2 , euclidean distance)

k-NN Algorithm
Store all training data
For any test point x:
1. Find its top k nearest neighbors (under metric d)
2. Return the most common label among these k neighbors (If for
regression, return the average value of the k neighbors)

7/33 8/33
The choice of metric The choice of metric

• We believe our metric d captures similarities between examples:

Examples that are close to each other share similar labels
• The k-nearest neighbor classifier fundamentally relies on a
distance metric. This Minkowski distance definition is pretty general and
→ The better that metric reflects label similarity, the better the contains many well-known distances as special cases. Can you
classified will be. identify the following candidates?
• There are many distance metrics or measures we can use to
select k-nearest neighbors. 1. p = 1
• There is no “best" distance measure, and the choice is highly
2. p = 2
context- or problem-dependent.
2. p → ∞
• The most common choice is the Minkowski distance
p 1/p
d (x, z) = |xr − zr | p

r =1

9/33 10/33

The choice of metric The choice of metric

This Minkowski distance deﬁnition is pretty general and

contains many well-known distances as special cases. Can you Remark: The NN classiﬁer is still widely used today, but often
identify the following candidates? with learned metrics. For more information on metric learning
check out the Large Margin NearestNeighbors (LMNN)
1. p = 1: Manhattan distance algorithm to learn a pseudo-metric (nowadays also known
2. p = 2: Euclidean distance asthe triplet loss) or FaceNet for face veriﬁcation.
2. p → ∞: Chebyshev distance

11/33 12/33
The choice of k The choice of k

1. What if we set k very large?

1. What if we set k very large? Top k-neighbors will include examples that are very far
away...

2. What if we set k very small (k = 1)? 2. What if we set k very small (k = 1)?

label has noise (easily overﬁt to the noise)

3. What about the training error when k = 1?
3. What about the training error when k = 1?

training error = 0)

13/33 14/33

1-Nearest Neighbors Decision Boundary 1-Nearest Neighbors Decision Boundary (Cont)

• Assuming a Euclidean distance metric, the decision boundary

between any two training examples a and b is a straight line.
• If a query point is located on the decision boundary, this means
its equidistant from both training example a and b.
• While the decision boundary between a pair of points is a
straight line, the decision boundary of the 1-NN model on a
global level, considering the whole training set, is a set of
connected, convex polyhedra.
• This partitioning of regions on a plane in 2D is also called
“Voronoi diagram".

15/33 16/33
1-Nearest Neighbors Decision Boundary (Cont) Plan

1. The k-NN Algorithm

2. Why/When does K-NN work

3. Curse of dimensionality (i.e., when it can fail)

17/33 18/33

Bayes Optimal Predictor Bayes Optimal Predictor

• Assume our data is collected in an i.i.d fashion, i.e., X , Y ∼ P

(say y ∈ {−1, 1})

• Assume our data is collected in an i.i.d fashion, i.e., X , Y ∼ P

• Assume we know P for now.
(say y ∈ {−1, 1})
• Question: what label you would predict?
• Assume we know P for now.
Answer: we will simply predict the most-likely label,
• Question: what label you would predict?

hopt (x) = arg max P(y |x)

Bayes optimal predictor

19/33 20/33
Bayes Optimal Predictor Bayes Optimal Predictor

• Assume our data is collected in an i.i.d fashion, i.e., X , Y ∼ P • Assume our data is collected in an i.i.d fashion, i.e., X , Y ∼ P
(say y ∈ {−1, 1}) (say y ∈ {−1, 1})
• Bayes optimal predictor: hopt (x) = arg maxy P(y |x) • Bayes optimal predictor: hopt (x) = arg maxy P(y |x)
• Example: • Example:
Question: What’s the Question: What’s the
P(+1|x) = 0.8 P(+1|x) = 0.8 probability of hopt making a
probability of hopt making a
P(−1|x) = 0.2 mistake on x? P(−1|x) = 0.2 mistake on x?
Answer:
yb = hopt (x) = 1 yb = hopt (x) = 1
BayesOpt = 1 − P(yb |x) =0.2

21/33 22/33

Guarantee of k-NN when k = 1 and n → ∞ Guarantee of k-NN when k = 1 and n → ∞

• Assume x ∈ [−1, 1]2 , P(x) has support everywhere,

P(x) > 0, ∀x ∈ [−1, 1]2
• What does it look when n → ∞?
• Assume x ∈ [−1, 1]2 , P(x) has support everywhere,
P(x) > 0, ∀x ∈ [−1, 1]2
• What does it look when n → ∞?

23/33 24/33
Guarantee of k-NN when k = 1 and n → ∞ Guarantee of k-NN when k = 1 and n → ∞

• Assume x ∈ [−1, 1]2 , P(x) has support everywhere,

P(x) > 0, ∀x ∈ [−1, 1]2
• What does it look when n → ∞?

• Cover and Hart “ as n → ∞, 1-NN prediction error is no more

than twice of the error of the Bayes optimal classiﬁer"

Given test x, as n → ∞ its nearest neighbor x NN is super

close, i.e., d (x, x NN ) → 0

25/33 26/33

Plan Curse of Dimensionality Explanation

• In high dimensional spaces, points that are drawn from a

probability distribution, tend to never be close together
• Example: Considering applying a k-NN classiﬁer to data where
1. The k-NN Algorithm the inputs are uniformly distributed in the p-dimensional unit
cube [0, 1]p . Consider a test point x ∈ [0, 1]p and the k = 10
nearest neighbors of such a test point.
2. Why/When does K-NN work

3. Curse of dimensionality (i.e., when it can fail)

Let l = the edge length of the smallest hyper-cube that contains

1
k k p
all k-nearest neighbor of a test point. → l p ≈ or l ≈
n n

27/33 28/33
Curse of Dimensionality Explanation The distance between two sampled points increases as p grows

• Example (Cont):
If n = 1000, how big is l ?
In [0, 1]p , we
uniformly sample
p l
two points x, x ,
2 0.100000
calculate
10 0.630957
d (x, x ) x − x 2
100 0.954993
1000 0.995405 Let’s plot the
• If p 0 almost the entire space is needed to ﬁnd the 10-NN → distribution of such
This breaks down the k−NN assumptions. distance
• Question: Could we increase the number of data points, n, until
the nearest neighbors are truly close to the test point. How
many data points would we need such that l becomes truly
small?
Distance increases as p → ∞
29/33 30/33

Data with low dimensional structure Strengths, Weakeness

• Strength
• k-NN: the simplest ML algorithm (very good baseline, should
always try!)
• It is very easy to understand and often gives reasonable
performance without a lot of adjustment.
→ It is a good baseline method to try before considering more
advanced techniques
• No training involved (“lazy"). New training examples can be
added easily
• Works well when data is low-dimensional (e.g., can compare
against the Bayes optimal)

31/33 32/33
Strengths, Weakeness

• Weaknesses
• Data needs to be pre-processed.
• Suﬀer when data is high-dimensional, due to the fact that in
highdimension space, data tends to spread far away from each
other
• It is expensive and slow: to determine the nearest neighbor of a
new point x, must compute the distance to all n training
examples

33/33

Notes 02
No ratings yet
Notes 02
79 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
05 KNN
No ratings yet
05 KNN
49 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
11 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Lecture 07 Slides
No ratings yet
Lecture 07 Slides
45 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
K-Nearest Neighbors Algorithm - Wikipedia
No ratings yet
K-Nearest Neighbors Algorithm - Wikipedia
10 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
ML DSBA Lab4
No ratings yet
ML DSBA Lab4
5 pages
The Nearest Neighbour Algorithm
No ratings yet
The Nearest Neighbour Algorithm
3 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
KNN
No ratings yet
KNN
53 pages
ML KN
No ratings yet
ML KN
12 pages
Wikipedia K Nearest Neighbor Algorithm
No ratings yet
Wikipedia K Nearest Neighbor Algorithm
4 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
Lect-06
No ratings yet
Lect-06
26 pages
12 - 23ECE216 - Nearest Neighbors
No ratings yet
12 - 23ECE216 - Nearest Neighbors
29 pages
L3 KNN
No ratings yet
L3 KNN
17 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
KNN 2
No ratings yet
KNN 2
53 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
KNN Presentation
No ratings yet
KNN Presentation
19 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
S3 K Nearest Neighbor LKW 15jan2025
No ratings yet
S3 K Nearest Neighbor LKW 15jan2025
16 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
08 Kmethods3 Curse Deminsionality
No ratings yet
08 Kmethods3 Curse Deminsionality
44 pages
Ch2 - Lec2 - K Nearest Neighbour (KNN)
No ratings yet
Ch2 - Lec2 - K Nearest Neighbour (KNN)
18 pages
L5-KNN
No ratings yet
L5-KNN
23 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
Unit 5 ML
No ratings yet
Unit 5 ML
13 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
Lecture 3 - KNN Algorithm
No ratings yet
Lecture 3 - KNN Algorithm
28 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
Patern Recogniton Part 2
No ratings yet
Patern Recogniton Part 2
4 pages