0% found this document useful (0 votes)

64 views9 pages

What Is KNN

KNN is a supervised machine learning algorithm that can be used for both classification and regression problems. It uses feature similarity to predict the values of new data points by finding the k nearest neighboring points in the training set and assigning the new point a value based on its neighbors. It is a non-parametric, lazy algorithm that does not make assumptions about the underlying data distribution and learns during the testing phase by storing training data points. The choice of k value impacts model performance, and standardizing variables is important before calculating distances between points.

Uploaded by

vaishalikarvir26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views9 pages

What Is KNN

Uploaded by

vaishalikarvir26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

What is k-nn?

1.KNN also called K- nearest neighbour is a supervised machine learning algorithm that can
be used for both classification and regression problems.

2.The KNN algorithm uses ‘feature similarity’ to predict the values of any new data points.
This means that the new point is assigned a value based on how closely it resembles the
points in the training set.
Example :
Suppose there are two categories, category A and category B and we have a new data
point(in orange color), so now the question is in which of these categories will the data
point lie? To classify the new data point we can use the KNN algorithm which observes the
behavior of the nearest points and classify itself accordingly.

3. K nearest neighbor is non-parametric i,e. It does not make any assumptions for
underlying data assumptions.

(Parametric: whenever u make an assumption about the nature of the function of your data
then that algo is parametric. A parametric algo has fixed numbers of parameters and
doesn’t depend on the rows present in your data. i.e. No matter how much data you throw
at a parametric model, it won’t change its mind about how many parameters it needs.
Eg: linear regression is a good example of parametric mL because while doing linear
regression you take an assumption that the function is a line. Also no. of coefficient is also
fixed i.e. slope and intercept.
On the contrary if you don’t take any assumption then it is nonparametric. Non parametric
algos also have parameters , it’s just that they change or rather grow with respect to the
number of rows in our data eg: decision tree, knn)

4. K nearest neighbor is also termed as a lazy algorithm as it does not learn during the
training phase rather it stores the data points but learns during the testing phase.

5. When independent variables in training data are measured in different units, it is

important to standardize variables before calculating distance. For example, if one variable
is based on height in cms, and the other is based on the weight in kgs then height will
influence more on the distance calculation. In order to make them comparable we need to
standardize them

6. It is a distance-based algorithm.
We can use euclidean or manhattan distance.

Where xi is the observed value and yi is the actual value

The Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:

Consider point A(x1,y1) and B(x2,y2) . Here the observed value is x1 and y1 and actual value
is x2 ,y2
Euclidean distance
Steps to perform k-nn
• Choose the K value
• Calculate the distance between all the training points and new data points.
• Sort the computed distance in ascending order between training points and new data
points.
• Choose the first K distances from the sorted list
• Take the mode/mean of the classes associated with the distances.

How does KNN work for ‘Classification’ and ‘Regression’ problem statements?

Classification: When the problem statement is of ‘classification’ type, KNN tends to use the
concept of “Majority Voting”. Within the given range of K values, the class with the most
votes is chosen.

Suppose we have the height, weight, and T-shirt size of some customers and we need to
predict the T-shirt size of a new customer given only the height and weight information we
have.
Data including height, weight, and T-shirt size information is shown below –

height (in cms) Weight (in kgs) T-Shirt Size Distance

158 58 M 4.2
158 59 M 3.6
158 63 M 3.6
160 59 M 2.2 4
160 60 M 1.4 1
163 60 M 2.2 3
163 61 M 2.0 2
160 64 L 3.2 5
163 64 L 3.6
165 61 L 4.0
165 62 L 4.1
165 65 L 5.7
168 62 L 7.1
168 63 L 7.3
168 66 L 8.6
170 63 L 9.2
170 64 L 9.5
170 68 L 11.4
161 61 ?

A new customer named 'Monica' has a height of 161cm and weighs 61kg. so what will be
her T-shirt size?

1. Let’s consider k=5

2. Now we will Calculate the distance between all the training points and new data points.
3. After calculating the distance, we will take the 5 nearest neighbors.
So in the above case, 1.4,2.0,2.2,2.2,3.2 are the 5 nearest neighbors and their respective
t-shirt sizes are M, M, M, M, L. Now we will take mode since it’s a categorial value and
therefore size of Monica’s T-shirt is M
In the above case the values were not standardized. So let’s see what happens after
standardization

After standardization, 5th closest value got changed as height was dominating earlier before
standardization. Hence, it is important to standardize predictors before running K-nearest
neighbor algorithm.
We have plotted the above information. In the graph below, 'Medium T-shirt size' is in blue
color and 'Large T-shirt size' is in orange color. New customer information is exhibited in the
yellow circle. Four blue highlighted data points and one orange highlighted data point are
close to yellow circle. so the prediction for the new case is blue highlighted data point which
is Medium T-shirt size.
Regression:
KNN employs a mean/average method for predicting the value of new data. Based on the
value of K, it would consider all of the nearest neighbors
As in the previous example of classification where we took the mode of the 5 nearest
neighbors because the target variable was categorical, in regression, we should take the
mean or median of the 5 nearest neighbors because our target variable is continuous.

How to choose the K value?

One of the trickiest questions to be asked is how we should choose the K value.

The impact of selecting a smaller or larger K value on the model

 Larger K value: The case of underfitting occurs when the value of k is increased. In
this case, the model would be unable to correctly learn on the training data.
 Smaller k value: The condition of overfitting occurs when the value of k is smaller.
The model will capture all of the training data, including noise. The model will
perform poorly for the test data in this scenario.

2.We should not use even values of K when classifying binary classification problems. Suppose
we choose K=4 and the neighboring 4 points are evenly distributed among classes i.e 2 data
points belong to category 1 and 2 data points belong to category 2. In that case, the data point
cannot classify as there is a tie between the classes.

3.Choose K value based on domain knowledge.

4.Plot the elbow curve between different K values and error. Choose the K value when there is
a sudden drop in the error rate.
Impact of Imbalanced dataset and Outliers on KNN
 Imbalanced dataset~

When dealing with an imbalanced data set, the model will become biased. Consider the example
shown in the diagram below, where the “Yes” class is more prominent.

As a consequence, the bulk of the closest neighbors to this new point will be from the dominant
class. Because of this, we must balance our data set using either an an “Upscaling”or
“Downscaling” strategy

 Outliers~

Outliers are the points that differ significantly from the rest of the data points.

The outliers will impact the classification/prediction of the model. The appropriate class for the
new data point, according to the following diagram, should be “Category B” in green.

The model, however, would be unable to have the appropriate classification due to the existence
of outliers. As a result, removing outliers before using KNN is recommended.

K Nearest Neighbor - Step by Step Tutorial
No ratings yet
K Nearest Neighbor - Step by Step Tutorial
16 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Computer Graphics Lab Manual: MR - Shivakumar B, Lecturer, Dept of BCA, SSIBM, Tumakuru
No ratings yet
Computer Graphics Lab Manual: MR - Shivakumar B, Lecturer, Dept of BCA, SSIBM, Tumakuru
21 pages
Lecture#2. K Nearest Neighbors
No ratings yet
Lecture#2. K Nearest Neighbors
10 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
ML 2
No ratings yet
ML 2
6 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
7 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Week 07
No ratings yet
Week 07
24 pages
Supervised Learning KNN
No ratings yet
Supervised Learning KNN
23 pages
Chapter 4. K Nearest Neighbors
No ratings yet
Chapter 4. K Nearest Neighbors
55 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
Instance-Based Learning: K-Nearest Neighbour Learning
No ratings yet
Instance-Based Learning: K-Nearest Neighbour Learning
21 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
KNN
No ratings yet
KNN
53 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
14-15 ASAP Advanced Statistics Clasification Techniques KNN
No ratings yet
14-15 ASAP Advanced Statistics Clasification Techniques KNN
49 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
21 KNN
No ratings yet
21 KNN
28 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
16 pages
K-Nearest Neighbors (KNN)
No ratings yet
K-Nearest Neighbors (KNN)
9 pages
Clustering - KNN
No ratings yet
Clustering - KNN
10 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
KNN Presentation
No ratings yet
KNN Presentation
19 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
Introduction To KNN and R
No ratings yet
Introduction To KNN and R
12 pages
Unit II 2 Mark Answers ML
No ratings yet
Unit II 2 Mark Answers ML
3 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
KNN 2
No ratings yet
KNN 2
53 pages
S3 K Nearest Neighbor LKW 15jan2025
No ratings yet
S3 K Nearest Neighbor LKW 15jan2025
16 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
K-Means and KNN
No ratings yet
K-Means and KNN
11 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
No ratings yet
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
10 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
When Do We Use KNN Algorithm?
No ratings yet
When Do We Use KNN Algorithm?
7 pages
K-Nearest NEIGHBOUR
No ratings yet
K-Nearest NEIGHBOUR
16 pages
06 KNN
No ratings yet
06 KNN
41 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Bài nhóm tìm hiểu về KNN
No ratings yet
Bài nhóm tìm hiểu về KNN
5 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
AIML
No ratings yet
AIML
13 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Introduction To KNN
100% (1)
Introduction To KNN
8 pages
KNN With Example
No ratings yet
KNN With Example
21 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Ziweritin Bitcoin Price Predition 444
No ratings yet
Ziweritin Bitcoin Price Predition 444
11 pages
Chapter 3
No ratings yet
Chapter 3
32 pages
Rauf Khann C M M File
No ratings yet
Rauf Khann C M M File
12 pages
June 2024 A Level D1 Paper
No ratings yet
June 2024 A Level D1 Paper
16 pages
1D and 2D Convolution Experiment No: 2
No ratings yet
1D and 2D Convolution Experiment No: 2
3 pages
Class X-Maths-Polynomials-Aecs2 Mumbai
No ratings yet
Class X-Maths-Polynomials-Aecs2 Mumbai
5 pages
Tractable & Intractable Problems: - We Will Be Looking at
No ratings yet
Tractable & Intractable Problems: - We Will Be Looking at
12 pages
DAA Unit-3 PPT - Part-1
No ratings yet
DAA Unit-3 PPT - Part-1
29 pages
A Speech Denoising Demonstration System Using Multi-Model Deep-Learning Neural Networks
No ratings yet
A Speech Denoising Demonstration System Using Multi-Model Deep-Learning Neural Networks
23 pages
Bmte 144 em 2024 MP
No ratings yet
Bmte 144 em 2024 MP
28 pages
26 Closures of Relations
No ratings yet
26 Closures of Relations
29 pages
Causality: The Impulse Response H (N) of An Ideal Low Pass Filter With Frequency Response
No ratings yet
Causality: The Impulse Response H (N) of An Ideal Low Pass Filter With Frequency Response
3 pages
A Path Finding Visualization Using A Star Algorithm and Dijkstra's Algorithm
100% (1)
A Path Finding Visualization Using A Star Algorithm and Dijkstra's Algorithm
2 pages
Analyzing Time-Varying Noise Properties With Spectrerf
100% (1)
Analyzing Time-Varying Noise Properties With Spectrerf
22 pages
Theoretical Neuroscience II Exercise 8: Principal Component Analysis (PCA)
No ratings yet
Theoretical Neuroscience II Exercise 8: Principal Component Analysis (PCA)
2 pages
15 DSA PPT Sorting Techniques-I
No ratings yet
15 DSA PPT Sorting Techniques-I
23 pages
Art of Programming Through Algorithms and Flowcharts in C
No ratings yet
Art of Programming Through Algorithms and Flowcharts in C
7 pages
Unit 3b
No ratings yet
Unit 3b
9 pages
Digital Communication
No ratings yet
Digital Communication
148 pages
Lec 5
No ratings yet
Lec 5
19 pages
INT247 Lect3.03.1
No ratings yet
INT247 Lect3.03.1
23 pages
Chapter 5 (Print)
No ratings yet
Chapter 5 (Print)
11 pages
Be Winter 2022
No ratings yet
Be Winter 2022
2 pages
A PDF
No ratings yet
A PDF
3 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
DeepSeek图解10页
No ratings yet
DeepSeek图解10页
11 pages
CS 188: Artificial Intelligence: Adversarial Search
No ratings yet
CS 188: Artificial Intelligence: Adversarial Search
44 pages
Alpha Beta Pruning Algorithm
No ratings yet
Alpha Beta Pruning Algorithm
27 pages
5 Bankers Algorithm Program
No ratings yet
5 Bankers Algorithm Program
4 pages

What Is KNN

Uploaded by

What Is KNN

Uploaded by

What is k-nn?

5. When independent variables in training data are measured in different units, it is

Where xi is the observed value and yi is the actual value

height (in cms) Weight (in kgs) T-Shirt Size Distance

1. Let’s consider k=5

How to choose the K value?

The impact of selecting a smaller or larger K value on the model

3.Choose K value based on domain knowledge.

You might also like