0% found this document useful (0 votes)
11 views9 pages

K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) is a simple, non-parametric machine learning algorithm used for classification and regression by predicting based on the K nearest data points. The algorithm involves selecting K, calculating distances, sorting neighbors, and making predictions based on majority class or average value. Choosing the optimal K can be guided by heuristics, odd/even considerations, and cross-validation for best accuracy.

Uploaded by

deepakshikandpal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) is a simple, non-parametric machine learning algorithm used for classification and regression by predicting based on the K nearest data points. The algorithm involves selecting K, calculating distances, sorting neighbors, and making predictions based on majority class or average value. Choosing the optimal K can be guided by heuristics, odd/even considerations, and cross-validation for best accuracy.

Uploaded by

deepakshikandpal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

(KNN)

• K-Nearest Neighbors (KNN) is one of the simplest and most effective machine
learning algorithms used for classification and regression tasks.
• It is a non-parametric, lazy learning algorithm, meaning it makes no
assumptions about the underlying data distribution and does not learn a model
during training.
How KNN Works?
• KNN works by finding the K nearest data points in
the training set for a given query point and making
predictions based on their majority class (for
classification) or average value (for regression).
Steps of the KNN Algorithm:
•Choose the value of K (number of neighbors).
•Calculate the distance between the query point and all data points in the training set.
•Sort the distances and select the K nearest neighbors.
•For classification, assign the most common class among the K neighbors.
•For regression, take the average (or weighted average) of the K nearest neighbors.
Distance Metrics in KNN
Problem Statement:
We have a dataset of students' heights and weights. We want to classify a
new student into one of two categories: "Underweight" or "Healthy."
Choosing the optimal value of K in
K-Nearest Neighbors (KNN)
• 1. Rule of Thumb for Choosing K

• A common heuristic is:

• K=​where N is the total number of training samples.

• If N = 100, then K=10.

• If N = 500, then K=22.

• This gives a good starting point but is not always the best
choice.
Choosing the optimal value of K in
K-Nearest Neighbors (KNN)
•2. Odd vs. Even K
•Use odd values of K to avoid ties in classification problems.
•Even values can cause ties, requiring tie-breaking strategies.

3. Choosing K Using Cross-Validation


The best way to find K is to try multiple values and select the one that gives the best
accuracy:
Steps for K Selection using Cross-Validation
1.Split data into training and validation sets (e.g., 80%-20%).
2.Train KNN for different values of K.
3.Measure accuracy for each K using validation data.
4.Select K with the highest accuracy.

You might also like