0% found this document useful (0 votes)
9 views

KNN_Algorithm

K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification and regression tasks that relies on the similarity of data points. It operates by storing training data and making predictions based on the K nearest neighbors, using distance metrics like Euclidean, Manhattan, and Minkowski distances. While KNN is simple and intuitive, it can be computationally expensive and sensitive to feature scaling.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

KNN_Algorithm

K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification and regression tasks that relies on the similarity of data points. It operates by storing training data and making predictions based on the K nearest neighbors, using distance metrics like Euclidean, Manhattan, and Minkowski distances. While KNN is simple and intuitive, it can be computationally expensive and sensitive to feature scaling.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

K-Nearest Neighbors (KNN)

1. Introduction to K-Nearest Neighbors (KNN)


K-Nearest Neighbors (KNN) is a supervised learning algorithm used for both classification
and regression tasks. It is an instance-based learning algorithm, meaning it does not
explicitly learn a model but rather memorizes the training dataset and makes predictions
based on similarity. The key idea behind KNN is that similar data points tend to belong to
the same class or have similar output values.

2. How KNN Works


KNN does not involve an explicit training phase. Instead, it simply stores the feature vectors
and corresponding labels from the training dataset. For a given test point X, KNN follows
these steps:
1. Compute the distance between X and all points in the training dataset.
2. Select the K nearest neighbors based on the computed distances.
3. Assign a class label (for classification) or compute the average (for regression) based on
the K neighbors.

3. Distance Metrics in KNN


KNN relies on distance metrics to find the closest neighbors. Some commonly used distance
measures include:

3.1. Euclidean Distance


The most widely used distance metric in KNN is Euclidean distance, which measures the
straight-line distance between two points in an n-dimensional space. Given two points X =
(x₁, x₂, ..., xₙ) and Y = (y₁, y₂, ..., yₙ), the Euclidean distance is defined as:

d(X, Y) = sqrt(Σ (x_i - y_i)^2)

3.2. Manhattan Distance


Manhattan distance computes the sum of absolute differences between corresponding
coordinates:

d(X, Y) = Σ |x_i - y_i|

3.3. Minkowski Distance


Minkowski distance generalizes Euclidean and Manhattan distances. It is given by:

d(X, Y) = (Σ |x_i - y_i|^p)^(1/p)

4. Choosing the Right K Value


The choice of K (number of neighbors) significantly affects KNN's performance. A small K
may lead to overfitting, while a large K smooths decision boundaries but may ignore local
patterns.
5. KNN for Classification
In classification, KNN assigns the class label of a test point based on the majority class
among its K nearest neighbors.

6. KNN for Regression


In regression, KNN predicts the output as the average of the target values of the K nearest
neighbors. Weighted KNN assigns higher weights to closer neighbors.

7. Advantages and Disadvantages of KNN


Advantages:
- Simple and intuitive
- No training time
- Works for both classification and regression

Disadvantages:
- Computationally expensive
- Sensitive to feature scaling
- Not robust to noisy data

8. Applications of KNN
KNN is widely used in:
- Image recognition
- Recommendation systems
- Medical diagnosis
- Anomaly detection

You might also like