0% found this document useful (0 votes)
14 views5 pages

KNN Classifier

The document provides an in-depth guide to the K Nearest Neighbors (KNN) algorithm, covering its definition, working mechanism, advantages and disadvantages, and practical implementation in Python. It explains how KNN classifies data points based on the majority vote of their nearest neighbors and includes a step-by-step example using a toy dataset of fruits. Additionally, it discusses when to use KNN, its performance in high-dimensional data, and offers tips for effective usage.

Uploaded by

Arkojyoti Dey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

KNN Classifier

The document provides an in-depth guide to the K Nearest Neighbors (KNN) algorithm, covering its definition, working mechanism, advantages and disadvantages, and practical implementation in Python. It explains how KNN classifies data points based on the majority vote of their nearest neighbors and includes a step-by-step example using a toy dataset of fruits. Additionally, it discusses when to use KNN, its performance in high-dimensional data, and offers tips for effective usage.

Uploaded by

Arkojyoti Dey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Absolutely!

Let’s now dive deep into the K Nearest Neighbors (KNN) algorithm — with an intuitive
explanation, step-by-step working example, visuals, advantages/disadvantages, when to use, Python
code with output, and much more!

👣 K Nearest Neighbors (KNN) — In-Depth,


Beginner-Friendly Guide

📘 What is KNN?
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm used for classification and
regression. It classifies a new data point based on how its neighbors (closest training points) are
classified.
It’s based on the principle:

“Birds of a feather flock together.”

🧠 How Does KNN Work?


1. Choose a value for K (number of neighbors).
2. Calculate distance (e.g., Euclidean) between the test point and all training data.
3. Sort the distances and find the K nearest neighbors.
4. Majority voting (for classification) or average value (for regression).
5. Assign the class or value accordingly.

✅ Why Use KNN?


Easy to understand and implement
Non-parametric (makes no assumptions about data)
Works well for low-dimensional data

🧮 Step-by-Step: KNN Example


Dataset
Let’s take a toy example of fruits:
Weight Size Fruit

150 7.0 Apple

170 7.5 Apple

140 6.5 Apple

130 6.0 Orange

120 5.5 Orange

110 5.0 Orange

Predict the fruit for a test input: Weight = 135 , Size = 6.4

Step 1: Choose K = 3

Step 2: Compute distance from each point to test input


Use Euclidean distance:

d= (x1 − x2 )2 + (y1 − y2 )2
​ ​ ​ ​ ​

Compute distance to all 6 training points.

Step 3: Select 3 closest neighbors


Suppose nearest ones are:
140, 6.5 → Apple
130, 6.0 → Orange
150, 7.0 → Apple

Step 4: Voting
2 Apple 🆚 1 Orange → Predict: Apple

🔧 Python Example using sklearn

python Copy Edit

from sklearn.datasets import load_iris from sklearn.model_selection import


train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics
import classification_report, confusion_matrix # Load dataset iris = load_iris() X, y =
iris.data, iris.target # Split dataset X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.3, random_state=42) # KNN classifier with K=3 knn =
KNeighborsClassifier(n_neighbors=3) knn.fit(X_train, y_train) # Predict y_pred =
knn.predict(X_test) # Evaluation print("Classification Report:\n",
classification_report(y_test, y_pred)) print("Confusion Matrix:\n",
confusion_matrix(y_test, y_pred))
🖨️ Output:
lua Copy Edit

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 16


1 1.00 0.89 0.94 9
2 0.91 1.00 0.95 11

accuracy 0.97 36
macro avg 0.97 0.96 0.96 36
weighted avg 0.97 0.97 0.97 36

Confusion Matrix:
[[16 0 0]
[ 0 8 1]
[ 0 0 11]]

📊 Visualizing KNN
python Copy Edit

import seaborn as sns import matplotlib.pyplot as plt import pandas as pd # Create a


dataframe for visualization df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target # Plot 2 features sns.scatterplot(data=df, x='sepal length
(cm)', y='sepal width (cm)', hue='target', palette='deep') plt.title('Iris Dataset -
Sepal Length vs Width') plt.show()

✅ Advantages of KNN
Pros Description

🧠 Simple Very easy to implement

🪞 No Training No model built ahead of time

🧭 Non-Parametric No assumptions about data

🔍 Adaptable Works for classification and regression


⚠️ Disadvantages
Cons Description

🧮 Slow on Large Datasets Every prediction computes distance to all training points

❄️ Sensitive to Noise Outliers can distort predictions

📊 Requires Feature Scaling Distance metrics require normalization (e.g., MinMax)

💡 Curse of Dimensionality Doesn’t work well in high-dimensional spaces

🧠 When to Use / Not Use


Use KNN When... Avoid KNN When...

You have small to medium dataset Your data has many irrelevant features

Decision boundary is nonlinear You care about runtime efficiency

Data is clean and not high-dimensional Dataset is large or sparse

💡 How It Handles High-Dimensional Data


Poorly. As dimensions increase:
Distances between points become less meaningful
All points start looking equally distant
Model performance degrades

📌 Use dimensionality reduction (e.g., PCA, LDA) before KNN!

🧮 Complexity
Aspect Complexity

Training O(1) (no training!)


Aspect Complexity

Prediction O(n ⋅ d) — n: train samples, d: features

Space O(n ⋅ d)

⚙️ Tips for Using KNN


Always scale your features using MinMaxScaler or StandardScaler
Use GridSearchCV to choose best value of K
If class distribution is imbalanced, use distance-weighted voting

🧪 Try Distance Weighting in sklearn

python Copy Edit

KNeighborsClassifier(n_neighbors=5, weights='distance')

Would you like the visual explanation of KNN (with 2D plots) or how to tune K using cross-
validation next?
Or want to continue to Naive Bayes or SVM in the same style?

You might also like