0% found this document useful (0 votes)
5 views6 pages

Module 3 Lab 2

This document provides a detailed explanation of implementing the K-Nearest Neighbors (KNN) algorithm from scratch, including distance calculation, neighbor selection, and accuracy metrics. It covers visualizing KNN behavior with Voronoi diagrams and decision boundaries, as well as evaluating model performance using confusion matrices and classification reports. Additionally, it discusses handling categorical data and using PCA for visualization in the context of the Iris and Car Evaluation datasets.

Uploaded by

katrao39798
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Module 3 Lab 2

This document provides a detailed explanation of implementing the K-Nearest Neighbors (KNN) algorithm from scratch, including distance calculation, neighbor selection, and accuracy metrics. It covers visualizing KNN behavior with Voronoi diagrams and decision boundaries, as well as evaluating model performance using confusion matrices and classification reports. Additionally, it discusses handling categorical data and using PCA for visualization in the context of the Iris and Car Evaluation datasets.

Uploaded by

katrao39798
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Detailed Explanation of Module 3 Lab 2: Implementing KNN from Scratch and

Visualizing Algorithm Performance


(Updated with All Your Queries and Key Concepts)

Section 1: Implementing KNN from Scratch

What is KNN?
K-Nearest Neighbors (KNN) is a simple, intuitive algorithm for classification and regression. It
predicts the label of a new data point by looking at the labels of its k closest points in the
training set (using a distance metric, usually Euclidean distance), and choosing the most
common label among them.

How is KNN Implemented from Scratch?


Distance Calculation: For each test point, compute the distance to every training point.
Find Neighbors: Sort all distances and select the k smallest (closest) points.
Predict Label: For classification, take the most frequent label among the k neighbors.
Example Code:

def predict(X_train, y_train, X_test, k):


distances = []
targets = []
for i in range(len(X_train)):
distances.append([np.sqrt(np.sum(np.square(X_test - X_train[i, :]))), i])
distances = sorted(distances)
for i in range(k):
index = distances[i][^1]
targets.append(y_train[index])
return Counter(targets).most_common(1)[^0][^0]

For k=1, the label of the single nearest neighbor is returned.


For k>1, the most common label among the k neighbors is chosen.

Accuracy Metric
Accuracy is the ratio of correctly classified samples to total samples:

def Accuracy(gtlabel, predlabel):


correct = (gtlabel == predlabel).sum()
return correct / len(gtlabel)

Section 1.1: KNN on the Iris Dataset


Dataset: Iris (150 samples, 4 features, 3 classes).
Process:
1. Split data into training and test sets.
2. Use your KNN function to predict test labels.
3. Calculate accuracy.
Result Example:
"The accuracy of our classifier is 94.0%"
Comparison:
The sklearn library’s KNN implementation gives the same accuracy, validating your scratch
code.

Section 1.2: Weighted KNN


Why Weighted?
If k is large, distant neighbors may outvote closer, more relevant ones. Weighted KNN gives
more importance to closer neighbors (e.g., by using the inverse of their distance as a
weight).
How to Implement:
In sklearn, use weights='distance' in KNeighborsClassifier.
In your own code, you’d multiply each neighbor’s vote by its weight (inverse distance).

Section 1.3: Return Neighbors and Distances


Modification:
Instead of just the predicted label, your function can return the indices, distances, and
labels of the k nearest neighbors for each test point.
Why?
This helps you analyze which points are influencing each prediction.

Section 2: Visualizing Data and KNN Behavior


Voronoi Diagrams
What are they?
Voronoi diagrams partition the plane into regions where each region contains all points
closest to one "seed" (data point).
Why useful?
They show how the choice of distance metric and data distribution affects the influence of
each training point.
Limitation:
Only practical for 2D data, so you use the first two features or apply PCA to reduce
dimensions.
Example Code:

from scipy.spatial import Voronoi, voronoi_plot_2d


vor = Voronoi(points)
voronoi_plot_2d(vor)
plt.scatter(points[:, 0], points[:, 1], c=targets, cmap='viridis', edgecolor='k')
plt.show()

Section 2.2: Decision Boundaries in KNN


What are Decision Boundaries?
Imaginary lines (or surfaces) in the feature space where the predicted class changes. They
show which regions of the space are classified as which class by KNN.
How are they plotted?
1. Create a grid covering the feature space.
2. Use KNN to predict the class at each grid point.
3. Color each region according to the predicted class.
4. Overlay the training data points.
Why are they important?
They help you see how KNN generalizes and where it is likely to make mistakes.
For small k, boundaries are jagged and sensitive to noise; for large k, boundaries are
smoother.
Example Code:

def decision_boundary_plot(x_dec, y_dec, k):


h = .02
n = len(set(y_dec))
cmap_light = ListedColormap(['pink', 'green', 'cyan', 'yellow'][:n])
cmap_bold = ['pink', 'darkgreen', 'blue', 'yellow'][:n]
for weights in ['uniform', 'distance']:
clf = KNeighborsClassifier(n_neighbors=k, weights=weights)
clf.fit(x_dec, y_dec)
x_min, x_max = x_dec[:, 0].min() - 1, x_dec[:, 0].max() + 1
y_min, y_max = x_dec[:, 1].min() - 1, x_dec[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, cmap=cmap_light)
sns.scatterplot(x=x_dec[:, 0], y=x_dec[:, 1], hue=y_dec,
palette=cmap_bold, edgecolor="black", alpha=1.0)
plt.show()

Section 2.3: PCA for Visualization


Why PCA?
The Iris dataset has 4 features; to plot Voronoi diagrams and decision boundaries, you need
2D data.
How?
Use PCA to reduce the data to two principal components, then plot as above.

Section 2.4: Confusion Matrix and Classification Report


Confusion Matrix:
A table showing the number of correct and incorrect predictions for each class. Diagonal
values are correct; off-diagonal are mistakes.
Classification Report:
Gives precision, recall, F1-score, and support for each class.
Precision: Of all predicted as class X, how many were correct?
Recall: Of all actual class X, how many did we find?
F1-score: Harmonic mean of precision and recall.
Example Output:

precision recall f1-score support


0 1.00 1.00 1.00 13
1 0.88 1.00 0.94 22
2 1.00 0.80 0.89 15
accuracy 0.94 50
macro avg 0.96 0.93 0.94 50
weighted avg 0.95 0.94 0.94 50
Section 3: Applying KNN on the Car Evaluation Dataset
Data Preparation:
Categorical features are label-encoded to numbers.
Data is split into train/test sets.
KNN Training and Evaluation:
KNN is trained and tested as above.
Accuracy is reported (e.g., 89.88%).
Visualization:
PCA reduces the data to 2D for plotting Voronoi diagrams and decision boundaries.
Confusion matrix and classification report are generated for model evaluation.

Summary Table
Concept What It Means / Why It Matters

KNN from scratch Understands the algorithm’s logic, not just using libraries

Weighted KNN Closer neighbors have more influence on prediction

Voronoi diagrams Visualize which points “own” which regions of space

Decision boundaries Show where class predictions change in feature space

PCA Reduces high-dimensional data to 2D for visualization

Confusion matrix Shows details of correct/incorrect predictions per class

Classification report Gives precision, recall, F1-score for each class

In summary:
This lab teaches you to implement KNN from scratch, understand how it works, visualize its
behavior using Voronoi diagrams and decision boundaries, and evaluate its performance with
confusion matrices and classification reports. You also learn how to handle categorical data, use
PCA for visualization, and interpret the strengths and weaknesses of your classifier [1] [2] [3] [4] [5]
[6] [7] .

1. https://www.machinelearningmastery.com/tutorial-to-implement-k-nearest-neighbors-in-python-from-s
cratch/
2. https://www.kaggle.com/code/jebathuraiibarnabas/knn-from-scratch-with-visualization
3. https://realpython.com/knn-python/
4. https://www.kaggle.com/code/just4jcgeorge/k-nearest-neighbour-algorithm
5. https://dataaspirant.com/k-nearest-neighbor-algorithm-implementaion-python-scratch/
6. https://www.scribd.com/document/736817575/MACHINE-LEARNING-LAB-MANUAL
7. AIML_Module_3_Lab_2_Implementing_KNN_from_scratch_and_visualize_Algorithm_performance.ipynb-
Cola.pdf

You might also like