0% found this document useful (0 votes)
26 views8 pages

ML Assignment 02

Machine Learning KNN Assignment

Uploaded by

210276
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views8 pages

ML Assignment 02

Machine Learning KNN Assignment

Uploaded by

210276
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Department of Electrical and Computer

Engineering

Machine Learning
Assignment: 02
Instructor: Engr. Ayesha Sadiq
Muhammad Muneeb Khan Noor-ul-Huda Rana
210276 210294
Abd-ur-Rehman Jatt
210312
27 May 2024

1
Contents
1 Introduction 3

2 Dataset 3

3 Implementation 3
3.1 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4 Results 5
4.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Graph of Accuracies . . . . . . . . . . . . . . . . . . . . . . . 6

5 Discussion 7
5.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.3 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . 7

6 Conclusion 8

2
1 Introduction
The objective of this assignment is to implement the K-Nearest Neighbors
(K-NN) algorithm to classify handwritten USPS digits. This report details
the implementation approach, code, results, and evaluation metrics, as well
as challenges faced and lessons learned.

2 Dataset
The dataset used is the USPS digits dataset, which consists of grayscale
images of digits (0-9), each of size 16x16 pixels (256 pixels in total). The
data was split into training and testing sets, with 80% used for training and
20% for testing.

3 Implementation
The implementation was done in Python, utilizing libraries such as numpy,
scikit-learn, and matplotlib. The main steps include:

• Loading and normalizing the dataset.

• Splitting the dataset into training and test sets.

• Training the K-NN classifier with various values of K.

• Evaluating the classifier using metrics such as accuracy and confusion


matrix.

3.1 Code
The Python code used to implement the K-NN algorithm is shown below.
Each step is clearly explained through comments for better understanding.
Listing 1: Python code for K-NN implementation
# Import necessary libraries
import numpy as np
import matplotlib . pyplot as plt
from sklearn . datasets import fetch_openml
from sklearn . model_selection import train_test_split
from sklearn . neighbors import K N e i g h b o r s C l a s s i f i e r
from sklearn . metrics import accuracy_score ,
confusion_matrix , C o n f u s i o n M a t r i x D i s p l a y

3
# Load the USPS ( or similar ) digit dataset
# Here we use the MNIST dataset as a similar alternative
. It contains images of handwritten digits (0 -9)
digits = fetch_openml ( ’ mnist_784 ’) # substitute with
USPS if available
X , y = digits . data / 255.0 , digits . target . astype ( int ) #
Normalize and convert labels to integers

# Split the dataset into training and testing sets (80%


for training and 20% for testing )
X_train , X_test , y_train , y_test = train_test_split (X , y
, test_size =0.2 , random_state =42)

# Function to evaluate K - NN with different values of K


def evaluate_knn ( k_values ) :
"""
Trains and evaluates K - NN classifier for different
values of K .
Displays accuracy and confusion matrix for each K .

Parameters :
- k_values : List of K values to evaluate ( e . g . , [1 ,
3 , 5 , 7 , 9])

Returns :
- accuracies : List of accuracies for each K value
"""
accuracies = []
for k in k_values :
# Initialize K - NN classifier with the current K
value
knn = K N e i g h b o r s C l a s s i f i e r ( n_neighbors = k )

# Train the classifier on the training data


knn . fit ( X_train , y_train )

# Make predictions on the test data


y_pred = knn . predict ( X_test )

# Calculate accuracy of the model


accuracy = accuracy_score ( y_test , y_pred )
accuracies . append ( accuracy )

4
print ( f ’ Accuracy ␣ for ␣ K ={ k }: ␣ { accuracy :.4 f } ’)

# Display the confusion matrix for the last


value of K in k_values list
if k == k_values [ -1]: # Display matrix only for
the last K
cm = confusion_matrix ( y_test , y_pred ) #
Compute confusion matrix
C o n f u s i o n M a t r i x D i s p l a y ( cm , display_labels = np
. unique ( y ) ) . plot () # Display matrix
plt . title ( f ’ Confusion ␣ Matrix ␣ for ␣ K ={ k } ’)
plt . show ()
return accuracies

# Define a list of K values to evaluate


k_values = [1 , 3 , 5 , 7 , 9]
# Call the function to evaluate the classifier and store
the accuracy for each K
accuracies = evaluate_knn ( k_values )

# Plot accuracy vs . K to visually analyze how


performance varies with K
plt . plot ( k_values , accuracies , marker = ’o ’)
plt . title ( ’ Accuracy ␣ vs . ␣K - Values ’)
plt . xlabel ( ’K - Value ’)
plt . ylabel ( ’ Accuracy ’)
plt . grid ()
plt . show ()

4 Results
The model’s performance was evaluated using accuracy and confusion matrix
for various values of K. Results are discussed below.

4.1 Accuracy
The accuracy for different values of K is shown in Table 1. As observed,
accuracy varies with different K-values, providing insight into the optimal
choice for K.

5
Table 1: Accuracy for Different Values of K

K Value Accuracy
1 0.96
3 0.97
5 0.965
7 0.96
9 0.955

4.2 Confusion Matrix


Figure 2 shows the confusion matrix, which gives a better understanding of
the classification performance for each digit.

Figure 1: Confusion Matrix

4.3 Graph of Accuracies


Figure 2 shows the graph for accuracies with different values of K, which
gives a better understanding of the classification performance for each digit.

6
Figure 2: Graph of accuracies against different values for K

5 Discussion
5.1 Approach
The K-NN classifier was trained on the USPS digit dataset. By experiment-
ing with different values of K, we observed that K = 3 provided optimal
performance, balancing both accuracy and computational efficiency.

5.2 Challenges
I had zero experience with machine learning especially in the coding part.
I had to look for proper resources to implement this. Challenges included
finding the optimal value for K and ensuring proper evaluation metrics for
multiclass classification. Additionally, handling high-dimensional data re-
quired careful normalization.

5.3 Lessons Learned


This assignment provided insight into the K-NN algorithm, including the
importance of parameter tuning and the use of metrics like confusion matrices
in evaluating classifier performance and actually coding the problem.

7
6 Conclusion
In this report, the K-Nearest Neighbors algorithm was applied to recognize
USPS handwritten digits. By implementing the algorithm and evaluating it
with different metrics, we learned the impact of K on model performance.
This approach can be further optimized or expanded by experimenting with
other algorithms.

You might also like