Machine Learning
Assignment: 02
Instructor: Engr. Ayesha Sadiq
Muhammad Muneeb Khan Noor-ul-Huda Rana
210276 210294
Abd-ur-Rehman Jatt
27 May 2024
1 Introduction
The objective of this assignment is to implement the K-Nearest Neighbors
(K-NN) algorithm to classify handwritten USPS digits. This report details
the implementation approach, code, results, and evaluation metrics, as well
as challenges faced and lessons learned.
2 Dataset
The dataset used is the USPS digits dataset, which consists of grayscale
images of digits (0-9), each of size 16x16 pixels (256 pixels in total). The
data was split into training and testing sets, with 80% used for training and
20% for testing.
3 Implementation
The implementation was done in Python, utilizing libraries such as numpy,
scikit-learn, and matplotlib. The main steps include:
3.1 Code
The Python code used to implement the K-NN algorithm is shown below.
Each step is clearly explained through comments for better understanding.
Listing 1: Python code for K-NN implementation
# Import necessary libraries
import numpy as np
import matplotlib . pyplot as plt
from sklearn . datasets import fetch_openml
from sklearn . model_selection import train_test_split
from sklearn . neighbors import K N e i g h b o r s C l a s s i f i e r
from sklearn . metrics import accuracy_score ,
confusion_matrix , C o n f u s i o n M a t r i x D i s p l a y
# Load the USPS ( or similar ) digit dataset
# Here we use the MNIST dataset as a similar alternative
. It contains images of handwritten digits (0 -9)
digits = fetch_openml ( ’ mnist_784 ’) # substitute with
USPS if available
X , y = digits . data / 255.0 , digits . target . astype ( int ) #
Normalize and convert labels to integers
Parameters :
- k_values : List of K values to evaluate ( e . g . , [1 ,
3 , 5 , 7 , 9])
Returns :
- accuracies : List of accuracies for each K value
accuracies = []
for k in k_values :
# Initialize K - NN classifier with the current K
knn = K N e i g h b o r s C l a s s i f i e r ( n_neighbors = k )
print ( f ’ Accuracy ␣ for ␣ K ={ k }: ␣ { accuracy :.4 f } ’)
4 Results
The model’s performance was evaluated using accuracy and confusion matrix
for various values of K. Results are discussed below.
4.1 Accuracy
The accuracy for different values of K is shown in Table 1. As observed,
accuracy varies with different K-values, providing insight into the optimal
choice for K.
Table 1: Accuracy for Different Values of K
K Value Accuracy
1 0.96
3 0.97
5 0.965
7 0.96
9 0.955
Figure 2: Graph of accuracies against different values for K
5 Discussion
5.1 Approach
The K-NN classifier was trained on the USPS digit dataset. By experiment-
ing with different values of K, we observed that K = 3 provided optimal
performance, balancing both accuracy and computational efficiency.
5.2 Challenges
I had zero experience with machine learning especially in the coding part.
I had to look for proper resources to implement this. Challenges included
finding the optimal value for K and ensuring proper evaluation metrics for
multiclass classification. Additionally, handling high-dimensional data re-
quired careful normalization.
6 Conclusion
In this report, the K-Nearest Neighbors algorithm was applied to recognize
USPS handwritten digits. By implementing the algorithm and evaluating it
with different metrics, we learned the impact of K on model performance.
This approach can be further optimized or expanded by experimenting with
other algorithms.