ML Assignment 02

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Department of Electrical and Computer

Engineering

Machine Learning
Assignment: 02
Instructor: Engr. Ayesha Sadiq
Muhammad Muneeb Khan Noor-ul-Huda Rana
210276 210294
Abd-ur-Rehman Jatt
210312
27 May 2024

1
Contents
1 Introduction 3

2 Dataset 3

3 Implementation 3
3.1 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4 Results 5
4.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Graph of Accuracies . . . . . . . . . . . . . . . . . . . . . . . 6

5 Discussion 7
5.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.3 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . 7

6 Conclusion 8

2
1 Introduction
The objective of this assignment is to implement the K-Nearest Neighbors
(K-NN) algorithm to classify handwritten USPS digits. This report details
the implementation approach, code, results, and evaluation metrics, as well
as challenges faced and lessons learned.

2 Dataset
The dataset used is the USPS digits dataset, which consists of grayscale
images of digits (0-9), each of size 16x16 pixels (256 pixels in total). The
data was split into training and testing sets, with 80% used for training and
20% for testing.

3 Implementation
The implementation was done in Python, utilizing libraries such as numpy,
scikit-learn, and matplotlib. The main steps include:

• Loading and normalizing the dataset.

• Splitting the dataset into training and test sets.

• Training the K-NN classifier with various values of K.

• Evaluating the classifier using metrics such as accuracy and confusion


matrix.

3.1 Code
The Python code used to implement the K-NN algorithm is shown below.
Each step is clearly explained through comments for better understanding.
Listing 1: Python code for K-NN implementation
# Import necessary libraries
import numpy as np
import matplotlib . pyplot as plt
from sklearn . datasets import fetch_openml
from sklearn . model_selection import train_test_split
from sklearn . neighbors import K N e i g h b o r s C l a s s i f i e r
from sklearn . metrics import accuracy_score ,
confusion_matrix , C o n f u s i o n M a t r i x D i s p l a y

3
# Load the USPS ( or similar ) digit dataset
# Here we use the MNIST dataset as a similar alternative
. It contains images of handwritten digits (0 -9)
digits = fetch_openml ( ’ mnist_784 ’) # substitute with
USPS if available
X , y = digits . data / 255.0 , digits . target . astype ( int ) #
Normalize and convert labels to integers

# Split the dataset into training and testing sets (80%


for training and 20% for testing )
X_train , X_test , y_train , y_test = train_test_split (X , y
, test_size =0.2 , random_state =42)

# Function to evaluate K - NN with different values of K


def evaluate_knn ( k_values ) :
"""
Trains and evaluates K - NN classifier for different
values of K .
Displays accuracy and confusion matrix for each K .

Parameters :
- k_values : List of K values to evaluate ( e . g . , [1 ,
3 , 5 , 7 , 9])

Returns :
- accuracies : List of accuracies for each K value
"""
accuracies = []
for k in k_values :
# Initialize K - NN classifier with the current K
value
knn = K N e i g h b o r s C l a s s i f i e r ( n_neighbors = k )

# Train the classifier on the training data


knn . fit ( X_train , y_train )

# Make predictions on the test data


y_pred = knn . predict ( X_test )

# Calculate accuracy of the model


accuracy = accuracy_score ( y_test , y_pred )
accuracies . append ( accuracy )

4
print ( f ’ Accuracy ␣ for ␣ K ={ k }: ␣ { accuracy :.4 f } ’)

# Display the confusion matrix for the last


value of K in k_values list
if k == k_values [ -1]: # Display matrix only for
the last K
cm = confusion_matrix ( y_test , y_pred ) #
Compute confusion matrix
C o n f u s i o n M a t r i x D i s p l a y ( cm , display_labels = np
. unique ( y ) ) . plot () # Display matrix
plt . title ( f ’ Confusion ␣ Matrix ␣ for ␣ K ={ k } ’)
plt . show ()
return accuracies

# Define a list of K values to evaluate


k_values = [1 , 3 , 5 , 7 , 9]
# Call the function to evaluate the classifier and store
the accuracy for each K
accuracies = evaluate_knn ( k_values )

# Plot accuracy vs . K to visually analyze how


performance varies with K
plt . plot ( k_values , accuracies , marker = ’o ’)
plt . title ( ’ Accuracy ␣ vs . ␣K - Values ’)
plt . xlabel ( ’K - Value ’)
plt . ylabel ( ’ Accuracy ’)
plt . grid ()
plt . show ()

4 Results
The model’s performance was evaluated using accuracy and confusion matrix
for various values of K. Results are discussed below.

4.1 Accuracy
The accuracy for different values of K is shown in Table 1. As observed,
accuracy varies with different K-values, providing insight into the optimal
choice for K.

5
Table 1: Accuracy for Different Values of K

K Value Accuracy
1 0.96
3 0.97
5 0.965
7 0.96
9 0.955

4.2 Confusion Matrix


Figure 2 shows the confusion matrix, which gives a better understanding of
the classification performance for each digit.

Figure 1: Confusion Matrix

4.3 Graph of Accuracies


Figure 2 shows the graph for accuracies with different values of K, which
gives a better understanding of the classification performance for each digit.

6
Figure 2: Graph of accuracies against different values for K

5 Discussion
5.1 Approach
The K-NN classifier was trained on the USPS digit dataset. By experiment-
ing with different values of K, we observed that K = 3 provided optimal
performance, balancing both accuracy and computational efficiency.

5.2 Challenges
I had zero experience with machine learning especially in the coding part.
I had to look for proper resources to implement this. Challenges included
finding the optimal value for K and ensuring proper evaluation metrics for
multiclass classification. Additionally, handling high-dimensional data re-
quired careful normalization.

5.3 Lessons Learned


This assignment provided insight into the K-NN algorithm, including the
importance of parameter tuning and the use of metrics like confusion matrices
in evaluating classifier performance and actually coding the problem.

7
6 Conclusion
In this report, the K-Nearest Neighbors algorithm was applied to recognize
USPS handwritten digits. By implementing the algorithm and evaluating it
with different metrics, we learned the impact of K on model performance.
This approach can be further optimized or expanded by experimenting with
other algorithms.

You might also like