0% found this document useful (0 votes)
14 views4 pages

KNN Lab

Uploaded by

egorboy2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

KNN Lab

Uploaded by

egorboy2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

K-Nearest Neighbors Classification

Julia Wieczorek
September 27, 2024

1 Introduction
In this document, we will perform a K-Nearest Neighbors (KNN) classification using the
Iris dataset. This will help us understand how the KNN algorithm works and how to
apply it to a real-world dataset. Follow the instructions step by step to complete the
example.

2 Setup
Ensure you have the following Python libraries installed:

• numpy

• pandas

• matplotlib

• scikit-learn

You can install these libraries using pip if they are not already installed:
1 pip install numpy pandas matplotlib scikit - learn

3 Step-by-Step Implementation
3.1 1. Import Necessary Libraries
First, we need to import the required libraries.
1 import numpy as np
2 import pandas as pd
3 import matplotlib . pyplot as plt
4 from sklearn . datasets import load_iris
5 from sklearn . model_selection import train_test_split
6 from sklearn . neighbors import K N e i g h b o r s C l a s s i f i e r
7 from sklearn . metrics import accuracy_score , classification_report
,→ , confusion_matrix

1
3.2 2. Load and Explore the Dataset
Load the Iris dataset and explore its structure.
1 # Load the Iris dataset
2 iris = load_iris ()
3 X = iris . data
4 y = iris . target
5

6 # Convert to DataFrame for easier exploration


7 df = pd . DataFrame (X , columns = iris . feature_names )
8 df [ ’ species ’] = y
9

10 # Display the first few rows


11 print ( df . head () )

3.3 3. Split the Data


Split the dataset into training and testing sets.
1 # Split the dataset into training and testing sets
2 X_train , X_test , y_train , y_test = train_test_split (X , y ,
,→ test_size =0.3 , random_state =42)

3.4 4. Train the KNN Model


Create and train the KNN model.
1 # Initialize the KNN classifier with k =3
2 knn = KNe i g h b o r s C l a s s i f i e r ( n_neighbors =3)
3

4 # Fit the model to the training data


5 knn . fit ( X_train , y_train )

3.5 5. Make Predictions and Evaluate the Model


Use the model to make predictions and evaluate its performance.
1 # Make predictions on the test data
2 y_pred = knn . predict ( X_test )
3

4 # Evaluate the model


5 print ( " Accuracy : " , accuracy_score ( y_test , y_pred ) )
6 print ( " \ nClassification Report :\ n " , c l a s s i f i c a t i o n _ r e p o r t ( y_test ,
,→ y_pred ) )
7 print ( " \ nConfusion Matrix :\ n " , confusion_matrix ( y_test , y_pred ) )

4 Student Tasks
After completing the example above, try the following tasks:

2
4.1 Task 1: Experiment with Different Values of k
Change the value of k in the KNN classifier and observe how it affects the model’s
performance. Plot the accuracy of the model as a function of k.
1 # Example code snippet to help you get started
2 k_values = range (1 , 21)
3 accuracies = []
4

5 for k in k_values :
6 knn = K N e i g h b o r s C l a s s i f i e r ( n_neighbors = k )
7 knn . fit ( X_train , y_train )
8 y_pred = knn . predict ( X_test )
9 accuracies . append ( accuracy_score ( y_test , y_pred ) )
10

11 plt . plot ( k_values , accuracies )


12 plt . xlabel ( ’ Number of Neighbors ( k ) ’)
13 plt . ylabel ( ’ Accuracy ’)
14 plt . title ( ’ Accuracy vs . k ’)
15 plt . show ()

4.2 Task 2: Compare with Other Classification Algorithms


Compare the performance of KNN with other classification algorithms such as Logistic
Regression and Support Vector Machine (SVM). Use the same training and test sets and
report the accuracy, classification report, and confusion matrix for each algorithm.
1 from sklearn . linear_model import Log is ti cR eg re ss io n
2 from sklearn . svm import SVC
3

4 # Initialize and train Logistic Regression


5 lr = Logis ti cR eg re ss io n ( max_iter =200)
6 lr . fit ( X_train , y_train )
7 lr_pred = lr . predict ( X_test )
8

9 # Initialize and train Support Vector Machine


10 svc = SVC ()
11 svc . fit ( X_train , y_train )
12 svc_pred = svc . predict ( X_test )
13

14 # Evaluate each model


15 print ( " Logistic Regression Accuracy : " , accuracy_score ( y_test ,
,→ lr_pred ) )
16 print ( " SVM Accuracy : " , accuracy_score ( y_test , svc_pred ) )

5 Additional Questions
5.1 Helpful Questions
1. What is the role of the parameter k in the KNN algorithm?

3
2. How does the choice of k affect the bias-variance trade-off in the model?

5.2 Questions for Reflection


1. How would you handle an imbalanced dataset when using KNN?

2. What are the limitations of the KNN algorithm? How might these limitations affect
its performance on different types of data?

3. In what scenarios might KNN not be the best choice for classification, and why?

You might also like