KNN Lab
KNN Lab
Julia Wieczorek
September 27, 2024
1 Introduction
In this document, we will perform a K-Nearest Neighbors (KNN) classification using the
Iris dataset. This will help us understand how the KNN algorithm works and how to
apply it to a real-world dataset. Follow the instructions step by step to complete the
example.
2 Setup
Ensure you have the following Python libraries installed:
• numpy
• pandas
• matplotlib
• scikit-learn
You can install these libraries using pip if they are not already installed:
1 pip install numpy pandas matplotlib scikit - learn
3 Step-by-Step Implementation
3.1 1. Import Necessary Libraries
First, we need to import the required libraries.
1 import numpy as np
2 import pandas as pd
3 import matplotlib . pyplot as plt
4 from sklearn . datasets import load_iris
5 from sklearn . model_selection import train_test_split
6 from sklearn . neighbors import K N e i g h b o r s C l a s s i f i e r
7 from sklearn . metrics import accuracy_score , classification_report
,→ , confusion_matrix
1
3.2 2. Load and Explore the Dataset
Load the Iris dataset and explore its structure.
1 # Load the Iris dataset
2 iris = load_iris ()
3 X = iris . data
4 y = iris . target
5
4 Student Tasks
After completing the example above, try the following tasks:
2
4.1 Task 1: Experiment with Different Values of k
Change the value of k in the KNN classifier and observe how it affects the model’s
performance. Plot the accuracy of the model as a function of k.
1 # Example code snippet to help you get started
2 k_values = range (1 , 21)
3 accuracies = []
4
5 for k in k_values :
6 knn = K N e i g h b o r s C l a s s i f i e r ( n_neighbors = k )
7 knn . fit ( X_train , y_train )
8 y_pred = knn . predict ( X_test )
9 accuracies . append ( accuracy_score ( y_test , y_pred ) )
10
5 Additional Questions
5.1 Helpful Questions
1. What is the role of the parameter k in the KNN algorithm?
3
2. How does the choice of k affect the bias-variance trade-off in the model?
2. What are the limitations of the KNN algorithm? How might these limitations affect
its performance on different types of data?
3. In what scenarios might KNN not be the best choice for classification, and why?