0% found this document useful (0 votes)

5 views

Vertopal.com Lab4 KNN

The document outlines a step-by-step guide for implementing a K-Nearest Neighbors (KNN) algorithm using the IRIS dataset. It covers data loading, exploratory data analysis, distance calculation, neighbor finding, voting on labels, and model evaluation. The document includes code snippets and explanations for each part of the process.

Uploaded by

ammarkusow2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Vertopal.com Lab4 KNN

Uploaded by

ammarkusow2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 9

Imports

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

Load IRIS dataset

iris = datasets.load_iris()

print(iris)

As you can see the dataset is in the form of a dictionay. What are the keys of the
dictionary?

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names',

'filename', 'data_module'])

What is the value of the key data? Assign the value to a variable X

What is the shape of X?

What is the value of the key target? Assign the value to a variable y

What is the shape of y?

What is the value of the key target_names? Assign the value to a variable
target_names

What is the value of the key feature_names? Assign the value to a variable
feature_names

#Solution
X = iris['data']
y = iris['target']
feature_names = iris['feature_names']
target_names = iris['target_names']

#note: you can also get access to the elements by dot (.) access operator,
e.g.,
# X = iris.data

print(type(X))
print(type(y))
print(X.shape)
print(y.shape)
print(feature_names)
print(target_names)

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
(150, 4)
(150,)
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width
(cm)']
['setosa' 'versicolor' 'virginica']

Figure below illustrates the features and target labels for iris
dataset.

Print the 5th datapoint in your dataset X

Print the features and target label of flower 1 to 5.

Iterate over all datapoints in X and calculate the area of Sepal and Petal for each
flower in the dataset.

Exploratory Data Analysis

Box plot of all features

plt.figure()
plt.boxplot(X)
plt.ylabel("[cm]")
plt.xlabel(feature_names)
plt.show()

[]

Scatter plot for each pair of features

Plot the scater plot for the pair of first and second features

(X[:,0], X[:,1])

Dont't forget to label your axes.

hint: use c=y inside the scatter plot to color the points based on the
target labels.

#your code here

Write a function called plot_pairwise that takes the pair of feaure and their
labels and plot the scatter plot.

def plot_pair(X1, X2, x1_label , x2_label, y):

...

Use plot_pari functions and plot the scatter plot for all pairs of features.

X[:,0], X[:,1], 'Sepal Length', 'Sepal Width'

X[:,0], X[:,2], 'Sepal Length', 'Petal Length'
X[:,0], X[:,3], 'Sepal Length', 'Petal Width'
X[:,1], X[:,2], 'Sepal Width', 'Petal Length'
X[:,1], X[:,3], 'Sepal Width', 'Petal Width'
X[:,2], X[:,3], 'Petal Length', 'Petal Width'

#your code here

(Optional) The plots shown above do not have legend. To add legend to
the plot, you can use the following code snippet.

def plot_pair_with_legned(x1, x2, x1_label , x2_label, y):

plt.figure()
for i, target_name in enumerate(iris.target_names):
plt.scatter(x1[y == i], x2[y == i], label=target_name)

plt.xlabel(x1_label)
plt.ylabel(x2_label)
plt.legend()
plt.show()

plot_pair_with_legned(X[:,0], X[:,1], feature_names[0], feature_names[1], y)

[]

Histogram of each feature

Plot the histogram of each feature.

#your code here

K Nearest Neighbors (KNN)

Euclidean Distance (2D)

In geometry, the Euclidean distance is the straight-line distance

between two points.

Given two points $ P(x_1, y_1) $ and $ Q(x_2, y_2)$ in a 2D plane, the
Euclidean distance between them is calculated as follows:

$ d(P, Q) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} $

Example (2D)

Let's say we have two points:

- $ P(2, 2) $
- $ P_2(5, 5) $

$ d(P_1, P_2) = \sqrt{(2 - 5)^2 + (2 - 5)^2}= \sqrt{18} \approx 4.2 $

We can calculate the distance between these two points.

P = np.array([2, 2])
Q = np.array([5, 5])
distance = np.sqrt(np.sum((P - Q)**2))
distance

np.float64(4.242640687119285)

Example (3 Dimensions)

Consider two points in 3D space:

- $ P_1(1, 2, 3) $
- $ P_2(4, 0, 8) $

We can calculate the Euclidean distance as follows:

$ d(P_1, P_2) = \sqrt{(4 - 1)^2 + (0 - 2)^2 + (8 - 3)^2} $

$ d(P_1, P_2) = \sqrt{3^2 + (-2)^2 + 5^2} = \sqrt{9 + 4 + 25} =
\sqrt{38} \approx 6.16 $

# Define two points in 3D space

P1 = np.array([1, 2, 3])
P2 = np.array([4, 0, 8])

# Calculate the Euclidean distance

distance = np.sqrt(np.sum((P2 - P1)**2))

print(f'The Euclidean distance between P1 and P2 is: {distance:.2f}')

The Euclidean distance between P1 and P2 is: 6.16

Write a function that get two np arrays P and Q and return the Euclidean distance
between them.

def straight_line_distance(P, Q):

...

KNN Algorithm

KNN from scratch

0 - Look at the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5,

random_state=42)

Explain each term in the cell above. X_train, X_test, y_train, y_test?

????

1 - Calculate distances

Take one sample from test set and find the distance between this sample and all
samples in the training set. In addition to the distance, you need to store the
index of the sample in the training set.

So for exaple if the distance between the test sample and the 5th sample in the
training set is 3.5, you need to store (5, 3.5).

test_instance = X_test[0]

distances = [] # append the (index, distance) tuples to this list

# your code here

Write a function called calculate_distances that takes the test sample and the
training set and return the distances and the indices of the training samples.

def calculate_distances(test_instance, X_train):

#return distances
...

What you pass as input to the function calculate_distances? What you get as output
when you call this function?

????
What is shape of input arrays to the function calculate_distances? What is the
shape of output?

???

2 - Find neighbors

Step 1: Sort the (index, distance) tuples based on distance value in

anascending order.

distances = calculate_distances(test_instance, X_train)

distances.sort(key=lambda x: x[1])
distances

[(34, np.float64(0.22360679774997896)),
(45, np.float64(0.30000000000000027)),
(28, np.float64(0.5099019513592785)),
(35, np.float64(0.5099019513592788)),
(66, np.float64(0.5196152422706639)),
(47, np.float64(0.5291502622129183)),
(17, np.float64(0.5830951894845297)),
(36, np.float64(0.6164414002968978)),
(65, np.float64(0.6244997998398398)),
(41, np.float64(0.6480740698407859)),
(48, np.float64(0.6999999999999995)),
(70, np.float64(0.7071067811865478)),
(63, np.float64(0.728010988928052)),
(23, np.float64(0.741619848709566)),
(14, np.float64(0.754983443527075)),
(68, np.float64(0.774596669241483)),
(73, np.float64(0.7874007874011811)),
(0, np.float64(0.8124038404635955)),
(50, np.float64(0.8124038404635965)),
(9, np.float64(0.8602325267042631)),
(60, np.float64(0.9273618495495711)),
(18, np.float64(0.9433981132056598)),
(67, np.float64(0.9643650760992956)),
(20, np.float64(0.9746794344808962)),
(5, np.float64(0.9746794344808963)),
(37, np.float64(1.0049875621120894)),
(42, np.float64(1.0440306508910553)),
(2, np.float64(1.0535653752852738)),
(64, np.float64(1.0954451150103324)),
(62, np.float64(1.1045361017187258)),
(8, np.float64(1.1575836902790226)),
(44, np.float64(1.224744871391589)),
(43, np.float64(1.296148139681572)),
(11, np.float64(1.2999999999999998)),
(71, np.float64(1.3490737563232036)),
(38, np.float64(1.3490737563232043)),
(31, np.float64(1.407124727947029)),
(40, np.float64(1.4247806848775015)),
(1, np.float64(1.438749456993816)),
(52, np.float64(1.5556349186104048)),
(56, np.float64(1.6186414056238647)),
(29, np.float64(1.6278820596099706)),
(58, np.float64(1.6431676725154982)),
(16, np.float64(1.7349351572897476)),
(74, np.float64(1.8138357147217057)),
(55, np.float64(1.8165902124584952)),
(24, np.float64(1.8493242008906932)),
(4, np.float64(1.8601075237738276)),
(54, np.float64(1.8973665961010275)),
(32, np.float64(1.9157244060668017)),
(15, np.float64(1.997498435543818)),
(61, np.float64(2.0346989949375804)),
(51, np.float64(2.090454496036687)),
(19, np.float64(2.4020824298928627)),
(69, np.float64(3.2939338184001206)),
(3, np.float64(3.3674916480965473)),
(13, np.float64(3.4161381705077445)),
(39, np.float64(3.551056180912941)),
(49, np.float64(3.5623026261113755)),
(53, np.float64(3.5623026261113755)),
(10, np.float64(3.5735136770411273)),
(12, np.float64(3.5791060336346563)),
(26, np.float64(3.6318039594669758)),
(6, np.float64(3.6537651812890224)),
(59, np.float64(3.6565010597564442)),
(25, np.float64(3.685105154537656)),
(57, np.float64(3.765634076752546)),
(30, np.float64(3.782856063875548)),
(7, np.float64(3.823610858861032)),
(33, np.float64(3.8314488121336034)),
(72, np.float64(3.844476557348217)),
(21, np.float64(3.845776904605882)),
(46, np.float64(3.8961519477556315)),
(27, np.float64(3.9357337308308855)),
(22, np.float64(4.177319714841085))]

Step 2: Select the first k elements of the sorted list. And, store the
index of these k elements in a list.

k = 5
distances[:k]

[(34, np.float64(0.22360679774997896)),
(45, np.float64(0.30000000000000027)),
(28, np.float64(0.5099019513592785)),
(35, np.float64(0.5099019513592788)),
(66, np.float64(0.5196152422706639))]

Extract the index of the k nearest neighbors from (index, distance) tuples.

neighbor_index =[]
# your code here

Step 3: Find the labels of these top k samples from y_train array.

neighbor_label = []
#your code here

Now write a function find_neighbors to do all the steps above from 1 to 3.

def find_neighbors(test_instance, X_train, y_train, k):

"""
Inputs
test_instance: One data point form test set
X_train: train dataset
y_train: train labels
k: number of neighbours

Output
neighbor_label: list of k neighbours labels
"""
#your code here

What you pass as input to the function find_neighbors? What you get as output when
you call this function?

???

What is shape of input arrays to the function find_neighbors? What is the shape of
output?

???

Explain what operations are done inside the function find_neighbors to calculate
the label of k nearest neighbors?

???

3 - Vote on labels

You have this function to vote on labels of the k nearest neighbors.

def vote_on_labels(neighbor_label):
prediction_dict = {}
for label in neighbor_label:
if label in prediction_dict:
prediction_dict[label] += 1
else:
prediction_dict[label] = 1
prediction = max(prediction_dict, key=prediction_dict.get)
return prediction

y_pred = vote_on_labels(neighbor_label)
y_pred

np.int64(1)

What you pass as input to the function vote_on_label? What you get as output when
you call this function?

????

What is shape of input arrays to the function vote_on_label? What is the shape of
output?

???

4 - put it all together

Now iterate over all datapoints of X_test and calculate their label.

y_pred = []
#your code here

Turn code into a function KNN that takes the training set, the target labels of the
training set, the test set, and the value of k and return the predicted labels of
the test set.

def KNN(X_train, y_train, X_test, k):

...

5 - Evaluate the model

Finally, calculate the accuracy of the KNN algorithm.

y_test == y_pred

array([ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, False, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, False, True, True,
True, True, True, True, True, True, True, True, True,
False, True, True])

accuracy = sum(y_test == y_pred) / len(y_test) #takes True as 1 and False as 0

print(f"accuracy: {accuracy * 100} %")

accuracy: 94.66666666666667 %

Turn your code into a function evaluate that takes the predicted labels and the
true labels and return the accuracy of the model.

def evaluate(y_test, y_pred):

# your code here
...

KNN in Scikit-Learn

knn_model = KNeighborsClassifier(n_neighbors=4) # You can change the value of

'k' as needed.
knn_model.fit(X_train, y_train)
y_pred = knn_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

Accuracy: 93.33%

(Optional) 6 - Hyperparameter tuning

So far we have used k=3. Now, we are going to find the best value of k
for the KNN algorithm.

K = [1, 2, 3, 4, 5, 6, 7, 8]
my_accs = []
# your code here

plot the accuracy of the model for different values of k with

scikit-learn and compare the results with the results from the scratch
implementation.

K = [1, 2, 3, 4, 5, 6, 7, 8]
sklearn_accs = []
#your code here

Can you justify the difference between the results of the two
implementations?

Hands On Data Visualization Using Matplotlib
100% (1)
Hands On Data Visualization Using Matplotlib
7 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
ML#07
No ratings yet
ML#07
21 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
Implementing KNN Algorithm on the Iris Dataset
No ratings yet
Implementing KNN Algorithm on the Iris Dataset
7 pages
DSM 2
No ratings yet
DSM 2
7 pages
Experiment1111
No ratings yet
Experiment1111
25 pages
DOC-20241108-WA0003
No ratings yet
DOC-20241108-WA0003
16 pages
DSM 1
No ratings yet
DSM 1
6 pages
DS Report
No ratings yet
DS Report
11 pages
DSM 3
No ratings yet
DSM 3
6 pages
Machine Learning Pract
No ratings yet
Machine Learning Pract
7 pages
ml_labmanual (3)
No ratings yet
ml_labmanual (3)
33 pages
Lect7 Skrearing
No ratings yet
Lect7 Skrearing
23 pages
Week 6 (PCA, SVD, LDA)
No ratings yet
Week 6 (PCA, SVD, LDA)
14 pages
K-Nearest Neighbor: General Gist
No ratings yet
K-Nearest Neighbor: General Gist
14 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Argha's ML LAB_240927_121838
No ratings yet
Argha's ML LAB_240927_121838
13 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
Final ML File
No ratings yet
Final ML File
34 pages
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
No ratings yet
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
1 page
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
BCSL606 MACHINE LEARNING LAB
No ratings yet
BCSL606 MACHINE LEARNING LAB
33 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
No ratings yet
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
4 pages
Mlext
No ratings yet
Mlext
1 page
Assignment #1: K Nearest Neighbor Classifier: Name: Srikanth Mujjiga (Roll No: 2015-50-831
No ratings yet
Assignment #1: K Nearest Neighbor Classifier: Name: Srikanth Mujjiga (Roll No: 2015-50-831
8 pages
External
No ratings yet
External
11 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
Exp 07 (ML)
No ratings yet
Exp 07 (ML)
4 pages
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
No ratings yet
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
32 pages
MachineLearning-Spring24 - KNN Implementation For Classification
No ratings yet
MachineLearning-Spring24 - KNN Implementation For Classification
3 pages
LinearRegression_Iris
No ratings yet
LinearRegression_Iris
4 pages
Logistic Multiclass Classification
No ratings yet
Logistic Multiclass Classification
2 pages
AIML Record 56
No ratings yet
AIML Record 56
28 pages
KRAI LabManual
No ratings yet
KRAI LabManual
77 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
EXP 07 (ML) - Sarthak
No ratings yet
EXP 07 (ML) - Sarthak
4 pages
Machine Learning Lab Manual (1)
No ratings yet
Machine Learning Lab Manual (1)
33 pages
cs229_python_friday
No ratings yet
cs229_python_friday
40 pages
Lecture 12 K-Nearest Neighbors
No ratings yet
Lecture 12 K-Nearest Neighbors
24 pages
1 Assignment 3 - Classification
No ratings yet
1 Assignment 3 - Classification
16 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
batch1 ds
No ratings yet
batch1 ds
15 pages
ML lab manual
No ratings yet
ML lab manual
25 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
KRAI Practical
No ratings yet
KRAI Practical
14 pages
Presentation 1
No ratings yet
Presentation 1
2 pages
Fds Slips
No ratings yet
Fds Slips
6 pages
DM ML Practical
No ratings yet
DM ML Practical
13 pages
Data Science Machine Leraning222
No ratings yet
Data Science Machine Leraning222
11 pages
AIML_LAB
No ratings yet
AIML_LAB
37 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Dsi-Usa Dywidag Soil Nails
No ratings yet
Dsi-Usa Dywidag Soil Nails
16 pages
Leave and Payroll Management System: March 2017
No ratings yet
Leave and Payroll Management System: March 2017
6 pages
Boiler Operation, Maintenance & Water Treatment Technology
No ratings yet
Boiler Operation, Maintenance & Water Treatment Technology
201 pages
Alpha Om 6106 Solder Paste en 05dec19 TB
No ratings yet
Alpha Om 6106 Solder Paste en 05dec19 TB
5 pages
Aa BGC
No ratings yet
Aa BGC
26 pages
ASTM-D7028-07-2024
No ratings yet
ASTM-D7028-07-2024
6 pages
New Ethics and Public Accountability Research_ Public Opinion is Not Enough to Hold Companies Accountable
No ratings yet
New Ethics and Public Accountability Research_ Public Opinion is Not Enough to Hold Companies Accountable
9 pages
Buku Teks Matematik Tahun 6 KSSR
No ratings yet
Buku Teks Matematik Tahun 6 KSSR
201 pages
A Unique and Rare Conjunction of Saturn and Ketu
No ratings yet
A Unique and Rare Conjunction of Saturn and Ketu
2 pages
College Prep: Writing A Strong Essay: Approaches To Unusual Essay Prompts
No ratings yet
College Prep: Writing A Strong Essay: Approaches To Unusual Essay Prompts
3 pages
Panel Thermostat: 7T.81.0.000.240x 7T.81.0.000.230x
No ratings yet
Panel Thermostat: 7T.81.0.000.240x 7T.81.0.000.230x
2 pages
Panasonic Viera TH-P42V20K
No ratings yet
Panasonic Viera TH-P42V20K
2 pages
Chapter 3 (PART B) Turbofan and Turboprop Engines
No ratings yet
Chapter 3 (PART B) Turbofan and Turboprop Engines
17 pages
(Ebook) Internet of Things, Smart Spaces, and Next Generation Networks and Systems: 18th International Conference, NEW2AN 2018, and 11th Conference, ruSMART 2018, St. Petersburg, Russia, August 27–29, 2018, Proceedings by Olga Galinina, Sergey Andreev, Sergey Balandin, Yevgeni Koucheryavy ISBN 9783030011673, 9783030011680, 3030011674, 3030011682 2024 Scribd Download
100% (12)
(Ebook) Internet of Things, Smart Spaces, and Next Generation Networks and Systems: 18th International Conference, NEW2AN 2018, and 11th Conference, ruSMART 2018, St. Petersburg, Russia, August 27–29, 2018, Proceedings by Olga Galinina, Sergey Andreev, Sergey Balandin, Yevgeni Koucheryavy ISBN 9783030011673, 9783030011680, 3030011674, 3030011682 2024 Scribd Download
67 pages
Hee Position Statement
No ratings yet
Hee Position Statement
22 pages
Verbal Analogies (Done)
No ratings yet
Verbal Analogies (Done)
9 pages
OAW-AP85 Installation Guide Rev01
No ratings yet
OAW-AP85 Installation Guide Rev01
48 pages
Performance of AquaCrop and SIMDualKc Maize
No ratings yet
Performance of AquaCrop and SIMDualKc Maize
13 pages
GR - 3 - Multiplication Note Book Work-2021-22
No ratings yet
GR - 3 - Multiplication Note Book Work-2021-22
8 pages
Putnam and Beyond 1st by Razvan Gelca pdf download
No ratings yet
Putnam and Beyond 1st by Razvan Gelca pdf download
37 pages
A Critical Glance Into The Metacinematic Gestures of The Act of Killing
No ratings yet
A Critical Glance Into The Metacinematic Gestures of The Act of Killing
19 pages
The Impact of Management Control Systems (MCS) On Organizations Performance A Literature Review
No ratings yet
The Impact of Management Control Systems (MCS) On Organizations Performance A Literature Review
17 pages
Magnets _ Electromagnets PPT
No ratings yet
Magnets _ Electromagnets PPT
54 pages
Non Comparative Scaling Techniques
No ratings yet
Non Comparative Scaling Techniques
19 pages
234,052 199,129 190,901 Total Putr-1 & Putr-3: Kalis BSD Jan 19
No ratings yet
234,052 199,129 190,901 Total Putr-1 & Putr-3: Kalis BSD Jan 19
1 page
Ad 19 NPSH
No ratings yet
Ad 19 NPSH
14 pages
Unit 2 - Exercises - 10-08-2023
No ratings yet
Unit 2 - Exercises - 10-08-2023
6 pages
memo-GPP Utilization
No ratings yet
memo-GPP Utilization
1 page
수능연계_2026학년도_EBS수능특강_영어_워크북_16강_아잉카
No ratings yet
수능연계_2026학년도_EBS수능특강_영어_워크북_16강_아잉카
8 pages
Subject Verb Agreement
No ratings yet
Subject Verb Agreement
15 pages

Vertopal.com Lab4 KNN

Uploaded by

Vertopal.com Lab4 KNN

Uploaded by

Imports

Load IRIS dataset

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names',

What is the shape of X?

What is the shape of y?

Print the 5th datapoint in your dataset X

Print the features and target label of flower 1 to 5.

Exploratory Data Analysis

Box plot of all features

Scatter plot for each pair of features

Dont't forget to label your axes.

#your code here

def plot_pair(X1, X2, x1_label , x2_label, y):

X[:,0], X[:,1], 'Sepal Length', 'Sepal Width'

#your code here

def plot_pair_with_legned(x1, x2, x1_label , x2_label, y):

plot_pair_with_legned(X[:,0], X[:,1], feature_names[0], feature_names[1], y)

Histogram of each feature

Plot the histogram of each feature.

#your code here

K Nearest Neighbors (KNN)

Euclidean Distance (2D)

In geometry, the Euclidean distance is the straight-line distance

$ d(P, Q) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} $

Let's say we have two points:

$ d(P_1, P_2) = \sqrt{(2 - 5)^2 + (2 - 5)^2}= \sqrt{18} \approx 4.2 $

We can calculate the distance between these two points.

Consider two points in 3D space:

We can calculate the Euclidean distance as follows:

$ d(P_1, P_2) = \sqrt{(4 - 1)^2 + (0 - 2)^2 + (8 - 3)^2} $

# Define two points in 3D space

# Calculate the Euclidean distance

print(f'The Euclidean distance between P1 and P2 is: {distance:.2f}')

The Euclidean distance between P1 and P2 is: 6.16

def straight_line_distance(P, Q):

KNN from scratch

0 - Look at the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5,

distances = [] # append the (index, distance) tuples to this list

def calculate_distances(test_instance, X_train):

Step 1: Sort the (index, distance) tuples based on distance value in

distances = calculate_distances(test_instance, X_train)

Now write a function find_neighbors to do all the steps above from 1 to 3.

def find_neighbors(test_instance, X_train, y_train, k):

You have this function to vote on labels of the k nearest neighbors.

4 - put it all together

def KNN(X_train, y_train, X_test, k):

5 - Evaluate the model

Finally, calculate the accuracy of the KNN algorithm.

accuracy = sum(y_test == y_pred) / len(y_test) #takes True as 1 and False as 0

def evaluate(y_test, y_pred):

knn_model = KNeighborsClassifier(n_neighbors=4) # You can change the value of

accuracy = accuracy_score(y_test, y_pred)

(Optional) 6 - Hyperparameter tuning

plot the accuracy of the model for different values of k with

You might also like