K-Nearest Neighbor Algorithm: Dataset Preparation

Knn algorithm

Uploaded by

zarifahmed180

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

K-Nearest Neighbor Algorithm: Dataset Preparation

Knn algorithm

Uploaded by

zarifahmed180

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

K-Nearest Neighbor Algorithm

Code for loading dataset into 2D python list: here

Dataset preparation:
Randomly Split the dataset into Training (70%), Validation (15%) and Test (15%) set
Train_set=[], Val_set=[], Test_set=[]
//Following code shufﬂes your dataset list
1. for each sample S in the dataset:
2. generate a random number R in the range of [0,1]
3. if R>=0 and R<=0.7
4. append S in Train_set
5. elif R>0.7 and R<=0.85
6. append S in Val_set
7. else:
8. append S in Test_set

KNN Classification:
Use Iris data iris,
K=5
1. for each sample V in the VALIDATION set:
2. for each sample T in the TRAINING set:
3. Find Euclidean distance between Vx (features->N-1) and Tx
(features->N-1)
4. Store T and the distance in list L
5. Sort L in ascending order of distance
6. Take the ﬁrst K samples
7. Take the majority class from the K samples (this is the detected class
for sample V)
8. Now, check if this class is correct or not
9. Calculate validation_accuracy = (correct VALIDATION samples)/(total
VALIDATION samples) * 100
Calculate validation accuracy in a similar way for K = 1, 3, 5, 10, 15
Make a table with 2 columns: K and Validation Accuracy (report template)
Now, take the K with highest Validation Accuracy
Use this best K to determine Test Accuracy (Simply replace the VALIDATION set with
TEST set)

KNN Regression:
Use diabetes data diabetes
K = 5, Error = 0
1.for each sample V in the VALIDATION set:
2. for each sample T in the TRAINING set:
3. Find Euclidean distance between Vx and Tx
4. Store Tx and the distance in list L
5. Sort L in ascending order
6. Take the ﬁrst K samples
7. Take the average output of the K samples (this is the determined
output for sample V)
8. Error = Error + (V true output - V determined output)^2
9.Calculate Mean_Squared_Error = Error/(total number of samples in
VALIDATION set)

Calculate Mean_Squared_Error in a similar way for K = 1, 3, 5, 10, 15

Make a table with 2 columns: K and Mean_Squared_Error (report template)
Now, take the K with minimum Mean_Squared_Error
Use this best K to determine Mean_Squared_Error for the Test set (Simply replace the
VALIDATION set with TEST set)
Instruction
● Submit the .ipynb file and a report (report template) .pdf file.
● DO NOT USE LIBRARIES SUCH AS: "Sklearn", "Scikit learning" or "pandas" for this
assignment
● Copying will result in -100% penalty

Marks Distribution
(1) Dataset loading: 1.5
(2) Train, Validation, Test split: 2.5
(3) KNN classification algorithm + K tuning (table) + test accuracy : 5 + 1.5 + 1.5
(4) KNN regression algorithm + K tuning (table) + test mean squared error : 5 + 1.5 + 1.5

Dataset description:

Diabetes
[source: Diabetes dataset, sklearn.datasets.load_diabetes — scikit-learn 1.1.1 documentation]
Number of Instances: 442
Number of Attributes: First 10 columns are numeric predictive values
Target: Column 11 is a quantitative measure of disease progression one year after baseline
Attribute Information:
● age in years
● sex
● bmi body mass index
● bp average blood pressure
● s1 tc, total serum cholesterol
● s2 ldl, low-density lipoproteins
● s3 hdl, high-density lipoproteins
● s4 tch, total cholesterol / HDL
● s5 ltg, possibly log of serum triglycerides level
● s6 glu, blood sugar level
Iris:
Source [7.1. Toy datasets — scikit-learn 1.1.1 documentation ]
Number of Instances 150 (50 in each of three classes)
Number of Attributes 4 numeric, predictive attributes and the class
Attribute Information
● sepal length in cm
● sepal width in cm
● petal length in cm
● petal width in cm
● class:
○ Iris-Setosa
○ Iris-Versicolour
○ Iris-Virginica
Resources
7.1. Toy datasets — scikit-learn 1.0.2 documentation

● Dataset (samples, features/attributes, label/classes)

○ iris, diabetes
● Model high level concept from the perspective of supervised learning
● supervised learning, Classification, Regression
● dataset -> train, val, test

● KNN high level overview

● KNN pseudocode
● Instructions

● Classification: majority
● Regression: squared error
https://www.quora.com/What-are-industry-applications-of-the-K-nearest-neighbor-algorithm
https://stackoverflow.com/questions/53704811/is-k-nearest-neighbors-algorithm-used-a-lot-in-
real-life

Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Cloud Security Mechanisms
100% (1)
Cloud Security Mechanisms
31 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
ML Notes
100% (2)
ML Notes
125 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle
No ratings yet
Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle
17 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Machine Learning Assignment 3
No ratings yet
Machine Learning Assignment 3
7 pages
Pima Indian Dibatets - Group Presentation - Ai
No ratings yet
Pima Indian Dibatets - Group Presentation - Ai
18 pages
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
No ratings yet
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
3 pages
Lab 5
No ratings yet
Lab 5
2 pages
Lesson 0: Martingales: Le Thi Xuan Mai
No ratings yet
Lesson 0: Martingales: Le Thi Xuan Mai
50 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
Assignment #1: K Nearest Neighbor Classifier: Name: Srikanth Mujjiga (Roll No: 2015-50-831
No ratings yet
Assignment #1: K Nearest Neighbor Classifier: Name: Srikanth Mujjiga (Roll No: 2015-50-831
8 pages
10 History Matching
No ratings yet
10 History Matching
21 pages
AI and ML Lab Manual
No ratings yet
AI and ML Lab Manual
29 pages
Introduction To KNN and R
No ratings yet
Introduction To KNN and R
12 pages
Channel Estimation
0% (1)
Channel Estimation
26 pages
IML Assingment Report
No ratings yet
IML Assingment Report
6 pages
ML Engineer
No ratings yet
ML Engineer
2 pages
Exp 5
No ratings yet
Exp 5
7 pages
ML Lec-10
No ratings yet
ML Lec-10
19 pages
AIML Report (1) 11
No ratings yet
AIML Report (1) 11
13 pages
AIML Report.
No ratings yet
AIML Report.
12 pages
Diabetes Prediction System With KNN Algorithm
No ratings yet
Diabetes Prediction System With KNN Algorithm
29 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
Experiment No-4 Code
No ratings yet
Experiment No-4 Code
16 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
8 pages
Finite Volume Method
No ratings yet
Finite Volume Method
4 pages
Lab 8
No ratings yet
Lab 8
7 pages
21bme145 Exp 5
No ratings yet
21bme145 Exp 5
3 pages
KNN For Classification
No ratings yet
KNN For Classification
5 pages
Formula Sheet ENMG 435
No ratings yet
Formula Sheet ENMG 435
11 pages
Knn-Experiments - Jupyter Notebook
No ratings yet
Knn-Experiments - Jupyter Notebook
10 pages
KNN (Ap21110011309)
No ratings yet
KNN (Ap21110011309)
5 pages
Perspectives On System Identification
No ratings yet
Perspectives On System Identification
17 pages
Multilevel Queue Scheduling Algorithm
No ratings yet
Multilevel Queue Scheduling Algorithm
9 pages
MANUFINAL
No ratings yet
MANUFINAL
18 pages
MLLABDA2
No ratings yet
MLLABDA2
5 pages
Paper - Review On KNN
No ratings yet
Paper - Review On KNN
21 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
Full Factorial (Minitab 1)
No ratings yet
Full Factorial (Minitab 1)
3 pages
Acquiring Mood Information From Songs in Large Music Database PDF
No ratings yet
Acquiring Mood Information From Songs in Large Music Database PDF
7 pages
cYCLE 9
No ratings yet
cYCLE 9
5 pages
ML Practical 3D
No ratings yet
ML Practical 3D
4 pages
Supervised Learningclassification Part2
No ratings yet
Supervised Learningclassification Part2
17 pages
Implementation of Pattern Matching Algorithm
No ratings yet
Implementation of Pattern Matching Algorithm
4 pages
Assign1 PDF
No ratings yet
Assign1 PDF
3 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
ML Lab - 3
No ratings yet
ML Lab - 3
5 pages
Week-2 NK
No ratings yet
Week-2 NK
12 pages
Assignment 5 - SourceCode - Ipynb - Colab
No ratings yet
Assignment 5 - SourceCode - Ipynb - Colab
4 pages
KNN Model
No ratings yet
KNN Model
5 pages
21STA024 Md. Toufik Umar Assignment On STAT 309 1
No ratings yet
21STA024 Md. Toufik Umar Assignment On STAT 309 1
12 pages
Intractable Problems Intractable Problems The Classes P and NP
No ratings yet
Intractable Problems Intractable Problems The Classes P and NP
22 pages
Final
No ratings yet
Final
13 pages
Faculty of Engineering and Technology Ramaiah University of Applied Sciences
No ratings yet
Faculty of Engineering and Technology Ramaiah University of Applied Sciences
2 pages
Oct 2022 S1 QP
No ratings yet
Oct 2022 S1 QP
28 pages
R-Trees - Presentation Slides
100% (1)
R-Trees - Presentation Slides
44 pages
PyTorch - Basic Operations
No ratings yet
PyTorch - Basic Operations
20 pages
Analysis Course HW2
No ratings yet
Analysis Course HW2
13 pages
Lecture2 Eee547 02
No ratings yet
Lecture2 Eee547 02
25 pages
Cst201 Data Structures, December 2021
No ratings yet
Cst201 Data Structures, December 2021
2 pages
Ptich Kdoanh Lms
No ratings yet
Ptich Kdoanh Lms
2 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
A Quick Assesment of "Automatic" Curve Discretization
No ratings yet
A Quick Assesment of "Automatic" Curve Discretization
4 pages
A1 U6 Review3
No ratings yet
A1 U6 Review3
3 pages
AI-based Ransomware Detection A Comprehensive Review
No ratings yet
AI-based Ransomware Detection A Comprehensive Review
30 pages
Sample Computer Practical File 12
No ratings yet
Sample Computer Practical File 12
130 pages
ISYE6501 Homework 2
No ratings yet
ISYE6501 Homework 2
11 pages
Compound Interest Worksheet
No ratings yet
Compound Interest Worksheet
4 pages
2020 - Zhang-Liang-Li-Wang-Wu - Research On Stock Prediction Model Based On Deep Learning - Journal of Physics Conference Series
No ratings yet
2020 - Zhang-Liang-Li-Wang-Wu - Research On Stock Prediction Model Based On Deep Learning - Journal of Physics Conference Series
8 pages
Rahul Raj - Ipynb - Colab
No ratings yet
Rahul Raj - Ipynb - Colab
50 pages
Question 1: Fill in The Blanks
No ratings yet
Question 1: Fill in The Blanks
4 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
K-Nearest Neighbors (KNN)
No ratings yet
K-Nearest Neighbors (KNN)
9 pages
Machine Learning 20CSE09
No ratings yet
Machine Learning 20CSE09
3 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
Complete Download Bayesian Theory and Applications 1st Edition Paul Damien PDF All Chapters
100% (17)
Complete Download Bayesian Theory and Applications 1st Edition Paul Damien PDF All Chapters
71 pages
Project File - AI (Final Term Class X) 2024
No ratings yet
Project File - AI (Final Term Class X) 2024
7 pages
Prediction of Diabetes Using R
No ratings yet
Prediction of Diabetes Using R
6 pages
Ashwinexp 7
No ratings yet
Ashwinexp 7
4 pages
MAE101 - SU2018 (Có Đáp Án)
No ratings yet
MAE101 - SU2018 (Có Đáp Án)
7 pages
Untitled Document
No ratings yet
Untitled Document
1 page
Artificial Intelligence Lab 7
No ratings yet
Artificial Intelligence Lab 7
10 pages
Homework1 DSCI 552
No ratings yet
Homework1 DSCI 552
2 pages
1557-Article Text-5908-2-10-20230413
No ratings yet
1557-Article Text-5908-2-10-20230413
5 pages
Measurement of Length - Screw Gauge (Physics) Question Bank
From Everand
Measurement of Length - Screw Gauge (Physics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

K-Nearest Neighbor Algorithm: Dataset Preparation

Uploaded by

K-Nearest Neighbor Algorithm: Dataset Preparation

Uploaded by

K-Nearest Neighbor Algorithm

Code for loading dataset into 2D python list: here

Calculate Mean_Squared_Error in a similar way for K = 1, 3, 5, 10, 15

● Dataset (samples, features/attributes, label/classes)

● KNN high level overview

You might also like