Diabetes Prediction System With KNN Algorithm

The document describes building a diabetes prediction system using a KNN classification algorithm. It details exploring a diabetes dataset, cleaning and preprocessing the data, selecting features, splitting the data into training and testing sets, implementing a KNN function to make predictions for different K values, and plotting the results to analyze model performance.

Uploaded by

Mahtab Ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Diabetes Prediction System With KNN Algorithm

Uploaded by

Mahtab Ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Diabetes prediction

system with KNN

algorithm
Submitted By Submitted To:
Mahtab Alam(31) And
Mohd Arish(13)
Pallvi Mam
Project Overview:

• A classification project using the K-Nearest Neighbors

(KNN) predictive model to determine if a person has
diabetes. We'll work with a dataset from the National
Institute of Diabetes and Digestive and Kidney Diseases,
available on Kaggle. This dataset contains valuable
healthcare information. We'll start by exploring the dataset
and then build our predictive model using KNN with cross-
validation.
Description of dataset :

• The dataset I've obtained from Kaggle originates from the National Institute of
Diabetes and Digestive and Kidney Diseases and consists of predictive
variables and an outcome indicating whether a person is diabetic or not. It
contains data from 768 patients and serves as the basis for our classification
task.
• Now, before diving into the K-Nearest Neighbors (KNN) algorithm, let's briefly
discuss what it entails. KNN is a type of supervised machine learning
algorithm used for classification and regression. In classification, like our
case, it predicts the class of a given data point by finding the most common
class among the k closest data points in the feature space. The choice of k,
the number of neighbors, is a crucial hyperparameter that can significantly
impact the model's performance.
KNN algorithm:
• K-Nearest Neighbors (KNN) is a supervised
machine learning algorithm that focuses on
similarity. It classifies a target variable by
predicting its class based on a specified number
of nearest neighbors. To make a prediction, KNN
calculates the distance from the instance being
classified to every instance in the training
dataset. It then assigns a class to the instance
based on the majority class of its k nearest
neighbors.
Distance between data points in KNN
algorithm:
Reading and exploring the
dataset:
• We begin by loading the dataset using pandas' `read_csv()`
function, which reads the dataset and converts it into a
structured tabular format that we can easily analyze.
Input Code:
Output:
Manipulating and Cleaning our dataset

• In this phase of data preprocessing, we focus on cleaning the

dataset by addressing zero and missing values. These values
can significantly impact the accuracy of our predictive model.
We concentrate on specific columns, including 'Glucose',
'Blood Pressure', 'Skin Thickness', 'Insulin', 'BMI', and
'Pedigree', as they are key indicators for determining diabetes.
By replacing these problematic values with the mean of their
respective columns, we ensure that our dataset is more
reliable and suitable for training our KNN model.
Input Code:
Output:
Let's take a quick statistical view of
the data provided

Input Code:
Output:
Plotting the dataset

• The updated diabetes dataset is now prepared for

basic visualization. Plotting the data will provide a
visual understanding of its distribution and help in
selecting the most suitable columns for the K-Nearest
Neighbors (KNN) experiment. To visualize the data, I
utilized the `pairplot()` function from the Seaborn
library, which generates a series of plots showing
relationships between different variables in the
dataset.
Input Code:
Output:
Reducing The Attributes

• Given the complexity of our multidimensional dataset

with numerous data points across various variables,
we aim to simplify our analysis by selecting only a
subset of variables to test our model. This approach
will streamline our experimentation process and make
it more manageable. for the purpose of simplicity and
analyzing the most relevant data , we will select three
features of the dataset
Input Code:
Output:
Splitting the dataset into training and
testing dataset
• An essential step in machine learning modeling is dividing our
dataset into training and testing sets. This division allows us to
evaluate the performance of our model accurately.
• During testing, the machine learning algorithm predicts outcomes
for the testing set based on what it learned from the training set.
This process helps us assess how well the model generalizes to
new, unseen data. By comparing the predicted outcomes to the
actual outcomes in the testing set, we can compute the accuracy
rate of the machine learning algorithm. A higher accuracy rate
indicates that the model is better at predicting outcomes for the
presented sample data, which is a desirable outcome.
Input Code:
We need to run a quick syntax to see
if these data are split correctly
Output:
KNN function
• The K-Nearest Neighbors (KNN) algorithm assesses similarity
between sample test data and training data. This similarity is gauged
using a set of K values, which represent the nearest data points to the
sample data. In this case, we employ two distance measurements to
determine the closest distances between our test data and the training
dataset. The selected distance measurement method for this exercise
is the Euclidean distance.
Continued…

• To execute these operations efficiently, I utilized the

scikit-learn library, which provides built-in functions
for running the KNN algorithm and calculating
distances.
• I have created a function that implements the K-
Nearest Neighbors (KNN) algorithm and populates
the results in the form of a line plot. This function
runs the KNN algorithm multiple times (K times) and
displays the results.
Input Code:
For this exercise , I will test and plot the model
with K values from 1 up to 500 and see where
are we with the best overall k values

Input Code:
Output:
Thank You

Math 0862-2023 QP
86% (21)
Math 0862-2023 QP
67 pages
Enter - The - Portal - 1 - Test - 2 - Module 2
57% (7)
Enter - The - Portal - 1 - Test - 2 - Module 2
4 pages
LC490EQY-SJA3 Ver.0 20160217 201704206187
No ratings yet
LC490EQY-SJA3 Ver.0 20160217 201704206187
41 pages
Durgasoft - Python For Data Science Running Notes
100% (2)
Durgasoft - Python For Data Science Running Notes
300 pages
Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle
No ratings yet
Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle
17 pages
AIML Report.
No ratings yet
AIML Report.
12 pages
KNN (Ap21110011309)
No ratings yet
KNN (Ap21110011309)
5 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
AIML Report (1) 11
No ratings yet
AIML Report (1) 11
13 pages
MANUFINAL
No ratings yet
MANUFINAL
18 pages
Rahul Raj.ipynb - Colab
No ratings yet
Rahul Raj.ipynb - Colab
50 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
KNN
No ratings yet
KNN
6 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
ML Notes
100% (2)
ML Notes
125 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Machine Learning
100% (5)
Machine Learning
56 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
KNN Model
No ratings yet
KNN Model
5 pages
DSASSign4
No ratings yet
DSASSign4
11 pages
KNN For Classification
No ratings yet
KNN For Classification
4 pages
ML Lab - 3
No ratings yet
ML Lab - 3
5 pages
KNN
No ratings yet
KNN
53 pages
Worksheet - 2.3 20BCS7611
No ratings yet
Worksheet - 2.3 20BCS7611
6 pages
IML Assingment Report
No ratings yet
IML Assingment Report
6 pages
KNN
No ratings yet
KNN
14 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
MLLABDA2
No ratings yet
MLLABDA2
5 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Experiment No-4 Code
No ratings yet
Experiment No-4 Code
16 pages
Practical 7
No ratings yet
Practical 7
6 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
05 K-Nearest Neighbors
No ratings yet
05 K-Nearest Neighbors
15 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
KNN Class 2
No ratings yet
KNN Class 2
40 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
Lab 8
No ratings yet
Lab 8
7 pages
KNN Class 1
No ratings yet
KNN Class 1
32 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Worksheet - 2.3 20BCS7490
No ratings yet
Worksheet - 2.3 20BCS7490
6 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
ML Lab2 pgm
No ratings yet
ML Lab2 pgm
3 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
Lab 5
No ratings yet
Lab 5
2 pages
KNN
No ratings yet
KNN
7 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
unit 4.8 KNN
No ratings yet
unit 4.8 KNN
10 pages
Report
No ratings yet
Report
11 pages
Untitled 9
No ratings yet
Untitled 9
17 pages
ML LAB WEEK 7.DOCX
No ratings yet
ML LAB WEEK 7.DOCX
4 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
CSL0777 L22
No ratings yet
CSL0777 L22
35 pages
ML-KN
No ratings yet
ML-KN
12 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
CP4252 MACHINE LEARNING LABORATORY
No ratings yet
CP4252 MACHINE LEARNING LABORATORY
37 pages
MACHINE LEARNING AND DATA ANALYTICS USING PYTHON LAB
No ratings yet
MACHINE LEARNING AND DATA ANALYTICS USING PYTHON LAB
36 pages
K-Nearest Neighbor(KNN) 6
No ratings yet
K-Nearest Neighbor(KNN) 6
46 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
K-Nearest Neighbor: General Gist
No ratings yet
K-Nearest Neighbor: General Gist
14 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
PML Lab Exp 11
No ratings yet
PML Lab Exp 11
3 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Network
No ratings yet
Computer Network
6 pages
PA7-28 Problems and Puzzles: - 1 1x4 4 3 Formula: 4n - 1
No ratings yet
PA7-28 Problems and Puzzles: - 1 1x4 4 3 Formula: 4n - 1
4 pages
Time: 3 Hours Maximum Marks: 100 (Weightage: 75%) Note: Question No. 1 Is Compulsory and Carries 40 Marks. Attempt Any Three Questions From The Rest
No ratings yet
Time: 3 Hours Maximum Marks: 100 (Weightage: 75%) Note: Question No. 1 Is Compulsory and Carries 40 Marks. Attempt Any Three Questions From The Rest
3 pages
Logcat
No ratings yet
Logcat
431 pages
Pii 132 S
No ratings yet
Pii 132 S
54 pages
A Seminar Report On Gucci
No ratings yet
A Seminar Report On Gucci
11 pages
SIM7600E H 4G HAT Manual EN
No ratings yet
SIM7600E H 4G HAT Manual EN
28 pages
2019 - Book - Data Analytics and Learning
100% (1)
2019 - Book - Data Analytics and Learning
450 pages
Object Oriented Programming Language Using C++
No ratings yet
Object Oriented Programming Language Using C++
14 pages
Xion Token Whitepaper
No ratings yet
Xion Token Whitepaper
17 pages
Csula Thesis Format
100% (3)
Csula Thesis Format
6 pages
Identification of Lung Nodules Using Yolov7
No ratings yet
Identification of Lung Nodules Using Yolov7
32 pages
Examens Eng PDF
No ratings yet
Examens Eng PDF
250 pages
Unit 10 - Student
No ratings yet
Unit 10 - Student
23 pages
21-30 Killing Stalking
No ratings yet
21-30 Killing Stalking
1,903 pages
Applications of Double Angle & Half-Angle Identities: de La Salle University - Manila
No ratings yet
Applications of Double Angle & Half-Angle Identities: de La Salle University - Manila
21 pages
OceanStor Dorado 6.x & OceanStor 6.x Host Connectivity Guide For Red Hat
No ratings yet
OceanStor Dorado 6.x & OceanStor 6.x Host Connectivity Guide For Red Hat
99 pages
kamokale
No ratings yet
kamokale
3 pages
2.5 Solved 2 Way Anova With Replication
No ratings yet
2.5 Solved 2 Way Anova With Replication
5 pages
Outer Space An Outer Product Based Sparse Matrix Multiplication Accelerator
No ratings yet
Outer Space An Outer Product Based Sparse Matrix Multiplication Accelerator
13 pages
DX 200
No ratings yet
DX 200
116 pages
1 - EMP - Merchant Application Form Copy-1
No ratings yet
1 - EMP - Merchant Application Form Copy-1
11 pages
gt32-r e Leaf
No ratings yet
gt32-r e Leaf
2 pages
Fall 24 SRE Assignment 1
No ratings yet
Fall 24 SRE Assignment 1
3 pages
C191 Study Guide Via GPT Print
No ratings yet
C191 Study Guide Via GPT Print
41 pages
10 1 1 45
No ratings yet
10 1 1 45
5 pages

Diabetes Prediction System With KNN Algorithm

Uploaded by

Diabetes Prediction System With KNN Algorithm

Uploaded by

Diabetes prediction

system with KNN

• A classification project using the K-Nearest Neighbors

• In this phase of data preprocessing, we focus on cleaning the

• The updated diabetes dataset is now prepared for

• Given the complexity of our multidimensional dataset

• To execute these operations efficiently, I utilized the

You might also like