Diabetes Prediction System With KNN Algorithm
Diabetes Prediction System With KNN Algorithm
• The dataset I've obtained from Kaggle originates from the National Institute of
Diabetes and Digestive and Kidney Diseases and consists of predictive
variables and an outcome indicating whether a person is diabetic or not. It
contains data from 768 patients and serves as the basis for our classification
task.
• Now, before diving into the K-Nearest Neighbors (KNN) algorithm, let's briefly
discuss what it entails. KNN is a type of supervised machine learning
algorithm used for classification and regression. In classification, like our
case, it predicts the class of a given data point by finding the most common
class among the k closest data points in the feature space. The choice of k,
the number of neighbors, is a crucial hyperparameter that can significantly
impact the model's performance.
KNN algorithm:
• K-Nearest Neighbors (KNN) is a supervised
machine learning algorithm that focuses on
similarity. It classifies a target variable by
predicting its class based on a specified number
of nearest neighbors. To make a prediction, KNN
calculates the distance from the instance being
classified to every instance in the training
dataset. It then assigns a class to the instance
based on the majority class of its k nearest
neighbors.
Distance between data points in KNN
algorithm:
Reading and exploring the
dataset:
• We begin by loading the dataset using pandas' `read_csv()`
function, which reads the dataset and converts it into a
structured tabular format that we can easily analyze.
Input Code:
Output:
Manipulating and Cleaning our dataset
Input Code:
Output:
Plotting the dataset
Input Code:
Output:
Thank You