0% found this document useful (0 votes)
9 views4 pages

Diabetes Prediction Report

Uploaded by

shewta.ray.hr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Diabetes Prediction Report

Uploaded by

shewta.ray.hr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Technical Report: Diabetes Prediction Using Machine Learning

### Detailed Report: Machine Learning Model Development for


Diabetes Prediction

#### Objective
The primary goal of this study is to develop and evaluate machine
learning models capable of predicting diabetes in individuals based
on diagnostic features. The dataset used for this purpose originates
from the National Institute of Diabetes and Digestive and Kidney
Diseases and focuses on Pima Indian women aged 21 and older.
Diabetes, being a major metabolic disorder, demands early detection
and management to mitigate severe complications, making predictive
models crucial in healthcare.

#### Dataset Overview


The dataset consists of 768 records and includes eight predictive
variables and one binary outcome variable. The features represent
medical and demographic information:

| Feature | Description
|
|----------------------------|----------------------------------------------------------------------
-------|
| Pregnancies | Number of times the patient has been
pregnant |
| Glucose | Plasma glucose concentration (mg/dL) during a
2-hour oral glucose tolerance test |
| BloodPressure | Diastolic blood pressure (mm Hg)
|
| SkinThickness | Triceps skinfold thickness (mm)
|
| Insulin | 2-hour serum insulin (mu U/ml)
|
| BMI | Body mass index (weight in kg/(height in m)^2)
|
| DiabetesPedigreeFunction | A function representing diabetes
history in the family |
| Age | Patient's age in years
|
| Outcome | Class variable (0 = No diabetes, 1 = Diabetes)
|

#### Steps in the Model Development Process

**1. Exploratory Data Analysis (EDA)**

EDA was conducted to understand the dataset's structure,


distribution, and relationships between variables.

**Key Observations:**
- The dataset contains missing or zero values in critical features such
as `Glucose`, `BloodPressure`, `SkinThickness`, `BMI`, and `Insulin`.
These values were treated as missing data.
- The distribution of the target variable (`Outcome`) revealed an
imbalance with 65% non-diabetic cases (Outcome = 0) and 35%
diabetic cases (Outcome = 1).
- Correlation analysis identified strong positive relationships between
`Glucose`, `BMI`, and `Outcome`, highlighting their predictive
importance.

**2. Data Preprocessing**

To ensure the models perform optimally, the following preprocessing


steps were performed:
- Handling Missing Values: Replaced with median values.
- Outlier Treatment: Used LOF to address outliers.
- Feature Scaling: Standardized numerical variables using the
RobustScaler.

**3. Model Development**

**Models Evaluated:**
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Decision Tree Classifier (CART)
- Random Forest Classifier
- XGBoost
- LightGBM
**Results of Baseline Models:**
| Model | Accuracy | Precision | Recall | F1 Score | AUC-ROC |
|------------------|----------|-----------|--------|----------|---------|
| Logistic Regression | 0.7674 | 0.74 | 0.68 | 0.71 | 0.84 |
| Random Forest | 0.8472 | 0.83 | 0.78 | 0.80 | 0.90 |
| XGBoost | 0.8703 | 0.85 | 0.80 | 0.82 | 0.92 |

**4. Hyperparameter Optimization**

**Best Model:** XGBoost, with an accuracy of **90%** after


fine-tuning.

#### Conclusions

The study successfully developed a predictive model for diabetes


diagnosis with high accuracy. The XGBoost algorithm was identified
as the optimal choice, balancing accuracy and computational
efficiency.

You might also like