0% found this document useful (0 votes)
18 views3 pages

Deep Learning Approach For Diabetes Prediction Using PIMA Indian Dataset

Uploaded by

doraeshin04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

Deep Learning Approach For Diabetes Prediction Using PIMA Indian Dataset

Uploaded by

doraeshin04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Deep Learning Approach for Diabetes Prediction using PIMA Indian Dataset

The PIMA Indian Diabetes dataset consists of medical records of patients, including several independent
variables (features) and a target variable (Outcome), where the goal is to predict the presence of diabetes (binary
classification problem).
Steps for Designing the Deep Learning Model
1. Dataset Overview:
o The dataset contains 768 samples, with the following features:
 Pregnancies: Number of times the patient has been pregnant.
 Glucose: Plasma glucose concentration.
 BloodPressure: Diastolic blood pressure (mm Hg).
 SkinThickness: Triceps skinfold thickness (mm).
 Insulin: 2-Hour serum insulin (mu U/ml).
 BMI: Body mass index (weight in kg/height in m²).
 DiabetesPedigreeFunction: Diabetes pedigree function (family history of diabetes).
 Age: Age of the patient (years).
 Outcome: Target variable (0 for non-diabetic, 1 for diabetic).
2. Preprocessing:
o Handle missing values (if any).
o Normalize the features (scaling them between 0 and 1) since the features have different
ranges, which helps improve the performance of deep learning models.
o Split the dataset into training and test sets.
3. Deep Learning Model:
o Use a deep neural network for classification.
o Three hidden layers with ReLU activation function.
o Dropout layers to avoid overfitting.
o Sigmoid activation in the output layer for binary classification.
4. Evaluation:
o Use appropriate evaluation metrics like accuracy, precision, recall, and F1-score.
o Cross-validation or a validation set should be used to evaluate the generalization of the model.
Deep Learning Model Implementation
Below is an implementation using TensorFlow and Keras for diabetes prediction:
python
Copy code
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.metrics import accuracy_score, classification_report

# Load the PIMA Indian Diabetes dataset


url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv'
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI',
'DiabetesPedigreeFunction', 'Age', 'Outcome']
data = pd.read_csv(url, names=columns)

# Split the dataset into features (X) and target (y)


X = data.iloc[:, :-1].values # All features except Outcome
y = data.iloc[:, -1].values # Target variable Outcome

# Train-test split (80% training, 20% testing)


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize the features (normalization between 0 and 1)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Build the deep neural network model


def create_model():
model = Sequential()

# Input layer + First hidden layer with 256 neurons, ReLU activation
model.add(Dense(256, input_dim=8, activation='relu'))
model.add(Dropout(0.2))

# Second hidden layer with 256 neurons, ReLU activation


model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))

# Third hidden layer with 256 neurons, ReLU activation


model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))

# Output layer (binary classification) with Sigmoid activation


model.add(Dense(1, activation='sigmoid'))

# Compile the model with Adam optimizer and binary crossentropy loss
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

return model

# Create the model


model = create_model()

# Train the model


history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)

# Evaluate the model on test data


y_pred = model.predict(X_test)
y_pred = np.round(y_pred) # Convert probabilities to binary predictions (0 or 1)

# Calculate accuracy and classification report


accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {accuracy * 100:.2f}%")
print(classification_report(y_test, y_pred))

# Model summary
model.summary()
Explanation:
 Preprocessing:
o Data is split into training and testing sets.
o Standardization is applied to normalize the data since features like age, insulin, and glucose
are on different scales.
 Model Architecture:
o Input Layer: Accepts 8 features (Pregnancies, Glucose, Blood Pressure, etc.).
o Hidden Layers: Three hidden layers with 256 neurons each and ReLU activation.
o Dropout: Applied after each hidden layer to reduce overfitting by randomly disabling 20% of
neurons during training.
o Output Layer: Uses a single neuron with Sigmoid activation, which outputs a probability
score for binary classification (0 for non-diabetic, 1 for diabetic).
 Optimizer and Loss Function:
o The Adam optimizer is used, which adjusts the learning rate dynamically and efficiently.
o Binary Crossentropy is used as the loss function, which is appropriate for binary classification
tasks.
 Training:
o The model is trained for 50 epochs with a batch size of 32, using 20% of the training data as
validation data.
Mechanisms to Improve the Model:
1. Early Stopping: To prevent overfitting, the training process can be stopped early if the validation
accuracy starts to degrade.
2. Cross-Validation: Use k-fold cross-validation to ensure the model generalizes well.
3. Hyperparameter Tuning: Experiment with different batch sizes, learning rates, number of neurons, or
even layer architectures to find the best-performing configuration.
Evaluation Metrics:
 Accuracy: Measures the proportion of correct predictions.
 Precision, Recall, and F1-Score: Useful for understanding the performance in terms of false positives
and false negatives, especially for an imbalanced dataset like PIMA.
This deep learning approach provides a strong baseline for diabetes prediction using the PIMA Indian dataset,
with room for further optimization and evaluation techniques.

You might also like