0% found this document useful (0 votes)
2 views

Python Project Report

The project report outlines the development of machine learning models for predicting heart disease, Parkinson's disease, and diabetes using Python. It details the methodology, including data collection, model development, and deployment of a user-friendly web application for disease prediction. The models demonstrated promising accuracy and reliability, providing users with personalized health insights based on their medical data.

Uploaded by

-Magic- Music-
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Python Project Report

The project report outlines the development of machine learning models for predicting heart disease, Parkinson's disease, and diabetes using Python. It details the methodology, including data collection, model development, and deployment of a user-friendly web application for disease prediction. The models demonstrated promising accuracy and reliability, providing users with personalized health insights based on their medical data.

Uploaded by

-Magic- Music-
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

PROJECT REPORT FILE

4th Semester (Academic Year 2023-24)

TOPIC – Disease Prediction Using Machine Learning

SUBJECT NAME: PYTHON LAB


SUBJECT CODE: CSW208B

Submitted By: - Submitted To: -


Student Name: SWAYAM Faculty Name: Dr. SHALU
Roll No.: 2K22CSUN01037 Designation: Assistant Professor
Branch: B.Tech. CSE Department: CST
Section: CSE 4A
TABLE OF CONTENT

Sr. No CONTENT
1.
INTRODUCTION

2.
METHODOLOGY

3.
IMPLEMENTATION

4.
RESULT

5.
SOURCE CODE

6.
WORKING OF PROJECT

7.
CONCLUSION

8.
FUTURE WORK

9.
REFERENCES
INTRODUCTION –
The aim of this project was to develop machine learning models for predicting three different
diseases: heart disease, Parkinson's disease, and diabetes. Each disease prediction model
was implemented separately using Python libraries such as NumPy, Pandas, Scikit-learn,
Stream lit, Pickle, and OS. The project aimed to provide a user-friendly interface for
individuals to input relevant medical data and receive predictions regarding their likelihood
of having any of these diseases.

METHODOLGY –
1 Data Collection:
• Datasets for heart disease, Parkinson's disease, and diabetes were obtained from
reliable sources such as medical research repositories or publicly available
datasets.
• The datasets were pre-processed to handle missing values, encode categorical
variables if necessary, and normalize/standardize numerical features.

2 Model Development:
• For each disease, a separate machine learning model was developed using Scikit-
learn.
• Various algorithms such as Logistic Regression, Random Forest, Support Vector
Machine, etc., were experimented with to determine the most suitable model for
each disease prediction task.
• Model hyperparameters were fine-tuned using techniques such as Grid Search
or Random Search to improve model performance.
• Model evaluation metrics such as accuracy, precision, recall, and F1-score were
utilized to assess the performance of each model.

3 Deployment:
• The Stream lit library was employed to create an interactive web application for
disease prediction.
• The models were serialized using the Pickle library and saved to disk.
• Upon receiving input from the user through the web interface, the relevant model
was loaded, and predictions were generated based on the provided medical data.
IMPLEMENTATION –
1 Feature Engineering:
• Features relevant to each disease were identified and included in the respective
datasets.
• Feature engineering techniques such as feature scaling, encoding categorical
variables, and feature selection were applied to prepare the data for modelling.

2 Model Training:
• The pre-processed datasets were split into training and testing sets.
• Each disease prediction model was trained using the training data.
• Cross-validation techniques were employed to ensure the robustness of the
models.

3 Model Evaluation:
• The trained models were evaluated using the testing data.
• Evaluation metrics such as accuracy, precision, recall, and F1-score were
computed to measure the performance of each model.
• Confusion matrices and ROC curves were generated to visualize the model's
performance.

4 Web Application Development:


• A user-friendly web interface was created using Stream lit.
• The interface allowed users to input their medical data, select the disease they
wanted to predict, and receive instant predictions.
• Proper error handling and validation were implemented to ensure the robustness
of the application.

RESULT –
• The developed machine learning models demonstrated promising performance in
predicting heart disease, Parkinson's disease, and diabetes.
• The accuracy and reliability of the models were assessed through rigorous evaluation
techniques.
• The web application provided an intuitive platform for users to access the prediction
models and obtain personalized health insights.
SOURCE CODE –
import os
import pickle
import streamlit as st
from streamlit_option_menu import option_menu
st.set_page_config(page_title="Health Assistant",
layout="wide",
page_icon="🧑⚕")

working_dir = os.path.dirname(os.path.abspath(__file__))
diabetes_model = pickle.load(open(f'{working_dir}/train_models/diabetes_model.sav', 'rb'))

heart_disease_model = pickle.load(open(f'{working_dir}/train_models/heart_disease_model.sav', 'rb'))

parkinsons_model = pickle.load(open(f'{working_dir}/train_models/parkinsons_model.sav', 'rb'))


with st.sidebar:
selected = option_menu('Multiple Disease Prediction System',

['Diabetes Prediction',
'Heart Disease Prediction',
'Parkinsons Prediction'],
menu_icon='hospital-fill',
icons=['activity', 'heart', 'person'],
default_index=0)

if selected == 'Diabetes Prediction':

st.title('Diabetes Prediction using ML')

col1, col2, col3 = st.columns(3)

with col1:
Pregnancies = st.text_input('Number of Pregnancies')

with col2:
Glucose = st.text_input('Glucose Level')

with col3:
BloodPressure = st.text_input('Blood Pressure value')

with col1:
SkinThickness = st.text_input('Skin Thickness value')

with col2:
Insulin = st.text_input('Insulin Level')

with col3:
BMI = st.text_input('BMI value')

with col1:
DiabetesPedigreeFunction = st.text_input('Diabetes Pedigree Function value')

with col2:
Age = st.text_input('Age of the Person')

diab_diagnosis = ''

if st.button('Diabetes Test Result'):

user_input = [Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin,


BMI, DiabetesPedigreeFunction, Age]

user_input = [float(x) for x in user_input]

diab_prediction = diabetes_model.predict([user_input])

if diab_prediction[0] == 1:
diab_diagnosis = 'The person is diabetic'
else:
diab_diagnosis = 'The person is not diabetic'

st.success(diab_diagnosis)
if selected == 'Heart Disease Prediction':
st.title('Heart Disease Prediction using ML')

col1, col2, col3 = st.columns(3)

with col1:
age = st.text_input('Age')

with col2:
sex = st.text_input('Sex')

with col3:
cp = st.text_input('Chest Pain types')

with col1:
trestbps = st.text_input('Resting Blood Pressure')

with col2:
chol = st.text_input('Serum Cholestoral in mg/dl')

with col3:
fbs = st.text_input('Fasting Blood Sugar > 120 mg/dl')

with col1:
restecg = st.text_input('Resting Electrocardiographic results')

with col2:
thalach = st.text_input('Maximum Heart Rate achieved')

with col3:
exang = st.text_input('Exercise Induced Angina')

with col1:
oldpeak = st.text_input('ST depression induced by exercise')

with col2:
slope = st.text_input('Slope of the peak exercise ST segment')

with col3:
ca = st.text_input('Major vessels colored by flourosopy')

with col1:
thal = st.text_input('thal: 0 = normal; 1 = fixed defect; 2 = reversable defect')

heart_diagnosis = ''

if st.button('Heart Disease Test Result'):

user_input = [age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal]

user_input = [float(x) for x in user_input]

heart_prediction = heart_disease_model.predict([user_input])

if heart_prediction[0] == 1:
heart_diagnosis = 'The person is having heart disease'
else:
heart_diagnosis = 'The person does not have any heart disease'

st.success(heart_diagnosis)

if selected == "Parkinsons Prediction":

st.title("Parkinson's Disease Prediction using ML")

col1, col2, col3, col4, col5 = st.columns(5)

with col1:
fo = st.text_input('MDVP:Fo(Hz)')

with col2:
fhi = st.text_input('MDVP:Fhi(Hz)')

with col3:
flo = st.text_input('MDVP:Flo(Hz)')

with col4:
Jitter_percent = st.text_input('MDVP:Jitter(%)')

with col5:
Jitter_Abs = st.text_input('MDVP:Jitter(Abs)')

with col1:
RAP = st.text_input('MDVP:RAP')

with col2:
PPQ = st.text_input('MDVP:PPQ')
with col3:
DDP = st.text_input('Jitter:DDP')

with col4:
Shimmer = st.text_input('MDVP:Shimmer')

with col5:
Shimmer_dB = st.text_input('MDVP:Shimmer(dB)')

with col1:
APQ3 = st.text_input('Shimmer:APQ3')

with col2:
APQ5 = st.text_input('Shimmer:APQ5')

with col3:
APQ = st.text_input('MDVP:APQ')

with col4:
DDA = st.text_input('Shimmer:DDA')

with col5:
NHR = st.text_input('NHR')

with col1:
HNR = st.text_input('HNR')

with col2:
RPDE = st.text_input('RPDE')

with col3:
DFA = st.text_input('DFA')

with col4:
spread1 = st.text_input('spread1')

with col5:
spread2 = st.text_input('spread2')

with col1:
D2 = st.text_input('D2')

with col2:
PPE = st.text_input('PPE')

parkinsons_diagnosis = ''

if st.button("Parkinson's Test Result"):

user_input = [fo, fhi, flo, Jitter_percent, Jitter_Abs,


RAP, PPQ, DDP,Shimmer, Shimmer_dB, APQ3, APQ5,
APQ, DDA, NHR, HNR, RPDE, DFA, spread1, spread2, D2, PPE]

user_input = [float(x) for x in user_input]


parkinsons_prediction = parkinsons_model.predict([user_input])

if parkinsons_prediction[0] == 1:
parkinsons_diagnosis = "The person has Parkinson's disease"
else:
parkinsons_diagnosis = "The person does not have Parkinson's disease"

st.success(parkinsons_diagnosis)

# Made by Swayam and Harsh

Importing the Dependencies


import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
Data Collection and Analysis

PIMA Diabetes Dataset


# loading the diabetes dataset to a pandas DataFrame
diabetes_dataset = pd.read_csv('./dataset/diabetes.csv')
# printing the first 5 rows of the dataset
diabetes_dataset.head()
# number of rows and Columns in this dataset
diabetes_dataset.shape
# getting the statistical measures of the data
diabetes_dataset.describe()
diabetes_dataset['Outcome'].value_counts()
0 --> Non-Diabetic

1 --> Diabetic
diabetes_dataset.groupby('Outcome').mean()
# separating the data and labels
X = diabetes_dataset.drop(columns = 'Outcome', axis=1)
Y = diabetes_dataset['Outcome']
print(X)
print(Y)
Train Test Split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, stratify=Y, random_state=2)
print(X.shape, X_train.shape, X_test.shape)
Training the Model
classifier = svm.SVC(kernel='linear')
#training the support vector Machine Classifier
classifier.fit(X_train, Y_train)
Model Evaluation
Accuracy Score
# accuracy score on the training data
X_train_prediction = classifier.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy score of the training data : ', training_data_accuracy)
# accuracy score on the test data
X_test_prediction = classifier.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy score of the test data : ', test_data_accuracy)
Making a Predictive System
input_data = (5,166,72,19,175,25.8,0.587,51)

# changing the input_data to numpy array


input_data_as_numpy_array = np.asarray(input_data)

# reshape the array as we are predicting for one instance


input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = classifier.predict(input_data_reshaped)
print(prediction)

if (prediction[0] == 0):
print('The person is not diabetic')
else:
print('The person is diabetic')
Saving the trained model
import pickle
filename = 'diabetes_model.sav'
pickle.dump(classifier, open(filename, 'wb'))
# loading the saved model
loaded_model = pickle.load(open('diabetes_model.sav', 'rb'))
input_data = (5,166,72,19,175,25.8,0.587,51)

# changing the input_data to numpy array


input_data_as_numpy_array = np.asarray(input_data)

# reshape the array as we are predicting for one instance


input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = loaded_model.predict(input_data_reshaped)
print(prediction)

if (prediction[0] == 0):
print('The person is not diabetic')
else:
print('The person is diabetic')
for column in X.columns:
print(column)

Importing the Dependencies


import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
Data Collection and Processing
# loading the csv data to a Pandas DataFrame
heart_data = pd.read_csv('./dataset/heart.csv')
# print first 5 rows of the dataset
heart_data.head()
# print last 5 rows of the dataset
heart_data.tail()
# number of rows and columns in the dataset
heart_data.shape
# getting some info about the data
heart_data.info()
# checking for missing values
heart_data.isnull().sum()
# statistical measures about the data
heart_data.describe()
# checking the distribution of Target Variable
heart_data['target'].value_counts()
1 --> Defective Heart

0 --> Healthy Heart


Splitting the Features and Target
X = heart_data.drop(columns='target', axis=1)
Y = heart_data['target']
print(X)
print(Y)
Splitting the Data into Training data & Test Data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=2)
print(X.shape, X_train.shape, X_test.shape)
Model Training
Logistic Regression
model = LogisticRegression()
# training the LogisticRegression model with Training data
model.fit(X_train, Y_train)
Model Evaluation
Accuracy Score
# accuracy on training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy on Training data : ', training_data_accuracy)
# accuracy on test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy on Test data : ', test_data_accuracy)
Building a Predictive System
input_data = (62,0,0,140,268,0,0,160,0,3.6,0,2,2)

# change the input data to a numpy array


input_data_as_numpy_array= np.asarray(input_data)

# reshape the numpy array as we are predicting for only on instance


input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = model.predict(input_data_reshaped)
print(prediction)

if (prediction[0]== 0):
print('The Person does not have a Heart Disease')
else:
print('The Person has Heart Disease')
Saving the trained model
import pickle
filename = 'heart_disease_model.sav'
pickle.dump(model, open(filename, 'wb'))
# loading the saved model
loaded_model = pickle.load(open('heart_disease_model.sav', 'rb'))
for column in X.columns:
print(column)
Importing the Dependencies
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
Data Collection & Analysis
# loading the data from csv file to a Pandas DataFrame
parkinsons_data = pd.read_csv('./dataset/parkinsons.csv')
# printing the first 5 rows of the dataframe
parkinsons_data.head()
# number of rows and columns in the dataframe
parkinsons_data.shape
# getting more information about the dataset
parkinsons_data.info()
# checking for missing values in each column
parkinsons_data.isnull().sum()
# getting some statistical measures about the data
parkinsons_data.describe()
# distribution of target Variable
parkinsons_data['status'].value_counts()
1 --> Parkinson's Positive

0 --> Healthy

# Check data types


print(parkinsons_data.dtypes)

# Inspect first few rows


print(parkinsons_data.head())

# Handle missing values


# For example, you can drop rows with missing values
parkinsons_data.dropna(inplace=True)

# Select only numeric columns


numeric_columns = parkinsons_data.select_dtypes(include=['float64', 'int64'])

# Group the numeric columns by the 'status' column and calculate the mean for each group
grouped_mean = numeric_columns.groupby('status').mean()

# Display the mean values for each group


print(grouped_mean)

Data Pre-Processing
Separating the features & Target
X = parkinsons_data.drop(columns=['name','status'], axis=1)
Y = parkinsons_data['status']
print(X)
print(Y)
Splitting the data to training data & Test data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)
print(X.shape, X_train.shape, X_test.shape)
Model Training
Support Vector Machine Model
model = svm.SVC(kernel='linear')
# training the SVM model with training data
model.fit(X_train, Y_train)
Model Evaluation
Accuracy Score
# accuracy score on training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(Y_train, X_train_prediction)
print('Accuracy score of training data : ', training_data_accuracy)
# accuracy score on training data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(Y_test, X_test_prediction)
print('Accuracy score of test data : ', test_data_accuracy)
Building a Predictive System
input_data =
(197.07600,206.89600,192.05500,0.00289,0.00001,0.00166,0.00168,0.00498,0.01098,0.09700,0.00563,0.00680,0.00802,0.0
1689,0.00339,26.77500,0.422229,0.741367,-7.348300,0.177551,1.743867,0.085569)

# changing input data to a numpy array


input_data_as_numpy_array = np.asarray(input_data)

# reshape the numpy array


input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = model.predict(input_data_reshaped)
print(prediction)

if (prediction[0] == 0):
print("The Person does not have Parkinsons Disease")

else:
print("The Person has Parkinsons")

Saving the trained model


import pickle
filename = 'parkinsons_model.sav'
pickle.dump(model, open(filename, 'wb'))
# loading the saved model
loaded_model = pickle.load(open('parkinsons_model.sav', 'rb'))
for column in X.columns:
print(column)
WORKING OF PROJECT–
CONCLUSION –
In conclusion, the project successfully developed machine learning models for predicting
heart disease, Parkinson's disease, and diabetes. The use of Python libraries such as NumPy,
Pandas, Scikit-learn, Stream lit, Pickle, and OS facilitated the entire development process.
The interactive web application provides a valuable tool for individuals to assess their risk
of developing these diseases based on their medical data. Further improvements and
enhancements can be made to the models and the application to enhance usability and
accuracy.

FUTURE WORK –
• Incorporate additional features and data sources to further improve the accuracy of the
prediction models.
• Explore advanced machine learning algorithms and techniques to enhance model
performance.
• Conduct real-world validation studies to assess the practical utility of the prediction
models.
• Continuously update and maintain the web application to ensure compatibility with the
latest technologies and medical guidelines.

REFRENCES –
1. Dataset – https://www.kaggle.com
2. Research Paper –

• https://www.sciencedirect.com/science/ar4cle/pii/S1877050920300557
• https://link.springer.com/ar4cle/10.1007/s42979-020-00365-y
• https://ieeexplore.ieee.org/abstract/document/6921958/
• https://www.sciencedirect.com/science/ar4cle/pii/S2214785321052202

3. Libraries –

• Sklearn – https://pypi.org/project/scikit-learn/
• Stream lit – https://streamlit.io
• NumPy – https://pypi.org/project/numpy/

You might also like