Python Project Report
Python Project Report
Sr. No CONTENT
1.
INTRODUCTION
2.
METHODOLOGY
3.
IMPLEMENTATION
4.
RESULT
5.
SOURCE CODE
6.
WORKING OF PROJECT
7.
CONCLUSION
8.
FUTURE WORK
9.
REFERENCES
INTRODUCTION –
The aim of this project was to develop machine learning models for predicting three different
diseases: heart disease, Parkinson's disease, and diabetes. Each disease prediction model
was implemented separately using Python libraries such as NumPy, Pandas, Scikit-learn,
Stream lit, Pickle, and OS. The project aimed to provide a user-friendly interface for
individuals to input relevant medical data and receive predictions regarding their likelihood
of having any of these diseases.
METHODOLGY –
1 Data Collection:
• Datasets for heart disease, Parkinson's disease, and diabetes were obtained from
reliable sources such as medical research repositories or publicly available
datasets.
• The datasets were pre-processed to handle missing values, encode categorical
variables if necessary, and normalize/standardize numerical features.
2 Model Development:
• For each disease, a separate machine learning model was developed using Scikit-
learn.
• Various algorithms such as Logistic Regression, Random Forest, Support Vector
Machine, etc., were experimented with to determine the most suitable model for
each disease prediction task.
• Model hyperparameters were fine-tuned using techniques such as Grid Search
or Random Search to improve model performance.
• Model evaluation metrics such as accuracy, precision, recall, and F1-score were
utilized to assess the performance of each model.
3 Deployment:
• The Stream lit library was employed to create an interactive web application for
disease prediction.
• The models were serialized using the Pickle library and saved to disk.
• Upon receiving input from the user through the web interface, the relevant model
was loaded, and predictions were generated based on the provided medical data.
IMPLEMENTATION –
1 Feature Engineering:
• Features relevant to each disease were identified and included in the respective
datasets.
• Feature engineering techniques such as feature scaling, encoding categorical
variables, and feature selection were applied to prepare the data for modelling.
2 Model Training:
• The pre-processed datasets were split into training and testing sets.
• Each disease prediction model was trained using the training data.
• Cross-validation techniques were employed to ensure the robustness of the
models.
3 Model Evaluation:
• The trained models were evaluated using the testing data.
• Evaluation metrics such as accuracy, precision, recall, and F1-score were
computed to measure the performance of each model.
• Confusion matrices and ROC curves were generated to visualize the model's
performance.
RESULT –
• The developed machine learning models demonstrated promising performance in
predicting heart disease, Parkinson's disease, and diabetes.
• The accuracy and reliability of the models were assessed through rigorous evaluation
techniques.
• The web application provided an intuitive platform for users to access the prediction
models and obtain personalized health insights.
SOURCE CODE –
import os
import pickle
import streamlit as st
from streamlit_option_menu import option_menu
st.set_page_config(page_title="Health Assistant",
layout="wide",
page_icon="🧑⚕")
working_dir = os.path.dirname(os.path.abspath(__file__))
diabetes_model = pickle.load(open(f'{working_dir}/train_models/diabetes_model.sav', 'rb'))
['Diabetes Prediction',
'Heart Disease Prediction',
'Parkinsons Prediction'],
menu_icon='hospital-fill',
icons=['activity', 'heart', 'person'],
default_index=0)
with col1:
Pregnancies = st.text_input('Number of Pregnancies')
with col2:
Glucose = st.text_input('Glucose Level')
with col3:
BloodPressure = st.text_input('Blood Pressure value')
with col1:
SkinThickness = st.text_input('Skin Thickness value')
with col2:
Insulin = st.text_input('Insulin Level')
with col3:
BMI = st.text_input('BMI value')
with col1:
DiabetesPedigreeFunction = st.text_input('Diabetes Pedigree Function value')
with col2:
Age = st.text_input('Age of the Person')
diab_diagnosis = ''
diab_prediction = diabetes_model.predict([user_input])
if diab_prediction[0] == 1:
diab_diagnosis = 'The person is diabetic'
else:
diab_diagnosis = 'The person is not diabetic'
st.success(diab_diagnosis)
if selected == 'Heart Disease Prediction':
st.title('Heart Disease Prediction using ML')
with col1:
age = st.text_input('Age')
with col2:
sex = st.text_input('Sex')
with col3:
cp = st.text_input('Chest Pain types')
with col1:
trestbps = st.text_input('Resting Blood Pressure')
with col2:
chol = st.text_input('Serum Cholestoral in mg/dl')
with col3:
fbs = st.text_input('Fasting Blood Sugar > 120 mg/dl')
with col1:
restecg = st.text_input('Resting Electrocardiographic results')
with col2:
thalach = st.text_input('Maximum Heart Rate achieved')
with col3:
exang = st.text_input('Exercise Induced Angina')
with col1:
oldpeak = st.text_input('ST depression induced by exercise')
with col2:
slope = st.text_input('Slope of the peak exercise ST segment')
with col3:
ca = st.text_input('Major vessels colored by flourosopy')
with col1:
thal = st.text_input('thal: 0 = normal; 1 = fixed defect; 2 = reversable defect')
heart_diagnosis = ''
user_input = [age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal]
heart_prediction = heart_disease_model.predict([user_input])
if heart_prediction[0] == 1:
heart_diagnosis = 'The person is having heart disease'
else:
heart_diagnosis = 'The person does not have any heart disease'
st.success(heart_diagnosis)
with col1:
fo = st.text_input('MDVP:Fo(Hz)')
with col2:
fhi = st.text_input('MDVP:Fhi(Hz)')
with col3:
flo = st.text_input('MDVP:Flo(Hz)')
with col4:
Jitter_percent = st.text_input('MDVP:Jitter(%)')
with col5:
Jitter_Abs = st.text_input('MDVP:Jitter(Abs)')
with col1:
RAP = st.text_input('MDVP:RAP')
with col2:
PPQ = st.text_input('MDVP:PPQ')
with col3:
DDP = st.text_input('Jitter:DDP')
with col4:
Shimmer = st.text_input('MDVP:Shimmer')
with col5:
Shimmer_dB = st.text_input('MDVP:Shimmer(dB)')
with col1:
APQ3 = st.text_input('Shimmer:APQ3')
with col2:
APQ5 = st.text_input('Shimmer:APQ5')
with col3:
APQ = st.text_input('MDVP:APQ')
with col4:
DDA = st.text_input('Shimmer:DDA')
with col5:
NHR = st.text_input('NHR')
with col1:
HNR = st.text_input('HNR')
with col2:
RPDE = st.text_input('RPDE')
with col3:
DFA = st.text_input('DFA')
with col4:
spread1 = st.text_input('spread1')
with col5:
spread2 = st.text_input('spread2')
with col1:
D2 = st.text_input('D2')
with col2:
PPE = st.text_input('PPE')
parkinsons_diagnosis = ''
if parkinsons_prediction[0] == 1:
parkinsons_diagnosis = "The person has Parkinson's disease"
else:
parkinsons_diagnosis = "The person does not have Parkinson's disease"
st.success(parkinsons_diagnosis)
1 --> Diabetic
diabetes_dataset.groupby('Outcome').mean()
# separating the data and labels
X = diabetes_dataset.drop(columns = 'Outcome', axis=1)
Y = diabetes_dataset['Outcome']
print(X)
print(Y)
Train Test Split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, stratify=Y, random_state=2)
print(X.shape, X_train.shape, X_test.shape)
Training the Model
classifier = svm.SVC(kernel='linear')
#training the support vector Machine Classifier
classifier.fit(X_train, Y_train)
Model Evaluation
Accuracy Score
# accuracy score on the training data
X_train_prediction = classifier.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy score of the training data : ', training_data_accuracy)
# accuracy score on the test data
X_test_prediction = classifier.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy score of the test data : ', test_data_accuracy)
Making a Predictive System
input_data = (5,166,72,19,175,25.8,0.587,51)
prediction = classifier.predict(input_data_reshaped)
print(prediction)
if (prediction[0] == 0):
print('The person is not diabetic')
else:
print('The person is diabetic')
Saving the trained model
import pickle
filename = 'diabetes_model.sav'
pickle.dump(classifier, open(filename, 'wb'))
# loading the saved model
loaded_model = pickle.load(open('diabetes_model.sav', 'rb'))
input_data = (5,166,72,19,175,25.8,0.587,51)
prediction = loaded_model.predict(input_data_reshaped)
print(prediction)
if (prediction[0] == 0):
print('The person is not diabetic')
else:
print('The person is diabetic')
for column in X.columns:
print(column)
prediction = model.predict(input_data_reshaped)
print(prediction)
if (prediction[0]== 0):
print('The Person does not have a Heart Disease')
else:
print('The Person has Heart Disease')
Saving the trained model
import pickle
filename = 'heart_disease_model.sav'
pickle.dump(model, open(filename, 'wb'))
# loading the saved model
loaded_model = pickle.load(open('heart_disease_model.sav', 'rb'))
for column in X.columns:
print(column)
Importing the Dependencies
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
Data Collection & Analysis
# loading the data from csv file to a Pandas DataFrame
parkinsons_data = pd.read_csv('./dataset/parkinsons.csv')
# printing the first 5 rows of the dataframe
parkinsons_data.head()
# number of rows and columns in the dataframe
parkinsons_data.shape
# getting more information about the dataset
parkinsons_data.info()
# checking for missing values in each column
parkinsons_data.isnull().sum()
# getting some statistical measures about the data
parkinsons_data.describe()
# distribution of target Variable
parkinsons_data['status'].value_counts()
1 --> Parkinson's Positive
0 --> Healthy
# Group the numeric columns by the 'status' column and calculate the mean for each group
grouped_mean = numeric_columns.groupby('status').mean()
Data Pre-Processing
Separating the features & Target
X = parkinsons_data.drop(columns=['name','status'], axis=1)
Y = parkinsons_data['status']
print(X)
print(Y)
Splitting the data to training data & Test data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)
print(X.shape, X_train.shape, X_test.shape)
Model Training
Support Vector Machine Model
model = svm.SVC(kernel='linear')
# training the SVM model with training data
model.fit(X_train, Y_train)
Model Evaluation
Accuracy Score
# accuracy score on training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(Y_train, X_train_prediction)
print('Accuracy score of training data : ', training_data_accuracy)
# accuracy score on training data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(Y_test, X_test_prediction)
print('Accuracy score of test data : ', test_data_accuracy)
Building a Predictive System
input_data =
(197.07600,206.89600,192.05500,0.00289,0.00001,0.00166,0.00168,0.00498,0.01098,0.09700,0.00563,0.00680,0.00802,0.0
1689,0.00339,26.77500,0.422229,0.741367,-7.348300,0.177551,1.743867,0.085569)
prediction = model.predict(input_data_reshaped)
print(prediction)
if (prediction[0] == 0):
print("The Person does not have Parkinsons Disease")
else:
print("The Person has Parkinsons")
FUTURE WORK –
• Incorporate additional features and data sources to further improve the accuracy of the
prediction models.
• Explore advanced machine learning algorithms and techniques to enhance model
performance.
• Conduct real-world validation studies to assess the practical utility of the prediction
models.
• Continuously update and maintain the web application to ensure compatibility with the
latest technologies and medical guidelines.
REFRENCES –
1. Dataset – https://www.kaggle.com
2. Research Paper –
• https://www.sciencedirect.com/science/ar4cle/pii/S1877050920300557
• https://link.springer.com/ar4cle/10.1007/s42979-020-00365-y
• https://ieeexplore.ieee.org/abstract/document/6921958/
• https://www.sciencedirect.com/science/ar4cle/pii/S2214785321052202
3. Libraries –
• Sklearn – https://pypi.org/project/scikit-learn/
• Stream lit – https://streamlit.io
• NumPy – https://pypi.org/project/numpy/