100% found this document useful (1 vote)
114 views

Machine Learning

This document outlines a machine learning workflow to predict the stages of chronic kidney disease. It describes using various machine learning models like logistic regression and KNN to classify patients into CKD stages based on features like age, weight, gender, race, and serum creatinine levels. The workflow involves data cleaning, exploratory data analysis to understand the distributions and correlations between features, training and evaluating models on split datasets, and finding the best performing model like KNN with K=3 to classify patients into the 5 stages of CKD.

Uploaded by

api-488097590
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
114 views

Machine Learning

This document outlines a machine learning workflow to predict the stages of chronic kidney disease. It describes using various machine learning models like logistic regression and KNN to classify patients into CKD stages based on features like age, weight, gender, race, and serum creatinine levels. The workflow involves data cleaning, exploratory data analysis to understand the distributions and correlations between features, training and evaluating models on split datasets, and finding the best performing model like KNN with K=3 to classify patients into the 5 stages of CKD.

Uploaded by

api-488097590
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Final Project Predicting the

Stages of Chronic Kidney


Disease:
Using Machine Learning Workflow

HCIN-620-01-SP21 – Machine Learning


Mojdeh Amini
University of San Diego
Dr. Reza Afra
May 17, 2021
Outlines and Outcomes
Predicting the stages of Chronic Kidney Disease (CKD)
Glomerular Filtration Rate (GFR): Age, Weight (kg), Gender, Race
GFR (mL/min/1.73 m2) = 175 × (Scr)-1.154 × (Age)-0.203 × (0.742 if female) × (1.212 if African American)
Normal Serum Creatinine 0.7 - 1.2 milligrams per deciliter (mg/dL))
Considering: A Serum Creatinine level of greater than 1.2 for women and greater than 1.4 for men
Stages of CKD:
1- Normal? eGFR >=90
2- Mild eGFR 60 - 89
3- Moderate eGFR 30 - 59
4- Sever eGFR 15 -29
5- Failure eGFR <15

Machine learning Workflow:


1: Environment: Importing Libraries
2: Data Cleaning: Uploading and Reading Data
3: Exploratory Data Analysis: Handling Missing Data
http://archive.ics.uci.edu/ml//datasets/Chronic_Kidney_Disease
4: Build & Evaluate the Models: Identifying Outliers
Step 1: Environment
Importing Libraries
Step 2: Data Cleaning
Uploading & Reading Data
Step 3: Exploratory Data Analysis (EDA)
Preprocessing and adding CKD Stages Column
Step 3: EDA –Cont.
The Distribution of Serum Creatinine Level

Histogram: Serum Creatinine Scatterplot: GFR & Serum Creatinine


Step 3: EDA –Cont.
Isolate Features from Target
Step 3: EDA –Cont.: The Preprocessing
Transforming: Encode Variables
Encode: Changing the categorical data into numbers before fitting and evaluating models.
Step 3: EDA –Cont.: Heatmap & Feature Correlation
Step 3: EDA –Cont.
Splitting the Data to Training and Testing Sets
Splitting the splits

The train-test split: A technique for evaluating the


performance:
• Minimize the effects of data discrepancies and
better understand the characteristics of the
model.

• If train-test split has more data in the training


set will most likely give you better accuracy

• Split size: Enough data in the training dataset


for effective mapping of inputs to outputs data

https://www.kdnuggets.com/2020/05/dataset-splitting-best-practices-python.html
Step 4: Building & Evaluating the Models
Logistic Regression
For classification of the data, and it is a predictive analysis algorithm and based on the
concept of probability
Step 4 Cont.: Logistic Regression & Confusion
Matrix
Confusion Matrix: Predicted vs Labels
Confusion matrix:
• A tabular summary of the number of
correct and incorrect predictions made
by a classifier.

• To evaluate the performance of a


classification model through the
calculation of performance metrics
like accuracy, and F-score.
Step 4 Cont.: K-nearest N: K=3
KNN algorithm assumes the similarity between the new and available data
and put the new data into the category that is most like the available categories.
Step 4 Cont.: Finding the Best K
Using the error plot or accuracy plot to find the most favorable K value
References

• Krishnamurthy, S., KS, K., Dovgan, E., Luštrek, M., Gradišek Piletič, B., Srinivasan, K., ... & Syed-Abdul, S. (2021, May).
Machine Learning Prediction Models for Chronic Kidney Disease using National Health Insurance Claim Data in Taiwan. In
Healthcare (Vol. 9, No. 5, p. 546). Multidisciplinary Digital Publishing Institute.
• Medical Advisory Committee. (2021). Stages of Chronic Kidney Disease (CKD). American Kidney Fund (AKF).
https://www.kidneyfund.org/kidney-disease/chronic-kidney-disease-ckd/stages-of-chronic-kidney-disease/
• Raynaud, M., Aubert, O., Reese, P. P., Bouatou, Y., Naesens, M., Kamar, N., ... & Loupy, A. (2021). Trajectories of
glomerular filtration rate and progression to end stage kidney disease after kidney transplantation. Kidney international,
99(1), 186-197.
• Shlipak, M. G., Tummalapalli, S. L., Boulware, L. E., Grams, M. E., Ix, J. H., Jha, V., ... & Zomer, E. (2021). The case for
early identification and intervention of chronic kidney disease: conclusions from a Kidney Disease: Improving Global
Outcomes (KDIGO) Controversies Conference. Kidney international, 99(1), 34-47.
• Thongprayoon, C., Kaewput, W., Choudhury, A., Hansrivijit, P., Mao, M. A., & Cheungpasitporn, W. (2021). Is It Time for
Machine Learning Algorithms to Predict the Risk of Kidney Failure in Patients with Chronic Kidney Disease?.
Q&A

Thank You!
[email protected]

You might also like