Kidney Disease Early-Stage Identification and Prevention Using Supervised Machine Learning
Kidney Disease Early-Stage Identification and Prevention Using Supervised Machine Learning
net/publication/382493125
CITATIONS READS
0 36
1 author:
Shanti Verma
Lokmanya Collge
63 PUBLICATIONS 113 CITATIONS
SEE PROFILE
All content following this page was uploaded by Shanti Verma on 26 July 2024.
[email protected] [email protected]
Abstract— Chronic Kidney Disease (CKD) is a long-term stones and acute kidney injury. In the case of a kidney
medical condition where the kidneys gradually lose their failure, the affected kidney can be replaced by a kidney
function over time. The main function of the kidneys is to filter transplant from an organ donor which is a very costly as
waste products and excess fluids from the blood, which are well as a risky procedure. There are various stages of
then excreted from the body as urine. CKD is a featureless
kidney disease in the human body. The progression from a
disorder of the kidney that continues for years without any
major symptoms. When people go through a laboratory test normal kidney to a kidney failure happens in 3 stages.
for kidney disease then only its chances of detection is possible. These stages are mild, moderate, and severe [9]. If we can
There has been a rapid increase in the number of people who identify and start a testament in the mild stage of kidney
suffer from CKD in India over the past few years. For this disease, we could thereby reduce the chances of mortality
reason, early diagnosis and effective treatment is essential. In by providing the patient with appropriate treatment and
this research study authors use various machine learning preventing them from eventually reaching the severe stage.
techniques such as K-Nearest Neighbour (KNN), Decision The rapid advancement in technology is evident in our daily
Tree (DT), Naive Bayesian Classifier (NB), and Supreme lives. The Healthcare sector in India is booming nowadays
Boosting Classifier (SB) for timely detection and prediction of
due to advancements in new technologies like Artificial
CKD. The authors used 400 samples from University of
California, Irvine (UCI) dataset having 25 attributes of CKD Intelligence, IoT, Blockchain, etc. They are used in the
[18]. The results of the study suggest that Supreme boosting healthcare sector for aiding in precise and accurate
classifiers have 99% accuracy which is the greatest among decisions about diseases as well as for predicting the
other classifiers used in study. possibility of diseases. Machine learning is a subset of
Artificial Intelligence. At present, machine learning
Keywords— Chronic Kidney Disease (CKD), Supervised algorithms are being used in various domains for
learning, Machine learning, Supreme Boosting Classifier (SB), classification, clustering and prediction [10].
Classifier accuracy
I. INTRODUCTION
1868
979-83503-6684-6/24/$31.00 ©2024 IEEE
Authorized licensed use limited to: Charotar University of Science and Technology. Downloaded on July 26,2024 at 09:42:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE)
II. RELATED WORK TABLE I: RELATED WORK DONE BY AUTHORS IN THE YEAR 2020-2024
[11-17]
1 Dataset Used /Year /Reference: UCI-CKD / 2024 [11]
There is lots of work already done on chronic kidney Algorithms Used: KNN, Random Forest, Decision Tree, SVM,
disease (CKD). For this paper, the authors referenced Gradient Boosting, XG Boost, AdaBoost and Ensemble
research work carried out by other authors using Findings: Prediction and Correlation between the parameters of
publications from IEEE, Springer, ACM and Science Direct CKD dataset. Decision tree algorithm gives best accuracy than
other algorithms used.
between the years 2018 till 2024. Based on these Evaluation Criteria Used: Accuracy, Precision, Recall, F1
publications, the authors identified and collected 25 papers Score, Weightage Average
for this study. Further steps included filtering the papers 2 Dataset Used /Year /Reference: UCI-CKD / 2024 [12]
that used supervised learning algorithms for kidney disease Algorithms Used: KNN, Random Forest, Decision Tree,
detection and prediction in the early stage. Now, the authors Gradient Boost, XGBoost with feature selection
Findings: Impact of feature selection step in the performance of
have 15 papers which all used one or more supervised algorithms used and prediction value.
machine learning algorithms for classification of the stages Evaluation Criteria Used: Jaccard coefficient, Accuracy,
of kidney disease and for the prediction of its severity. Precision, Recall, F1 Score
3 Dataset Used /Year /Reference: UCI-CKD /2024 [13]
III. DATASET AND METHODS Algorithms Used: Probability reweighted Adaboost(PRAB)
Findings: Proposed an enhanced version of AdaBoost
algorithm called Probability reweighted AdaBoost (PRAB)
Dataset: The dataset used in study is a public dataset having 99% accuracy which is highest than another algorithm.
available online in UCI repository. The sample size of Evaluation Criteria Used: Accuracy, Precision, Recall, F1
dataset is 400 which is divided in to two classes. These Score
4 Dataset Used /Year /Reference: UCI-CKD /2023 [14]
classes are used as a dependent variable in classification
Algorithms Used: AdaBoost, Decision Tree, KNN, XGBoost,
algorithms. There are 25 features in dataset. 11 features CatBoost, Random Forest, Gradient Boosting, Stochastic
having quantitative and 13 are nominal in nature. The gradient boosting, Light gradient boosting machine (LGBM),
parameters used by authors are age, blood pressure, specific Extra tree, ANN, SVM, HML
gravity, albumin, sugar, red blood cells etc. The dataset Findings: Compared the results of classifier algorithm used
with CKD dataset with PCA. The proposed hybrid algorithm
used includes demographics, medical reports, lab test takes less time in prediction than other algorithms used.
reports, meditation and image reports of patients having Evaluation Criteria Used: Accuracy, Precision, Recall, F1
kidney disease. The authors used the dataset to forecast the Score, Specificity
severity of kidney disease (High, low, moderate) and 5 Dataset Used /Year /Reference: Diagnostic test reports of
patients at the Medical Complex, Buner, Khyber Pakhtunkhwa,
estimate the kidney functions in the human body. Here are Pakistan /2023 [15]
a few ways to represent the text about the general Algorithms Used: Logistic Regression, KNN, SVM, Decision
characteristics of CKD datasets: Tree, Random Forest.
Findings: Authors conclude that Random Forest model
Focus on the comprehensiveness of data: CKD datasets performs better as compared to other models used.
Evaluation Criteria Used: Accuracy, Brier Score, Sensitivity,
offer a rich tapestry of information, weaving together Specificity, Youdent.
demographics, medical history, lab tests, medications, and 6 Dataset Used /Year /Reference: Patient admitted in St. Paul
even occasional imaging data, providing a detailed picture hospital, Ethiopia having kidney disease between years 2018 to
of individuals with the disease. 2022 / 2022 [16]
Algorithms Used: Random Forest, SVM, Decision Tree
Findings: Multiclass classification is used for better accuracy.
Highlight the diverse target variables: These datasets can Feature extraction is also used.
be used to answer different questions about CKD, from Evaluation Criteria Used: Accuracy, Precision, Recall, F1
predicting its presence (yes/no) to estimating how well the Score, Specificity, Sensitivity
kidneys are functioning (through eGFR). 7 Dataset Used /Year /Reference: National Health Insurance
Sharing Service (NHISS) / 2020 [17]
Algorithms Used: Random Forest, XGBoost, RestNet,
Emphasize the variety of data: From small, focused studies Regression
to vast national surveys, CKD datasets come in all shapes Findings: Dataset used is very unbalanced. So, authors used
and sizes, catering to specific research needs and offering Under sampling method and MSE. The AUC value is 0.76
insights into broad trends. Evaluation Criteria Used: R square Score, Cost sensitive loss
function, ROC Curve
1869
Authorized licensed use limited to: Charotar University of Science and Technology. Downloaded on July 26,2024 at 09:42:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE)
along with the findings and model evaluation parameters K-Nearest Neighbour (KNN): K-Nearest Neighbour
formed the basis of this study. By reference to Table 1, the (KNN) is a supervised machine learning algorithm used for
authors concluded that mostly all authors used precision, classification and prediction. For supervised learning
recall accuracy (PRA) and F1 score as model evaluation algorithms the outcome variable classes are predefined [3].
parameters. They primarily used Decision Tree, SVM, The KNN method is predicated on locating neighbours
KNN, XGBoost and AdaBoost supervised machine based on distance. In this paper the authors used Euclidean
learning algorithms for classifying the images of chronic distance for the same. For better accuracy authors executed
kidney disease. Therefore, the authors of this paper used the the error and trial method with the value of K. K is defined
public dataset UCI-CKD and applied three supervised as the number of neighbours. To optimize the value of K,
learning algorithms which are Naive Bayes, KNN, model construction and evaluation was done in a loop. For
Decision Tree and SBC. this study, others iterated the loop from 1 to 10 to optimize
the results as shown in Fig 3 and found K=3 is best for the
IV. PROPOSED WORK CKD dataset.
1870
Authorized licensed use limited to: Charotar University of Science and Technology. Downloaded on July 26,2024 at 09:42:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE)
VII. CONCLUSION
1871
Authorized licensed use limited to: Charotar University of Science and Technology. Downloaded on July 26,2024 at 09:42:14 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE)
This proposed work deals with the early detection and learning algorithms." Journal of Pathology Informatics 14 (2023):
100189.
prediction of Chronic Kidney Disease in humans. A random
[15] Iftikhar, Hasnain, et al. "A Comparative Analysis of Machine
sampling was used for dividing data taken from the UCI Learning Models: A Case Study in Predicting Chronic Kidney
repository, into training and testing samples. Classification Disease." Sustainability 15.3 (2023): 2754.
and Prediction was done using the machine learning [16] Debal, Dibaba Adeba, and Tilahun Melak Sitote. "Chronic kidney
disease prediction using machine learning techniques." Journal of
techniques KNN, NB, DT and SBC. These were evaluated
Big Data 9.1 (2022): 1-19.
based on four parameters, particularly - accuracy, precision, [17] Wang, Weilun, Goutam Chakraborty, and Basabi Chakraborty.
recall and F1 Score. The results derived from this study "Predicting the risk of chronic kidney disease (ckd) using a machine
determined that the SB classifier, which is a supervised learning algorithm." Applied Sciences 11.1 (2020): 202.
[18] Rubini,L., Soundarapandian,P., and Eswaran,P.. (2015). Chronic
meta-heuristic algorithm used for classification and
Kidney Disease. UCI Machine Learning Repository.
prediction, was the most precise in classifying patients with https://doi.org/10.24432/C5G020.
CKD with an accuracy rate of 99%. [19] Bharati, J. and Jha, V. (2020). Global Dialysis Perspective: India.
Kidney360, doi:https://doi.org/10.34067/kid.0003982020.
[20] Cahyani, N., Cahyani, N. and Muslim, M. (2020). Increasing
REFERENCES
Accuracy of C4.5 Algorithm by Applying Discretization and
Correlation-based Feature Selection for Chronic Kidney Disease
[1] Siddeshwar Tekale, “Prediction of Chronic Kidney Disease Using Diagnosis Increasing Accuracy of C4.5 Algorithm by Applying
Machine Learning, International Journal of Advanced Research in Discretization and Correlation-based Feature Selection for Chronic
Computer and Communication Engineering, 2018. Kidney Disease Diagnosis.
[2] Baisakhi Chakraborty, “Development of Chronic Kidney Disease [21] Sinha, P. and Sinha, P. (2015). Comparative Study of Chronic
Prediction Using Machine Learning”, International Conference on Kidney Disease Prediction using KNN and SVM. Bhopal, India:
Intelligent Data Communication Technologies, 2019. International Journal of Engineering Research & Technology
[3] J. Snegha, “Chronic Kidney Disease Prediction using Data (IJERT).
Mining”, International Conference on Emerging Trends, 2020. [22] M. M. Shabtari, V. Kumar Shukla, H. Singh and I. Nanda,
[4] Ventrella, Piervincenzo, et al. "Supervised machine learning for the "Analyzing PIMA Indian Diabetes Dataset through Data Mining
assessment of chronic kidney disease advancement." Computer Tool ‘RapidMiner’," 2021 International Conference on Advance
Methods and Programs in Biomedicine 209 (2021): 106329. Computing and Innovative Technologies in Engineering
[5] Nishat, Mirza Muntasir, et al. "A comprehensive analysis on (ICACITE), Greater Noida, India, 2021, pp. 560-574, doi:
detecting chronic kidney disease by employing machine learning 10.1109/ICACITE51222.2021.940474.
algorithms." EAI Endorsed Transactions on Pervasive Health and [23] M. Sanghar, V. K. Shukla, A. Verma and P. Sharma,
Technology 7.29 (2021): e1-e1. "Implementation of Support Vector Machines Algorithm through
[6] Sawhney, Rahul, et al. "A comparative assessment of artificial R-Language for Diabetes Database Testing," 2021 11th
intelligence models used for early prediction and evaluation of International Conference on Cloud Computing, Data Science &
chronic kidney disease." Decision Analytics Journal 6 (2023): Engineering (Confluence), Noida, India, 2021, pp. 746-751, doi:
100169. 10.1109/Confluence51648.2021.9377124.
[7] Y. R. Prajapati, D. G. Hihoriya and S. Verma, "Early Detection and [24] U. Thange, V. K. Shukla, R. Punhani and W. Grobbelaar,
Prediction of Diabetes Using Ensemble Classifier," 2023 14th "Analyzing COVID-19 Dataset through Data Mining Tool
International Conference on Computing Communication and “Orange”," 2021 2nd International Conference on Computation,
Networking Technologies (ICCCNT), Delhi, India, 2023, pp. 1-6, Automation and Knowledge Management (ICCAKM), Dubai,
doi: 10.1109/ICCCNT56998.2023.10306942. United Arab Emirates, 2021, pp. 198-203, doi:
[8] Sadariya, T., Verma, S. (2023). Early Prediction and Detection of 10.1109/ICCAKM50778.2021.9357754.
Anxiety Level Using Support Vector Machine. In: Swaroop, A.,
Polkowski, Z., Correia, S.D., Virdee, B. (eds) Proceedings of Data
Analytics and Management. ICDAM 2023. Lecture Notes in
Networks and Systems, vol 787. Springer, Singapore.
https://doi.org/10.1007/978-981-99-6550-2_22.
[9] B. S. Rao, A. D, R. R, S. Verma, P. V. Nandankar and R. P, "Machine
Learning and Deep Transfer Learning approaches were used to
create a Face Mask Identification model for COVID-19," 2022
International Conference on Innovative Computing, Intelligent
Communication and Smart Electrical Systems (ICSES), Chennai,
India, 2022, pp. 1-6, doi: 10.1109/ICSES55317.2022.9914141.
[10] A. P, A. B. Dorothy, N. Kamalraj, S. Pundir, S. Verma and G. Jakka,
"Real-Time Intelligent Information Protection Using AI and
Machine Learning Model," 2023 Eighth International Conference
on Science Technology Engineering and Mathematics
(ICONSTEM), Chennai, India, 2023, pp. 1-5, doi:
10.1109/ICONSTEM56934.2023.10142296.
[11] Nidadavolu, Vani, and Navya Pagadala. "Prediction of Chronic
Kidney Disease using Machine Learning Techniques." Proceedings
of the Fourth International Conference on Advances in Computer
Engineering and Communication Systems (ICACECS 2023). Vol.
18. Springer Nature, 2024.
[12] Hema, K., and Ramaraj Pandian. "Analyze the Impact of Feature
Selection Techniques in the Early Prediction of CKD."
International Journal of Cognitive Computing in Engineering
(2024).
[13] S. M. . Imran, N. . Prakash, K. . Hazeena, and S. . Sivasubramanian,
“Robust Chronic Kidney Impact Identification System Using Prab
Algorithm”, Int J Intell Syst Appl Eng, vol. 12, no. 7s, pp. 404–411,
Dec. 2023.
[14] Islam, Md Ariful, Md Ziaul Hasan Majumder, and Md Alomgeer
Hussein. "Chronic kidney disease prediction based on machine
1872
Authorized licensed use limited to: Charotar University of Science and Technology. Downloaded on July 26,2024 at 09:42:14 UTC from IEEE Xplore. Restrictions apply.