2.PGK Nawa - Ismail Rasianto

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Jurnal Sangkil & Mangkus

Vol. 1, No. 16, January 2024, pp. 1~1x


ISSN: XXX-XXXX, DOI: -----  1
1

Diagnosing Chronic Kidney Disease Using the C4.5 Algorithm

Nawa Ismail 1,* , Rasianto 2


1,2
State Vocational School 1 Tegineneng
1,3
Computer and Network Engineering Laboratory
1,4
Pesawaran Regency,Lampung province,Indonesia

Article Info ABSTRACT


Chronic Kidney Disease (CKD) is a condition where there is a decrease in
Article history: kidney function which causes the kidneys to be unable to remove toxins and
waste products from the blood, which is marked by the presence of protein in
the urine and a decrease in glomerular filtration rate. According to WHO
chronic kidney disease contributes to the world burden of disease with a death
rate of 850,000 people per year. With this high mortality, proper management
of chronic kidney disease is needed in diagnosis. One method that can be used
Keywords: in diagnosing chronic kidney disease is the C4.5 algorithm. The C4.5
algorithm is capable of classifying chronic kidney disease data with high
Chronic Kidney Disease
accuracy. In this study, the chronic kidney disease dataset was obtained from
Prediction
https://www.kaggle.com/datasets/mansoordaku/ckdisease. In this study using
Algorithm C4.5
the C4.5 algorithm in the process of classifying chronic kidney disease data.
This C4.5 algorithm is processed on rapidminer version 10.01 tools. through
the stages of Pre-processing, Set roles, modeling the C4.5 algorithm on
training data, applying the model to data testing, and testing to calculate the
accuracy of the model against data testing. Testing using the confusion matrix
resulted in an accuracy rate of 97.5%.

This is an open access article under the CC BY-SA license.

Corresponding Author:
Tegineneng,
Computer and Network Engineering Laboratory,
State Vocational School 1 Tegineneng,
Batanghari Ogan Village, Tegineneng District, Pesawaran Regency, Lampung Province, Indonesia
Email: 1 [email protected], 2 rasianto.2221210052 @mail.darmajaya.ac.id

1. INTRODUCTION

Health is the most important thing in life, so people do many things to achieve a healthy body
condition every day. By having a healthy body, humans can carry out daily activities more pleasantly. But in
reality, many people when they reach adulthood begin to suffer from diseases such as Chronic Kidney Disease,
which is caused by an unhealthy lifestyle. The kidneys are a very important organ whose function is to maintain
blood composition by preventing the accumulation of waste or impurities and regulating fluid balance in the
body, keeping electrolyte levels such as sodium, potassium and phosphate stable, as well as producing
hormones and enzymes that help the body control blood pressure, makes red blood cells and keeps bones strong
and healthy.[1]
Chronic Kidney Disease is a health problem that many people suffer from by people throughout the
world, especially in countries with low income levels or intermediate. Chronic kidney disease is a condition
where kidney function decreases causes the kidneys to be unable to remove toxins and waste products from

Journal homepage: http:/xxxx..org


2  ISSN: 2302-9285

the blood, which are indicated by the presence of protein in the urine and a decrease in the glomerular filtration
rate. This disease is progressive and generally incurable recover ( irreversible ). Symptoms of this disease
generally include no appetite, nausea, vomiting, dizziness, shortness of breath, fatigue, edema in the feet and
hands, and uremia. [2]
Chronic kidney disease seen from a global perspective has experienced an increase in prevalence of
87% from 1990 to 2016 [2]. According to the World Health Organization (WHO), chronic kidney failure
contributes to the burden world disease with a death rate of 850,000 people per year (Pongsifeld, 2016).
According to studies Global Burden Disease 2010 conducted by the International Society of Nephrology ,
kidney disease Chronic disease has been appointed as an important cause of death worldwide by a large number
of deaths increased by 82.3% in the last two decades. In Indonesia alone, the increase in sufferers of this disease
has reached 20%. The Indonesian Hospital Association Data and Information Center (PDPERSI) states that
the number of chronic kidney failure sufferers is estimated at around 50 people per one million population.
Based on data from the Indonesia Renal Registry, a registration activity of the Indonesian nephrology
association, in 2008 the number of hemodialysis patients ( dialysis) reached 2260 people from 2146 people in
2007 [3]. Chronic kidney disease can also cause other deadly diseases such as diabetes, high blood pressure,
heart disease and lupus [4]. These things show that chronic kidney disease requires more attention, one of them
is with fast handling through an accurate prediction system. [2] Early diagnosis is considered the most
appropriate step to obtain a decision regarding kidney disease. To diagnose chronic kidney disease,
classification is needed according to criteria that indicate an indication of chronic kidney disease. [5]
Much research has been carried out on the prediction of chronic kidney disease using data mining
techniques. Data mining is the process of discovering new meaningful correlations, patterns and trends by sifting
through large amounts of data stored in repositories, using pattern reasoning technology as well as statistical and
mathematical techniques. The term data mining has the essence of being a scientific discipline whose main goal is to
discover, explore, or mine knowledge from the data or information that we have.[6] Data mining is used to carry out
the process of extracting hidden information from large datasets and there are several techniques in data
mining such as classification, clustering, regression and association that will be used in data in the medical
field [5]. In this data mining, classification will be carried out, which will include various data and put them
in certain classes [6] . In making predictions, this data prediction uses the C4.5 algorithm because this
algorithm can make predictions from various information based on the data used to calculate the possibility
of disease occurring based on the attributes that can be used and also to see how effective the C4.5 algorithm
is used to detect chronic kidney disease.

2. METHOD

This research methodology will be carried out based on the research stages illustrated in Figure 1.
Research stages, below.

Figure 1. Research stages


2 .1 . Data Collection
At this stage, the data to be processed is determined. Search for available data, obtain additional data required,
integrating all the data into data set, including variables required in the process.[8] Collect data and material that is

Jurnal Sangkil & Mangkus Vol. 01, No. 01, Month 2099: 1-1x
Jurnal Sangkil & Mangkus ISSN: XXX-XXX  3

relevant to chronic kidney disease, where the dataset is secondary data obtained from
https://www.kaggle.com/datasets/mansoordaku/ckdisease ).[9] The data consists of 25 attributes and 400
records.[10] This dataset has 25 attributes of which there are 11 numerical attributes and 14 categorical attributes.[11]The
class attribute itself has 2 values, namely ckd and notckd.[12]

2 . 2 . Data Preprocessing
At this stage, the data is prepared so that processing can be carried out by completing the empty data, including
labeling and forming training and testing data. This stage is important so that the objectives of the research can be
achieved, for example, the resulting prediction results are more accurate.[13]

2 . 3 . Data Splitting
The processed dataset is then divided into training data and testing data.[14]At this stage, the training data
and test data are divided, where the training data is 70%, while the test data is 30%.

2 . 4 . Classification
Classification is a grouping based on existing data.At this stage, classification is carried out using the C4.5
algorithm on Chronic Kidney Disease data. The goal is to analyze input data and develop accurate descriptions or
models for each class using features in the data.[15]

2 . 5 . Evaluation of Results
Testing the analysis is very important to determine and ensure whether the results of the analysis are in accordance
with the expected decisions. At this stage, the data mining process applies the C4.5 decision tree algorithm using the Python
programming language with Google Colab tools.

3. RESULTS AND DISCUSSION

3 .1 . Datasets
In this study, Chronic Kidney Disease consists of 25 attributes, namely age, blood pressure, gravity,
albumin, sugar, red blood cells, puscell, puscell, bacteria, GDS, urea, cretinin, sodium, potassium, hemoglobin,
mvc , white blood cells, red blood cell count, hypertension, diabetes, cad, appetite, edema, anemia,
classification and consists of 400 records.[6]

Figure 2. Initial data before preprocessing

Paper’s should be the fewest possible that accurately describe … (First Author)
4  ISSN: 2302-9285

3 .2 . Data Preprocessing

Figure 3. Data has been preprocessed

Figure 4. Classification Distribution Diagram


From the diagram above, it can be seen that 62.5% (1) were identified as chronic kidney disease or
250, while 37.5% (0) did not have chronic kidney disease or 150.

3 . 3 . Data Splitting

3 .4 . Classification
Algorithm C4.5 is a data method mining to find out the accuracy value classification.[16]

Jurnal Sangkil & Mangkus Vol. 01, No. 01, Month 2099: 1-1x
Jurnal Sangkil & Mangkus ISSN: XXX-XXX  5

3 .5 . Evaluation of Results
Testing data which will display the accuracy obtained from the processing of the chronic kidney disease
dataset using the C4.5 algorithm where the tool used by Google Colab achieves an accuracy of 97.5%.

Figure 5. C4.5 Algorithm Test Results

Confusion matrix is used to evaluate the classification quality of the algorithm classifier. The confusion matrix
is calculated by comparing the number of correct predictions and the number of incorrect predictions.[17] Metode ini
menggunakan tabel matriks yang digunakan untuk membandingkan jumlah TP terhadap jumlah record yang positif dengan
jumlah TN terhadap jumlah record yang negative.[18] The formula for calculating the level of accuracy in the Confusion
Matrix with variable descriptions is as follows:
a. TP = True Positif
b. TN = True Negatif
c. FP = False Positif
d. FN = False Negatif
Recall =TP (/ (FN/TP)[16]
Precision = TP / (FP+TP)[19]
F1 = 2*Recall* Precision / (Recall +Precision)[20]
Accuracy = (TP+TN) / TP+FN+FP+TN)[7]

Figure 6. Test results in the form of a confusion matrix

From the Confusion matrix above, we get a Recall value of = 44 / (3+44) = 93.62, Precision value =
44 / (0+44) = 1, F1 value = (2 x 93.62 x 1)/ (93.62 + 1) = 96.7 and Accuracy value = ((44+73)/ (44+0+3+73)=
97.5%

The ROC (Receiver Operating Characteristic) curve shows accuracy and compares visual classification. ROC
expresses the confusion matrix. ROC is a two-dimensional graph with false positives as a horizontal line and true positives
to measure the difference performance of the method used. ROC Curve is another way to test performance classification[21]
ROC is a graph makes the results of false positives a line horizontally and the result is true positive for the measure
differences in performance of the methods used, and ROC is usually used to express Confusion Matrix.[20]

Paper’s should be the fewest possible that accurately describe … (First Author)
6  ISSN: 2302-9285

Figure 7. Test results in the form of a ROC curve

Figure 8 . Test results in the form of a decision tree structure

4. CONCLUSION

From the explanation above, conclusions can be drawn from this research, namely from the available
dataset of 400 data, of which there are 25 attributes consisting of 24 feature attributes and 1 label attribute
(classification). This data processing was carried out using the classification method using the C4.5 algorithm.
obtain good calculations in the actual stage which is processed in Google Colab tools through Pre-processing,
Split data = training data : testing data), C4.5 algorithm classification of training data, and testing. Then after
finding the tree model, the test results are obtained Accuracy level of 9 7.5%. Test results can be displayed
with a ROC curve and also a decision tree structure. However, there are still several shortcomings of the
decision tree algorithm, including being unstable. This is one of the limitations of the decision tree algorithm
when small changes in the data can result in large changes in the structure of the decision tree and are less
effective in predicting the results of continuous variables. So for this reason, further research can be carried out
to develop classification methods with other algorithms as it is hoped that this will provide better accuracy

ACKNOWLEDGEMENTS
"This journal article was written by Nawa Ismail and Rasianto, SMKN 1 Tegineneng, based on the results of
research on Diagnosing Chronic Kidney Disease Using the C4.5 Algorithm which was funded independently as a form of
Community Service. The contents are the sole responsibility of the author.”

REFERENCES
[1] H. Amalia, “PERBANDINGAN METODE DATA MINING SVM DAN NN UNTUK KLASIFIKASI
PENYAKIT GINJAL KRONIS,” 2018.

Jurnal Sangkil & Mangkus Vol. 01, No. 01, Month 2099: 1-1x
Jurnal Sangkil & Mangkus ISSN: XXX-XXX  7

[2] A. R. S. Darwanto, Taza Luzia Viarindita, and Yekti Widyaningsih, “Analisis Regresi Logistik Binomial
dan Algoritma Random Forest pada Proses Pengklasifikasian Penyakit Ginjal Kronis,” JSA, vol. 5, no. 1,
pp. 1–14, Jun. 2021, doi: 10.21009/JSA.05101.
[3] W. Yunus, “Algoritma K-Nearest Neighbor Berbasis Particle Swarm Optimization Untuk Prediksi
Penyakit Ginjal Kronik”.
[4] H. Amalia, “PERBANDINGAN METODE DATA MINING SVM DAN NN UNTUK KLASIFIKASI
PENYAKIT GINJAL KRONIS,” 2018.
[5] H. Harmayani and L. Sitorus, “Diagnosa Penyakit Ginjal Kronis Menggunakan Metode Klasifikasi
Naïve,” mib, vol. 4, no. 3, p. 850, Jul. 2020, doi: 10.30865/mib.v4i3.2292.
[6] W. Yunus, “Algoritma K-Nearest Neighbor Berbasis Particle Swarm Optimization Untuk Prediksi
Penyakit Ginjal Kronik”.
[7] N. Sunanto and G. Falah, “PENERAPAN ALGORITMA C4.5 UNTUK MEMBUAT MODEL
PREDIKSI PASIEN YANG MENGIDAP PENYAKIT DIABETES,” rabit, vol. 7, no. 2, pp. 208–216,
Jul. 2022, doi: 10.36341/rabit.v7i2.2435.
[8] “VOL.VI NO.1 FEBRUARI 2017,” JURNAL SISTEM INFORMASI.
[9] H. Amalia, “PERBANDINGAN METODE DATA MINING SVM DAN NN UNTUK KLASIFIKASI
PENYAKIT GINJAL KRONIS,” 2018.
[10] “Diagnosis Penyakit Ginjal Kronis dengan Algoritma C4.5, a.pdf.”
[11] I. G. A. Mahardika Pratama, L. G. Astuti, I. M. Widiartha, I. G. N. A. Cahyadi Putra, C. R. Adi
Pramartha, and I. D. M. B. Atmaja Darmawan, “Diagnosis Penyakit Ginjal Kronis dengan Algoritma
C4.5, K-Means dan BPSO,” JLK, vol. 10, no. 4, p. 371, Jul. 2022, doi: 10.24843/JLK.2022.v10.i04.p07.
[12] S. F. N. Aini and A. S. Sunge, “PERBANDINGAN METODE KLASIFIKASI DALAM
MEMPREDIKSI PENYAKIT GINJAL KRONIS”.
[13] L. Sari, A. Romadloni, and R. Listyaningrum, “Penerapan Data Mining dalam Analisis Prediksi Kanker
Paru Menggunakan Algoritma Random Forest,” infotekmesin, vol. 14, no. 1, pp. 155–162, Jan. 2023,
doi: 10.35970/infotekmesin.v14i1.1751.
[14] T. Arifin and D. Ariesta, “PREDIKSI PENYAKIT GINJAL KRONIS MENGGUNAKAN
ALGORITMA NAIVE BAYES CLASSIFIER BERBASIS PARTICLE SWARM OPTIMIZATION,”
JTI, vol. 13, no. 1, pp. 26–30, Apr. 2019, doi: 10.36787/jti.v13i1.97.
[15] Y. Widiastiwi and I. Ernawati, “Klasifikasi Penyakit Batu Ginjal Menggunakan Algoritma Decision
Tree C4.5 Dengan Membandingkan Hasil Uji Akurasi,” vol. 5, no. 2, 2021.
[16] “IMPLEMENTASI ALGORITMA C4.5 DAN K-MEANS.pdf.”
[17] A. Budiman, “Cronic Kidney Disease Prediction Using C4.5 Algorithm and K-Means,” vol. 1, no. 1,
2020.
[18] “12 Analisis Penyakit Jantung Menggunakan Metode KNN Dan Random Forest.pdf.”
[19] S. Hendrian, “Algoritma Klasifikasi Data Mining Untuk Memprediksi Siswa Dalam Memperoleh
Bantuan Dana Pendidikan,” FaktorExacta, vol. 11, no. 3, Oct. 2018, doi:
10.30998/faktorexacta.v11i3.2777.
[20] E. C. P. Witjaksana, R. Saedudin, and V. P. Widartha, “PERBANDINGAN AKURASI ALGORITMA
RANDOM FOREST DAN ALGORITMA ARTIFICIAL NEURAL NETWORK UNTUK
KLASIFIKASI PENYAKIT DIABETES”.
[21] T. Retnasari and E. Rahmawati, “DIAGNOSA PREDIKSI PENYAKIT JANTUNG DENGAN MODEL
ALGORITMA NAÏVE BAYES DAN ALGORITMA C4.5,” 2017.

Paper’s should be the fewest possible that accurately describe … (First Author)

You might also like