KNN Diabetes Internasional 2
KNN Diabetes Internasional 2
KNN Diabetes Internasional 2
e-ISSN: 2654-4644
Vol. 6, No. 2, pp. 134-139, June 2023
Accredited by KEMENDIKBUDRISTEK, No. 230/E/KPT/2022
http://www.ijeepse.ejournal.unri.ac.id
Abstract--- Diabetes is a chronic disease characterized by Indonesia in 2000 was 8.4 million, after India (31.7 million),
high blood sugar (glucose) levels or above abnormal values. This China (20.8 million), and the United States (17.7 million). For
can occur when the body is no longer able to absorb glucose people with diabetes worldwide, the WHO reports that there
properly or when the intake of glucose is higher than needed. are more than 143 million sufferers, and this number is
Glucose is the main energy source for the cells of the human projected to double in prevalence by 2030 [2], and 77% of
body. Glucose that accumulates over the long term in the body them occur in developing countries [3].
can lead to complications and more serious and life-threatening
diseases. As a result, patients with diabetes must be predicted The increase in diabetes cases is due to the delay in
prior to the onset of disease complications. Machine learning is establishing a diagnosis of the disease. The patient had died
one of the branches of artificial intelligence that can be used to from complications before the diagnosis was made. The cause
provide predictive value to datasets of diabetic patients. The of the delay in establishing the diagnosis is the variety of
tested dataset has 390 observations with data on cholesterol factors that influence the existing choices. Therefore, we need
levels, glucose, HDL cholesterol, cholesterol ratio, age, gender, a prediction that can be a tool in determining whether a person
blood pressure, BMI, waist and hip width with its ratio, and the has diabetes mellitus or not. Disease is caused by people who
patient's height and weight as variables. Predictions are applied combine excessive physical activity with a diet high in
using the K-Nearest Neighbor method, which shows an accuracy calories and fat that lacks fiber. Identification of diabetes is
of 93.58% with a k value of 3, using 20% of all data as test data.
needed as a prevention strategy. By utilizing a data mining
Keywords—Diabetes, K-Nearest Neighbor, Prediction,
approach, it is possible to extract previously unknown
Machine Learning information [4]. It is a great challenge for the healthcare
organizations to provide cost-effective and high-quality
I. INTRODUCTION clinical care for patients. This can be done only with the
analyses of large healthcare database to extract the knowledge
Nowadays, technology is developing more rapidly and
of disease and to make decisions. This is an important
providing more and more benefits to human life. One of the
application in case of major diseases such as heart disease,
benefits provided is computer technology, which has the
cancer and diabetes [5]. The diagnosis of diabetes is very
ability to implement a human's way of thinking into a system
important; there are so many techniques in Machine Learning
on a computer. One of them is a machine-learning system that
that can be effectively used for the prediction and diagnosis of
is used to detect or predict. Diabetes is a chronic disease that
diabetes disease. These algorithms in Machine Learning prove
can be characterized by
to be cost-effective and time saving for diabetic patients [6].
abnormally high levels of glucose (blood sugar). The
Therefore, machine learning algorithms are now used to
people suffering from diabetes, their body is unable to
identify and diagnose diseases in order to minimize the death
properly process food for use as energy. The pancreas
risk and improve a patient's health status, as machine learning
make a hormone called ‘Insulin’ helps glucose to penetrate
contributes to specific decisions [7].
into the cells of the Body, at times, the body doesn’t make
enough or any insulin. As a result, the glucose (or sugar) II. METHODOLOGY
stays in the blood and an over a time period it causes health
problems [1]. Diabetes is one of the most dangerous and A. Machine Learning
deadly diseases in Indonesia, after stroke and coronary heart Machine learning is a branch of computer science that
disease. Early prediction of diabetes risk is needed for early examines how a machine can solve problems without being
treatment of this disease. According to Sidartawan Soegondo, explicitly programmed [8]. Peter Harington (2012) describes
Indonesia is the fourth country in the world with the highest several machine learning performance flows, namely:
number of diabetics, which has increased to 14 million people. § Collect data, in the form of Excel, Ms Access, Text Files
This is based on a report from the World Health Organization and so on.
(WHO), where the number of people with diabetes in
Received: April 9, 2023 | Revised: May 10, 2023 | Accepted: June 1, 2023 134
§ Prepare the data, by determining the quality of the data D. Confusion Matrix
and then taking steps to correct problems such as data The confusion matrix is a method that is usually used to
loss. perform accuracy calculations on data mining concepts. The
§ Train a model with data prepared into two parts, namely confusion matrix is illustrated by a table which states the
training data used for model development and test data amount of test data that is correctly classified and the amount
used as a reference. of test data that is misclassified [12]. Accuracy is the
§ Evaluating the model, by determining the provisions in comparison between the data that is classified correctly and
the selection of algorithms based on the test results. the entire data. The accuracy value can be obtained from the
§ Improving performance, involves choosing a different following equation [13] :
model or introducing more variables to increase
efficiency. !"#!$
Accuracy =
!"#!$#%"#%$
𝑥 100% (1)
B. Data Mining
Data mining is the process of looking for interesting Precision is defined as the ratio of the selected relevant items
patterns or information in selected data using certain to all selected items. Precision can be obtained by using the
techniques or methods. Techniques, methods, or algorithms following equation [13] :
in data mining vary widely [9]. According to Rerun at 2018,
Data mining has several stages, with an explanation of each !"
stage in the following: Precision =
!"#%"
𝑥 100% (2)
!"
Recall =
!"#%$
𝑥 100% (3)
%"
Fig. 1. Data Mining [10] Error =
!"
𝑥 100% (4)
Received: April 9, 2023 | Revised: May 10, 2023 | Accepted: June 1, 2023 135
The third stage is classifying the training data based on the
value of k. After obtaining training data samples that are
included in the k value, the training data can be separated
according to their classification class, namely diabetes or no
diabetes. The fourth stage is to calculate the results of the
number of class variable classifications from all training data
that are included in the k value. At this stage, it will be
calculated how much training data is included in the Diabetes
classification and how much training data is included in No
diabetes. Each class of classification will be counted in order
for the next stage to draw conclusions.
The final stage is drawing conclusions. The test data will
be compared with the training data. If the number of diabetes
classifications in the training data is greater than the number
of no diabetes classifications, it can be concluded that the test
data is included in the Diabetes classification. If the number
of no diabetes classification is more dominant, then the test
data is classified into the classification no diabetes.
14 variables
Received: April 9, 2023 | Revised: May 10, 2023 | Accepted: June 1, 2023 136
TABLE II. POST PREPROCESSED DATASET
Received: April 9, 2023 | Revised: May 10, 2023 | Accepted: June 1, 2023 137
As shown in Table VII, the conclusion is that the K-Nearest K-Nearest Neighbor Model View, in this view the value of
Neighbor has the best prediction result on 20% - 80% split k can be changed for different prediction results as shown in
data ratio with three as the k value, whose accuracy of Fig. 6.
93.58% is the highest accuracy score and its error rate 6.4%
the lowest error rate.
E. Interface Discussion
Load Dataset Form View, this view is used to load the
raw dataset for the machine learning to use. As shown in
Fig. 3, the file must be in csv. format in order to run.
Training Test Split View, in this view, the user can set the
training test split ratio by changing the value of the ‘test-
size’. The amounts of training and test data are displayed
below the code box after running the codes, as shown in Fig.
5.
Error Rate View, this view displays graph of error rate and the value of k.
Its purpose is to check which k value has the lowest error rate. Lower error
rates provide better accuracy.
IV. CONCLUSION
The conclusion obtained based on the research
conducted is that the K-Nearest Neighbor algorithm has a
Fig. 5. Training Test Split View good performance result in predicting diabetes, with a fairly
high accuracy of 93.58% and a fairly low probability of
prediction error of 6.4%.
Received: April 9, 2023 | Revised: May 10, 2023 | Accepted: June 1, 2023 138
ACKNOWLEDGMENT BIOGRAPHIES OF AUTHORS
The authors would like to thank the Faculty of Science JACK BILLIE CHANDRA was born in Pekanbaru,
Computer, Institut Bisnis dan Teknologi Pelita Indonesia Indonesia and he is a student from Faculty of
for the facilities that has been provided and for its support. Computer Science, Institut Bisnis dan Teknologi
Pelita Indonesia Pekanbaru. He graduated in 2022. He
also received his A.P in 2019 from the same institute.
REFERENCES
[1] S. Kumar, “Detailed Analysis Of Classifiers For Prediction Of
Diabetes,” Vol. 11, No. 09, pp. 209–212, 2022.
[2] R. Saxena And S. Kumar Sharma Manali Gupta, “Role Of K-Nearest DEWI NASIEN received her Ph.D. in 2012 and has
Neighbour In Detection Of Diabetes Mellitus,” Turkish J. Comput. worked at Universiti Teknologi Malaysia, Johor
Math. Educ., Vol. 12, No. 10, Pp. 373–376, 2021. Bahru, Malaysia, from 2012 to 2016. She is currently
a lecturer at a private university at Pelita Indonesia
[3] J. J. Pangaribuan, “Mendiagnosis Penyakit Diabetes Melitus Dengan Institute of Business and Technology. Moreover, she
Menggunakan Metode Extreme Learning Machine,” J. Isd, Vol. 2, is also an adjunct lecturer at several universities. Her
No. 2, Pp. 69–76, 2016. areas of expertise include image processing, pattern
recognition, machine learning, and soft computing.
[4] M. S. Mustafa And I. W. Simpen, “Implementasi Algoritma K-
Nearest Neighbor (Knn) Untuk Memprediksi Pasien Terkena
Penyakit Diabetes Pada Puskesmas Manyampa Kabupaten
Bulukumba,” Semin. Ilm. Sist. Inf. Dan Teknol. Inf., Vol. Viii, No.
1, pp. 1–10, 2019.
[5] P. C. Thirumal And N. Nagarajan, “Applying Average K Nearest
Neighbour Algorithm To Detect Type-2 Diabetes,” Aust. J. Basic
Appl. Sci., Vol. 8, No. 7, Pp. 128–134, 2014.
[6] S. V. M And U. K, “Type 2 Diabetic Prediction Using Machine
Learning Algorithm,” Am. Sci. Res. J. Eng. Technol. Sci., Vol. 45,
No. 1, pp. 299–307, 2018.
[7] M. Panda, D. P. Mishra, S. M. Patro, And S. R. Salkuti, “Prediction
Of Diabetes Disease Using Machine Learning Algorithms,” Iaes Int.
J. Artif. Intell., Vol. 11, No. 1, pp. 284–290, 2022.
[8] M. Ula And A. Faridhatul Ulva, “Implementasi Machine Learning
Dengan Model Case Based Reasoning Dalam Mendagnosa Gizi
Buruk Pada Anak,” J. Inform. Kaputama, Vol. 5, No. 2, pp. 333–
339, 2021.
[9] Y. Mardi, “Data Mining : Klasifikasi Menggunakan Algoritma
C4.5,” Edik Inform., Vol. 2, No. 2, Pp. 213–219, 2017.
[10] R. R. Rerung, “Penerapan Data Mining Dengan Memanfaatkan
Metode Association Rule Untuk Promosi Produk,” J. Teknol.
Rekayasa, Vol. 3, No. 1, P. 89, 2018.
[11] Yeni Kustiyahningsih And N. Syafa’ah, “Sistem Pendukung
Keputusan Untuk Menentukan Jurusan Pada Siswa Sma
Menggunakan Metode Knn Dan Smart,” J. Istek, Vol. Vi, No. 1, pp.
40–42, 2013.
[12] M. F. Rahman, D. Alamsah, M. I. Darmawidjadja, And I. Nurma,
“Klasifikasi Untuk Diagnosa Diabetes Menggunakan Metode
Bayesian Regularization Neural Network (Rbnn),” J. Inform., Vol.
11, no. 1, p. 36, 2017.
[13] B. P. Pratiwi And A. Silvia, “Pengukuran Kinerja Sistem Kualitas
Udara Dengan Teknologi Wsn Menggunakan Confusion Matrix”,
Vol. 6, No. 2, pp. 66–75, 2020.
Received: April 9, 2023 | Revised: May 10, 2023 | Accepted: June 1, 2023 139