Liver Disease Prediction Using Different Machine Learning Algorithms
Liver Disease Prediction Using Different Machine Learning Algorithms
/LYHU'LVHDVH3UHGLFWLRQ8VLQJ'LIIHUHQW0DFKLQH
Liver Disease Prediction Using Different Machine
/HDUQLQJ$OJRULWKPV
Learning Algorithms
2023 International Conference on Advanced & Global Engineering Challenges (AGEC) | 979-8-3503-4096-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/AGEC57922.2023.00034
Abhilash Kumarڔ#
$EKLODVK.XPDU .DSLO'HY0DKDWR
Kapil Dev Mahatof ڔ &KDQGUDVKHNKDU$]DG
Chandrashekhar Azad 8GD\.XPDU
UdayKumar*
'HSDUWPHQWRI3K\VLFV
Department ofPhysics 'HSDUWPHQWRI3K\VLFV
Department ofPhysics 'HSDUWPHQWRI&RPSXWHU
Department ofComputer 'HSDUWPHQWRI3K\VLFV
Department ofPhysics
1DWLRQDO,QVWLWXWHRI
National Institute of 1DWLRQDO,QVWLWXWHRI
National Institute of 6FLHQFH
Science &(QJLQHHULQJ
Engineering 1DWLRQDO,QVWLWXWHRI
National Institute of
7HFKQRORJ\-DPVKHGSXU
Technology, Jamshedpur 7HFKQRORJ\-DPVKHGSXU
Technology, Jamshedpur 1DWLRQDO,QVWLWXWHRI
National Institute of 7HFKQRORJ\-DPVKHGSXU
Technology, Jamshedpur
-DPVKHGSXU,QGLD
Jamshedpur, India -DPVKHGSXU,QGLD
Jamshedpur, India 7HFKQRORJ\-DPVKHGSXU
Technology, Jamshedpur -DPVKHGSXU,QGLD
Jamshedpur, India
SJSKSK#QLWMVUDFLQ
[email protected] UVSK\#QLWMVUDFLQ
[email protected] -DPVKHGSXU,QGLD
Jamshedpur, India XGD\SK\#QLWMVUDFLQ
[email protected]
0000-0001-8448-2729 FVD]DGFD#QLWMVUDFLQ
[email protected]
0000-0002-0927-2454
ڔ$EKLODVK.XPDUDQG.DSLO'HY0DKDWRKDYHFRQWULEXWHGHTXDOO\WRWKLVUHVHDUFK
# Abhilash Kumar and Kapil Dev Mahato have contributed equally to this research
*&RUUHVSRQGLQJDXWKRU8GD\.XPDU
Corresponding author: Uday Kumar
Abstract-7KHOLYHULVDYLWDORUJDQLQWKHKXPDQERG\VLQFHLW
$EVWUDFW² The liver is a vital organ in the human body since it EORRGWHVWVLPDJLQJWHVWVVXFKDVDQXOWUDVRXQGRU&7VFDQDQG
blood tests, imaging tests such as an ultrasound or CT scan, and
LVLQFKDUJHRIPHWDEROLVPGHWR[LILFDWLRQVWRUDJHGLJHVWLRQEORRG
is in charge of metabolism, detoxification, storage, digestion, blood Da OLYHU
liver ELRSV\
biopsy, GRFWRUV
doctors FDQ
can GHWHFW
detect OLYHU
liver GLVHDVH
disease. -DXQGLFH
Jaundice,
VXJDU
sugar UHJXODWLRQ
regulation, DQG
and LPPXQRORJLFDO
immunological IXQFWLRQ
function. $V
As Da UHVXOW
result, DQ\
any GLVFRPIRUW
discomfort LQ in WKH
the DEGRPHQ
abdomen, DQG
and ZHDULQHVV
weariness DUH
are VLJQV
signs RI
of OLYHU
liver
KDUPRUG\VIXQFWLRQWRLWFDQUHVXOWLQVHULRXVLOOQHVV+HQFHHDUO\
harm or dysfunction to it can result in serious illness. Hence, early GLVHDVH7RSUHYHQWIXWXUHOLYHUGDPDJHDQGLWVFRQVHTXHQFHV
disease. To prevent future liver damage and its consequences,
GHWHFWLRQ
detection RI
of LW
it SURYLGHV
provides HIIHFWLYH
effective WUHDWPHQW
treatment, SUHYHQWV
prevents IXUWKHU
further LWit's
V FUXFLDO
crucial WR
to VHHN
seek WUHDWPHQW
treatment DQG
and HDUO\
early GHWHFWLRQ
detection. 7KH The
GDPDJHFRVWHIIHFWLYHDQGRYHUDOOLQFUHDVHVWKHVXUYLYDOUDWHRID
damage, cost-effective, and overall increases the survival rate of a
SDUWLFXODU LQVWUXPHQWVXVHGLQWKLVFDVHDUHFRVWO\DQGQRWZLGHO\DYDLODEOH
instruments used in this case are costly and not widely available
particular LQGLYLGXDO
individual. ,Q
In WKLV
this SDSHU
paper, ZH
we KDYH
have WDNHQ
taken WKH
the %83$
BUPA
/LYHU'LVHDVH /' GDWDVHWIURPWKH8&,UHSRVLWRU\:HZRUNHG
Liver Disease (LD) dataset from the UCI repository. We worked LQPDQ\KRVSLWDOV,QPDQ\GHYHORSLQJQDWLRQVOLNH,QGLDLWLV
in many hospitals. In many developing nations like India, it is
ZLWK
with VHYHUDO
several PDFKLQH
machine OHDUQLQJ
learning DOJRULWKPV 0/$V VXFK
algorithms (MLAs) such DV
as REVHUYHGWKDWJHWWLQJUHJXODUFKHFNXSVLVQRWYHU\DIIRUGDEOH
observed that getting regular check-ups is not very affordable
'HFLVLRQ
Decision 7UHH '7 .1HDUHVW
Tree (DT), K-Nearest 1HLJKERUV .11 0XOWLOD\HU
Neighbors (KNN), Multilayer RU
or FRQYHQLHQW
convenient IRU for WKH
the FRPPRQ
common SHRSOH
people. 7R
To DGGUHVV
address WKLV
this
3HUFHSWURQ 0/3 $GDSWLYHSWLYH%RRVWLQJ
Perceptron (MLP), $% 5DQGRP)RUHVW
Adaptiveptive Boosting (AB), Random Forest VKRUWFRPLQJ
shortcoming, UHVHDUFK
research JURXSV
groups DUH
are JUDYLWDWLQJ
gravitating WRZDUG
toward 0/$
MLA
5) *UDGLHQW%RRVWLQJ
(RF), *% ([WUHPH*UDGLHQW%RRVWLQJ
Gradient Boosting (GB), Extreme Gradient Boosting (XGB),;*% DSSURDFKHV
approaches. 7KH The PDFKLQH
machine OHDUQLQJ 0/ DSSURDFK
learning (ML) approach LV is KLJKO\
highly
/RJLVWLF5HJUHVVLRQ /5 *DXVVLDQ1DLYH%D\HV
Logistic Regression (LR), Gaussian Naive Bayes (NB),1% ([WUD7UHH
Extra Tree KHOSIXO
helpful VLQFH
since LW
it HQWDLOV
entails FROOHFWLQJ
collecting LQIRUPDWLRQ
information IURP
from UDZ
raw GDWD
data,
(7 /LJKW
(ET), Light *UDGLHQW%RRVWLQJ
Gradient-Boosting 0DFKLQH /*%0 DQG
Machine (LGBM), and 6XSSRUW
Support ILJXULQJRXWKRZGHSHQGHQWDQGLQGHSHQGHQWYDULDEOHVUHODWHWR
figuring out how dependent and independent variables relate to
9HFWRU0DFKLQH 690 $IWHUDSSO\LQJWKHVHWZHOYHPHWKRGVZH
Vector Machine (SVM). After applying these twelve methods, we RQHDQRWKHULQDJLYHQGDWDVHWDQGPDNLQJSUHGLFWLRQVEDVHGRQ
one another in a given dataset, and making predictions based on
PDGHSUHGLFWLRQVDERXWWKHSRVVLELOLW\RIKDYLQJWKHOLYHUGLVHDVH
made predictions about the possibility of having the liver disease
RUQRW+HUHZHKDYHIRXQG'7WREHWKHEHVWDPRQJWKHWZHOYH ZKDWLWOHDUQV7KHDSSOLHGDOJRULWKPVDUHYHU\
what it learns. The applied algorithms are very XVHIXOIRUWKH
useful for the
or not. Here, we have found DT to be the best among the twelve
DOJRULWKPV
algorithms ZLWK
with DFFXUDF\
accuracy, SUHFLVLRQ
precision, UHFDOO
recall, DQG
and )6FRUH
FI-Score RI of GLDJQRVLVRIWKHKHDUW
diagnosis of the heart, &29,'FDQFHUWXPRXUVGLDEHWHV
COVID-19, cancer, tumours, diabetes,
DQGUHVSHFWLYHO\
86.67%,0.87,0.87, and 0.86, respectively. /'DQGVRRQ>@>@)XUWKHUPRUHZHFKRVHDOLYHUGLVHDVH
LD, and so on [3], [4]. Furthermore, we chose a liver disease
SDWLHQWGDWDVHWWRLPSURYHSUHGLFWLRQDFFXUDF\XVLQJGLIIHUHQW
patient dataset to improve prediction accuracy using different
Keywords- 'HFLVLRQWUHH5DQGRP)RUHVW/RJLVWLFUHJUHVVLRQ
.H\ZRUGV² Decision tree, Random Forest, Logistic regression, 0/$V7KHPDLQREMHFWLYHRIWKLVSDSHULVWRSUHGLFWDFFXUDWHO\
MLAs. The main objective ofthis paper is to predict accurately,
OLYHU
liver GLVHDVH
disease SUHGLFWLRQ
prediction, 0DFKLQH
Machine OHDUQLQJ
learning DOJRULWKPV
algorithms, 6XSHUYLVHG
Supervised ZKLFKKHOSVSHRSOHGLDJQRVHWKHGLVHDVH,QDQHIIRUWWRLPSURYH
which helps people diagnose the disease. In an effort to improve
OHDUQLQJ
learning GLDJQRVLVDQGWUHDWPHQWKHDOWKFDUHRUJDQL]DWLRQVKDYHUHFHQWO\
diagnosis and treatment, healthcare organizations have recently
, EHJXQXWLOL]LQJFRQWHPSRUDU\DQGDXWRPDWHGWHFKQRORJLHVOLNH
begun utilizing contemporary and automated technologies like
INTRODUCTION
1. ,1752'8&7,21
PDFKLQHOHDUQLQJGDWDPLQLQJDQGDUWLILFLDOLQWHOOLJHQFH7KLV
machine learning, data mining, and artificial intelligence. This
2YHUWKHSDVWIHZ\HDUVPDQ\UHVHDUFKHUVKDYHZRUNHGRQ
Over the past few years, many researchers have worked on KDVPDGHLWSRVVLEOHWRRIIHUSDWLHQWVWRSQRWFKPHGLFDORSWLRQV
has made it possible to offer patients top-notch medical options.
0/$V
MLAs DQGand WULHG
tried WR
to SUHGLFW
predict WKH
the OLYHUUHODWHG
liver-related GLVHDVH
disease. (DUO\
Early $FFRUGLQJWR3DUNHWDORQHDUHDZKHUHSUHGLFWLYHDQDO\WLFVDUH
According to Park et al., one area where predictive analytics are
GHWHFWLRQKDVEHFRPHYHU\HVVHQWLDODVLWLQFUHDVHVWKHVFRSHRI
detection has become very essential as it increases the scope of ZLGHO\XVHGIRUPDQ\JRDOVLQFOXGLQJGLVHDVHGHWHFWLRQSDWLHQW
widely used for many goals, including disease detection, patient
HIIHFWLYHWUHDWPHQWDQGLQFUHDVHVWKHVXUYLYDOUDWH7KHOLYHULV
effective treatment and increases the survival rate. The liver is FDUH
care, SDWLHQW
patient UHFRYHU\
recovery, DQGand GUXJ
drug IRUPXODWLRQ
formulation, LV is KHDOWKFDUH
healthcare
DYHU\LPSRUWDQWRUJDQLQWKHKXPDQERG\,WLVUHVSRQVLEOHIRU
a very important organ in the human body. It is responsible for PDQDJHPHQW>@
management [5].
PHWDEROLVP
metabolism, GHWR[LILFDWLRQ
detoxification, VWRUDJH
storage, GLJHVWLRQ
digestion, EORRG
blood VXJDU
sugar 7KHFXUUHQWVWXG\ORRNVDWSUHGLFWLQJWKHH[LVWHQFHRIOLYHU
The current study looks at predicting the existence of liver
UHJXODWLRQDQGLPPXQHIXQFWLRQ$Q\KDUPRUG\VIXQFWLRQFDQ
regulation, and immune function. Any harm or dysfunction can GLVHDVH
disease XVLQJ
using WKH
the %83$
BUPA GDWDVHW
dataset, ZKHUH
where WKHUH
there ZHUH
were PHGLFDO
medical
OHDG
lead WR
to D
a YDULHW\
variety RI
of GLVHDVHV
diseases, LQFOXGLQJ
including GLDEHWHV
diabetes DQG
and KHDUW
heart UHFRUGV
records IRUfor
345 PDOH
male SDWLHQWV
patients. 7KLV
This ZRUN
work H[SDQGV
expands RQon HDUOLHU
earlier
GLVHDVH$FFRUGLQJWRDUHSRUWSXEOLVKHGLQFKURQLFOLYHU
disease. According to a report published in 2016, chronic liver UHVHDUFKHUV
researchers' ZRUN
work RQon WKH
the VWXGLHG
studied GDWDVHW VHH 6HFWLRQ
dataset (see Section ,,II) DQG
and
GLVHDVHDQGFLUUKRVLVDFFRXQWHGIRUQHDUO\RIWRWDOGHDWKV
disease and cirrhosis accounted for nearly 2.1% of total deaths IRFXVHVRQRSWLPL]LQJ
focuses on optimizing WKH the PRGHODQGLPSURYLQJ
model and improving OLYHUGLVHDVH
liver disease
LQ,QGLD>@
in India [1]. GLDJQRVLVE\XWLOL]LQJVHYHQDWWULEXWHV
diagnosis by utilizing seven attributes (see VHH7DEOH
Table 2).:HKDYH
We have
/LYHU
Liver GLVHDVH
disease FDQ
can EH
be FDXVHG
caused E\
by VHYHUDO
several IDFWRUV
factors, VXFK
such DV
as XVHGWZHOYHGLIIHUHQWPRGHOVWRDQDO\]HWKHJLYHQGDWDDQGWKH
used twelve different models to analyze the given data, and the
DOFRKRODEXVHYLUDOKHSDWLWLVQRQDOFRKROLFIDWGHSRVLWLRQDQG
alcohol abuse, viral hepatitis, non-alcoholic fat deposition, and RXWSXW
output KDVhas EHHQ
been SUHGLFWHG
predicted. 7KHVH
These PRGHOV
models GRdo JLYH
give DFFXUDWH
accurate
PDQ\PRUH>@&RPELQLQJDPHGLFDOKLVWRU\SK\VLFDOH[DP
many more [2]. Combining a medical history, physical exam,
979-8-3503-4096-9/23/$31.00©2023
979-8-3503-4096-9/23/$31.00 ©2023IEEE
IEEE 118
118
DOl10.1109/AGEC57922.2023.00034
DOI 10.1109/AGEC57922.2023.00034
Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 09,2024 at 10:30:37 UTC from IEEE Xplore. Restrictions apply.
UHVXOWV DFFXUDF\ SUHFLVLRQ
results (accuracy, precision, UHFDOO
recall, DQG
and )6FRUH
Fl-Score), EXW
but ILQGLQJVZLWKDFFXUDFLHVRIDQG>@
findings, with accuracies of69.23%, 69.57%, and 68.84% [13].
VRPHWLPHVWKH\RYHUILWRUXQGHUILWWKHGDWDJLYHQWRWKHP$OVR
sometimes they overfit or underfit the data given to them. Also, 0G5+DTXHHWDOLGHQWLILHGGLVHDVHVRIWKHOLYHUZLWK
Md. R. Haque et al, identified diseases of the liver with 80%
ZH
we WU\
try WR
to SUHGLFW
predict WKHthe GLVHDVH
disease PRUH
more DFFXUDWHO\
accurately ZLWK
with IHZHU
fewer DQGDFFXUDF\UHVSHFWLYHO\XVLQJ)5DQGDUWLILFLDOQHXUDO
and 85.2% accuracy, respectively, using FR and artificial neural
UHVRXUFHV
resources DQG
and JLYH
give RXW
out WKH
the RSWLPXP
optimum UHVXOW
result. 7KLV
This VWXG\
study WHVWHG
tested QHWZRUNV
networks >@
[14]. -
J. 1DVLU
Nasir DQG
and FROOHDJXHV
colleagues ZRUNHG
worked ZLWK
with &KLQHVH
Chinese
WZHOYHSUHGLFWLYHFODVVLILHUDOJRULWKPVLQWKHGDWDVHWRI%83$
twelve predictive classifier algorithms in the dataset of BUPA DQG
and Pakistani scientists to examine four distinct GDWDVHWV
3DNLVWDQL VFLHQWLVWV WR H[DPLQH IRXU GLVWLQFW datasets RQ
on
OLYHU
liver GLVRUGHUV
disorders >@[6] DQG
and LGHQWLILHG
identified WKDW
that '7
DT KDV
has WKH
the KLJKHVW
highest OLYHU
liver GLVRUGHUV
disorders. 7KH
The DFFXUDF\
accuracy RI of WKH
the %83$
BUPA GDWDVHWV
datasets ZDV
was
DFFXUDF\
accuracy RIof
86.67% . 7KLV
This UHVXOWDQW
resultant YDOXH
value LV
is DERXW
about
26.51%, GHWHUPLQHGWREHIRU/5IRU690DQGIRU
determined to be 66.7% for LR, 71.6% for SVM, and 67% for
25.91%, DQGand
17.33% KLJKHU
higher WKDQ
than WKH
the SXEOLVKHG
published YDOXHV
values RI
of .11
KNN >@
[15]. 8VLQJ
Using 690
SVM DQGand .11
KNN, 0GMd. 5R. 5H]D
Reza DQG
and KLV
his
60.16% >@
[7], 60.76% >@ [8], DQG
and 69.34% >@ [9], UHVSHFWLYHO\
respectively. 2WKHU
Other FROOHDJXHV
colleagues SUHGLFWHG
predicted OLYHU
liver GLVHDVH
disease ZLWKwith
70% DQG
and
79.61%
PRGHOVOLNH/5 DQG.11
models like LR (73.33%) KDYHSHUIRUPHG
and KNN (83.34%) have performed DFFXUDF\
accuracy, UHVSHFWLYHO\
respectively. )RU
For GLDEHWLF
diabetic GLVHDVH
disease SUHGLFWLRQ
prediction, IRXU
four
EHWWHU
better WKDQ
than UHFHQWO\
recently UHSRUWHG
reported DFFXUDF\
accuracy UHVXOWV
results RI
of
69.86% DQG
and PHDVXUHV DFFXUDF\ SUHFLVLRQ
measures (accuracy, precision, UHFDOO
recall, DQG
and )6FRUH
Fl-Score) ZHUH
were
UHVSHFWLYHO\>@ZKLFKDUHDQGKLJKHU
70.72%, respectively [10], which are 3.47% and 12.62% higher FDOFXODWHG
calculated DWat WKH
the VDPH
same WLPH
time >@
[16]. 0
M. )DWKL
Fathi DQG
and FROOHDJXHV
colleagues
WKDQWKHSXEOLVKHGUHVXOW
than the published result. DSSOLHG
applied WKUHH
three W\SHV
types RI
of 690
SVM FODVVLILHUV
classifiers WRto WZR
two LQGHSHQGHQW
independent
0RUHRYHU
Moreover, WKHthe DSSOLHG
applied DOJRULWKPV
algorithms KDYH
have WKH
the SRWHQWLDO
potential WR
to DLG
aid GDWDVHWV
datasets, ,/3'
ILPD DQGand %83$
BUPA, DQG and GLVFRYHUHG
discovered OLQHDU
linear 690
SVM
FOLQLFLDQV
clinicians LQ
in WKH
the GLDJQRVLV
diagnosis DQGand PDQDJHPHQW
management RI of OLYHU
liver GLVHDVH
disease, DFFXUDF\
accuracy RIof
82.9% DQG
and
83.5%, UHVSHFWLYHO\
respectively >@
[17]. &
C. 9ODFKDV
Vlachas
WKHUHE\HQKDQFLQJSDWLHQWRXWFRPHVDQGUHGXFLQJWKHVWUDLQRQ
thereby enhancing patient outcomes and reducing the strain on DQG
and FROOHDJXHV
colleagues H[SDQGHG
expanded RQ on WKHLU
their ILQGLQJV
findings E\by XVLQJ
using WKUHH
three
KHDOWKFDUHV\VWHPV9DOLGDWLQJWKHPRGHO
healthcare systems. Validating the model'sVJHQHUDOL]DELOLW\DQG
generalizability and GLIIHUHQW
different PHGLFDO
medical GDWD
data VHWV
sets: %83$
BUPA OLYHU
liver GLVHDVH
disease, 3,0$
PIMA ,QGLD
India
LQYHVWLJDWLQJLWVFOLQLFDOXWLOLW\LQWKHUHDOZRUOGFDQEHWKHIRFXV
investigating its clinical utility in the real world can be the focus GLDEHWHVDQG8&,KHDUWGLVHDVH8VLQJWKHVDPHGDWDVHWWKH\
diabetes, and UCI heart disease. Using the same dataset, they
RI
of IXWXUH
future UHVHDUFK
research. 8OWLPDWHO\
Ultimately, WKHthe LQFRUSRUDWLRQ
incorporation RI of PDFKLQH
machine FDOFXODWHG
calculated 5)RF DFFXUDF\
accuracy WRto EH
be
66.65% >@
[18]. $GGLWLRQDOO\
Additionally, WKH
the
OHDUQLQJEDVHG
learning-based WRROV
tools LQWR
into FOLQLFDO
clinical SUDFWLFH
practice FRXOG
could UHYROXWLRQL]H
revolutionize UHVXOWV
results IRU
for 690
SVM, ;*%
XGB, DQGand 5)
RF ZHUH
were
79.47%,
76.83%, DQGand
WKH
the GLDJQRVLV
diagnosis DQGand PDQDJHPHQW
management RI of OLYHU
liver GLVHDVH
disease, UHVXOWLQJ
resulting LQ
in UHVSHFWLYHO\LQWKHVDPHGDWDVHWWKDW-=KDRDQGKLV
80.35%, respectively, in the same dataset that J. Zhao and his
HQKDQFHGSDWLHQWRXWFRPHVDQGTXDOLW\RIOLIH
enhanced patient outcomes and quality of life. FROOHDJXHVXVHGLQWRSUHGLFWOLYHUGLVHDVHSUREOHPV>@
colleagues used in 2022 to predict liver disease problems [19].
-6LQJKDQG..DQJUDWZR,QGLDQVFLHQWLVWVH[SDQGHGWKHLU
J. Singh and K. Kangra, two Indian scientists, expanded their
,, LITERATURE6859(<
II. /,7(5$785( SURVEY UHVHDUFKWRLQFOXGHDJUHDWHUQXPEHURISUHGLFWLYHFODVVLILHUV
research to include a greater number of predictive classifiers.
$V
As can be seen in Table 1, numerous UHVHDUFKHUV
FDQ EH VHHQ LQ 7DEOH QXPHURXV researchers KDYH
have 7KH\DFKLHYHG690DFFXUDF\RI1%DFFXUDF\RI
They achieved SVM accuracy of 58%, NB accuracy of 55%,
H[DPLQHG
examined WKH the %83$
BUPA GDWDVHW
dataset RYHU
over WKH
the SDVW
past GHFDGHV
decades WRto SUHGLFW
predict '7
DT DFFXUDF\
accuracy RIof
68%, 5)RF DFFXUDF\
accuracy RI of
73%, /5LR DFFXUDF\
accuracy RIof
OLYHUGLVHDVH67KDLSDUQLWDQGDFROOHDJXHXVHGILYHFODVVLILHU
liver disease. S. Thaipamit and a colleague used five classifier
68%, DQG
and .11
KNN DFFXUDF\
accuracy RIof
62% >@
[20]. 7KLV
This H[SHULPHQW
experiment ZDV
was
PHWKRGV
methods WR to FRPSDUH
compare KRZ how ZHOO
well WKHVH
these PRGHOV
models SHUIRUPHG
performed. FRQWLQXHG
continued ZLWK
with D
a KLJKHU
higher QXPEHU
number RI of FODVVLILHUV
classifiers DQG
and LPSURYHG
improved
7UHHV5)
Trees.RF KDV has WKH
the EHVW
best DFFXUDF\
accuracy RI of WKH
the UHSRUWHG
reported FODVVLILHUV
classifiers UHVXOWV
results. 7KLV
This '7
DT GLVFRYHU\
discovery RXWSHUIRUPV
outperforms HDUOLHU
earlier ZRUN
work RQon WKH
the
>@
(75.76%) [7]. 7
T. 0M. .DPUX]]DPDQ
Kamruzzaman HW et DO
al., D
a %DQJODGHVKL
Bangladeshi LQYHVWLJDWHGGDWDVHW
investigated dataset.
UHVHDUFKWHDPDGYDQFHGWKHLUZRUNXVLQJWZRGDWDVHWVUHODWHG
research team, advanced their work using two datasets related
WROLYHUGLVHDVHDQGIRXUFODVVLILHUDOJRULWKPV6901%.11
to liver disease and four classifier algorithms: SVM, NB, KNN, 7$%/($&&85$&<2)/'35(',&7,2186,1*',))(5(17
TABLE 1. ACCURACY OF LD PREDICTION USING DIFFERENT
0$&+,1(/($51,1*$/*25,7+06
MACHINE LEARNING ALGORITHMS
DQG
and '7
DT. 7KH
The GDWDVHW
dataset ZLWK
with WKH
the IRXU
four HYROXWLRQ
evolution SDUDPHWHUV
parameters WKDW
that $XWKRU1DPH
Author Name 0HWKRG
Method $FFXUDF\
Accuracv (%)
ZHUHWDNHQLQWRDFFRXQWLQWKLVVWXG\ZDVDOVRDQDO\]HG7KH\
were taken into account in this study was also analyzed. They 5)
RF
75.77
HVWLPDWHG
estimated WKHthe 690
SVM, 1% NB, .11
KNN, DQG and '7DT DFFXUDFLHV
accuracies RQon WKH
the 6DWWDUSRRP7KDLSDUQLWHWDO>@
Sattarpoom Thaiparuit et al. [7]
VWXGLHG
studied GDWDVHW
dataset WR to EH
be
73.26%, 68%,
75.19%, DQGand
60.76%, 690
SVM
73.26
respectively [8].$QRWKHUUHVHDUFKHU0.5DPDQGPHPEHUV
UHVSHFWLYHO\>@ Another researcher, M.K. Ram, and members 70.DPUX]]DPDQHWDO>@
T. M. Kamruzzaman et al. [8]
1%
NB
68
RIKLVWHDPXVHGWHQUHODWHGPHWKRGVRIRXUUHVHDUFKWRZRUN .11
KNN
75.19
of his team used ten related methods of our research to work '7
DT 60.76
ZLWK
with WKH
the %83$
BUPA DQG and ,/3'
ILPD GDWDVHWV
datasets. 7KHThe DFFXUDF\
accuracy RI
of WKH
the 0/3
MLP 69.54
UHVXOWLQJYDOXHIRUWKH%83$RIWKHVHDSSOLHGDOJRULWKPVZDV
resulting value for the BUPA of these applied algorithms was .11
KNN
73.50
69.54%,73.50%, 72.30%, 74.60%, 69.34%, 83.65%, 67.23%, /5
LR
72.30
65.67%, and 71.34%, respectively [9].5HFHQWO\
DQGUHVSHFWLYHO\>@ $3DQ
Recently (2022), A. Pan '7
DT
69.34
5)
RF
74.60
DQGKHUFROOHDJXHVFRQGXFWHGUHVHDUFKRQWZRGDWDVHWV,QGLDQ
and her colleagues conducted research on two datasets: Indian 0\ODYDUDSX.DO\DQ5DPHWDO>@
Mylavarapu Kalyau Ram, et al. [9] *%
GB 69.40
DQG$PHULFDQ7KH\H[DPLQHGDQXPEHURI0/FODVVLILFDWLRQ
and American. They examined a number of ML classification- 690
SVM
83.65
EDVHG
based SDUDPHWHUV
parameters. 9DULRXV
Various 0/$V
MLAs, LQFOXGLQJ
including /5LR, 1%
NB, .11
KNN, 1%
NB
67.23
690
SVM, *%GB, DQGand 5)
RF, DFKLHYHG
achieved WKH the PRVW
most LPSRUWDQW
important SDUDPHWHU
parameter $%
AB
65.67
;*%
XGB
71.34
DFFXUDF\
accuracy RI of WKH
the $PHULFDQ
American GDWDVHW
dataset, ZLWK
with
69.86%,
77.1%, /5
LR 69.86
70.72%,
75.94%,
88.7%, DQGand
100%, UHVSHFWLYHO\
respectively >@
[10]. %
B. 9
V. 1%
NB
77.1
5DPDQD
Ramana HW et DO
al, GHPRQVWUDWHG
demonstrated WKH the DFFXUDF\
accuracy RI of 1%
NB, .11
KNN, DQG
and $ULWUD3DQHWDO>@
.11
KNN
70.72
Aritra Pau et al. [10] 690
690
SVM RQ on WZR
two GDWDVHWV
datasets, RQH
one RI
of ZKLFK
which ZDVwas %83$
BUPA, DV as
51.59%, SVM 75.94
*%
GB
88.7
DQGUHVSHFWLYHO\>@(0+DVKHPDQG06
57.97%, and 62.5%, respectively [11]. E. M. Hashem and M. S. 5)
RF 100
0DEURXNRQWKHRWKHUKDQGXVHGWKH690PRGHORQWKH%83$
Mabrouk, on the other hand, used the SVM model on the BUPA 1%
NB
51.59
DQG,/3'GDWDVHWV2QWKHWHVWHGGDWDVHW
and ILPD datasets. On the tested dataset (BUPA), %83$ WKHSURSRVHG
the proposed %HQGL9HQNDWD5DPDQDHWDO>@
Beudi Veukata Ramaua et al. [11] .11
KNN
57.97
PRGHO
model DWWDLQHG
attained 70% DFFXUDF\
accuracy >@
[12]. 6
S. %DKUDPLUDG
Bahramirad DQGand KHU
her 690
SVM
62.6
FROOHDJXHV
colleagues FDUULHG
carried RXW
out WKH
the VWXG\
study XVLQJ
using WZR
two GDWDVHWV
datasets, RQH
one RI
of (VUDD0+DVKHPHWDO>@
Esraa M.Hashem et al. [121 690
SVM
70
ZKLFKZDVWKH%83$GDWDEDVH2QO\WKUHHRIWKHWZHOYH0/
which was the BUPA database. Only three of the twelve ML 690
SVM
69.23
6LQD%DKUDPLUDGHWDO>@ /5
LR
69.57
DSSURDFKHV
approaches WKH\ they XVHG 690 /5
used (SVM, LR, DQG
and 0/3MLP) PDWFKHG
matched RXU
our Siua Bahramirad, et al. [13]
0/3
MLP 68.84
119
119
Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 09,2024 at 10:30:37 UTC from IEEE Xplore. Restrictions apply.
0G5H]ZDQXO+DTXHHWDO>@
Md. Rezwanul Haque et al. [14] 5)
RF
80 RXWOLHUVDQGLPSXWHGWKHPE\WUDQVIRUPLQJWKHGDWDZLWKDORJ
outliers and imputed them by transforming the data with a log
/5
LR
66.7 RUSRZHUWUDQVIRUPDWLRQ
or power transformation.
-XQDLG1DVLUHWDO>@
Junaid Nasir et al. [15] 690
SVM
71.6
.11
KNN
67.0 7$%/($775,%87(6$1''(6&5,37,212)7+(%83$'$7$6(7
TABLE 2: ATTRIBUTES AND DESCRIPTION OF THE BUPA DATASET
690
SVM
70
0G5HVKDG5H]DHWDO>@
Md. Reshad Reza, et al. [16] .11
KNN
79.61 $WWULEXWH
Attribute $WWULEXWH
Attribnte 'DWD
Data $WWULEXWH
Attribnte
0RKDPPDG)DWKLHWDO>@ 690 5DQJH
Range
Mohammad Fathi et al. [171 SVM 83.5 QXPEHU
nnmber 1DPH
Name 7\SH
Tvoe ,QIRUPDWLRQ
Information
&KULVWRGRXORV9ODFKDVHWDO>@
Christodoulos Vlachas et al. [18] 5)
RF
66.65 0HDQ&RUSXVFXODU
Mean Corpuscular
I 0&9
MCV ,QW
Tnt >@
[65,103]
690
SVM
79.47 9ROXPH
Volume
-LQJ=KDRHWDO>@
Jing Zhao et al. [19] ;*%
XGB
76.83 $ONDOLQH
Alkaline
2 $ONSKRV
Alkphos ,QW
Tnt >@
[23,138]
5)
RF
80.35 SKRVSKRWDVH
nhosnhotase
690
SVM
58 $ODPLQH
Alamine
3 6JSW
Sgpt ,QW
Tnt >@
[4,155]
1%
NB
55 $PLQRWUDQVIHUDVH
Aminotransferase
-DVZLQGHU6LQJKDQG.LUWL.DQJUD
Jaswinder Singh and Kirti Kangra '7
DT
68 $VSDUWDWH
Aspartate
>@
[20] 5)
RF
73
4 6JRW
Sgot ,QW
Tnt $PLQRWUDQVIHUDVH
Aminotransferase >@
[5,82]
/5
LR
68
.11
KNN
62 *DPPDJOXWDP\O
Gamma-glutamyl
'7
DT
86.67
5 *DPPDJW
Gammagt ,QW
Tnt 7UDQVSHSWLGDVH
Transpeptidase >@
[5,297]
.11
KNN
83.34
5)
RF
80.00 1XPEHURIKDOI
Number of half-
0/3
MLP
76.67 SLQWHTXLYDOHQWVRI
pint equivalents of
1%
NB
76 .67
6 'ULQNV
Drinks )ORDW
Float DOFRKROLF
alcoholic >@
[0.0,20.0]
;*%
XGB
76.67 EHYHUDJHV
beverages
7KHDSSOLHG$OJRULWKPV
The applied Algorithms $%
AB
76 .67
*%
GB
76.67 IRU/LYHU
I: for Liver
(7
ET
73.33 /L
Li
7 ,QW
Tnt SUREOHPIRUQR
problem, 0: for no >@
[0,1]
/5
LR
73.33 YHUBSUREOHP
vcrproblem /LYHUSUREOHP
Liver problem
690
SVM
73.33
/*%0
LGBM 70.00
&
C. $OJRULWKPV,PSOHPHQWHG
Algorithms Implemented
,,,
III. 352326('
PROPOSED0(7+2'2/2*<
METHODOLOGY 1) /RJLVWLF
Logistic UHJUHVVLRQ
regression: ,W¶V
It's Da FODVVLILHU
classifier WKDW
that SUHGLFWV
predicts WKH
the
RXWSXWLQELQDU\IRUP LHDQG
output in binary form (i.e., 0 and 1).,WXVHVWKHVLJPRLGIXQFWLRQ
It uses the sigmoid function
WRGHWHUPLQHWKHRXWFRPHLQWKHIRUPRISUREDELOLW\IURPDOLQHDU
to determine the outcome in the form ofprobability from a linear
HTXDWLRQ7KHHTXDWLRQLVIRUPHGE\XVLQJZHLJKWVZLWKWKHLQSXW
equation. The equation is formed by using weights with the input
IHDWXUH
feature DQGDGGLQJELDV
and adding bias WR
to LW,W
it. It LV
is VLPSOHWR
simple to LQWHUSUHW
interpret, DQGWKH
and the
ZHLJKWDVVLJQHGWRHDFKIHDWXUHUHIOHFWVWKHFRQWULEXWLRQRIWKDW
weight assigned to each feature reflects the contribution of that
IHDWXUH
feature WR
to WKH
the RXWFRPH
outcome. +RZHYHU
However, RQH one VKRXOG
should PDNH
make VXUH
sure DOO
all
PLVVLQJYDOXHVDUHLPSXWHGEHIRUHWUDLQLQJWKHPRGHO>@
missing values are imputed before training the model [21].
2) 'HFLVLRQ
Decision WUHH
tree: 7KH
The GHFLVLRQ
decision WUHH
tree LV
is D
a FODVVLILFDWLRQ
classification
DOJRULWKPWKDWZRUNVRQWKHSULQFLSOHRIUHSHDWHGO\VSOLWWLQJWKH
algorithm that works on the principle ofrepeatedly splitting the
GDWDVHWRQWKHEDVLVRIWKHIHDWXUHV,WIROORZVDWUHHOLNHPRGHO
dataset on the basis of the features. It follows a tree-like model
DQGVHOHFWVWKHEHVWVSOLWXQWLOWKHVWRSSLQJFULWHULDDUHPHW,W
and selects the best split until the stopping criteria are met. It
KDV
has Da ORW
lot RI
of DGYDQWDJHV
advantages, LQFOXGLQJ
including EHLQJ
being YHU\
very VLPSOH
simple WR
to
)LJ6KRZVWKHDUFKLWHFWXUHRIWKHDOJRULWKPV
Fig. 1. Shows the architecture of the algorithms XQGHUVWDQGDQGLQWHUSUHW$OVR\RXFDQXVHERWKFDWHJRULFDODQG
understand and interpret. Also, you can use both categorical and
FRQWLQXRXVGDWDEXWQRWLQGDWDVHWVZLWKPXOWLSOHUHODWLRQVKLSV
continuous data. but not in datasets with multiple relationships
:H
We ILUVW
first FROOHFWHG
collected WKH
the GDWD
data DQG
and WKHQ
then SUHSURFHVVHG
pre-processed LWit E\
by >@
[21].
FKHFNLQJ
checking IRUfor PLVVLQJ
missing YDOXHV
values. ,Q
In WKLV
this GDWDVHW
dataset, WKHUH
there ZHUH
were QRno
PLVVLQJ
missing YDOXHV
values. 7KHQ
Then ZH
we VSOLW
split WKH
the GDWD
data LQWR
into WZR
two VHWV
sets: RQH
one IRU
for 3) 5DQGRP
Random )RUHVW
Forest: 5)
RF LV
is DQ
an HQVHPEOH
ensemble FODVVLILHU
classifier WKDW
that ILUVW
first
WHVWLQJ
testing DQG
and DQRWKHU
another IRU
for WUDLQLQJ
training. )ROORZLQJ
Following WKDW
that, ZH
we UDQ
ran RXU
our FUHDWHVDERRWVWUDSGDWDVHWIURPWKHRULJLQDOGDWDVHWE\FKRRVLQJ
creates a bootstrap dataset from the original dataset by choosing
WUDLQLQJ
training VHW
set WKURXJK
through YDULRXV
various DOJRULWKPV
algorithms DQG
and TXDQWLILHG
quantified WKH
the YDULDEOH
variable VXEVHWV
subsets UDQGRPO\
randomly. 7KHQ
Then LW
it FUHDWHV
creates PXOWLSOH
multiple GHFLVLRQ
decision
SHUIRUPDQFHSDUDPHWHUYDOXHV)LJVKRZVWKHZRUNIORZRIWKH
performance parameter values. Fig. 1 shows the workflow ofthe WUHHV
trees E\
by WDNLQJ
taking GDWD
data UDQGRPO\
randomly IURP
from WKH
the ERRWVWUDS
bootstrap GDWDVHW
dataset DQG
and
ZRUN
work. SUHGLFWLQJ
predicting WKH
the UHVXOW
result E\
by DQDO\]LQJ
analyzing WKH
the RXWFRPHV
outcomes RIof DOO
all WKH
the
GHFLVLRQWUHHV7KLVLVYHU\KHOSIXODVLWUHGXFHVRYHUILWWLQJDV
decision trees. This is very helpful as it reduces overfitting as
$
A. 'DWDVHWFROOHFWLRQ
Dataset collection FRPSDUHGWRDVLQJOHGHFLVLRQWUHHDQGFDQXVHERWKFDWHJRULFDO
compared to a single decision tree and can use both categorical
+HUHLQWKH%83$GDWDVHWWKHUHDUHDWRWDORIDWWULEXWHVDQG
Here in the SUPA dataset, there are a total of 7 attributes and DQG
and FRQWLQXRXV
continuous GDWD
data. ,W
It VXIIHUV
suffers IURP
from WKH
the FKDOOHQJH
challenge RI of
345 HQWULHV
entries RI
of PDOH
male SDWLHQWV
patients WDNHQ
taken IURP
from WKH
the 8&,
UCI UHSRVLWRU\
repository. LQWHUSUHWDELOLW\
interpretability. 2QH
One FDQQRW
cannot GHWHUPLQH
determine WKH
the LPSRUWDQFH
importance RIof D
a
,QIRUPDWLRQDERXWWKHGDWDVHWLVJLYHQLQ7DEOH
Information about the dataset is given in Table 2. SDUWLFXODUIHDWXUHEDVHGRQWKHRXWFRPH>@
particular feature based on the outcome [21].
%
B. 'DWDSUHSURFHVVLQJ
Data pre-processing 4) 690
SVM: 690
SVM LV is RQH
one RI
of WKH
the PRGXOHV
modules LQ
in WKH
the PDFKLQH
machine
OHDUQLQJWHFKQLTXHZKLFKXVHVLWWRFODVVLI\WKHREMHFWVLQWKH
learning technique, which uses it to classify the objects in the
)LUVW
First, ZH
we FKHFNHG
checked WKH
the FROOHFWHG
collected GDWD
data IRU
for PLVVLQJ
missing YDOXHV
values,
JLYHQ
given GDWDVHW
dataset LQWR
into WZR
two RU
or PRUH
more FDWHJRULHV
categories; KHUH
here, WKH
the WZR
two
GXSOLFDWHYDOXHVRXWOLHUVRULQFRUUHFWHQWULHVWKDWFRXOGLPSHGH
duplicate values, outliers, or incorrect entries that could impede
FDWHJRULHV
categories DUH
are
1 IRU
for D
a SHUVRQ
person ZLWK
with OLYHU
liver GLVHDVH
disease DQG
and
0 IRU
for D
a
WKH
the GHVLUHG
desired DFFXUDF\
accuracy. 6R
So, EHIRUH
before ZH
we EHJDQ
began, ZH
we FDUHIXOO\
carefully
UHPRYHGWKHIDXOW\HQWULHVIURPWKH%83$GDWDVHW7KHUHZHUH KHDOWK\SHUVRQ7KHEDVLFSULQFLSOHLVWKDWLWXVHVDOLQHNQRZQ
healthy person. The basic principle is that it uses a line known
removed the faulty entries from the SUPA dataset. There were
IRXUURZVWKDWZHUHGXSOLFDWHG)ROORZLQJWKDWZHFKHFNHGIRU DVDK\SHUOLQHLQ'DQGDK\SHUSODQHLQ'E\PD[LPL]LQJ
as a hyper-line in 2D and a hyperplane in 3D by maximizing
four rows that were duplicated. Following that, we checked for
120
120
Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 09,2024 at 10:30:37 UTC from IEEE Xplore. Restrictions apply.
WKHPDUJLQKHUHWKHREMHFWLVDSRLQWLQQGLPHQVLRQDOVSDFH
the margin; here the object is a point in n-dimensional space, WKHQFDOFXODWHVWKHZHLJKWRIWKHKLGGHQOD\HUDQGZRUNVZLWK
then calculates the weight of the hidden layer, and works with
DQGLWVFRRUGLQDWHUHSUHVHQWVWKHIHDWXUHRIDJLYHQGDWDVHWLW
and its coordinate represents the feature of a given dataset; it DQRQOLQHDUDFWLYDWLRQIXQFWLRQWRJLYHWKHILQDORXWSXW,WZRUNV
a non-linear activation function to give the final output. It works
XVHV D
uses NHUQHO function
a kernel IXQFWLRQ to WR map
PDS the
WKH GDWD DQG the
data, and WKH DGYDQWDJH
advantage of RI ZHOO with
well ZLWK ERWK
both VLPSOH
simple DQG
and FRPSOH[
complex data. GDWD Also,
$OVR it
LW FDQ KDQGOH
can handle
XVLQJ 690
using SVM LV WKDW LW
is that it LV UREXVW IOH[LEOH
is robust, HIILFLHQW and
flexible, efficient, DQG KLJKGLPHQVLRQDOGDWDVHWV+RZHYHULWLVVHQVLWLYHWRVHOHFWLRQ
high-dimensional datasets. However, it is sensitive to selection
JHQHUDOL]HG>@
generalized [16]. K\SHUSDUDPHWHUVDQGPD\DIIHFWWKHUHVXOW>@
hyperparameters and may affect the result [21].
5) KNN:
.11.11LVRQHRIWKHFODVVLILFDWLRQDOJRULWKPVXVHG
KNN is one of the classification algorithms used
12) LGBM:
/*%0 LightGBM
/LJKW*%0 LV is D SRZHUIXO 0/
a powerful WHFKQLTXH that
ML technique WKDW
LQPDFKLQHOHDUQLQJ,WXVHVWKHGLVWDQFHEHWZHHQWKHGDWDSRLQWV
in machine learning. It uses the distance between the data points SHUIRUPVZHOORQODUJHVFDOHDQGKLJKGLPHQVLRQDOGDWDVHWV,WV
performs well on large-scale and high-dimensional datasets. Its
IRU LWV
for SUHGLFWLRQ The
its prediction. 7KH (XFOLGHDQ
Euclidean GLVWDQFH PHWKRG LV
distance method XVHG to
is used WR HIILFLHQF\VFDODELOLW\DQGDFFXUDF\PDNHLWDIDYRULWHFKRLFH
efficiency, scalability, and accuracy make it a favorite choice
FRPSXWHGLVWDQFH,WZRUNVE\VHOHFWLQJWKHQHDUHVWQHLJKERUWR
compute distance. It works by selecting the nearest neighbor to IRU UHVHDUFKHUV IRU
for researchers PDQ\ types
for many W\SHV of RI DSSOLFDWLRQV
applications VXFK
such as DV image
LPDJH
WKH new
the QHZ given
JLYHQ data
GDWD point
SRLQW DQG XVLQJ the
and using WKH JLYHQ
given FODVVHV
classes to WR FODVVLILFDWLRQ QDWXUDO ODQJXDJH
classification, natural SURFHVVLQJ HWF
language processing, %XW it
etc. But LW needs
QHHGV
GHWHUPLQHWKHFODVVRIWKHQHZSRLQW,WLVRQHRIWKHVLPSOHVW
determine the class of the new point. It is one of the simplest PRUHPHPRU\WRFRPSXWH>@
more memory to compute [22].
PHWKRGVDQGZRUNVRQERWKFRQWLQXRXVDQGFDWHJRULFDOGDWDLQ
methods and works on both continuous and categorical data. in
WKLVFDVH+RZHYHUFKRRVLQJµN¶LVUHVSRQVLEOHIRUDOJRULWKP
this case. However, choosing 'k' is responsible for algorithm )3HUIRUPDQFH0HDVXUHPHQW
F. Performance Measurement
RYHUILWWLQJRUXQGHUILWWLQJZKLFKKLQGHUVLWVDFFXUDF\>@
overfitting or underfitting, which hinders its accuracy [21]. 3HUIRUPDQFH PHDVXUHPHQW is
Performance measurement LV based
EDVHG RQ WKH FRQVWLWXHQW
on the constituent RI of
6) Naive
1DLYH Bayes:
%D\HV 7KH
The DOJRULWKP ZRUNV by
algorithm works E\ calculating
FDOFXODWLQJ theWKH FRQIXVLRQ PHWULFV 7KH
confusion metrics. IROORZLQJ constitutes
The following FRQVWLWXWHV DUH
are 7UXH
True
SUREDELOLW\%DVHGRQWKHLQSXWIHDWXUHVWKHSUREDELOLW\RIHDFK
probability. Based on the input features, the probability of each SRVLWLYH 73 7UXHQHJDWLYH
positive(TP), 71 )DOVHSRVLWLYH
True negative (TN), )3 DQG)DOVH
False positive(FP), and False
FODVVLVFDOFXODWHGDQGWKHSUREDELOLW\RIWKHPD[LPXPFODVVLV
class is calculated, and the probability of the maximum class is QHJDWLYH )1
negative(FN).
UHSRUWHGDVDQRXWFRPH,WLVVLPSOHIDVWDQGUREXVWWRQRLVH
reported as an outcome. It is simple, fast, and robust to noise,
A
ccuracy
ൌ
ሺ்ேା்ሻ
(TN+TP)
EXW there
but WKHUH are
DUH VRPH
some VKRUWFRPLQJV
shortcomings, VXFK such asDV naive
QDwYH DVVXPSWLRQV
assumptions, = ሺிା்ା்ேାிேሻ
(FP + TP + TN + FN) (1)
OLPLWHGH[SUHVVLYHSRZHURYHUILWWLQJDQGSRRUHVWLPDWLRQ>@
limited expressive power, overfitting, and poor estimation [21]. Precision ்
ൌ
= __ TP_ _
(2)
7) AdaBoost:
$GD%RRVW,WLVDVHTXHQWLDOSURFHVVRIFODVVLILFDWLRQDQG
It is a sequential process of classification, and ሺிା்ሻ
(FP+TP)
்
DOOWKHFRPELQHGPRGHOVGRQRWKDYHWKHVDPHZHLJKWGXULQJ
all the combined models do not have the same weight during ሺሻ ൌ
Recall(sensitivity) = TP
(3)
ሺ்ାிேሻ (TP+FN)
WKH RXWFRPH
the outcome. ,Q WKH algorithm,
In the DOJRULWKP aD FXOWLYDWHG WUHH has
cultivated tree KDV RQH URRW
one root ଶ
ͳെ
Fl - ൌ
Score భ
1 2 భ
1
(4)
QRGHDQGWZROHDIQRGHV:HDNPRGHOVDUHFRPELQHGWRJLYH
node and two leaf nodes. Weak models are combined to give ା
Precision + ೃೌ
ುೝೞ Recall
WKHILQDOUHVXOWDQGLPSURYHDFFXUDF\,WZRUNVZHOOZLWKERWK
the final result and improve accuracy. It works well with both
FRQWLQXRXVDQGFDWHJRULFDOGDWD%XWLWLVVHQVLWLYHWRQRLVHDQG
continuous and categorical data. But it is sensitive to noise and ,9 02'(/(9$/8$7,21
IV. MODEL EVALUATION5(68/76
RESULTS
RXWOLHUV>@
outliers [22]. ,Q this
In WKLV VHFWLRQ ZH discussed
section, we GLVFXVVHG the
WKH result
UHVXOW DUULYHG EDVHG RQ
arrived based on
8) *UDGLHQWERRVWLQJ,QWKLVFODVVLILHUOHDUQLQJKDSSHQVE\
Gradient boosting: In this classifier, learning happens by GLIIHUHQWSDUDPHWHUV
different parameters.
RSWLPL]LQJ the
optimizing WKH loss
ORVV function.
IXQFWLRQ 7KHThe VHTXHQWLDO GHFLVLRQ tree
sequential decision WUHH is
LV
EXLOW with
built ZLWK OHDI QRGHV ranging
leaf nodes UDQJLQJ from
IURP WR
8 to 32. 7KLV PRGHO FDQ
This model can 100
KDQGOH both
handle ERWK FDWHJRULFDO
categorical and DQG FRQWLQXRXV
continuous GDWD
data DQG EDODQFH the
and balance WKH
LPEDODQFHG GDWD
imbalanced $OVR handle
data. Also, KDQGOH the
WKH unprocessed
XQSURFHVVHG GDWDVHW
dataset, butEXW ~
80
VRPHWLPHV
sometimes these WKHVH hyperparameters
K\SHUSDUDPHWHUV FDQ JUHDWO\ DIIHFW
can greatly affect theWKH
SUHGLFWLRQDFFXUDF\RIWKHPRGHO>@
prediction
9) XGB
;*%
accuracy of the model [22]. ~
: ,W
It LV
is D YHU\ IDPRXV IUDPHZRUN DV as LW SURYLGHV
ODQJXDJH VXSSRUW
language
V\VWHPV
support DQG
a very
and FDQ
)XUWKHUPRUH LW
systems. Furthermore,
famous framework
can beEH HDVLO\
LV D
it is
easily LQWHJUDWHG
a JUDGLHQW
integrated with
it provides
ZLWK many
ERRVW DGYDQFHPHQW
gradient boost
PDQ\
advancement LQ in
.'"
u>.
U
::l
60
u
WHUPVRIVSHHGDQGSHUIRUPDQFH7KHUHLVSDUDOOHOL]DWLRQFDFKH
terms of speed and performance. There is parallelization, cache -e 40
RSWLPL]DWLRQ DQG
optimization, and RXWRIPHPRU\
out-of-memory FRPSXWDWLRQ
computation to WR increase
LQFUHDVH LWV
its
VSHHG$QGLWZRUNVZLWKDUHJXODWRUZKLFKSUHYHQWVRYHUILWWLQJ
speed. And it works with a regulator, which prevents overfitting 20
DQGDXWRSUXQLQJRIWKHWUHH,QWKLVZD\WKHDOJRULWKPLVPRUH
and auto-pruning of the tree. In this way, the algorithm is more
UREXVW HYHQ
robust, WKRXJK LW
even though FDQ handle
it can KDQGOH missing
PLVVLQJ values
YDOXHV HDVLO\ DQG
easily and
HIIHFWLYHO\>@
effectively [22].
10) ([WUDWUHH,WLVEDVHGRQDKHDYLO\UDQGRPL]HGVDPSOH
Extra tree: It is based on a heavily randomized sample
VHOHFWLRQ+HUHXQOLNHWKHWUDGLWLRQDOGHFLVLRQWUHHDOJRULWKP
selection. Here, unlike the traditional decision tree algorithm, App lied Algorithms
GDWD is
data LV taken
WDNHQ LQin aD randomized
UDQGRPL]HG manner,
PDQQHU which
ZKLFK UHGXFHV
reduces the WKH )LJ&RPSDULVRQVRIDSSOLHGDOJRULWKPVDFFXUDFLHV
Fig. 2. Comparisons of applied algorithms accuracies
YDULDQFHRIWKHGHFLVLRQWUHH7KXVE\FUHDWLQJPXOWLSOHVXFK
variance of the decision tree. Thus, by creating multiple such
WUHHV LW $ Accuracy
A. $FFXUDF\
trees, it LQFUHDVHV
increases DFFXUDF\
accuracy DQG SUHYHQWV overfitting.
and prevents RYHUILWWLQJ ,W ZRUNV
It works
ZHOO with
well ZLWK both
ERWK FDWHJRULFDO
categorical and DQG continuous
FRQWLQXRXV GDWD
data DQG DOVR
and also ,Q this
In WKLV VWXG\ ZH performed
study, we SHUIRUPHG RQ WZHOYH VXSHUYLVHG
on twelve supervised OHDUQLQJ
learning
GHWHUPLQHVDQRQOLQHDUUHODWLRQVKLSDPRQJIHDWXUHV+RZHYHU
determines a non-linear relationship among features. However, DOJRULWKPV
algorithms (DT,'7 (7
ET, .11
KNN, /5
LR, 0/3 1% 690
MLP, NB, ;*% AB,
SVM, XGB, $%
LW is
it LV computationally
FRPSXWDWLRQDOO\ time-consuming
WLPHFRQVXPLQJ as DV there
WKHUH are
DUH multiple
PXOWLSOH *%/*%0DQG5)
GB, LGBM, and RF),DQGWKHLUDFFXUDFLHV LQSHUFHQWDJH
and their accuracies (in percentage) DUH
are
FUHDWLRQVRIGHFLVLRQWUHHVDQGGXHWRVXFKUDQGRPQHVVLWODFNV
creations of decision trees, and due to such randomness, it lacks
86.67, 73.33, 83.34, 73.33, 76.67, 76.67, 73.33, 76.67, 76.67,
WKHVDPHLQWHUSUHWDELOLW\DVWUDGLWLRQDO'7>@
the same interpretability as traditional DT [23]. DQGUHVSHFWLYHO\,QWKLVFDVH'7KDGWKHKLJKHVW
76.67,70, and 80, respectively. In this case, DT had the highest
DFFXUDF\RIIROORZHGE\.11DWDQG5)DW
accuracy of 86.67%, followed by KNN at 83.74%, and RF at
11) 0/3,WLVDIHHGIRUZDUGQHXUDOQHWZRUNWKDWFRPSULVHV
MLP: It is a feed-forward neural network that comprises
/*%0KDGWKHORZHVWDFFXUDF\RI7KHUHPDLQLQJ
80%. LGBM had the lowest accuracy of 70%. The remaining
aDODUJHQXPEHURIOD\HUV+HUHRQHILUVWFKHFNVWKHLQSXWYDOXHV
large number oflayers. Here, one first checks the input values
121
Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 09,2024 at 10:30:37 UTC from IEEE Xplore. Restrictions apply.
DOJRULWKPV LQFOXGLQJ
algorithms, including 0/31% ;*% AB,
MLP, NB, XGB, $% DQG *% were
and GB, ZHUH at
DW WKHKLJKHVWUHFDOORIIROORZHGE\.11DW7KHORZHVW
the highest recall of 0.87, followed by KNN at 0.83. The lowest
(7
76.67%. ET, /5 DQG 690
LR, and KDG 73.33%
SVM had DFFXUDF\
accuracy. )LJ
Fig.
2 YDOXH RI
value was
of 0.70 ZDV found
IRXQG LQ WKH /*%0
in the LGBM. The 7KH remaining
UHPDLQLQJ
SURYLGHVDSLFWRULDOUHSUHVHQWDWLRQRIWKHGDWDLQWKHIRUPRID
provides a pictorial representation of the data in the form of a DOJRULWKPV UDQJHG IURP
algorithms ranged from
0.80-0 .73. )LJ
Fig.
4 VKRZV
shows aD pictorial
SLFWRULDO
EDUJUDSK
bar graph. UHSUHVHQWDWLRQRIWKHGDWDLQWKHIRUPRIDEDUJUDSK
representation of the data in the form of a bar graph .
% Precision
B. 3UHFLVLRQ ' Fl-
D. )6FRUH
Score
:HZRUNHGRQWZHOYHPRGHOV
We worked on twelve models (DT, '7(7.11/50/31%
ET, KNN, LR, MLP, NB, :H
We TXDQWLILHG WKH )6FRUH
quantified the Fl-Score RI WZHOYH models
of twelve PRGHOV (DT,
'7 (7
ET,
690 ;*% AB,
SVM, XGB, $% *%
GB, /*%0
LGBM, DQG
and 5)RF) and
DQG their
WKHLU precision
SUHFLVLRQ .11/50/31%690;*%$%*%/*%0DQG5)
KNN, LR, MLP, NB, SVM, XGB, AB, GB, LGBM, and RF),
YDOXHVZHUH
values were 0.87, 0.74, 0.87, 0.74, 0.86, 0.77, 0.79, 0.79, 0.77, DQGWKHLUYDOXHVZHUH
and their values were 0.86, 0.72, 0.82, 0.72, 0.79, 0.76, 0.72,
DQGUHVSHFWLYHO\,QWKLVFDVHZHIRXQGWKDW
0.79, 0.70, and 0.82, respectively. In this case, we found that DQGUHVSHFWLYHO\+HUHZHIRXQG
0.75, 0.76, 0.75, 0.69, and 0.79, respectively. Here, we found
'7DORQJZLWK.11KDGWKHKLJKHVWSUHFLVLRQRIZKLOH
DT, along with KNN, had the highest precision of 0.87, while '7KDGWKHKLJKHVW)6FRUHRIIROORZHGE\.11DW
DT had the highest Fl-Score of 0.86, followed by KNN at 0.82.
/*%0 had
LGBM KDG the
WKH lowest
ORZHVW precision
SUHFLVLRQ RI
of 7KH remaining
0.70. The UHPDLQLQJ 7KH
The ORZHVW YDOXH RI
lowest value WKH )6FRUH
of the LV
Fl-Score is 0.69 LQ
in /*%0
LGBM. ,Q WKH
In the
DOJRULWKPV precision
algorithms' SUHFLVLRQ values
YDOXHV ranged
UDQJHG IURP
from WR
0.74 to 0.86. )LJ
Fig.
3 UHPDLQLQJDOJRULWKPVUHVXOWVUDQJHGIURPWR)LJ
remaining algorithms, results ranged from 0.79 to 0.72. Fig. 5
HOXFLGDWHVDSLFWRULDOUHSUHVHQWDWLRQRIWKHGDWDLQWKHIRUPRID
elucidates a pictorial representation of the data in the form of a GHSLFWVDSLFWRULDOUHSUHVHQWDWLRQRIWKHGDWDLQWKHIRUPRIDEDU
depicts a pictorial representation of the data in the form of a bar
EDUJUDSK
bar graph. JUDSK
graph.
1.0 .,~--.-,...,,....,~,--~,....----r--.--......,~--.-,...,,....,---,
0.8
0.8
c 0.6
o
";;;
.....
.~
...
e, 0 .4
... 0.6
e
-
rr.
I
0.2 . .. 0.4
0.0
0.2
~'\ i:~ 1b~~ \..;~ ~,\~ ~~ s'+ ~c$' ~~ c8'\.J(,~~'- ~i
Applied Algorithms
~ 1.M7\
)LJ&RPSDULVRQVRI)6FRUHYDOXHV
Fig. 5. Comparisons ofFl-Score values
0.8 0.71, 0.71 1 0.77\ ".77 " .77
1I.n 1I.73/ &?J 9
V. ',6&866,21
DISCUSSION
7$%/(&203$5$7,9($1$/<6,62)$/*25,7+06%$6('21
TABLE 3 : COMPARATIVE ANALYSIS OF ALGORITHMS BASED ON
0.6 ',))(5(173$5$0(7(56
DIFFERENT PARAMETERS
$SSOLHG0RGHOV
Applied Models $FFXUDF\ (%)
Accuracy 3UHFLVLRQ
Precision 5HFDOO
Recall )6FRUH
FI-Score
0. 4
'7
DT
86.67
0.87
0.87
0.86
.11
KNN
83.34
0.87
0.83
0.82
0.2 5)
RF
80.0
0.82
0.80
0.79
0/3
MLP
76.67
0.86
0.77
0.79
1%
NB
76.67
0.77
0.77
0.76
0.0
;*%
XGB
76.67
0.79
0.77
0.75
\"'I' ~-( ~~"f'. ,~l. ~r\,.'~ ~~ ~~~' ~o~ ,:,-' ' ' (»"\.~~~'\ ~~
$%
AB 76.67 0.77 0.77 0.76
App lied Algo rithms *%
GB
76.67
0.79
0.77
0.75
)LJ&RPSDULVRQVRIUHFDOOYDOXHV
Fig. 4. Comparisons of recall values (7
ET
73.33
0.74
0.73
0.72
/5
LR
73.33
0.74
0.73
0.72
& 5HFDOO
C. Recall
690
SVM
73.33
0.79
0.73
0.72
+HUHZHSHUIRUPHGRQWZHOYHPRGHOV
Here, we performed on twelve models (DT, '7(7.11/5
ET, KNN, LR, /*%0
LGBM
70.0
0.70
0.70
0.69
MLP, NB, SVM, XGB, AB, GB, LGBM, DQG
0/3 1% 690 ;*% $% *% /*%0 5) DQG
and RF) WKHLU
and their
UHFDOOYDOXHVZHUH
recall values were 0.87, 0.73, 0.83,0.73,0.77,0.77,0.73,0.77,
DQGUHVSHFWLYHO\+HUHZHIRXQG'7KDG
0.77,0.77,0.70, and 0.80, respectively. Here, we found DT had ,QWKLVVWXG\ZHLPSOHPHQWHGWZHOYHPRGHOVDQGFRQGXFWHG
In this study, we implemented twelve models and conducted
DSHUIRUPDQFHSDUDPHWHUEDVHGFRPSDUDWLYHDQDO\VLV'7ZDV
a performance parameter-based comparative analysis. DT was
122
Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 09,2024 at 10:30:37 UTC from IEEE Xplore. Restrictions apply.
XVHG
used WR
to ILQG
find DFFXUDF\ UHFDOO
accuracy (86.67%), DQG
recall (0.87), and )6FRUH
Fl-Score >@
[6] ³/LYHU
"Liver 'LVHDVH
Disease 'DWDVHW
Dataset 8&,UCI OLQNSGI´
link.pdf." >2QOLQH@
[Online]. $YDLODEOH
Available:
KWWSVDUFKLYHLFVXFLHGXPOPDFKLQHOHDUQLQJGDWDEDVHVOLYHU
https://archive.ics.uci.edu/ml/machine-learning-databases/liver-
YDOXHV +RZHYHULQWHUPVRISUHFLVLRQERWKWKH.11
values (0.86). However, in terms of precision, both the KNN GLVRUGHUVEXSDGDWD
disorders/bupa.data.
DQG'7DFFRXQWVKDGWKHVDPHYDOXHRI7DEOHVKRZVWKDW
and DT accounts had the same value of 0.87. Table 3 shows that >@
[7] 67KDLSDUQLW1&KXPXDQJDQG0.HWFKDP³$&RPSDULWLYH6WXG\
S. Thaiparnit, N. Chumuang, andM . Ketcham, "A Comparitive Study
WKHDFFXUDF\DQGUHFDOOYDOXHVIRU0/31%;*%$%DQG*%
the accuracy and recall values for MLP, NB, XGB, AB, and GB RI&ODVLILFDWLRQ/LYHU'\VIXQFWLRQZLWK0DFKLQH/HDUQLQJ´LQ
of Clasification Liver Dysfunction with Machine Learning," in 2018
ZHUHDQGUHVSHFWLYHO\(7/5DQG690DOOKDG
were 76 .67% and 0.77, respectively. ET, LR, and SVM all had ,QWHUQDWLRQDO-RLQW6\PSRVLXPRQ$UWLILFLDO,QWHOOLJHQFHDQG1DWXUDO
International Joint Symposium on Artificial Intelligence and Natural
/DQJXDJH
Language 3URFHVVLQJ L6$,1/3 1RY
Processing (iSAI-NLP), Nov.
2018, SSpp. ±
1--4, GRL
doi:
73.33% DFFXUDF\
accuracy,
0.73 UHFDOO
recall, DQG
and
0.72 )6FRUH
Fl-Score YDOXHV
values,
L6$,1/3
10.1109/iSAI-NLP.20 18.8692808.
UHVSHFWLYHO\
respectively. +RZHYHU
However, SUHFLVLRQ
precision YDOXHV
values GLIIHU
differ JUHDWO\
greatly; IRU
for >@
[8] 7
T. 0
M. .DPUX]]DPDQ
Kamruzzaman, 0 M. 6S. 0DKEXE
Mahbub, DQGand 0 M. $A. +DNLP
Hakim, ³$"A
PRGHOVVXFKDV;*%690
models such as XGB, SVM, DQG*%WKHUHSRUWHGYDOXH
and GB, the reported value ZDV was 6WUXFWXUHG
Structured 0HWKRG
Method )RUFor 3UHGLFWLQJ
Predicting /LYHU
Liver 'LVHDVH
Disease 8VLQJ
Using 0DFKLQH
Machine
ZKLOHIRUPRGHOVVXFKDV1%DQG$%WKHYDOXHZDV
0.79, while for models such as NB and AB, the value was 0.77. /HDUQLQJ7HFKQLTXHV
Learning Techniques & DPS,PSURYHPHQWV,Q&RUUHFWQHVV´LQ
Improvements In Correctness," in 2021
WK
12th ,QWHUQDWLRQDO
International &RQIHUHQFH
Conference RQ on &RPSXWLQJ
Computing &RPPXQLFDWLRQ
Communication DQG and
:HIRXQGWKDW/*%0KDGVKRZQWKHORZHVWYDOXHVLQDOOIRXU
We found that LGBM had shown the lowest values in all four 1HWZRUNLQJ
Networking 7HFKQRORJLHV ,&&&17 -XO
Technologies (ICCCNT), Jul.
2021, SSpp. ±
01--07, GRL
doi:
SDUDPHWHUV
parameters. ,&&&17
10.1109/ICCCNT5l525.202l.9579809.
>@
[9] 0.5DP&6XMDQD56ULQLYDVDQG*610XUWK\³$)DFW
M. K. Ram, C. Sujana, R. Srinivas, and G. S. N. Murthy, "A Fact-
9,
VI. &21&/86,21
CONCLUSION %DVHG
Based /LYHU
Liver 'LVHDVH
Disease 3UHGLFWLRQ
Prediction E\ by (QIRUFLQJ
Enforcing 0DFKLQH
Machine /HDUQLQJ
Learning
$OJRULWKPV´YRO66P\V-0567DYDUHV5%HVWDNDQG
Algorithms," vol. 1318, S. Smys, J. M. R. S. Tavares, R. Bestak, and
7KHSUHGLFWLRQRIOLYHULOOQHVVKDVWKHSRWHQWLDOWRVDYHPDQ\
The prediction ofliver illness has the potential to save many )6KL(GV6LQJDSRUH6SULQJHU6LQJDSRUHSS±
F. Shi, Eds. Singapore: Springer Singapore, 2021, pp. 567~586.
OLYHV
lives DQG
and KDYH
have D a VLJQLILFDQW
significant LPSDFW
impact RQon LWV
its WUHDWPHQW
treatment. ,Q
In WKLV
this >@
[10] $
A. 3DQ
Pan, 6
S. 0XNKRSDGK\D\
Mukhopadhyay, DQG and 6
S. 6DPDQWD
Samanta, ³/LYHU
"Liver 'LVHDVH
Disease
SDSHU
paper, D a OLYHU
liver GLVHDVH
disease SUHGLFWLRQ
prediction LVis FDUULHG
carried RXW
out RQ
on WKH
the 8&,
UCI 'HWHFWLRQ´,QW-+HDOWKF,QI6\VW,QIRUPDWLFVYROQRSS
Detection," Int. J Healthc. Inf Syst. Informatics, vol. 17, no. 2, pp.
GDWDVHW
dataset XVLQJ
using WZHOYH
twelve 0/$V
MLAs: '7 DT, .11
KNN, 0/3
MLP, $%
AB, 5)
RF, *%
GB, ±-XQGRL,-+,6,
1~19, Jun. 2022, doi: IOA018/IJHIS1.299956.
>@
[11] %
B. 9HQNDWD
Venkata 5DPDQD
Ramana, 0 M. 6
S. 3
P. %DEX
Babu, DQG
and 1
N. . 9HQNDWHVZDUOX
Venkateswarlu, ³$ "A
;*%
XGB, /5 LR, 1%
NB, (7ET, 690
SVM, DQGand /*%0
LGBM. 7KH The GDWDVHW
dataset LVis &ULWLFDO
Critical 6WXG\
Study RI of 6HOHFWHG
Selected &ODVVLILFDWLRQ
Classification $OJRULWKPV
Algorithms IRU for /LYHU
Liver
DSSURSULDWHO\
appropriately SUHSURFHVVHG
pre-processed EHIRUH
before EHLQJ
being XWLOL]HG
utilized LQ
in WKHVH
these 'LVHDVH'LDJQRVLV´,QW-'DWDEDVH0DQDJ6\VWYROQRSS
Disease Diagnosis," Int. J Database Manag. Syst., vol. 3, no. 2, pp.
PRGHOV:LWKIHZHUUHVRXUFHVLQFOXGLQJOHVVWLPHDQGPRQH\
models. With fewer resources, including less time and money, ±0D\GRLLMGPV
IOl~114, May 2011, doi: lO.5l21/ijdms.20l1.3207.
WKHVWXG\REMHFWLYHLVWRREWDLQH[WUHPHO\DFFXUDWHSUHGLFWLRQV
the study objective is to obtain extremely accurate predictions >@
[12] (
E. 0
M. +DVKHP
Hashem DQG and 0
M. 6
S. 0DEURXN
Mabrouk, ³$ "A 6WXG\
Study RIof 6XSSRUW
Support 9HFWRU
Vector
0DFKLQH$OJRULWKPIRU/LYHU'LVHDVH'LDJQRVLV´$P-,QWHOO6\VW
Machine Algorithm for Liver Disease Diagnosis," Am. J Intell. Syst.,
XWLOL]LQJ
utilizing DOO
all WKH
the TXDOLWLHV
qualities DQG
and WHVWV
tests. :KHQ
When ZH we FRPSDUH
compare WKHthe YROQRSS±GRLMDMLV
vol. 2014, no. 1, pp. 9~14, 2014, doi: 1O.5923/j.ajis.20l40401.02.
DOJRULWKPVXVLQJWKHGDWDVHWZLWKWKHVHDWWULEXWHVSUHVHQWHGLQ
algorithms using the dataset with these attributes presented in >@
[13] 6
S. %DKUDPLUDG
Bahramirad, $ A. 0XVWDSKD
Mustapha, DQG and 0
M. (VKUDJKL
Eshraghi, ³&ODVVLILFDWLRQ
"Classification RIof
7DEOH
T able
2 . :H
We FRQFOXGH
conclude WKDWthat WKH
the '7
DT LVis WKH
the EHVW
best RQH
one, ZLWK
with DQ
an OLYHU
liver GLVHDVH
disease GLDJQRVLV
diagnosis: $ A FRPSDUDWLYH
comparative VWXG\´
study," LQ in
2013 6HFRQG
Second
DFFXUDF\
accuracy RI of
86.67%, ZKLFK
which LVis DERXW
about
26.51%,
25.91%, DQGand ,QWHUQDWLRQDO&RQIHUHQFHRQ,QIRUPDWLFV
International Conference on Informatics &$SSOLFDWLRQV ,&,$ 6HS
Applications (ICIA), Sep.
SS±GRL,&R,$
2013, pp. 42~46, doi: lO.ll09/ICoIA.2013.6650227.
KLJKHUWKDQWKHSXEOLVKHGYDOXHVRI>@
17.33% higher than the published values of60.l6% [ 7] , 6 0 .7 6 % >@ 05+DTXH00,VODP+,TEDO065H]DDQG0.+DVDQ
[14] M. R. Haque, M. M. Islam, H. Iqbal, M. S. Reza, and M. K. Hasan,
>@
[8] , DQG
and
69.34% >@[9] , UHVSHFWLYHO\
respectively. 2WKHU
Other PRGHOV
models OLNH
like /5
LR ³3HUIRUPDQFH
"Performance (YDOXDWLRQ
Evaluation RI
of 5DQGRP
Random )RUHVWV
Forests DQG
and $UWLILFLDO
Artificial 1HXUDO
Neural
DQG
(73.33%) and .11 KDYH
KNN (83.34%) have SHUIRUPHG
performed EHWWHU
better WKDQ
than 1HWZRUNV
Networks IRU for WKH
the &ODVVLILFDWLRQ
Classification RI of /LYHU
Liver 'LVRUGHU´
Disorder," LQ in
2018
UHFHQWO\
recently UHSRUWHG
reported DFFXUDF\
accuracy UHVXOWV
results RI
of
69.86% DQGand
70.72%, ,QWHUQDWLRQDO&RQIHUHQFHRQ&RPSXWHU&RPPXQLFDWLRQ&KHPLFDO
International Conference on Computer, Communication, Chemical,
0DWHULDODQG(OHFWURQLF(QJLQHHULQJ
Material and Electronic Engineering (IC4ME2), ,&0( )HESS±
Feb. 2018, pp. 1~5,
UHVSHFWLYHO\>@ZKLFKDUHDQGKLJKHUWKDQWKH
respectively [10], which are 3.47% and 12.62% higher than the GRL,&0(
doi: 10.1109/IC4ME2.20l8.8465658.
SXEOLVKHGUHVXOW
published result. >@
[15] -
J. 1DVLU
Nasir HW
et DO
al., ³&ODVVLILFDWLRQ
"Classification DQG
and 3UHGLFWLRQ
Prediction $QDO\VLV
Analysis RI of 'LVHDVHV
Diseases
DQG2WKHU'DWDVHWV8VLQJ0DFKLQH/HDUQLQJ´YRO,6%DMZD
and Other Datasets Using Machine Learning," vol. 1198,1. S. Bajwa,
9,,
VII. $ &.12:/('*0(17
ACKNOWLEDGMENT 7
T. 6LEDOLMD
Sibalija, DQG
and 'D. 1
N. $A. -DZDZL
Jawawi, (GV
Eds. 6LQJDSRUH
Singapore: 6SULQJHU
Springer
6LQJDSRUHSS±
Singapore, 2020, pp. 432-442.
7KHZRUNZDVVXSSRUWHGFRQVWDQWO\E\&6,5,QGLD.'0
The work was supported constantly by CSIR, India. K.D.M. >@ 055H]DHWDO³$XWRPDWLF'LDEHWHVDQG/LYHU'LVHDVH'LDJQRVLV
[16] M. R. Reza et al., "Automatic Diabetes and Liver Disease Diagnosis
DFNQRZOHGJHV
acknowledges WKH
the &6,5
CSIR 1(7-5)
NET -JRF IHOORZVKLS )LOH 1R
fellowship (File No. DQG3UHGLFWLRQ7KURXJK690DQG.11$OJRULWKPV´YRO
and Prediction Through SVM and KNN Algorithms," vol. 2, 2021,
1)1201
09/1277(000 (05,
9-EMR-I)IRUILQDQFLDOVXSSRUW
for financial support. SS±
pp. 589~599.
>@
[17] 0)DWKL01HPDWL600RKDPPDGLDQG5$EEDVL.HVEL³$
M. Fathi, M. Nemati, S. M. Mohammadi, and R. Abbasi-Kesbi, "A
CONFLICT2)
&21)/,&7 OF,17(5(67
INTEREST 0$&+,1(
MACHINE /($51,1*LEARNING $3352$&+APPROACH %$6(' BASED 21 ON 690
SVM )25FOR
&/$66,),&$7,21
CLASSIFICATION 2) OF /,9(5
LIVER ',6($6(6´
DISEASES," %LRPHG
Biomed. (QJ
Eng. $SSO
Appl.
7KHDXWKRUVGHFODUHQRFRQIOLFWRILQWHUHVW
The authors declare no conflict of interest. %DVLV
Basis &RPPXQ
Commun., YRO vol.
32, QR
no.
03, S
p.
2050018, -XQ Jun.
2020, GRL
doi:
6
10040 l5/S 1016237220500180.
5REFERENCES
()(5(1&(6 >@
[18] &9ODFKDVHWDO³5DQGRPIRUHVWFODVVLILFDWLRQDOJRULWKPIRUPHGLFDO
C. Vlachas et al., "Random forest classification algorithm for medical
LQGXVWU\GDWD´6+6:HE&RQIYROS0D\GRL
industry data," SHS Web Conf, vol. 139, p. 03008, May 2022, doi:
>@
[1] '
D. 0RQGDO
Mondal, .
K. 'DV
Das, DQG
and $
A. &KRZGKXU\
Chowdhury, ³(SLGHPLRORJ\
"Epidemiology RIof /LYHU
Liver VKVFRQI
10.1051/shsconfl202213903008.
'LVHDVHVLQ,QGLD´&OLQ/LYHU'LVYROQRSS±0DU
Diseases in India," Clin. Liver Dis., vol. 19, no. 3, pp. ll4~117, Mar. >@
[19] -
J. =KDR
Zhao, 3
P. :DQJ
Wang, DQG
and <
Y. 3DQ
Pan, ³3UHGLFWLQJ
"Predicting OLYHU
liver GLVRUGHU
disorder EDVHG
based RQ
on
GRLFOG
2022, doi: 1O.1002/cld.1l77. PDFKLQHOHDUQLQJPRGHOV´-(QJYROQRSS±
machine learning models," J Eng., vol. 2022, no. 10, pp. 978~984,
>@
[2] ³0D\R&OLQLF -DQXDU\
"Mayo Clinic. (2021, January 22)./LYHUGLVHDVH´S
Liver disease." p. 20374502, 2021, 2FWGRLWMH
Oct. 2022, doi: 1O.1049/tje2.l2l84.
>2QOLQH@
[Online]. $YDLODEOH
Available: KWWSVZZZPD\RFOLQLFRUJGLVHDVHV
https ://www.mayoclinic.orgldiseases- >@
[20] -6LQJKDQG..DQJUD³3UHGLFWLRQDQG$QDO\VLVRI/LYHU'LVRUGHU
J. Singh and K. Kangra, "Prediction and Analysis of Liver Disorder
FRQGLWLRQVOLYHUSUREOHPVV\PSWRPVFDXVHVV\F
conditions/liver-problems/symptoms-causes/syc-20374502. 'LVHDVH
Disease XVLQJ0DFKLQH
using Machine /HDUQLQJ
Learning $OJRULWKPV´
Algorithms," ,QW-
Int. J &RPSXW6FL
Comput. Sci.
>@
[3] $%KRZPLFN.'0DKDWR&$]DGDQG8.XPDU³+HDUW'LVHDVH
A. Bhowmick, K. D. Mahato, C. Azad, and U. Kumar, "Heart Disease
&RPPXQ
Commun., YRO vol.
13, QR
no.
2, SS
pp. ±
26~32,
2022, >2QOLQH@
[Online]. $YDLODEOH
Available:
3UHGLFWLRQ8VLQJ'LIIHUHQW0DFKLQH/HDUQLQJ$OJRULWKPV´LQ
Prediction Using Different Machine Learning Algorithms," in 2022 ZZZFVMRXUQDOVFRP
www.csjournals.com.
,(((
IEEE :RUOG
World &RQIHUHQFH
Conference RQ
on $SSOLHG
Applied ,QWHOOLJHQFH
Intelligence DQG
and &RPSXWLQJ
Computing >@
[21] -%URZQOHH0DFKLQHOHDUQLQJDOJRULWKPVIURPVFUDWFKZLWK3\WKRQ
J. Brownlee, Machine learning algorithms from scratch with Python.
$,& -XQ
(AIC), Jun.
2022, YROvol.
2418, SS pp. ±
60~65, GRL
doi: 0DFKLQH/HDUQLQJ0DVWHU\
Machine Learning Mastery, 2016.
$,&
lO.lI09/AIC55036.2022.9848885. >@
[22] -
J. 7DQKD
Tanha, <Y. $EGL
Abdi, 1
N. 6DPDGL
Samadi, 1 N. 5D]]DJKL
Razzaghi, DQG and 0
M. $VDGSRXU
Asadpour,
>@
[4] 7
T. $JDUZDO
Agarwal, .
K. '
D. 0DKDWR
Mahato, &
C. $]DG
Azad, DQG
and 8
U. .XPDU
Kumar, ³3UHGLFWLQJ
"Predicting ³%RRVWLQJPHWKRGVIRUPXOWLFODVVLPEDODQFHGGDWDFODVVLILFDWLRQDQ
"Boosting methods for multi-class imbalanced data classification: an
+DSSLQHVV6FRUH'XULQJ&RYLG8VLQJ0DFKLQH/HDUQLQJ´LQWK
Happiness Score During Covid-19 Using Machine Learning," in 4th H[SHULPHQWDOUHYLHZ´-%LJ'DWDYROQRS'HF
experimental review," J Big Data, vol. 7, no. 1, p. 70, Dec. 2020,
,QWHUQDWLRQDO
International &RQIHUHQFH
Conference RQon $UWLILFLDO
Artificial ,QWHOOLJHQFH
Intelligence DQG
and 6SHHFK
Speech GRLV\
doi: 10.1186/s40537-020-00349-y.
7HFKQRORJ\ $,67 SS±
Technology (AIST2022) , 2022, pp. 1~6. >@
[23] /
1.. 7KLEHUYLOOH
Thiberville, ³&ODVVLILFDWLRQ
"Classification RI of (QGRPLFURVFRSLF
Endomicroscopic ,PDJHVImages RI
of WKH
the
>@
[5] -3DUN.<.LPDQG2.ZRQ³&RPSDULVRQRIPDFKLQHOHDUQLQJ
J. Park, K.-Y. Kim, and O. Kwon, "Comparison of machine learning /XQJ%DVHGRQ5DQGRP6XEZLQGRZV´YROQRSS±
Lung Based on Random Subwindows,"vol. 59, no. 9,pp. 2677~2683,
DOJRULWKPVWRSUHGLFWSV\FKRORJLFDOZHOOQHVVLQGLFHVIRUXELTXLWRXV
algorithms to predict psychological wellness indices for ubiquitous
KHDOWKFDUHV\VWHPGHVLJQ´LQ3URFHHGLQJVRIWKH,QWHUQDWLRQDO 2012, doi: lO.ll09/TBME.2012.2204747.
GRL7%0(
healthcare system design," in Proceedings of the 2014 International
&RQIHUHQFHRQ,QQRYDWLYH'HVLJQDQG0DQXIDFWXULQJ
Conference on Innovative Design and Manufacturing (ICIDM),,&,'0 $XJ
Aug.
YROSS±GRL,'$0
2014, vol. 17, pp. 263~269, doi: 10.1109/IDAM.2014.69 12705.
123
123
Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 09,2024 at 10:30:37 UTC from IEEE Xplore. Restrictions apply.