A Comparative Study On Liver Disease Prediction Using Supervised Machine Learning Algorithms 3
A Comparative Study On Liver Disease Prediction Using Supervised Machine Learning Algorithms 3
net/publication/338924601
CITATIONS READS
78 10,404
5 authors, including:
Some of the authors of this publication are also working on these related projects:
A Survey Based Research on Improvement of Counseling for the Education System View project
Bangla numerical sign language recognition using convolutional neural networks View project
All content following this page was uploaded by F M Javed Mehedi Shamrat on 08 December 2020.
Abstract: Chronic Liver Disease is the leading cause of global death that impacts the massive quantity of humans around the world. This disease is
caused by an assortment of elements that harm the liver. For example, obesity, an undiagnosed hepatitis infection, alcohol misuse. Which is responsible
for abnormal nerve function, coughing up or vomiting blood, kidney failure, liver failure, jaundice, liver encephalopathy and there are many more. This
disease diagnosis is very costly and complicated. Therefore, the goal of this work is to evaluate the performance of different Machine Learning
algorithms in order to reduce the high cost of chronic liver disease diagnosis by prediction. In this work, we used six algorithms Logistic Regression, K
Nearest Neighbors, Decision Tree, Support Vector Machine, Naïve Bayes, and Random Forest. The performance of different classification techniques
was evaluated on different measurement techniques such as accuracy, precision, recall, f-1 score, and specificity. We found the accuracy 75%, 74%,
69%, 64%, 62% and 53% for LR, RF, DT, SVM, KNN and NB. The analysis result shown the LR achieved the highest accuracy. Moreover, our present
study mainly focused on the use of clinical data for liver disease prediction and explore different ways of representing such data through our analysis.
Keywords: Machine Learning, Liver Disease, Classification, Supervised learning, Computational Intelligence, Regression, Random Forest, Decision
Tree, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes.
—————————— ——————————
419
IJSTR©2019
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616
𝑝= ( ) (1)
minimization of the error function: patients. It is otherwise called the True Negative Rate (TNR).
𝑤 𝑤+𝑐∑ 𝜁 Specificity =
( )
(2) Precision is otherwise called positive predictive value. It
In contrast to Classification SVM Type 1, the Classification gives the proportion of an accurately predicted positive
SVM Type 2 model minimizes the error function: outcome by classifier algorithms.
𝑤 𝑤 − 𝑣𝑝 + Precision =
( )
∑ 𝜁 (3) F1 measures the precision of the model by a blend of
accuracy and recall. It gives the proportion of both FP and FN
3.5. K-Nearest Neighbors (KNN) of a model.
( )
KNN is one of the most fundamental occasion-based F1 =
( )
classification algorithms in Machine Learning. In any case, the
(9)
KNN takes a shot at the idea that examples are near fit in
similar examples class [13]. A KNN sorts an example to the
4.2. Analysis of the Result
class that is most decided among K neighboring. K is a
In this experiment, we considered different analyses to
limitation for adjusting the classification algorithms [14].
examine the six-machine learning classifier for the
classification of liver disease dataset. In terms of accuracy, LR
3.6. Naive Bayes (NB)
achieved the highest accuracy of 75% and NB achieved the
Naive Bayes is one of the basic, best and ordinarily utilized, AI
worst performance 53%. With respect to precision, LR
techniques. It is a probabilistic classifier that classifies utilizing
achieved the highest score 91% and NB performs worst 36%.
the speculation of restrictive freedom with the pre-trained
When considering the sensitivity, SVM achieved the highest
datasets [15]. From this time forward, Naive Bayes classifiers
value 88% and KNN obtained the worst 76%. Logistics
are procedures for finding the conventional arrangement of
Regression was also the best performer in terms of f1
grouping issues, for example, spam identification, and
measure 83% and NB obtained the worst performance 53%.
furthermore all-around fit for medical issues. Bayes' Theorem
When considering specificity DT achieved the highest value
finds the probability of an occasion occurring given the
48% and LR the lowest 47%. According to compare these
probability of another occasion that has just happened. Bayes'
measurement criteria LR classification technique is more
theorem is expressed mathematically as the following
effective than the other classifiers for predicting chronic liver
equation:
( ⁄ ) ( )
disease. The confusion matrix of prediction results is shown in
𝑃(𝐴⁄𝐵 ) = figure 5. The performance comparison of six supervised
( )
(4) machine learning techniques is presented in figure 6.
Figure 7 shows the Receiver Operating Characteristics (ROC). [5] Hashem, M. Esraa, S. Mai, A study of support vector
ROC is used to represent the performance of machine machine algorithm for liver disease diagnosis. American
learning techniques which is based on the true positive rate Journal of Intelligent Systems 4.1 (2014); 9-14.
(TPR) and false-positive rate (FPR) of these classification [6] P. Sajda, "Machine learning for detection and diagnosis of
results. Moreover, SVM achieved the highest AUC (area under disease." Annu. Rev. Biomed. Eng. 8 (2006); 537-565.
the curve) for ROC. [7] UCI Machine Learning Repository. ILPD (Indian Liver
Patient Dataset) Data Set.
https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver
+Patient+Dataset)
[8] Logistic Regression, Retrieve from:
HTTPS://WWW.SAEDSAYAD.COM/LOGISTIC
_REGRESSION.HTM, LAST Accessed: 5
Octobor,2019
[9] L. Breiman, Random Forests. Machine Learning, 45(1),
(2001); 5–32. https://doi.org/10.1023/A:1010933404324
[10] Decision Trees, Retrieve from:
https://dataaspirant.com/2017/01/30/how-decision-tree-
Fig. 7: Receiver Operating Characteristics (ROC). algorithm-works/, Last Accessed: 5 Octobor,2019
[11] Support vector machine, Retrieve from:
5 CONCLUSION http://www.statsoft.com/textbook/support-vector-
The principal part of this work is to make an effective diagnosis machines, Last Accessed: 5 Octobor,2019
system for chorionic liver infection patients utilizing six distinctive [12] V. Vapnik, I. Guyon, T. H.-M, Learn, and undefined 1995.
supervised machine learning classifiers. We researched all Support vector machines. statweb.stanford.edu (1995).
classifiers execution on patient's information parameters and the [13] Zhang M, Zhou Z, "ML-KNN: A lazy learning approach to
LR classifier gives the most elevated order exactness 75% multi-label learning." Pattern recognition40.7:(2007);
dependent on F1 measure to predict the liver disease and NB 2038-2048.
gives the least precision 53%. From now on, the outperform [14] G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer, KNN Model-
classification procedure will give for the decision support system Based Approach in Classification (pp. 986–996). Springer,
and diagnosis of chronic disease. The application will have the Berlin, Heidelberg (2003). https://doi.org/10.1007/978-3-
option to predict liver infection prior and advise the wellbeing 540-39964-3_62
condition. This application can be surprisingly gainful in low-salary [15] Naive Bayes, , Retrieve from:
nations where our absence of medicinal foundations and just as https://www.geeksforgeeks.org/naive-bayes-classifiers/,
particular specialists. In our study, there are a few bearings for Last Accessed: 5 Octobor,2019
future work in this field. We just explored some popular [16] S. M. Mahmud, et al. "Machine Learning Based Unified
supervised machine learning algorithms, more algorithms can be Framework for Diabetes Prediction." Proceedings of the
picked to assemble an increasingly precise model of liver disease 2018 International Conference on Big Data Engineering
prediction and performance can be progressively improved. and Technology. ACM (2018).
Additionally, this work likewise ready to assume a significant role [17] S. Safavian, D. Landgrebe, A survey of decision tree
in health care research and just as restorative focuses to classifier methodology. IEEE transactions on systems,
anticipate liver infection. man, and cybernetics, 21(3), (1991); 660-674.
[18] A. Chervonenkis, Early history of support vector
ACKNOWLEDGMENT machines. In Empirical Inference (pp. 13-20). Springer,
The authors are grateful and pleased to all the researchers in Berlin, Heidelberg (2013).
this research study. [19] K.M. Leung, Naive bayesian classifier. Polytechnic
University Department of Computer Science/Finance and
REFERENCES Risk Engineering (2007).
[1] K. Sumeet, J.J. Larson, B. Yawn, T.M. Therneau, W.R.
Kim, Underestimation of liver-related mortality in the
United States. Gastroenterology;(2013) 145:375–382,
e371–372.
[2] A.A. Mokdad, A.D. Lopez, S. Shahraz, R. Lozano, A.H.
Mokdad, J. Stanaway, et al, Liver cirrhosis mortality in 187
countries between 1980 and 2010: a systematic analysis.
BMC Med 2014; 12:145.
[3] Byass, Peter, The global burden of liver disease: a
challenge for methods and for public health. BMC
medicine 12.1 (2014); 159.
[4] L. A. Auxilia, Accuracy Prediction Using Machine Learning
Techniques for Indian Patient Liver Disease. 2018 2nd
International Conference on Trends in Electronics and
Informatics (ICOEI). IEEE (2018).
422
IJSTR©2019
View publication stats
www.ijstr.org