0% found this document useful (0 votes)
65 views

A Comparative Study On Liver Disease Prediction Using Supervised Machine Learning Algorithms 3

This document summarizes a research paper that compares different machine learning algorithms for predicting liver disease using clinical data. The researchers collected a dataset of 583 liver patients' records containing attributes like age, gender, bilirubin levels, and applied six machine learning techniques: logistic regression, K-nearest neighbors, decision tree, support vector machine, naive Bayes, and random forest. They evaluated the performance of these classifiers based on accuracy, precision, recall, F1 score, and ROC. Logistic regression achieved the highest accuracy of 75%. The paper aims to reduce the cost of liver disease diagnosis through machine learning and explores different ways of representing clinical data for prediction.

Uploaded by

Heyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

A Comparative Study On Liver Disease Prediction Using Supervised Machine Learning Algorithms 3

This document summarizes a research paper that compares different machine learning algorithms for predicting liver disease using clinical data. The researchers collected a dataset of 583 liver patients' records containing attributes like age, gender, bilirubin levels, and applied six machine learning techniques: logistic regression, K-nearest neighbors, decision tree, support vector machine, naive Bayes, and random forest. They evaluated the performance of these classifiers based on accuracy, precision, recall, F1 score, and ROC. Logistic regression achieved the highest accuracy of 75%. The paper aims to reduce the cost of liver disease diagnosis through machine learning and explores different ways of representing clinical data for prediction.

Uploaded by

Heyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/338924601

A Comparative Study On Liver Disease Prediction Using Supervised Machine


Learning Algorithms

Article · November 2019

CITATIONS READS

78 10,404

5 authors, including:

A. K. M. Sazzadur Rahman F M Javed Mehedi Shamrat


Daffodil International University University of Malaya
6 PUBLICATIONS   176 CITATIONS    60 PUBLICATIONS   1,056 CITATIONS   

SEE PROFILE SEE PROFILE

Zarrin Tasnim Joy Roy


Daffodil International University Daffodil International University
21 PUBLICATIONS   359 CITATIONS    4 PUBLICATIONS   84 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

A Survey Based Research on Improvement of Counseling for the Education System View project

Bangla numerical sign language recognition using convolutional neural networks View project

All content following this page was uploaded by F M Javed Mehedi Shamrat on 08 December 2020.

The user has requested enhancement of the downloaded file.


INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616

A Comparative Study On Liver Disease


Prediction Using Supervised Machine Learning
Algorithms
A.K.M Sazzadur Rahman, F. M. Javed Mehedi Shamrat, Zarrin Tasnim, Joy Roy, Syed Akhter Hossain

Abstract: Chronic Liver Disease is the leading cause of global death that impacts the massive quantity of humans around the world. This disease is
caused by an assortment of elements that harm the liver. For example, obesity, an undiagnosed hepatitis infection, alcohol misuse. Which is responsible
for abnormal nerve function, coughing up or vomiting blood, kidney failure, liver failure, jaundice, liver encephalopathy and there are many more. This
disease diagnosis is very costly and complicated. Therefore, the goal of this work is to evaluate the performance of different Machine Learning
algorithms in order to reduce the high cost of chronic liver disease diagnosis by prediction. In this work, we used six algorithms Logistic Regression, K
Nearest Neighbors, Decision Tree, Support Vector Machine, Naïve Bayes, and Random Forest. The performance of different classification techniques
was evaluated on different measurement techniques such as accuracy, precision, recall, f-1 score, and specificity. We found the accuracy 75%, 74%,
69%, 64%, 62% and 53% for LR, RF, DT, SVM, KNN and NB. The analysis result shown the LR achieved the highest accuracy. Moreover, our present
study mainly focused on the use of clinical data for liver disease prediction and explore different ways of representing such data through our analysis.

Keywords: Machine Learning, Liver Disease, Classification, Supervised learning, Computational Intelligence, Regression, Random Forest, Decision
Tree, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes.
——————————  ——————————

1. INTRODUCTION or not. Six machine learning techniques have been applied


THE liver is the largest organ of the body and it is essential for including LR, KNN, DT, SVM, NB, RF and the performance of
digesting food and releasing the toxic element of the body. The these techniques were estimated on various perspectives
viruses and alcohol use lead the liver towards liver damage such as accuracy, precision, recall, f-1 score. Moreover, the
and lead a human to a life-threatening condition. There are performance was compared using the receiver operative
many types of liver diseases whereas hepatitis, cirrhosis, liver characteristic (ROC).
tumors, liver cancer, and many more. Among them liver The remains of the paper are arranged as follows, chapter 2
diseases and cirrhosis as the main cause of death [1]. presents the dataset details, data preprocessing and
Therefore, liver disease is one of the major health problems in methodology. Chapter 3 describes the classification
the world. Every year, around 2 million people died worldwide algorithms. Chapter 4 describes results and discussion
because of liver disease [2]. According to the Global Burden of including Measurement of Classification Techniques, Analysis
Disease (GBD) project, published in BMC Medicine, one of the Results and Performance Evolution. Finally, chapter 5
million peoples are died in 2010 because of cirrhosis and presents the conclusion section.
million are suffering from liver cancer [3]. Machine learning has
made a significant impact on the biomedical field for liver 2 MATERIALS AND METHODOLOGY
disease prediction and diagnosis [4-6]. Machine learning offers
a guarantee for improving the detection and prediction of 2.1. Data Collection
disease that has been made an interest in the biomedical field In this experiment, we collect a dataset from the UCI Machine
and they also increase the objectivity of the decision-making Learning Repository. In addition, the original dataset was
process [16]. By using machine learning techniques medical collected from the northeast of Andhra Pradesh, India [7].
problems can be easily solved and the cost of diagnosis will This dataset consists of 583 liver patient’s data whereas
be reduced. In this study, the main aspect is to predict the 75.64% male patients and 24.36% are female patients. This
results more efficiently and reduce the cost of diagnosis in the dataset has contained 11 particular parameters whereas we
medical sector. Therefore, we used different classification choose 10 parameters for our further analysis and 1
techniques for the classification of patients have liver disease parameter as a target class. Such as,
————————————————
 A. k. M. Sazzadur Rahman Rahman is currently pursuing master’s I. Age: Age of the patient
degree program in Computer Science and Engineering at Daffodil II. Gender: Gender of the Patients
International University, Bangladesh. E-mail: [email protected] III. TB: Total Bilirubin
 F. M. Javed Mehedi Shamrat is currently pursuing Bachelor’s degree IV. DB: Direct Bilirubin
program in Software Engineering at Daffodil International University,
V. Alkphos: Alkaline Phosphotase
Bangladesh. E-mail: [email protected]
 Zarrin Tasnim is currently pursuing Bachelor’s degree program in VI. Sgpt: Alamine Aminotransferase
Software Engineering at Daffodil International University, Bangladesh. VII. Sgot: Asparatate Aminotransferase
E-mail: [email protected] VIII. TP: Total Proteins
 Joy Roy is currently Studying in Software Engineering at Daffodil IX. ALB: Albumin
International University, Bangladesh. E-mail: [email protected] X. AG Ratio: Albumin and Globulin Ratio
 Syed Akhter Hossain, Professor and Head of Department of Computer XI. Selector field used to split the data into two sets
Science and Engineering at Daffodil International University,
Bangladesh. E-mail: [email protected] (labeled by the experts)

419
IJSTR©2019
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616

2.2. Data Preprocessing


In this study, we analyzed 583 liver patient’s data whereas 416 3 DESCRIPTION OF THE CLASSIFICATION
samples are liver patient and 167 samples are non-liver ALGORITHMS
patients. The ratio of total liver patients is presented in Fig. 1.
Moreover, from the liver patient’s dataset, (Fig. 2) 441 are 3.1. Logistics Regression (LR)
male samples and 112 are female samples were taken for Calculated Regression was for the most part utilized in natural
analysis. research and applications in the mid-20th century [8]. Logistic
regression can deal with any number of numerical as well as
absolute factors. In addition, it introduces a discrete parallel
item somewhere in the range of 0 and 1. Strategic Regression
processes the connection between the element factors by
surveying probabilities (p) utilizing an underlying logistic
function. Regression equation given as,

𝑝= ( ) (1)

3.2. Random Forest (RF)


Random forests or random decision forests are an ensemble
learning technique for classification, regression and different
Fig. 1: Count Plot shows the ratio of liver patients. assignments that works by developing a huge number of
decision trees at training time and yielding the class that is the
method of the classes (classification) or mean forecast
(regression) of the individual trees. Random decision forests
right for decision trees' propensity for overfitting to their
training set. In the forest of trees has been the immediate
connection between the combine trees and the outcome it can
get. To get increasingly effective and precise predictions,
random forest inserts an additional layer of irregularity to
stowing [9].

3.3. Decision Tree (DT)


Fig. 2: Count Plot shows the ratio of gender of liver patients. Decision Tree calculation has a place with the supervised
learning algorithms [10]. In contrast to other supervised
The heatmap is shown in Fig. 3 appear to have some learning algorithms, a decision tree algorithm can be utilized
correlated parameters. Some of these columns have a low for taking care of regression and classification issues as well.
correlation. Therefore, we omitted some of the features for The general thought process of utilizing Decision Tree is to
better prediction of liver disease. make a training model that can use to predict class or
estimation of objective factors by taking in choice standards
derived from earlier data (training data). In Fig. 4 we have
shown a sample picture of decision trees.

Fig. 4: Sample of the process of Decision Trees.

3.4. Support Vector Machine (SVM)


SVM is a supervised learning calculation. It can utilize for both
grouping or relapse issues however generally it is utilized in
Fig. 3. Heat map for checking correlated columns for the liver characterization issues. SVM function admirably for some,
dataset. human services issues and can comprehend both linear and
non-linear issues. SVM grouping strategy which is an
2.2. Tool and Language endeavor to pass a linearly separable hyperplane to order the
In this study, we used the jupyter notebook as a tool and dataset into two classes [11-12]. At long last, the model can
python 3.7 as a programming language. without a doubt gauge the objective groups (labels) for new
cases. For Classification type 1 of SVM, training involves the
420
IJSTR©2019
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616

minimization of the error function: patients. It is otherwise called the True Negative Rate (TNR).
𝑤 𝑤+𝑐∑ 𝜁 Specificity =
( )
(2) Precision is otherwise called positive predictive value. It
In contrast to Classification SVM Type 1, the Classification gives the proportion of an accurately predicted positive
SVM Type 2 model minimizes the error function: outcome by classifier algorithms.
𝑤 𝑤 − 𝑣𝑝 + Precision =
( )
∑ 𝜁 (3) F1 measures the precision of the model by a blend of
accuracy and recall. It gives the proportion of both FP and FN
3.5. K-Nearest Neighbors (KNN) of a model.
( )
KNN is one of the most fundamental occasion-based F1 =
( )
classification algorithms in Machine Learning. In any case, the
(9)
KNN takes a shot at the idea that examples are near fit in
similar examples class [13]. A KNN sorts an example to the
4.2. Analysis of the Result
class that is most decided among K neighboring. K is a
In this experiment, we considered different analyses to
limitation for adjusting the classification algorithms [14].
examine the six-machine learning classifier for the
classification of liver disease dataset. In terms of accuracy, LR
3.6. Naive Bayes (NB)
achieved the highest accuracy of 75% and NB achieved the
Naive Bayes is one of the basic, best and ordinarily utilized, AI
worst performance 53%. With respect to precision, LR
techniques. It is a probabilistic classifier that classifies utilizing
achieved the highest score 91% and NB performs worst 36%.
the speculation of restrictive freedom with the pre-trained
When considering the sensitivity, SVM achieved the highest
datasets [15]. From this time forward, Naive Bayes classifiers
value 88% and KNN obtained the worst 76%. Logistics
are procedures for finding the conventional arrangement of
Regression was also the best performer in terms of f1
grouping issues, for example, spam identification, and
measure 83% and NB obtained the worst performance 53%.
furthermore all-around fit for medical issues. Bayes' Theorem
When considering specificity DT achieved the highest value
finds the probability of an occasion occurring given the
48% and LR the lowest 47%. According to compare these
probability of another occasion that has just happened. Bayes'
measurement criteria LR classification technique is more
theorem is expressed mathematically as the following
effective than the other classifiers for predicting chronic liver
equation:
( ⁄ ) ( )
disease. The confusion matrix of prediction results is shown in
𝑃(𝐴⁄𝐵 ) = figure 5. The performance comparison of six supervised
( )
(4) machine learning techniques is presented in figure 6.

4 RESULT AND DISCUSSIONS


4.1. Measurement of Classification Techniques
In the work, we utilized some factual estimations that measure
the test execution of various classification algorithms. The
performance of the classification methods was assessed by
various evaluation procedures, for example, accuracy,
sensitivity, specificity, and precision and f1 measure.
Consequently, the exhibition evaluation variables are
determined by the confusion matrix. Here, True Positive (TP):
The result of prediction correctly identifies that a patient has
Fig. 5: The Confusion Matrix of prediction results.
liver disease. False Positive (FP): The result of prediction
incorrectly identifies that a patient has liver disease. True
Negative (TN): The result of prediction correctly rejects that a
patient has liver disease. False Negative (FN): The result of
prediction incorrectly rejects that a patient has liver disease.
The precision gives the contrast between sound and patient
capacity ratio utilizing the prediction model. To discover the
precision of classification is determined by the true positive,
true negative, false positive and false negative.
( )
Accuracy =
( )
(5)
The affectability test gives the pace of effectively distinguishes Fig. 6: The performance comparison of six supervised
the patient with their liver disease. It mainly demonstrates the machine learning techniques.
positive instances of the test. It additionally is known as Recall
and True Positive Rate (TPR).
Sensitivity =
( )
(6) Particularity is showing the negative consequence of the
disease. It gives the extent of the missing disease of the
421
IJSTR©2019
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616

Figure 7 shows the Receiver Operating Characteristics (ROC). [5] Hashem, M. Esraa, S. Mai, A study of support vector
ROC is used to represent the performance of machine machine algorithm for liver disease diagnosis. American
learning techniques which is based on the true positive rate Journal of Intelligent Systems 4.1 (2014); 9-14.
(TPR) and false-positive rate (FPR) of these classification [6] P. Sajda, "Machine learning for detection and diagnosis of
results. Moreover, SVM achieved the highest AUC (area under disease." Annu. Rev. Biomed. Eng. 8 (2006); 537-565.
the curve) for ROC. [7] UCI Machine Learning Repository. ILPD (Indian Liver
Patient Dataset) Data Set.
https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver
+Patient+Dataset)
[8] Logistic Regression, Retrieve from:
HTTPS://WWW.SAEDSAYAD.COM/LOGISTIC
_REGRESSION.HTM, LAST Accessed: 5
Octobor,2019
[9] L. Breiman, Random Forests. Machine Learning, 45(1),
(2001); 5–32. https://doi.org/10.1023/A:1010933404324
[10] Decision Trees, Retrieve from:
https://dataaspirant.com/2017/01/30/how-decision-tree-
Fig. 7: Receiver Operating Characteristics (ROC). algorithm-works/, Last Accessed: 5 Octobor,2019
[11] Support vector machine, Retrieve from:
5 CONCLUSION http://www.statsoft.com/textbook/support-vector-
The principal part of this work is to make an effective diagnosis machines, Last Accessed: 5 Octobor,2019
system for chorionic liver infection patients utilizing six distinctive [12] V. Vapnik, I. Guyon, T. H.-M, Learn, and undefined 1995.
supervised machine learning classifiers. We researched all Support vector machines. statweb.stanford.edu (1995).
classifiers execution on patient's information parameters and the [13] Zhang M, Zhou Z, "ML-KNN: A lazy learning approach to
LR classifier gives the most elevated order exactness 75% multi-label learning." Pattern recognition40.7:(2007);
dependent on F1 measure to predict the liver disease and NB 2038-2048.
gives the least precision 53%. From now on, the outperform [14] G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer, KNN Model-
classification procedure will give for the decision support system Based Approach in Classification (pp. 986–996). Springer,
and diagnosis of chronic disease. The application will have the Berlin, Heidelberg (2003). https://doi.org/10.1007/978-3-
option to predict liver infection prior and advise the wellbeing 540-39964-3_62
condition. This application can be surprisingly gainful in low-salary [15] Naive Bayes, , Retrieve from:
nations where our absence of medicinal foundations and just as https://www.geeksforgeeks.org/naive-bayes-classifiers/,
particular specialists. In our study, there are a few bearings for Last Accessed: 5 Octobor,2019
future work in this field. We just explored some popular [16] S. M. Mahmud, et al. "Machine Learning Based Unified
supervised machine learning algorithms, more algorithms can be Framework for Diabetes Prediction." Proceedings of the
picked to assemble an increasingly precise model of liver disease 2018 International Conference on Big Data Engineering
prediction and performance can be progressively improved. and Technology. ACM (2018).
Additionally, this work likewise ready to assume a significant role [17] S. Safavian, D. Landgrebe, A survey of decision tree
in health care research and just as restorative focuses to classifier methodology. IEEE transactions on systems,
anticipate liver infection. man, and cybernetics, 21(3), (1991); 660-674.
[18] A. Chervonenkis, Early history of support vector
ACKNOWLEDGMENT machines. In Empirical Inference (pp. 13-20). Springer,
The authors are grateful and pleased to all the researchers in Berlin, Heidelberg (2013).
this research study. [19] K.M. Leung, Naive bayesian classifier. Polytechnic
University Department of Computer Science/Finance and
REFERENCES Risk Engineering (2007).
[1] K. Sumeet, J.J. Larson, B. Yawn, T.M. Therneau, W.R.
Kim, Underestimation of liver-related mortality in the
United States. Gastroenterology;(2013) 145:375–382,
e371–372.
[2] A.A. Mokdad, A.D. Lopez, S. Shahraz, R. Lozano, A.H.
Mokdad, J. Stanaway, et al, Liver cirrhosis mortality in 187
countries between 1980 and 2010: a systematic analysis.
BMC Med 2014; 12:145.
[3] Byass, Peter, The global burden of liver disease: a
challenge for methods and for public health. BMC
medicine 12.1 (2014); 159.
[4] L. A. Auxilia, Accuracy Prediction Using Machine Learning
Techniques for Indian Patient Liver Disease. 2018 2nd
International Conference on Trends in Electronics and
Informatics (ICOEI). IEEE (2018).

422
IJSTR©2019
View publication stats
www.ijstr.org

You might also like