Applying Machine Learning in Liver Disease & Transplantation: A Comprehensive Review
Applying Machine Learning in Liver Disease & Transplantation: A Comprehensive Review
Applying Machine Learning in Liver Disease & Transplantation: A Comprehensive Review
Ashley Spann1, Angeline Yasodhara2, Justin Kang3, Kymberly Watt4, Bo Wang2, Anna Goldenberg2,
Mamatha Bhat3,5
Abstract
Machine learning utilizes artificial intelligence to generate predictive models efficiently and
more effectively than conventional methods through detection of hidden patterns within large data
sets. With this in mind, there are several areas within hepatology where these methods can be applied.
In this review, we examine the literature pertaining to machine learning in hepatology and liver
transplant medicine. We provide an overview of the strengths and limitations of machine learning
tools, and their potential applications to both clinical and molecular data in hepatology. Machine
This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process, which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1002/hep.31103
This article is protected by copyright. All rights reserved
learning has been applied to various types of data in Liver disease research, including clinical,
Accepted Article
demographic, molecular, radiologic and pathologic data. We anticipate that the use of ML tools to
generate predictive algorithms will change the face of clinical practice in Hepatology and
transplantation. This review will provide readers with the opportunity to learn about the ML tools
available, and potential applications to questions of interest in Hepatology.
Introduction
In the changing atmosphere of healthcare and information technology, there is an increasing
opportunity for the use of data science and technology to personalize healthcare and improve delivery
of patient care. At its core, machine learning utilizes artificial intelligence to generate predictive
models efficiently and more effectively than conventional methods through detection of hidden
patterns within large data sets. With this in mind, there are several areas within hepatology where
these methods can be applied. In this review, we examine the literature of the already tested
applications of machine learning in hepatology and liver transplant medicine. We provide an
overview of the strengths and limitations of machine learning tools, and their potential applications to
both clinical and molecular data in hepatology.
Artificial intelligence (AI) and machine learning (ML) algorithms have been increasingly
applied to questions in hepatology in recent years. Electronic health records are a rich source of data,
as are registries and clinically-annotated biobanks. Efforts such as The Cancer Genome Atlas continue
to produce layers of molecular data. The large proportion of the research literature in hepatology
stems from the use of traditional biostatistical methods. These hypothesis-driven studies consist of
examination of preselected variables and their impact on liver-related outcomes such as cirrhosis,
liver cancer, transplantation and mortality. These studies have included prediction models that have
revolutionized clinical practice in hepatology1. Machine learning is an unbiased approach that stands
in complete contrast to this, using any number of variables to permit data-driven discovery. This
hypothesis-free approach has led to identification of similarities and differences in clinical
Liver diseases are complex and heterogeneous in nature, developing under the influence of
various factors that affect susceptibility to disease. These include sex, ethnicity, genetics,
environmental exposures (viruses, alcohol, diet, chemical), body mass index, and comorbid conditions
such as diabetes. Various types of complex data are generated in hepatology practice and research that
could benefit from AI-based approaches: electronic health record data, transient elastography, other
imaging technologies, histology, biobank data, data from clinical trials, clinical sensors, wearables,
and a variety of molecular data (genomics, transcriptomics, proteomics, metabolomics, immunomics,
microbiomics).
Deep neural networks (DNN) have been a tremendous breakthrough in ML, enabling
machines to learn patterns of data by modeling them through a combination of simple non-linear
elementary operations. Neural networks have been applied to predict 3-month graft survival and to
assist with donor-recipient matching for patients with end-stage liver disease as well as predicting the
presence of liver disease from imaging20,21. This can be further extended into convolutional neural
networks (CNN) and recurrent neural networks (RNN) which handle local structures and sequential
data consecutively22,23. Local structure can be important in data, e.g. in images, and it is important to
incorporate this existing structure. CNN uses multiple convolution filters, learned by the network, at
different layers to aggregate information from neighboring pixels. RNN allows temporal
dependability across different timepoints by modifying the architecture to receive input from its past
The application of machine learning extends beyond the setting of supervised learning.
Unsupervised learning algorithms have been widely used to automatically discover the patterns
without any labeled data. Classic unsupervised learning methods range from clustering algorithms
such as k-means and graph-based spectral clustering, to dimensionality reduction methods such as
principal component analysis or kernel-based methods27-29,. Deep neural networks generalize some of
these approaches by learning the dataset distribution, whether explicitly or implicitly, and generating
samples from those learned distribution. For example, variational Auto-Encoder (VAE) parameterizes
the distribution of the dataset and trains the neural network to learn the distribution that fits the
training dataset best by maximizing its likelihood30. Generative Adversarial Model (GAN) uses two
separate networks, one to generate fake samples (generator) and another to discriminate whether the
given input is fake or real (discriminator)31. These networks learn adversarially: the goal of one is to
generate samples that are closer to the true distribution, while the other wants to better differentiate
the generated and the true training samples. This method of training results in a model able to
generate samples that are very similar to the training distribution. This method can also be further
extended to impute missing data32,33.
Methods
A comprehensive literature review was conducted by two independent reviewers (ALS and
JK). Two biomedical databases – MEDLINE (PubMed) and Embase (Elsevier) – were searched for
relevant studies through 01/15/2019. The primary search strategy was created in PubMed and
included a combination of text word and Medical Subject Heading (MeSH) terms. Primary search
concepts included machine learning, predictive modeling, deep learning, and liver transplantation as
well as specific etiologies for liver disease, such as Hepatitis C and non-alcoholic fatty liver disease
(NAFLD). The itemized search strategy was then translated to the additional database Embase. The
Traditional serum-based biomarker indices such as APRI, Fibrosis-4 (FIB-4), AST to ALT
ratio, and NAFLD fibrosis score have been used in clinical practice to identify significant fibrosis and
cirrhosis in patients already diagnosed with different etiologies of chronic liver disease, with modest
NAFLD
Perhaps the best-studied condition in which machine learning has been explored for pattern
recognition is non-alcoholic fatty liver disease (NAFLD). Early detection and identification of
patients with NAFLD at risk of disease progression is paramount. Towards this end, efforts have been
made to utilize machine learning methods to not only more accurately and efficiently identify these
patients through image analyses and pathology review, but also to help differentiate severity of the
underlying steatosis.
Viral Hepatitis
In the absence of genetic data, machine learning has also played a role in clinical data-based
predictive assessments. Wei et al compared machine learning methods with the FIB-4 score for
detection of viral hepatitis-related cirrhosis34. Gradient boosting outperformed other MLAs, with an
AUROC of 0.87 as compared to the AUROC for FIB-4 at 0.8334. In a cohort of chronic hepatitis C
patients from the HALT-C trial, Konerman et al utilized longitudinal clinical data to improve upon
standard statistical methods (AUC 0.79) with random forest (AUC 0.86) and boosting MLAs (AUC
0.84) for prediction of fibrosis progression, liver-related death, hepatic decompensation, an increase in
Child-Turcotte Pugh score ≥ 7, hepatocellular carcinoma or liver transplantation within 1 year49. The
success of these algorithms was later validated in a more heterogenous cohort of 1,007 chronic
hepatitis C patients for the prediction of both 1-year (AUROC 0.78) and 3-year (AUROC 0.76)
outcomes50. Unfortunately, the algorithm was not able to be validated for prediction of fibrosis
progression due to small sample size. In a follow-up study utilizing a cohort of 72,683 veterans with
Hepatocellular Carcinoma
In a study of 59 tissue samples obtained from explanted livers of liver transplant recipients,
Kim et al. generated cDNA microarrays with over 9,000 genes per sample16. Through utilization of k-
nearest neighbors and support vector machine methods, they were able to identify a molecular
signature of 30 genes significantly altered in cirrhotic patients at high risk for HCC16. Unfortunately,
molecular data is not readily available for the evaluation of patient risk, therefore other methods need
to be employed to be able to identify those at greatest risk for development of HCC. In a prospective
evaluation of 442 patients with Child A or B cirrhosis, Singal et al. sought to develop and compare
predictive models utilizing regularly collected clinical data and conventional predictive methods in
comparison to ML algorithms54. A random forest approach for decision tree-based analysis
performed significantly better (C-statistic 0.71 [CI 0.63-0.79]) than conventional regression modeling
Further to this, a more recent study used a ML tool known as the Optimal Classification Tool
(OCT) to predict a given individual’s 3-month mortality on the waiting list or risk of delisting58. This
ML has also been used to predict outcomes such as acute kidney injury (AKI) and diabetes
after transplant. ML tools were employed on preoperative and intraoperative anesthesia and surgery-
related variables to predict post-operative AKI, which has been associated with increased mortality59.
Gradient boosting machine performed best among all ML methods to predict AKI of all stages (AUC
0.90, 95% CI 0.86–0.93), as compared to AUC for standard logistic regression analysis of 0.61 (95%
CI 0.56–0.66). Post-transplant diabetes mellitus (PTDM) is a major complication associated with a 2-
fold higher risk of cardiovascular events, graft loss and infections in the long term60. Different MLAs
were used to identify key predictors of PTDM in the U.S. Scientific Registry for Transplant
Recipients (SRTR)61. Increasing age, male sex, and obesity were recipient factors correlating with
increased risk of PTDM. Sirolimus as primary immunosuppressant carried a 33% higher risk of
PTDM than tacrolimus61.
Even though ML is often perceived as a black box model, a lot of work has been done to make
ML models interpretable and there are a few tools available to build a more interpretable yet powerful
model62-63. It is important to look carefully at what the model is learning. For example, some
questions one might ask can be: (1) What factors does the model find to be important? Are they
confounders? (2) Which subpopulation of patients is the model consistently misclassifying? (3) Is
there a data shift distribution between the training data and where the model would be applied?
Conclusion
Machine Learning approaches may potentially improve upon biostatistical methods to address
questions across medicine. Though standard biostatistics can be sufficient for many questions in
hepatology, ML can improve upon this, particularly when studying questions of clinical prediction.
ML approaches have been increasingly used in recent years in hepatology to examine the wealth of
clinical, molecular, radiologic and pathologic data available in liver disease. One can anticipate that
using these tools to decipher the complexity of liver disease could enhance the identification of more
optimal biomarkers and therapeutic strategies, and ultimately a more precision medicine approach to
the practice of hepatology.
Acknowledgements: The authors acknowledge the assistance of Marc Angeli and Elisa Pasini in
formatting and the submission process of this manuscript.
Figure 1: Schematic diagram of machine learning tools and applications to liver disease
Supplemental Figure 1: Growth of publications in PubMed with “hepatology” and “machine learning”.
Data compiled using Medline (PubMed) trend (http://dan.corlan.net/medline-trend.html).
Kuppili et al, 63 patients (27 normal, 36 abnormal) in Superior performance of ELM compared to SVM for all protocols, and all types
2017 US liver database of US data sets in terms of sensitivity, specificity, accuracy
Yip et al, 922 subjects from a population Laboratory parameter-based ML model (NAFLD ridge score) is a robust
2017 screening study were randomly divided reference for detecting NAFLD in the general population
into training and validation groups
Byra et al, 55 severely obese patients admitted for AUROC with proposed approach (Inception-ResNet-v2 deep convolutional
2018 bariatric surgery (mean age 40.1 ± 9.1, neural network) was higher than both the hepatorenal index method and the
mean BMI 45.9 ± 5.6, 20% of males) gray-level co-occurrence matrix algorithm
Islam et al, Dataset developed with 10 attributes Logistic Regression (LR) proved to be the best technique among other
2018 that included 994 liver patients (553 techniques (RF, SVM, and ANN) when taking into account accuracy, sensitivity,
female, 461 male) specificity, positive predictive value, and negative predictive value in prediction
of FLD
Ma et al, 10,508 patients of which 2522 patients Bayesian network model improves F-measure scores and is the best performing
2018 met the diagnostic criteria of NAFLD in accuracy, specificity, and sensitivity among 11 total MLAs
Novel ML techniques can have screening and predictive value for NAFLD
Perveen et al, Multiclass labeled dataset of 7 risk Proposed Decision Tree based method could help with management of NAFLD
2018 factors for 40,637 individuals over a patients evaluating risk and progression
period of 10 years
HEPATOCELLULAR Kim et al, Tissue samples from 59 patients with Identification of gene signatures that can be used as markers for diagnosing
CARCINOMA 2004 end stage CLD who received liver early onset HCC in high-risk populations using BRB ArrayTools – an Excel-based
transplantation platform
Tumor and matched nontumor tissue
samples from 74 patients
Records Screened:
(n = 182) Records Excluded:
(n = 122)
- No relation to liver disease, or minor/no
application of machine learning techniques
by title/abstract (n = 121)
Full test articles assessed for eligibility:
Excluded: (n = 60)
(n = 17)
- Not specific to liver patient
populations (n = 17)
Studies included:
(n = 43)
Excluded:
Manuscript review and application of inclusion criteria
(n = 3)
- Abstract only (n = 1)
- No ML technique used (n = 1)
- Mice models (n = 1)
Studies included:
(n = 40)
hep_31103_f1.eps
A Naïve Bayes Classifier & the B Gradient Boosting Machines To Predict Primary C Generative Adversarial Model
Accepted Article
risk of Cirrhosis in Viral Hepatitis
Posterior =
Prior
* Likelihood
Evidence
Sclerosing Cholangitis Survival
Low Survival
Real
Data
High Survival
Cirrhosis No Cirrhosis
Noise
Synthetic
Generator Data Real or Synthetic
Discriminator
Data?
D Support Vector Machine to Classify Shear Wave E Neural Networks & Random Forest Predict PTDM Risk in Transplant Population
Elastography (SWE) Images Hidden Layer
Output Layer
Dataset
SWE Chronic Liver Disease
Healthy
Risk of
Input Layer
PTDM
Mean
Neural Networks
Random Forest
Prediction
Sample
Real Data Generated Data
Class A
Class B σ
Class ? μ
hep_31103_f2.eps