0% found this document useful (0 votes)
36 views19 pages

Applications of Machine Learning in Routine Laboratory Medicine Current State and Future Directions 2022

Applications of Machine Learning in Routine Laboratory Medicine Current State and Future Directions 2022

Uploaded by

dsf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views19 pages

Applications of Machine Learning in Routine Laboratory Medicine Current State and Future Directions 2022

Applications of Machine Learning in Routine Laboratory Medicine Current State and Future Directions 2022

Uploaded by

dsf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

HHS Public Access

Author manuscript
Clin Biochem. Author manuscript; available in PMC 2023 May 01.
Author Manuscript

Published in final edited form as:


Clin Biochem. 2022 May ; 103: 1–7. doi:10.1016/j.clinbiochem.2022.02.011.

Applications of Machine Learning in Routine Laboratory


Medicine: Current State and Future Directions
Naveed Rabbani1,2, Grace Y. E. Kim3, Carlos J. Suarez4, Jonathan H. Chen5,6
1Department of Clinical Informatics, Lucile Packard Children’s Hospital, Palo Alto, CA
2Department of Pediatrics, Stanford University School of Medicine, Stanford, CA
Author Manuscript

3Department of Computer Science, Stanford University, Stanford, CA


4Department of Pathology, Stanford University School of Medicine, Stanford, CA
5Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine,
Stanford, CA
6Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA

Abstract
Machine learning is able to leverage large amounts of data to infer complex patterns that are
otherwise beyond the capabilities of rule-based systems and human experts. Its application to
laboratory medicine is particularly exciting, as laboratory testing provides much of the foundation
for clinical decision making. In this article, we provide a brief introduction to machine learning
Author Manuscript

for the medical professional in addition to a comprehensive literature review outlining the current
state of machine learning as it has been applied to routine laboratory medicine. Although still in
its early stages, machine learning has been used to automate laboratory tasks, optimize utilization,
and provide personalized reference ranges and test interpretation. The published literature leads
us to believe that machine learning will be an area of increasing importance for the laboratory
practitioner. We envision the laboratory of the future will utilize these methods to make significant
improvements in efficiency and diagnostic precision.

Keywords
Artificial Intelligence; Clinical Pathology; Biochemistry; Precision Medicine; Clinical Decision
Support
Author Manuscript

Corresponding author: Naveed Rabbani; [email protected].


Conflicts of Interest: Jonathan H Chen is the co-founder of Reaction Explorer LLC, which develops and licenses organic chemistry
software. He has received consulting fees from Sutton Pierce and Younker Hyde MacFarlane PLLC. Naveed Rabbani has received
consulting fees from Atropos LLC.
Ethics Statement: This study does not involve the use of human subjects.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our
customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review
of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered
which could affect the content, and all legal disclaimers that apply to the journal pertain.
Rabbani et al. Page 2

1. Introduction
Author Manuscript

The application of machine learning in medicine has garnered enormous attention over the
past decade [1–3]. Novel computational methods provide a way to learn from past examples
in order to infer complex patterns beyond the capabilities of rule-based algorithms. Along
with this attention comes expectations and promises that advances in computation will
transform the way that medicine is practiced. In fact, there are already several examples
of machine learning methods that have been approved for use by the US Food and
Drug Administration (FDA), most recognizably in the field of radiology, cardiology, and
pathology [4,5].

The use of machine learning in laboratory medicine has also gained traction and is an
increasingly important area of which practitioners should stay abreast [6–8]. The numerical
and structured format of data in laboratory medicine lends itself well to computational
Author Manuscript

methods such as machine learning. Such advances harbor promise for the future of
medicine, where laboratory testing provides much of the basis for clinical decision making.

In this review we provide a practical introduction to machine learning for the laboratory
medicine specialist and a survey of ongoing work using machine learning in routine
laboratory testing and laboratory information systems. While there has been extensive work
in the use of machine learning in the greater field of clinical pathology, this review will
focus on its application in routine laboratory testing including clinical chemistries and
common laboratory tests such as blood counts and urinalysis [9,10]. Similarly excluded
are machine learning algorithms that rely on laboratory data to make clinical predictions
[11–14]. Although this is another growing interest in the medical application of machine
learning, we believe such algorithms pertain more to the clinical specialty related to the
Author Manuscript

model’s use case rather than the practice of laboratory medicine. The use of machine
learning in these related fields is briefly covered in section 4.3 of the text, to serve as a
reference for readers who may be interested in further exploring these areas.

2. A Brief Primer on Machine Learning


2.1. Overview
In contrast to traditional programs that are defined by precoded rules, machine learning
refers to computer algorithms that learn from prior examples. The objective for most
supervised machine learning models is to take input data and output a predicted result.
The algorithm that performs this prediction is trained on large datasets of prior observations.
These observations (often referred to as samples) usually consist of features (or predictors),
which are the input variables, and a label, which is the dependent variable (the outcome of
Author Manuscript

interest that we wish to predict in the future).

In order to train a model, large amounts of structured data are required. Processing this
data involves cleaning and organizing data tables, imputing missing values, and reshaping or
combining observations so that they can be summarized and fed into a model. Furthermore,
these prior observations must be labeled so that the computer can learn from them (Figure
1). The majority of the work of developing a machine learning model is typically spent in

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 3

the data preparation phase. Important decisions must be made about which data to include
Author Manuscript

as model inputs and how they should be processed. Furthermore, the creation of accurate
labels also requires careful consideration and time. In many cases, labels are created through
manual enumeration by an expert; for example, a physician who reviews prior cases and
assigns diagnoses.

To put this all in context, consider a prior study that attempted to predict serum ferritin
levels based on other iron panel components [15]. In this example, the ferritin value was the
label. The other laboratory components were predictors. In cases where necessary laboratory
results were missing, the value was imputed using a statistical formula.

The retrospective data that is used to develop the machine learning model is broken up into
a training set of data and a testing set of data. The algorithm learns from the training set,
and then its performance is evaluated based on how well it runs on the testing data. This
Author Manuscript

is similar to how a student might study for a test based on published practice questions,
but a set of new questions is reserved for the actual evaluation—critical in preventing the
computer from simply memorizing the “practice questions” in the training data observations.

2.2. Subcategories within Machine Learning


Machine learning can largely be broken down into two subcategories: supervised machine
and unsupervised machine learning. Supervised machine learning, which is the most
common application of machine learning in medicine and what is described above, is when
a computer infers patterns from prior labeled data—data where the target label is known.
These labels provide feedback to the computer program as to what the correct answer is so
that the model can improve its predictions.
Author Manuscript

There are a broad suite of supervised machine learning models suitable for a variety of tasks
in medicine, including linear and logistic regression, support vector machines, and tree-
based models such as random forest and XGBoost. Tree-based algorithms were commonly
encountered in this literature review and in general have achieved good performance in
medical applications [16]. Such models use a decision tree that consists of a complex series
of decision points. The decision points are inferred during the model development based on
training data used to develop the model (Figure 2).

In contrast, unsupervised machine learning is when models are provided with an unlabeled
dataset. The model is left to describe relationships in the data according to patterns or
trends that it observes. Unsupervised machine learning can be used to discover previously
unknown patterns [17,18]. Examples of unsupervised machine learning models are k-means
clustering, k-nearest neighbors, and principal component analysis.
Author Manuscript

Finally, it is important to place machine learning in context alongside artificial intelligence


and deep learning. Artificial intelligence (AI) refers to the broader field of using computers
to perform human-like tasks such as problem-solving. Machine learning is a subset of AI.
Meanwhile, deep learning is a subset of machine learning inspired by neuronal networks
of the brain. In deep learning algorithms, each layer of the neural network builds upon
the last to extract increasingly complex insights from the input data (Figure 2). The

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 4

emergent property of these methods allows for the performance of complex tasks, such as
Author Manuscript

image recognition and language interpretation, explaining its rising popularity in healthcare
[19,20].

2.3. Evaluation Metrics


In scenarios where a model is predicting a binary output (typically referred to as binary
classification), evaluation metrics are similar to what is used to evaluate diagnostic tests in
medicine: sensitivity, specificity, positive-predictive value, and so on. The area under the
receiver operator curve (AUROC), another commonly used evaluation metric, illustrates how
a model balances its false positive rate (1 - specificity) with its true positive rate (sensitivity).
The closer the AUROC is to 1, the higher the performance of the model. Table 1 outlines
some of the most common metrics used by studies in this review.
Author Manuscript

3. Current State of Machine Learning in Routine Laboratory Medicine


3.1. Search Methodology
In order to identify relevant articles, a comprehensive PubMed query was designed. The
query, shown below, utilizes text word and Medical Subject Headings (MeSH) matching
to capture articles at the intersection of machine learning and general clinical laboratory
techniques and clinical laboratory information systems. The addition of MeSH inclusion
criteria captures articles that may not explicitly mention “machine learning” or “artificial
intelligence” in the title or abstract but are indexed into PubMed as pertaining to these
topics. The exclusion criteria of the query exclude MeSH children of “Clinical Laboratory
Techniques” that do not pertain to common laboratory tests such as serum chemistries,
urinalysis, and routine hematologic tests.
Author Manuscript

( (“Artificial intelligence”[MeSH Major Topic] OR “Artificial intelligence”[Title/


Abstract] OR “machine learning”[MeSH Major Topic] OR “machine
learning”[Title/Abstract] OR “deep learning”[Title/Abstract])

AND

(“clinical lab*”[tw] OR “clinical chemistr*”[tw] OR “Laboratory medicine”[tw]


OR “Clinical Laboratory Techniques”[majr] OR “Clinical Laboratory Information
Systems”[majr]) NOT (“covid-19 testing”[mesh] OR “genetic testing”[mesh] OR
“histological techniques”[mesh] OR “immunologic tests”[mesh] or “metabolic
clearance rate”[mesh] or “microbiological techniques”[mesh] or “molecular
diagnostic techniques”[mesh] or “neonatal screening”[mesh] or “occult
blood”[mesh] or “parasite load”[mesh] or “pregnancy tests”[mesh] or “radioligand
Author Manuscript

assay”[mesh] or “semen analysis”[mesh] or “sex determination analysis”[mesh] or


“specimen handling”[mesh]) )

The query was executed with a date range filter from 1 October 2011 to 30 September 2021.
Only English-language articles were included. This search returned 583 articles within the
10-year search period. As evidenced by the number of articles returned by our literature
search query over the past decade, this topic has received increasing attention over recent
years (Figure 3). Through manual title and abstract review, 544 of the original 583 articles

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 5

were excluded. Of those excluded articles, 108 did not primarily pertain to the field of
Author Manuscript

laboratory medicine or clinical pathology. Other major excluded themes included laboratory
imaging such as microscopy and cytology (166 articles), clinical prediction algorithms (90
articles), molecular medicine (47 articles), and microbiology (21 articles). The remaining 39
articles underwent full manuscript review, after which a total of 18 were included (Figure 4).

3.2. Literature Review


Our review of the current state of machine learning in laboratory medicine reveals
several exciting applications including predicting laboratory test values, improving
laboratory utilization, automating laboratory processes, promoting precision laboratory
test interpretation, and finally improving laboratory medicine information systems. In
these studies, tree-based learning algorithms and neural networks often achieved the best
performance. Table 2 summarizes the characteristics and themes from this literature review.
Author Manuscript

Articles are described below, by application.

3.2.1. Laboratory Test Value Prediction and Laboratory Utilization—One of the


prevailing themes in how machine learning has been applied to laboratory medicine is for
the prediction of laboratory results based on other clinically available data. Authors of these
studies propose that such models can be used to power clinical decision support tools for
ordering providers and optimize laboratory testing utilization.

One of the first examples of this type of work is the study by Azarkhish et al. [21] in which
a neural network model predicted iron deficiency anemia and serum iron levels based on
features from a routine complete blood count. The model achieved an impressive AUROC
of 98% for the binary classification of iron-deficiency anemia. It predicted the actual serum
iron level with less accuracy, achieving a root-mean squared error of 0.136 mcg/dL and
Author Manuscript

R2 of 0.93. It is important to note, however, this study was limited by the relatively small
number of participants, with 149 subjects in the training group and 54 subjects in the testing
group.

This work continued with Luo et al. [15], who conceived a clinical decision support tool
capable of predicting laboratory test results from related laboratory results and other clinical
information. As a proof of concept, they demonstrated a machine learning algorithm that
was capable of predicting whether serum ferritin level was abnormal with considerable
accuracy—achieving an AUROC of 97% using a random forest imputation method to fill
in required missing laboratory features that were then fed into a logistic regression model.
Meanwhile, Lidbury et al. [22] also studied the redundancy of laboratory test panels, with
a focus on liver function tests. They were able to predict whether ɣ-glutamyl transferase
Author Manuscript

(GGT) was normal or abnormal using other components of the liver function panel,
achieving an accuracy of 90% with a tree-based machine learning model. They concluded
that GGT offered little additional value beyond the other components of a typical liver
function panel.

Along similar lines of test result prediction and lab utilization, Xu et al. [23] studied
a machine learning model to predict laboratory test results as normal or abnormal in
order to identify low-yield, repetitive laboratory tests. Their group performed a multi-site

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 6

study of nearly 200,000 inpatient laboratory testing orders to identify the most repetitive
Author Manuscript

laboratory tests, and then attempted to predict each one. They were able to achieve an
AUROC of >90% for 20 common laboratory tests including sodium, hemoglobin, and
lactate dehydrogenase. They proposed a sensitive decision threshold pertaining to a negative
predictive value of 95% to power a clinical decision support tool aimed at reducing low-
yield, repetitive testing.

In the same realm of clinical decision support and laboratory utilization, Islam et al.
[24] developed a deep learning machine learning model capable of recommending what
laboratory tests a provider should order. Rather than predicting specific test results, their
model predicted what tests should be ordered in the first place. Using features such as
clinical diagnoses, medications, prior laboratory tests, and demographic information, their
neural network model was able to achieve moderate performance with AUROCmacro of
0.76 and AUROCmicro of 0.87. One important limitation of this study, however, is that the
Author Manuscript

algorithm learned from prior testing patterns, but no expert determination was made about
whether these prior ordering behaviors were optimal in the first place. Thus a model like this
is prone to learning undesirable practices from historic testing patterns.

Lee et al. [25] from South Korea proposed a neural network deep learning model
to predict low density lipoprotein cholesterol (LDL-C) from high density lipoprotein
cholesterol (HDL-C), total cholesterol, and triglycerides model compared to a ground
truth of fractionated LDL-C measurement. They showed that their model achieved better
performance than the historical Friedewald equation [26] and Martin’s “novel method” [27],
with a root mean squared error of 8.1 mg/dL versus 10.8 mg/dL and 8.3 mg/dL respectively.

Finally, Dunn et al. [28] completed an experimental study using a machine learning
Author Manuscript

regression model to predict common laboratory tests using data from wearable devices such
as accelerometers and electrodermal probes sensors. Unfortunately, this futuristic take on
laboratory test prediction was unable to achieve meaningful performance. For example, their
model using wearable data was able to explain only 21% of the variability in hematocrit
level, which was the laboratory test for which the model performed best.

3.2.2. Validation and Quality Assurance/Quality Control—Our review of the


literature also revealed many examples of using machine learning for test result validation
and quality control in routine laboratory medicine. For example, Demirci et al. [29]
developed a neural network machine learning algorithm capable of classifying whether a
critical lab result was valid or invalid. They studied several common biochemical assays
such as electrolytes and liver function tests and used prior test results, lab indices such as
hemolytic index, and demographic information in order to predict whether a critical result
Author Manuscript

was valid. Model results were compared to expert opinion of a group of biochemists. The
model was able to correctly classify critical values as valid with a sensitivity of 91% at
a specificity of 100%, meaning the model could drastically reduce the number of critical
results requiring manual validation while keeping the rate of incorrectly validated tests to a
minimum.

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 7

Similarly, Wang et al. [30] developed an ensemble of tree-based algorithms to automate


Author Manuscript

the test result verification process. Their model was able to automatically verify laboratory
results with a sensitivity of 99.9% and specificity of 98%. On their retrospective data, this
would have led to an 80% reduction in laboratory reports requiring manual verification
compared to their current rule-based verification system.

Meanwhile, Cao et al. [31] used a tree-based machine learning model to reduce the volume
of samples flagged for manual review. Their model, which used features from a 10-point
dipstick and urine cytometry measurements, called for a manual review rate of 32%, which
corresponded to a sensitivity of 92% and specificity of 81.5% compared against expert-label
ground truth manual urine microscopy results.

As for quality assurance/quality control, Farrell et al. [32] showed that a machine
learning algorithm for identifying mislabeled lab samples was able to outperform manual
Author Manuscript

verification. Their best performing algorithm was a neural network that achieved an AUROC
of 98%. A limitation of this study, however, is that they do not compare their performance
against rule-based delta checks, which are the current gold standard. Meanwhile, a neural
network machine learning algorithm by Fang et al. [33] was able to classify if a blood
specimen was clotted or not with moderate accuracy (AUROC 91%). Their algorithm used
coagulation testing results from the sample and compared model outputs to a ground truth of
manual inspection for clotting by laboratory technicians.

3.2.3. Test Result Interpretation and Personalized Reference Ranges—


Additionally, there are several recent studies aimed at using machine learning in
laboratory medicine for test interpretation and personalized reference ranges—efforts
towards achieving precision medicine. With regards to test interpretation, Wilkes et al.
Author Manuscript

[34] developed a tree-based machine learning model capable of classifying a urine


steroid profile as either normal or potentially abnormal (compared to manually expert-
labeled interpretations) with an AUROC of 96%. They then sought to interpret urinary
steroid profiles into specific pathophysiologic conditions such as “adrenal suppression” or
“congenital adrenal hyperplasia.” Their model was able to achieve modest performance at
this more complicated multiclass classification problem with an accuracy of 87%.

Peng et al. [35] were able to achieve significant test performance improvements with a
tree-based (random forest) machine learning model capable of reducing false positives from
newborn screening, a common issue with the highly sensitive assay. Their model, which
used 39 metabolic analytes and clinical variables such as weight and gestational age was
able to reduce false positives by 98% for ornithine transcarbamylase deficiency and 89% for
glutaric acidemia type 1, without sacrificing any test sensitivity. They published their tool
Author Manuscript

online for providers to use freely.

Finally, one of the most promising applications of machine learning in medicine is the
general development of “personalized” medical diagnosis and interpretation. To this effect,
Poole et al. [36] demonstrated that a series of statistical learning methods can be used
to create more personalized reference ranges by analyzing test result distributions against
clinical features such as diagnosis codes. In an earlier study from China, Yang et al. [37] also

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 8

demonstrated a neural network model capable of predicting reference ranges for erythrocyte
Author Manuscript

sedimentation rate (ESR) testing, which is known to vary based on geographic factors such
as altitude. Their algorithm uses a number of environmental variables and is able to predict
ESR reference ranges for laboratories across China, differing only up to 3% from established
reference ranges (which vary from 4 to 21 mm/hr).

3.2.4. Laboratory Information Systems—Machine learning has also been applied


to laboratory information systems to improve operations and enable clinical research.
For example, two studies in this review proposed machine learning methods to map
laboratory data to standard LOINC (Observation Identifiers Names and Codes) codes to
enable interoperability and clinical research. LOINC codes are the current industry standard
ontology for representing measurement data in healthcare, including laboratory testing
healthcare [38].
Author Manuscript

In one such article, Fillmore et. al [39] studied a group of models for mapping 7 common
laboratory concepts to the United States Department of Veterans Affairs (US VA) medical
records system, where LOINC mappings are imperfect. The best performance was achieved
by a tree-based (random forest) model with an accuracy of 98%, presenting a significant
improvement over what was an otherwise tedious task of manually reviewing hundreds of
possible conceptual links.

Similarly, Parr et al. [40] developed a machine learning model to assign missing LOINC
codes and improve the accuracy of existing codes in the US VA medical records data
warehouse. Their tree-based machine learning algorithm was able to correctly identify the
LOINC code with a rate of 85% in unlabeled laboratory tests and correctly identify the
LOINC code in 96% of randomly selected previously labeled laboratory tests. In cases
Author Manuscript

where the algorithm differed from the currently assigned LOINC code, manual review
revealed that the machine learning algorithm was correct 83% of the time, compared to the
72% accuracy rate of the incumbent label.

4. Discussion
4.1. Reflections and Future Direction
Machine learning is able to leverage large amounts of data to infer complex relationships
and patterns that may otherwise be beyond the capabilities of a rule-based system or human
expert. Furthermore, while static rule-based algorithms are based on previously established
knowledge, machine learning can identify new patterns and applications, and continuously
use new data to improve its performance.
Author Manuscript

Along those lines, one of the most promising aspects of artificial intelligence in laboratory
medicine has been its success in automation. The reviewed work demonstrates significant
advancements in using machine learning algorithms to improve upon current rule-based
methods for identifying samples for manual verification or validation. Such algorithms have
already achieved excellent performance and we anticipate will soon be commonplace in the
modern laboratory.

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 9

Another exciting application of machine learning is its ability to leverage large amounts of
Author Manuscript

prior medical data to create more personalized interpretation of test results. Although still
in its early stages, we see the foundations for this in the work by Poole et al. who propose
a relatively simple method of using diagnosis codes to achieve slightly more personalized
reference ranges [36]. We envision future work into more comprehensive algorithms that
consider the entire clinical context of the patient to provide personalized laboratory test
reference ranges to enable precision diagnostics. The paradigm will shift from “What is a
normal hemoglobin?” to “What is a normal hemoglobin for you.”

Finally, while it is the focus of many articles considered in this review, considerable work
must still be done until machine learning prediction of laboratory test results can be utilized
to make changes in clinical practice. While studies have demonstrated a high level of
redundancy in lab panels and ordering practices, attempts at predicting laboratory test results
still fail to achieve consistently high performance across a variety of tests. Despite the lack
Author Manuscript

of a generalizable solution in this space, there are opportunities for smaller gains to be made
by optimizing testing utilization in specific situations.

4.2. Challenges to the Field


There is great excitement regarding the future of machine learning in laboratory medicine,
however, there are significant challenges that must be addressed as well. From a technical
perspective, one of the biggest limitations faced by machine learning algorithms in medicine
is data quality. Laboratory information systems are plagued with incorrectly labeled or
missing data, limiting the maximum performance that any algorithm can achieve. Another
technical and financial challenge includes the cost of the computational infrastructure,
along with the cost of personnel with the right computational expertise to develop, deploy,
maintain, and update the machine learning algorithms and the software tools needed to run
Author Manuscript

them.

From a clinical point of view, as a relatively young field, machine learning in laboratory
medicine requires standardization and regulation. Currently, there are no guidelines
regarding the best practices for the clinical validation of machine learning algorithms.
Even in pathology fields where clinical machine learning tools are developing rapidly,
such as digital pathology, there are no well-established guidelines for laboratory-developed
applications or for the verification of vendor-developed software [41]. In fact, just recently,
the College of American Pathologists (CAP) assembled a committee to start addressing this
gap, including the creation of laboratory standards for AI applications [42].

Similarly, regulatory entities, such as the FDA, have not completely determined what their
Author Manuscript

role will be in the regulation of laboratory-developed machine learning applications. In


2021, the FDA published an action plan to update the proposed regulatory framework
for artificial intelligence/machine learning-based software as a medical device [43]. This
is an important step to regulate the market of such software in medicine. However, it is
still unclear what the position of federal and state regulators will be in regard to laboratory-
developed machine learning tools. Finally, more studies assessing the actual implementation
of these applications in clinical laboratories and demonstrating their safety and reliability
will be necessary to have laboratory medicine professionals fully embrace this technology.

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 10

4.3. Machine Learning in Related Fields


Author Manuscript

This review focuses on the use of machine learning in routine laboratory testing. However,
there has been much attention given to the application of machine learning to the broader
field of pathology. Several related applications are briefly discussed here to serve as a
reference for laboratory medicine practitioners who may wish to explore these topics further.

The digitization of histopathology slides have allowed for the widespread utilization of
computer vision and other artificial intelligence methods for image interpretation. Many
studies in this area focus on histopathology in cancer [44–46]. Along these lines, the FDA
recently approved the first artificial intelligence product in pathology that can identify areas
of interest in prostate biopsy slides [47].

Similarly, digital image acquisition in microscopy has spawned additional work in this
field. Identification of cellular events such as mitosis or apoptosis can be used to flag
Author Manuscript

areas of dysregulated growth or quantify response to chemotherapy [48,49]. Applications of


artificial intelligence in specialized cytometry have also allowed for more precise detection
of neoplastic cell lines and cellular markers of interest [50,51].

Finally, related to laboratory medicine is the use of machine learning for point-of-care
testing. In this field, there has been considerable emphasis on the use of predictive
algorithms in continuous glucose monitoring for patients with diabetes [52].

5. Conclusion
Machine learning promises exciting advancements in medicine, but its application in
laboratory medicine is still nascent. As a young field, there is additional need for
standardization of how these algorithms are developed and presented. Regardless, several
Author Manuscript

machine learning models have achieved excellent performance in automating test result
validation and triaging samples for manual review. There is also exciting, ongoing work
in using machine learning for optimizing laboratory utilization, predicting laboratory test
results, and providing personalized laboratory test interpretation.

Acknowledgements:
The authors wish to thank Connie Wong, medical education librarian, for her help in forming our literature search
query.

Funding:
Jonathan H Chen was supported in part by the NIH/National Library of Medicine Award R56LM013365, the
Stanford Artificial Intelligence in Medicine and Imaging and Human-Centered Artificial Intelligence (AMIA-HAI)
Author Manuscript

Partnership Grant, Stanford Aging and Ethnogeriatrics Research Center (under NIH/National Institute on Aging
grant P30AG059307), the Stanford Clinical Excellence Research Center (CERC), and the Stanford Departments of
Medicine and Pathology.

References
[1]. Darcy AM, Louie AK, Roberts LW. Machine Learning and the Profession of Medicine. JAMA
2016;315:551–2. [PubMed: 26864406]

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 11

[2]. Obermeyer Z, Emanuel EJ. Predicting the Future - Big Data, Machine Learning, and Clinical
Medicine. N Engl J Med 2016;375:1216–9. [PubMed: 27682033]
Author Manuscript

[3]. Beam AL, Kohane IS. Big Data and Machine Learning in Health Care. JAMA 2018;319:1317–8.
[PubMed: 29532063]
[4]. Benjamens S, Dhunnoo P, Meskó B. The state of artificial intelligence-based FDA-approved
medical devices and algorithms: an online database. NPJ Digit Med 2020;3:118. [PubMed:
32984550]
[5]. Cui M, Zhang DY. Artificial intelligence and computational pathology. Lab Invest 2021;101:412–
22. [PubMed: 33454724]
[6]. Lippi G, Bassi A, Bovo C. The future of laboratory medicine in the era of precision medicine. J
Lab Precis Med 2016;1:1–5.
[7]. Cabitza F, Banfi G. Machine learning in laboratory medicine: waiting for the flood? Clin Chem
Lab Med 2018;56:516–24. [PubMed: 29055936]
[8]. Paranjape K, Schinkel M, Hammer RD, Schouten B, Nannan Panday RS, Elbers PWG, et al.
The Value of Artificial Intelligence in Laboratory Medicine. Am J Clin Pathol 2021;155:823–31.
[PubMed: 33313667]
Author Manuscript

[9]. Pillay TS. Artificial intelligence in pathology and laboratory medicine. J Clin Pathol 2021;74:407–
8. [PubMed: 34031137]
[10]. Niazi MKK, Parwani AV, Gurcan MN. Digital pathology and artificial intelligence. Lancet Oncol
2019;20:e253–61. [PubMed: 31044723]
[11]. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in
developing risk prediction models with electronic health records data: a systematic review. J
Am Med Inform Assoc 2017;24:198–208. [PubMed: 27189013]
[12]. Brnabic A, Hess LM. Systematic literature review of machine learning methods used in the
analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak
2021;21:54. [PubMed: 33588830]
[13]. Tomašev N, Glorot X, Rae JW, Zielinski M, Askham H, Saraiva A, et al. A clinically applicable
approach to continuous prediction of future acute kidney injury. Nature 2019;572:116–9.
[PubMed: 31367026]
[14]. Fleuren LM, Klausch TLT, Zwager CL, Schoonmade LJ, Guo T, Roggeveen LF, et al. Machine
Author Manuscript

learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test
accuracy. Intensive Care Med 2020;46:383–400. [PubMed: 31965266]
[15]. Luo Y, Szolovits P, Dighe AS, Baron JM. Using Machine Learning to Predict Laboratory Test
Results. Am J Clin Pathol 2016;145:778–88. [PubMed: 27329638]
[16]. Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning
algorithms for disease prediction. BMC Med Inform Decis Mak 2019;19:281. [PubMed:
31864346]
[17]. Wang Y, Zhao Y, Therneau TM, Atkinson EJ, Tafti AP, Zhang N, et al. Unsupervised machine
learning for the discovery of latent disease clusters and patient subgroups using electronic health
records. J Biomed Inform 2020;102:103364. [PubMed: 31891765]
[18]. Roohi A, Faust K, Djuric U, Diamandis P. Unsupervised Machine Learning in Pathology: The
Next Frontier. Surg Pathol Clin 2020;13:349–58. [PubMed: 32389272]
[19]. Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng
2018;2:719–31. [PubMed: 31015651]
[20]. Shrestha A, Mahmood A. Review of Deep Learning Algorithms and Architectures. IEEE Access
Author Manuscript

2019;7:53040–65.
[21]. Azarkhish I, Raoufy MR, Gharibzadeh S. Artificial intelligence models for predicting iron
deficiency anemia and iron serum level based on accessible laboratory data. J Med Syst
2011;36:2057–61. [PubMed: 21503744]
[22]. Lidbury BA, Richardson AM, Badrick T. Assessment of machine-learning techniques on large
pathology data sets to address assay redundancy in routine liver function test profiles. Diagnosis
(Berl) 2015;2:41–51. [PubMed: 29540013]

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 12

[23]. Xu S, Hom J, Balasubramanian S, Schroeder LF, Najafi N, Roy S, et al. Prevalence


and Predictability of Low-Yield Inpatient Laboratory Diagnostic Tests. JAMA Netw Open
Author Manuscript

2019;2:e1910967. [PubMed: 31509205]


[24]. Islam MM, Yang H-C, Poly TN, Li Y-CJ. Development of an Artificial Intelligence-Based
Automated Recommendation System for Clinical Laboratory Tests: Retrospective Analysis of the
National Health Insurance Database. JMIR Med Inform 2020;8:e24163. [PubMed: 33206057]
[25]. Lee T, Kim J, Uh Y, Lee H. Deep neural network for estimating low density lipoprotein
cholesterol. Clin Chim Acta 2018;489:35–40. [PubMed: 30448282]
[26]. Friedewald WT, Levy RI, Fredrickson DS. Estimation of the concentration of low-density
lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin Chem
1972;18:499–502. [PubMed: 4337382]
[27]. Martin SS, Blaha MJ, Elshazly MB, Toth PP, Kwiterovich PO, Blumenthal RS, et al. Comparison
of a novel method vs the Friedewald equation for estimating low-density lipoprotein cholesterol
levels from the standard lipid profile. JAMA 2013;310:2061–8. [PubMed: 24240933]
[28]. Dunn J, Kidzinski L, Runge R, Witt D, Hicks JL, Schüssler-Fiorenza Rose SM, et al.
Wearable sensors enable personalized predictions of clinical laboratory measurements. Nat Med
Author Manuscript

2021;27:1105–12. [PubMed: 34031607]


[29]. Demirci F, Akan P, Kume T, Sisman AR, Erbayraktar Z, Sevinc S. Artificial Neural Network
Approach in Laboratory Test Reporting: Learning Algorithms. Am J Clin Pathol 2016;146:227–
37. [PubMed: 27473741]
[30]. Wang H, Wang H, Zhang J, Li X, Sun C, Zhang Y. Using machine learning to develop an
autoverification system in a clinical biochemistry laboratory. Clin Chem Lab Med 2020;59:883–
91. [PubMed: 33554565]
[31]. Cao Y, Cheng M, Hu C. UrineCART, a machine learning method for establishment of review
rules based on UF-1000i flow cytometry and dipstick or reflectance photometer. Clin Chem Lab
Med 2012;50:2155–61. [PubMed: 23093270]
[32]. Farrell C-J. Identifying mislabelled samples: Machine learning models exceed human
performance. Ann Clin Biochem 2021:45632211032991.
[33]. Fang K, Dong Z, Chen X, Zhu J, Zhang B, You J, et al. Using machine learning to identify
clotted specimens in coagulation testing. Clin Chem Lab Med 2021;59:1289–97. [PubMed:
33660491]
Author Manuscript

[34]. Wilkes EH, Rumsby G, Woodward GM. Using Machine Learning to Aid the Interpretation of
Urine Steroid Profiles. Clin Chem 2018;64:1586–95. [PubMed: 30097499]
[35]. Peng G, Tang Y, Cowan TM, Enns GM, Zhao H, Scharfe C. Reducing False-Positive Results in
Newborn Screening Using Machine Learning. Screening 2020;6. 10.3390/ijns6010016.
[36]. Poole S, Schroeder LF, Shah N. An unsupervised learning method to identify reference intervals
from a clinical database. J Biomed Inform 2015;59:276–84. [PubMed: 26707631]
[37]. Yang Q, Mwenda KM, Ge M. Incorporating geographical factors with artificial neural networks
to predict reference values of erythrocyte sedimentation rate. Int J Health Geogr 2013;12:11.
[PubMed: 23497145]
[38]. Huff SM, Rocha RA, McDonald CJ, De Moor GJ, Fiers T, Bidgood WD Jr, et al. Development
of the Logical Observation Identifier Names and Codes (LOINC) vocabulary. J Am Med Inform
Assoc 1998;5:276–92. [PubMed: 9609498]
[39]. Fillmore N, Do N, Brophy M, Zimolzak A. Interactive Machine Learning for Laboratory Data
Integration. Stud Health Technol Inform 2019;264:133–7. [PubMed: 31437900]
Author Manuscript

[40]. Parr SK, Shotwell MS, Jeffery AD, Lasko TA, Matheny ME. Automated mapping of laboratory
tests to LOINC codes using noisy labels in a national electronic health record system database. J
Am Med Inform Assoc 2018;25:1292–300. [PubMed: 30137378]
[41]. Baxi V, Edwards R, Montalto M, Saha S. Digital pathology and artificial intelligence in
translational medicine and clinical practice. Mod Pathol 2022;35:23–32. [PubMed: 34611303]
[42]. College of American Pathologists. Artificial Intelligence (AI) Committee. College
of American Pathologists 2021. https://www.cap.org/member-resources/councils-committees/
artificial-intelligence-ai-committee/ (accessed January 24, 2022).

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 13

[43]. US Food and Drug Administration. Artificial intelligence/machine learning (ai/ml)-based


software as a medical device (SAMD) action plan 2021. https://www.fda.gov/media/145022/
Author Manuscript

download (accessed January 24, 2022).


[44]. Korbar B, Olofson AM, Miraflor AP, Nicka CM, Suriawinata MA, Torresani L, et al. Deep
Learning for Classification of Colorectal Polyps on Whole-slide Images. J Pathol Inform
2017;8:30. [PubMed: 28828201]
[45]. Nagpal K, Foote D, Liu Y, Chen P-HC, Wulczyn E, Tan F, et al. Development and validation
of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med
2019;2:48. [PubMed: 31304394]
[46]. Couture HD, Williams LA, Geradts J, Nyante SJ, Butler EN, Marron JS, et al. Image analysis
with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic
subtype. NPJ Breast Cancer 2018;4:30. [PubMed: 30182055]
[47]. Office of the Commissioner. FDA Authorizes Software that Can Help Identify Prostate Cancer
2021. https://www.fda.gov/news-events/press-announcements/fda-authorizes-software-can-help-
identify-prostate-cancer (accessed October 27, 2021).
[48]. Grys BT, Lo DS, Sahin N, Kraus OZ, Morris Q, Boone C, et al. Machine learning and computer
Author Manuscript

vision approaches for phenotypic profiling. J Cell Biol 2016;216:65–71. [PubMed: 27940887]
[49]. Falk T, Mai D, Bensch R, Çiçek Ö, Abdulkadir A, Marrakchi Y, et al. U-Net: deep learning for
cell counting, detection, and morphometry. Nat Methods 2018;16:67–70. [PubMed: 30559429]
[50]. Wang S, Zhou Y, Qin X, Nair S, Huang X, Liu Y. Label-free detection of rare circulating tumor
cells by image analysis and machine learning. Sci Rep 2020;10:12226. [PubMed: 32699281]
[51]. Syed-Abdul S, Firdani R-P, Chung H-J, Uddin M, Hur M, Park JH, et al. Artificial Intelligence
based Models for Screening of Hematologic Malignancies using Cell Population Data. Sci Rep
2020;10:4583. [PubMed: 32179774]
[52]. Perkins BA, Sherr JL, Mathieu C. Type 1 diabetes glycemic management: Insulin therapy,
glucose monitoring, and automation. Science 2021;373:522–7. [PubMed: 34326234]
Author Manuscript
Author Manuscript

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 14
Author Manuscript

Figure 1.
Machine learning models are trained using prior observations (samples). Features from prior
observations are extracted and processed into a data matrix. In supervised machine learning,
each observation is labeled with an outcome.
Author Manuscript
Author Manuscript
Author Manuscript

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 15
Author Manuscript

Figure 2.
Author Manuscript

Graphical representation of types of machine learning models: (A) a simple decision tree
and (B) deep learning neural network.
Author Manuscript
Author Manuscript

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 16
Author Manuscript
Author Manuscript

Figure 3.
Bar plot showing pubmed query results by year, adjusted by number of months included in
the queried year (i.e. only October to December of 2011 and January to September of 2021
are included in the search query date range).
Author Manuscript
Author Manuscript

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 17
Author Manuscript
Author Manuscript
Author Manuscript

Figure 4.
Diagram showing manuscript inclusion and exclusion criteria for review.
Author Manuscript

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 18

Table 1.

Common evaluation metrics used to describe machine learning model performance.


Author Manuscript

Evaluation Metric Definition


Accuracy Proportion of correct predictions over total number of predictions

Positive Predictive Value Probability that a predicted positive is truly positive

Negative Predictive Value Probability that a predicted negative is truly negative

True Positive Rate (Sensitivity) Probability that a truly positive result is predicted to be positive

True Negative Rate (Specificity) Probability that a truly negative result is predicted to be negative

False Positive Rate = 1-Specificity Probability that a truly negative result is falsely predicted to be positive
Author Manuscript
Author Manuscript
Author Manuscript

Clin Biochem. Author manuscript; available in PMC 2023 May 01.


Rabbani et al. Page 19

Table 2.

Summary of characteristics of machine learning algorithms included for review.


Author Manuscript

Author and Year Objective and Machine Learning Task Best Model Major Themes
Predict iron deficiency anemia and serum iron levels from CBC
Azarkhish (2012) Neural Network Prediction
indices

Cao (2012) Triage manual review for urinalysis samples Tree-based Automation

Predict normal reference ranges of ESR for various laboratories


Yang (2013) Neural Network Interpretation
based on geographic and other clinical features

Predict liver function test results from other tests in the panel,
Lidbury (2015) Tree-based Prediction, Utilization
highlighting redundancy in the liver function panel

Automation,
Classify whether critical lab result is valid or invalid using other lab
Demirci (2016) Neural Network Interpretation,
values and clinical information
Validation

Luo (2016) Predict ferritin from other tests in iron panel Tree-based Prediction, Utilization

Create personalized reference ranges that take into account patients’


Poole (2016) Unsupervised learning Interpretation
Author Manuscript

diagnoses

Automate mapping of Veterans Affair laboratory data to LOINC Information systems,


Parr (2018) Tree-based
codes Automation

Classify urine steroid profiles as normal or abnormal, and further Interpretation,


Wilkes (2018) Tree-based
interpret into specific disease processes Automation

Automate mapping of Veterans Affair laboratory data to LOINC Information systems,


Fillmore (2019) Tree-based
codes Automation

Predict LDL-C levels from a limited lipid panel more accurately than Interpretation,
Lee (2019) Neural Network
current gold standard equations Prediction

Identify redundant laboratory tests and predict their results as normal


Xu (2019) Tree-based Prediction, Utilization
or abnormal

Use prior ordering patterns to create an algorithm that can


Islam (2020) Neural Network Utilization
recommend best practice tests for specific diagnoses

Interpret newborn screening assays based on gestational age and Interpretation,


Peng (2020) Tree-based
other clinical information to reduce false positives Utilization
Author Manuscript

Validation,
Wang (2020) Automatically verify if lab test result is valid or invalid Tree-based
Automation

Dunn (2021) Predict laboratory test results from wearable data Tree-based Prediction

Classify blood specimen as clotted or not clotted based on


Fang (2021) Neural Network Quality control
coagulation indices
Quality control,
Farrell (2021) Automatically identify mislabelled laboratory samples Neural Network
Automation
Author Manuscript

Clin Biochem. Author manuscript; available in PMC 2023 May 01.

You might also like