0% found this document useful (0 votes)
47 views10 pages

How Machine Learning Will Transform Biomedicine

Uploaded by

vduran.inger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views10 pages

How Machine Learning Will Transform Biomedicine

Uploaded by

vduran.inger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Leading Edge

Perspective

How Machine Learning Will Transform Biomedicine


Jeremy Goecks,1,* Vahid Jalili,1 Laura M. Heiser,1 and Joe W. Gray1
1Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
*Correspondence: [email protected]
https://doi.org/10.1016/j.cell.2020.03.022

This Perspective explores the application of machine learning toward improved diagnosis and
treatment. We outline a vision for how machine learning can transform three broad areas of biomed-
icine: clinical diagnostics, precision treatments, and health monitoring, where the goal is to maintain
health through a range of diseases and the normal aging process. For each area, early instances of
successful machine learning applications are discussed, as well as opportunities and challenges
for machine learning. When these challenges are met, machine learning promises a future of
rigorous, outcomes-based medicine with detection, diagnosis, and treatment strategies that are
continuously adapted to individual and environmental differences.

Machine learning leverages sophisticated algorithms operating response (Zitnik et al., 2019). Automated pattern recognition
on large-scale, heterogeneous datasets to uncover useful pat- through machine learning is essential due to the enormity and
terns that would be difficult or impossible for even well-trained complexity of biomedical data; manual analysis is both inefficient
individuals to identify. There already are many applications of and untenable. Equally important, many human diseases involve
this approach throughout science and society ranging from a complex constellation of changes that occur dynamically and
game playing (Silver et al., 2018), to product recommendations vary from patient to patient. Understanding this complexity re-
(Batmaz et al., 2019), to controlling self-driving cars (Bojarski quires analysis of large-scale heterogeneous data to identify
et al., 2016). In biomedicine, work in the human genome project novel patterns that, after rigorous evaluation, can be used for
(Venter et al., 2001), efforts in cancer omics (e.g., The Cancer diagnosis and treatment. Machine learning, then, can assist
Genome Atlas [Tomczak et al., 2015], the International Cancer biomedical scientists and medical professionals by identifying
Genome Consortium [Zhang et al., 2019], and the Clinical Pro- and summarizing meaningful patterns from large datasets (Raj-
teomic Tumor Analysis Consortium [Ellis et al., 2013]), and komar et al., 2019). Careful evaluation of the patterns found
numerous international machine learning competitions such and predictions made by machine learning applications in diag-
as DREAM challenges (Saez-Rodriguez et al., 2016; Sage Bio- nosis and treatment is essential. ‘‘Ground truth’’ data, in which
networks, 2020) and the Critical Assessment of Genome Inter- associations between data and outcome are known, can be
pretation (Andreoletti et al., 2019) have shown the power of this used to rigorously evaluate the performance of novel algorithms.
approach. The ability to collect and analyze large datasets Such evaluation data may be quantitative, such as biomarker
related to medical treatments and outcomes promises to trans- reduction on treatment, or more qualitative, such as overall pa-
form medicine into a data-driven, outcomes-oriented discipline tient health. It is also important to appreciate that ground truth
with profound implications for disease detection, diagnosis, may change depending on individual characteristics such as
and treatment. Collection of molecular and phenotypic data age, gender, and environmental exposures.
has become pervasive and includes genomic testing for Recognizing this, there are a growing number of research pro-
personalized treatment of cancer, high-resolution two- and grams designed to collect and organize large-scale datasets
three-dimensional anatomical imaging of organs, histological linking variables to health status, which can be used to train
analyses of tissue biopsies, and smart watches that monitor and evaluate machine learning approaches. Programs in cancer
heart rates and notify wearers of irregularities (Shilo et al., that aggregate molecular profiles from experimental model sys-
2020). These and many other collected data provide the raw tems or patient samples together with diagnostic, prognostic,
material for a future of early, more accurate diagnoses, person- and therapeutic responses provide examples of these valuable
alized treatments, and ongoing monitoring in support of overall data repositories. For example, the Cancer Dependency Map
health. (Tsherniak et al., 2017) has collected multimodal molecular pro-
Machine learning will help realize a future of improved health files, drug response, and genetic viability data on more than
care by unlocking the potential of large biomedical and patient 1,000 cancer cell lines. The AACR Project GENIE (AACR Project
datasets. Early uses of machine learning in diagnosis and treat- GENIE Consortium, 2017) has collected genomic profiles and
ment have shown promise to diagnosis breast cancer from X- clinical data for more than 19,000 patients, and the ASCO Can-
rays (McKinney et al., 2020; Wu et al., 2019), discover new anti- cerLinQ is building a similar database of hundreds of thousands
biotics (Stokes et al., 2020), predict onset of gestational diabetes of patients. Coupled with advanced algorithms, such programs
from electronic health records (Artzi et al., 2020), and identify have the potential to transform our understanding of diseases
clusters of patients that share a molecular signature of treatment and improve our ability to predict disease outcomes.

92 Cell 181, April 2, 2020 ª 2020 Elsevier Inc.


Table 1. Key Concepts in Machine Learning multiple computational models are combined to produce a final
prediction, can lead to more accurate predictions by enabling
Concepts Definition
models to generalize to new data better (d) deep learning, which
Supervised, Supervised learning predicts labels or
uses artificial neural networks, a formalization modeled on the
unsupervised, classes on future data based on past data
and semi- that includes labels/classes. Unsupervised
human brain, to recognize patterns or associations in the data,
supervised learning identifies structure, usually is especially useful when working with unstructured data such
learning clusters, among unlabeled data. Semi- as images, speech, and text, and (e) Bayesian learning, in which
supervised learning first performs prior knowledge is encoded into the learning process and is
unsupervised learning, and humans label especially useful in data-poor situations.
structures found from unsupervised There are two complementary approaches that can be used
learning. with any of these learning methods and are especially useful
Classification Both are supervised learning methods. for biomedical applications. Many biomedical datasets have a
and regression Classification predicts discreet categories large number of features (dimensions), and the number of fea-
such as normal versus diseased while tures may exceed the number of data points. Dimensionality
regression predicts real-valued outputs reduction can help improve the performance of machine learning
such as response to therapy.
approaches by selecting a subset of relevant attributes of a data-
Ensemble Ensemble methods build many models and set or combining attributes into a smaller number that capture
learning use the average of all models to produce
variability in a dataset. Reducing the dimensions of a dataset is
predictions. Common ensemble
also useful for visualizing data or model predictions. When
approaches include random forests,
gradient-boosting, and stacking/meta-
data are distributed across multiple sites and cannot be moved
ensembles. to create a single dataset for machine learning, federated
learning approaches are used to learn incrementally across all
Deep learning Multi-layer artificial neural networks that
can learn complex non-linear functions. the data (Konec ný et al., 2016; Yang et al., 2019). Federated
Very useful for unstructured data such as learning is especially important in many biomedical applications
images, speech, or text but typically do not where data contain sensitive or protected health information that
provide insights in to the aspects of the data cannot be easily shared. Most of these approaches are concep-
that are driving the functions. tually mature but are now finding increased use as structured
Bayesian Methods that combine prior knowledge in biomedical data become available and as computer technology
learning addition to data to perform machine becomes sufficiently powerful to enable discovery of subtle but
learning. important patterns in large datasets. A recent review provides
Dimensionality Reduces the number of attributes or a brief tutorial on machine learning approaches in the life sci-
reduction features of a dataset by selecting important ences (Camacho et al., 2018). The application goals and avail-
features or combining features to capture able data dictate appropriate machine learning methods to
variance in a dataset. Often used to improve use. Table 2 lists prototypic examples of machine learning appli-
performance of machine learning models
cations for medical diagnosis and treatment.
and to aid visualization.
We expect that applications of machine learning will have a
Federated Approaches for incrementally learning from profound impact on many aspects of health management as
learning data distributed in multiple locations and
computers optimized for machine learning increase in power
which cannot be combined into a single
and as infrastructure for accurate data collection and curation
dataset. Federated learning is useful when
data are located in multiple clinical systems becomes more widely deployed. Immediate biomedical oppor-
or when learning from sensitive tunities summarized in the following sections include earlier
personal data. and more accurate disease detection, better diagnosis, and
more durable and tolerable treatments. Of course, the accuracy
of the underlying ‘‘learned’’ relationships depends on the
Machine learning is a subdiscipline of artificial intelligence, and accuracy and magnitude of the data on which learning is based.
the main conceptual approaches in machine learning are sum- This can be enhanced substantially by widely deploying stan-
marized in Table 1. Whereas artificial intelligence includes all dardized electronic medical record systems designed specif-
methods for enabling computers to display human-like under- ically to support machine learning and by supporting their
standing and intelligence, machine learning is focused specif- widespread use. Acquisition of data ‘‘at home’’ using smart-
ically on developing algorithms to learn from data. General clas- phones, commercial home assistant devices (e.g., Amazon
ses of machine learning methods include: (a) supervised learning Echo, Google Home), and other electronic devices will further
in which data groups are associated with a specific outcome; enhance robust biomedical machine learning. Looking ahead,
categorical data (e.g., disease versus normal) rely on classifica- we envision these trends merging to enable outcomes-based
tion methods whereas continuous values (e.g., strength of personalized management of patient health (Figure 1) using algo-
response to therapy) are used in regression methods, (b) unsu- rithms that increase in accuracy as the quantity and quality of
pervised or semi-supervised methods to cluster data into data grows.
discrete groups that can then be manually labeled and associ- In this Perspective, we outline a vision for how machine
ated with outcome, (c) ensemble learning, where results from learning can be applied to make critical advances in

Cell 181, April 2, 2020 93


biomedicine. We focus on three biomedical areas: improved
clinical diagnostics, precision treatment, and health manage-

Supervised learning, deep learning,

Both supervised and deep learning


Unsupervised clustering for cluster
discovery; supervised learning and

structured data with labels; deep


ment and monitoring. For each area, we describe opportunities

processing to mine unstructured


Traditional machine learning on

approaches, with adjustments


made for time-series analyses
for machine learning applications to enable new insights or
improve on current state-of-the-art approaches, discuss suc-
deep learning for subtype

learning/natural language

data; federated learning


cessful early applications of machine learning, and highlight un-
and ensemble learning

Continuous learning
met needs to be addressed. We conclude by identifying several
cross-cutting challenges that, if solved, will help realize the full
Deep learning

potential of machine learning in biomedicine.


ML Method

assignment

Improved Diagnostics from Clinical Imaging and


Molecular Tests
Technological advances in clinical testing are generating or-
High-dimensional, structured data;

High-dimensional, structured data;

ders of magnitude more data than tests in the past. High-fidelity


Structured and unstructured data;

Structured and unstructured data;


Unstructured data; Labeled data

Unstructured, longitudinal data;

imaging tests now produce large two-, three-, or four-dimen-


sional (the fourth dimension being time) images of tissue and
organs, and molecular tests can provide assessment of hun-
dreds or even thousands of genes and proteins. Machine
learning is both essential and ubiquitous for automated analysis
Unlabeled data

Unlabeled data

of diagnostic features in these data that are strongly associated


Labeled data

(Bumgarner et al., 2018) and agonal Labeled data

Labeled data

with disease type, status or response to treatment.


Data Type

The use of deep learning to extract meaning from biomedical


images is one of the most active areas of current research.
Several recent publications have shown that computer-aided
detection (CAD) software using machine learning can interpret
(Artzi et al., 2020); patient similarity

breathing, an audible biomarker of


Diagnosis of gestational diabetes
Medical imaging diagnostics (Liu

None yet due to lack of available


cardiac arrest (Chan et al., 2019)

radiologic images on par with medical professionals indicating


Cancer subtyping (Curtis et al.,

Cancer cell line drug response

the power of this approach. For example, deep learning-based


prediction (Chiu et al., 2019;

Detection of atrial fibrillation

CAD software was able to detect diabetic retinopathy at high


2012; Gao et al., 2019)

levels of accuracy (Gulshan et al., 2016) and to retrospectively


Costello et al., 2014)

identify invasive and in situ breast cancer of all grades similar to


(Lee et al., 2018)

radiologists (McKinney et al., 2020; Wu et al., 2019). A recent


Table 2. Example Applications of Machine Learning for Diagnosis and Treatment

review found that deep learning-based approaches performed


et al., 2019)
Successes

as well as medical professionals across a range of medical im-


datasets

aging diagnostic tasks, although many of these studies are


small and have yet to perform a prospective evaluation (Liu
et al., 2019). Importantly, deep learning approaches benefit
Predict most efficacious therapies

from large datasets and will increase continually in accuracy


(1) Discover subtypes or stratify
patients; (2) Identify similarities

as the sizes of the training datasets grow.


Ongoing health management

Molecular assays can identify genetic mutations and quantify


among clustered patients

Predict clinical outcomes

gene expression levels and protein abundance from a variety of


Automated diagnoses

samples, including blood, saliva, and tissue. Machine learning


has the potential to increase the utility of these data by discov-
Wearable and home device ambient Early diagnoses

ering complex sets of biomarkers associated with various dis-


ease states, which ultimately can inform patient outcome and
identify effective treatment strategies. Some examples from
Goals

cancer biology include using DNA methylation (Kang et al.,


2017) and nucleosome positioning (Heitzer et al., 2019) from
blood to predict tumor tissue of origin, quantifying cellular
Images and associated diagnoses
Patient molecular patient profiles

pathway activation levels in biopsies and other tissue samples


Patient or laboratory molecular

EMR data + clinical outcomes

(Way and Greene, 2019; Way et al., 2018), predicting genomic


features of brain cancers using magnetic resonance images
profiles with clinical data

Deep longitudinal data

(Chang et al., 2018a), and forecasting cancer patient outcomes


without clinical data

based on multi-omics (Chaudhary et al., 2018) or imaging-


omics integrations (Mobadersany et al., 2018). Beyond cancer,
data collection

machine learning has been used to identify individuals with


sleep deprivation through analysis of mRNA in the blood, in-
Dataset

forming how sleep insufficiencies negatively affect health (La-


ing et al., 2019). Through integration of multiple data types

94 Cell 181, April 2, 2020


Figure 1. How Machine Learning Applications Could Help Individuals Maintain Health
At home, machine learning may help in early detection of disease, monitoring response to treatment, and adherence to treatment regimens. In the clinic or
hospital, machine learning may aid medical professionals to diagnosis and tune treatment for an individual patient. The dashed line shows how a patient moves
between home and clinical settings and how machine learning can help at each step to maintain health.

and biomarkers, machine learning models are likely to be sub- chine learning software is transparent will be critical before wide-
stantially more accurate than current practice, which is often spread deployment and adoption. ‘‘Transparency’’ in this
limited to a few markers and reflects only a narrow view of com- context includes description of the optimized objectives,
plex diseases. strengths, quantitative performance, and limitations of a partic-
Joint human-computer diagnostic approaches such as those ular algorithm (Cai et al., 2019) as well as the procedures
illustrated in Figure 1, are likely to become common because used to validate the algorithm. These attributes will help medical
they take advantage of the strengths of both humans and com- professionals decide when and how to use machine learning ap-
puters. In this collaborative approach, physicians will make a plications to obtain valid results and improve decision making.
final diagnosis by integrating all available information, including Applications that use machine learning can help build trust in
that provided by machine learning systems (Ahuja, 2019). Ma- the system and facilitate deeper understanding of the underlying
chine learning systems will have a key role by automating routine biological mechanism of disease by explaining predictions, such
diagnosis, flagging challenging cases that require more human as by highlighting the most important features used (Ching et al.,
input, and providing additional information useful in making diag- 2018; Litjens et al., 2016).
noses (e.g., Ardila et al., 2019). Moreover, machine learning sys- As more advanced clinical testing technologies are coupled
tems may use different features than medical professionals to with machine learning, it will be important to consider tradeoffs
make diagnoses, though care will be required to assess the bio- between disease detection rates, patient outcomes, and other
logical utility of such features. As a result, approaches that inte- factors that impact patient health and quality of life. Disease
grate knowledge from both medical professionals and advanced detection rates may increase with the use of machine learning
algorithms will lead to improved diagnoses. Ensuring that ma- technologies, and disease-specific research will be needed to

Cell 181, April 2, 2020 95


differentiate indolent versus fatal disease to avoid over-treat- performing predictions recommended for use in treatment.
ment and to identify disease subtypes in order to guide the selec- This hybrid approach has many advantages: machine learning
tion of the most effective treatments for each subtype. Careful models can dramatically reduce the space of potential treatment
framing of clinical goals that can be connected to evaluation combinations to be considered and identify others that might
and validation metrics will ensure that machine learning im- otherwise be overlooked. An experimental validation step could
proves patient care and overall health (Chen et al., 2019b). be added to provide additional evidence that a predicted therapy
is likely to be effective.
Precision Treatment through Multiscale Modeling and Precision medicine will also be advanced by using machine
Expert Guidance learning to automatically mine and search expert knowledge in
One of the most promising application areas for machine published literature and patient databases (Rajkomar et al.,
learning is precision medicine, where a patient receives medical 2019). Patient databases, usually in the form of electronic health
care and treatment tailored to their personal disease profile. Pre- records (EHRs), represent a rich source of information about
cision oncology, where the goal is to prescribe cancer treat- diagnosis, treatment, and treatment response for large patient
ments based on tumor molecular characteristics, is a prime cohorts. Early efforts have attempted to use natural language
example of the challenges and opportunities for machine processing algorithms to mine publications (Dong et al., 2018),
learning in precision medicine. In current practice, individual mo- EHRs (Shickel et al., 2018), and clinical reports (Kreimeyer
lecular markers such as somatic mutations and gene expression et al., 2017; Pons et al., 2016) for useful knowledge, such as
levels are often used to inform treatment selection. However, re- biomarker-therapy associations and biological pathways of in-
sponses are often highly variable between patients due to differ- terest. Other applications have used structured information
ences at other genomic and epigenomic loci as well as anatomic from EHRs to predict disease onset (Artzi et al., 2020). Machine
disease distribution (Brown et al., 2019; Kobayashi and Mitsu- learning will help harness this information and make it useful for
domi, 2016; Rotow and Bivona, 2017). Further complicating pre- precision medicine through advanced approaches that address
cision oncology is that there are hundreds of potential drugs, and the unstructured nature of data and metadata in publications and
not every combination can be tested for every disease profile EHRs. Of course, the EHR mining approach assumes that the in-
(Gerstung et al., 2017; Kurnit et al., 2018). formation needed to establish a useful association is accurately
One way that machine learning can help overcome these chal- and completely captured. Unfortunately, this is not always the
lenges is through the development of multifactorial predictive case, and future work will be needed to increase the utility of
models that are robust against individual diversity. For example, EHR analyses.
single-purpose models have been built to forecast the functional
consequences of biological changes, such as how genetic muta- Health Management and Monitoring
tions influence splicing and gene expression (Xiong et al., 2015) We envision a shift in how complex diseases are treated, moving
as well as transcription factor binding (Chen et al., 2019a). Ma- from the goal of a cure to one of disease management. This
chine learning models have also been built to predict drug comprehensive health management approach will strive to main-
response in cancer cell lines (Chang et al., 2018b), transfer pre- tain health through a range of diseases and the normal aging pro-
dictions from cell lines to patient tumors (Chiu et al., 2019), and cess. Health management is demanding, because it requires
forecast patient response to therapies based on clinical ongoing monitoring of all aspects of health for potential disease,
response data (Huang et al., 2018). Future advancements in choosing treatments suited to individual patients, and adapting
modeling for precision therapeutics are likely to operate over treatments based on patient response (Figure 2). Here, machine
multiple scales and serve multiple purposes. Multiscale learning has a key role to play, largely by integrating many of the
modeling will use large biological datasets to investigate the ideas already discussed for diagnosis and treatment into a
growth and development of an organism across diverse tempo- continuous learning approach.
ral and spatial domains. Already there are computational models Outside of clinical settings, wearable devices and at-home
of human-virus interactions (Lasso et al., 2019), cell-cell interplay smart electronic devices provide a new avenue for health man-
such as tumor-immune cell interactions, and even whole cells agement. These devices can collect large amounts of fine-
(Metzcar et al., 2019; Rahman et al., 2017; Sakamoto et al., grained data on patient health status that can be used by ma-
2018). Eventually, we anticipate that computational models of chine learning applications to suggest one-time actions,
organs and entire individuals—so-called ‘‘digital twins’’ (Björns- changes in daily activities, or referral to a physician for assess-
son et al., 2019)—will be developed. The goal of digital twins will ment and testing. Wearable devices now include sensors for mo-
be multifaceted, such as predicting the efficacy of different com- tion, pulse, respiratory rate, body temperature, blood pressure,
bination therapies that have never been used together and oxygen levels, and other biometrics. Prototype applications
modeling the impact of disease on different organs. show how data from wearables might be useful, including: dia-
While multiscale models may become accurate enough that betes management (Chang et al., 2016), detection of atrial fibril-
their predictions can be used directly for treatment, we envision lation (Bumgarner et al., 2018), blood cholesterol monitoring (Fu
an intermediate stage in which machine learning approaches and Guo, 2018), early detection of Parkinson’s disease (Lonini
generate a ranked list of suggested therapies that can be used et al., 2018), self-adherence to medications (Car et al., 2017;
by expertly trained physicians to help guide treatment decisions. Toh et al., 2016), and early warning of heart attack (Sahoo
For instance, patient-derived laboratory models could be used et al., 2017). Speech-driven home assistants have been used
to test predictions from computational models, with the best- to detect agonal breathing, an audible biomarker that is an early

96 Cell 181, April 2, 2020


Figure 2. Integrating Data and Machine Learning Models for Continuous and Personalized Health Management
Combining data collected from both home (left) and clinical settings (right), or combining predictive models built at home and in the clinic, has the potential to lead
to comprehensive and integrated models that support personalized health management. Comprehensive models are more likely to perform well as they
incorporate more information about an individual, and these models have the potential to be applied in the home, clinic, or wherever an individual may be.

sign of cardiac arrest (Chan et al., 2019). In the future, machine with data collected for each individual. A key advantage of this
learning software is likely to be used to identify new biomarkers approach is that personal baselines can be established and de-
from wearable and audio sensor data, perhaps by integrating viations from baselines—that may indicate a change in health
data across different types of devices. Both traditional super- status—can be detected. Using personalized models, machine
vised learning and deep learning are likely to play roles in devel- learning applications will monitor individuals for any changes
oping models from wearable data. from normal and notify individuals when a change requires con-
Using machine learning together with data collected from sult with a medical professional. An interesting possibility along
smartphones provides new opportunities for diagnostics as these lines is suggested by recent work showing that monitoring
well. Deep learning approaches have been applied to analyze of individual internet symptom searches (in essence, self-re-
pictures from smartphone cameras to identify different types of ported health issues such as weight loss, bronchitis, cough,
skin cancers (Esteva et al., 2017) and also to diagnose diabetic chest pain, etc.) coupled with machine-learned tendencies
retinopathy (Micheletti et al., 2016). Recent studies have found from many individuals can enable early detection of lung (White
that sensory data (e.g., voice, tapping, response time, and accel- and Horvitz, 2017) and pancreatic (Paparrizos et al., 2016) can-
erometer data) collected from smartphones and processed cers. This could lead to a physician or patient alert system that
using machine learning can be used to monitor symptoms and recommends medical attention when a more serious issue may
progression of Parkinson’s disease (Arora et al., 2015; Espay explain the seemingly innocuous symptoms searched for. Of
et al., 2016; Ginis et al., 2016; Pereira et al., 2016). These proto- course, many issues regarding privacy would have to be over-
type applications suggest a role for machine learning where come to make this possible.
wearables, home devices, and smartphones are used to capture Once in a clinical setting, high-fidelity imaging and molecular
all kinds of data, including biometric measurements, photos, di- testing will be interpreted by medical professionals with the
etary intake, and even environmental information (i.e., the ‘‘expo- help of machine learning to identify noteworthy biomarkers and
some’’ [Vermeulen et al., 2020]). By connecting this information make a final diagnosis. Disease diagnoses that require treatment
with diagnoses, machine learning will be used to identify patterns will use multiscale modeling and automated search results for
within the data that suggest a particular diagnosis. similar patients to inform treatment selection.
The foundation of health management is the ongoing moni- After diagnosis and treatment, health management begins
toring of individual behavior and body functioning through again with ongoing monitoring of individual health. This time, how-
home and wearable devices together with readouts from routine ever, there are multiple goals that a machine learning system must
blood sampling. Personalized models of baseline functions and meet: monitor how the individual is responding to treatment,
activity will be built by customizing population-level models watch for any adverse effects, and monitor overall health and

Cell 181, April 2, 2020 97


changes from baseline not accounted for by treatment. Machine response in major medical centers may not perform well in com-
learning will help adapt the initial personalized model to include munity settings due to differences in overall care and patient
the new diagnosis and therapy information, creating an expected populations. However, the ultimate goal of biomedical data
trajectory on treatment that will serve as the new baseline. collection for machine learning is to obtain suitable representa-
Health management across a person’s lifespan will require tive data from patient cohorts to develop accurate machine
data integration and modeling at a level of sophistication and learning models that will generalize to diverse populations.
automation that is only possible with machine learning. Each Therefore, there must be a concerted effort to also account for
step in health management—building personalized models and variables such as patient status prior to treatment, treatment re-
using them to monitor for and accurately detect anomalies, aid- gimes, age, gender, race, ethnicity, and environmental ex-
ing physicians in diagnosis and treatment through automated posures.
processing of large datasets and patient databases, and updat- Rigorous evaluation approaches are needed for biomedical
ing individual models for new diagnoses and treatments—is data machine learning applications, especially in settings where contin-
intensive and requires automated pattern recognition of complex uous learning is required. In our view, the performance of a ma-
datasets. Health management will also continuously learn as chine learning system is best measured by the accuracy of its pre-
models will be updated with availability of new data. Two general dictions in a prospective setting. We advocate for an iterative
approaches for continuous learning are to build new predictive approach to machine learning that includes: training with retro-
models or to update existing models, and more work is needed spective data, algorithm lock-down and deployment, followed
to understand the strengths and limitations of these approaches by assessment of the application’s accuracy based on predictions
for different applications. obtained during deployment. Data collected during deployment,
coupled with additional or larger retrospective datasets, can
Challenges and Concluding Thoughts then be used to retrain and optimize the algorithm, followed by
For machine learning to play a transformative role in diagnosis a subsequent deployment-evaluation cycle. Evaluating contin-
and treatment, it is necessary to develop high-quality, well- uous learning systems—such as those we envision for health
curated datasets. High-quality datasets have several important monitoring that must adapt to changes in health status or
benefits: they improve the predictive power of machine learning habits—will likely require tightening this loop and use of data
methods while reducing the size of the data needed for training collected during the deployment phase to detect limitations or fail-
and the complexity of the learned representations. Famously, ures. Quantifying not only accuracy but also confidence intervals
machine learning approaches for image recognition accelerated is critical, as some uses of machine learning will be more tolerant
when ImageNet (Deng et al., 2009), a corpus of labeled and onto- to inexact predictions than others and confidence intervals can be
logically linked images, was introduced. Similar efforts in used by physicians to inform decision making. Iteratively training
biomedicine are needed across the variety the tasks where ma- and deploying machine learning applications poses regulatory
chine learning may be applied. challenges as most diagnostic and therapeutic tests assume
Creating high-quality datasets for machine learning applica- that models and data are fixed. When models are updated in
tions in diagnosis and treatment will require addressing tech- response to new data or adapted for new diagnoses or treat-
nical, legal, and economic issues that often result in siloed ments, ongoing evaluation is needed to ensure that predictions
biomedical data that are not standardized. As discussed above, remain accurate. Real or simulated datasets that are multi-modal,
federated learning provides a technical solution for combining expansive, and longitudinal will be needed to ensure robust eval-
data among siloed systems because no actual data movement uation of biomedical machine learning applications.
is necessary and individual privacy can be protected. Wearables While the challenges outlined above are significant, we are
and home devices provide a way to collect accurate data, and optimistic that they can be overcome. Further, we believe the
machine learning can be used as a preprocessing step to extract effort is worthwhile, as success promises a future of rigorous,
accurate analytic and clinical data from unstructured sources outcomes-based medicine with detection, diagnosis, and treat-
such as electronic health records and publications. Legal pro- ment strategies that are continuously adapted via machine
cedures must be developed for the secure management and learning to individual and environmental differences and that
analysis of private health information (PHI), and community and enable comprehensive health management.
legal standards that define the performance of these procedures
must be established. Biomedical institutions and individuals ACKNOWLEDGMENTS
must be incentivized to engage in data standardization and
J.G. and J.W.G. are supported by National Cancer Institute (U2C CA233280).
sharing. Similarly, insurers, the pharmaceutical industry, and J.G. is supported by National Cancer Institute (U24 CA231877). L.M.H. and
agencies that support biomedical research must be willing to J.W.G. are supported by the National Institutes of Health (U54 HG008100)
invest the infrastructure, data acquisition, and data curation and National Cancer Institute (U54 CA209988). J.G., L.M.H., and J.W.G. are
required to generate high quality data. supported by the Prospect Creek Foundation. J.W.G. is supported by the Su-
Approaches and incentives for data sharing that promote di- san G. Komen Foundation. L.M.H. is supported by the Breast Cancer
Research Foundation and The Jayne Koskinas Ted Giovanis Foundation.
versity in the datasets used for learning are needed as well.
This includes national and international data sharing standards
DECLARATION OF INTERESTS
that make it possible to obtain data from both major medical
centers and community clinics. It is likely, for instance, that ma- J.W.G. receives research support from Micron and ThermoFisher and has
chine learning applications that improve patient treatment stock in NVIDIA, Microsoft, Amazon, Google (Alphabet), and GE.

98 Cell 181, April 2, 2020


REFERENCES Chaudhary, K., Poirion, O.B., Lu, L., and Garmire, L.X. (2018). Deep Learning-
Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer.
AACR Project GENIE Consortium (2017). AACR Project GENIE: Powering Pre- Clin. Cancer Res. 24, 1248–1259.
cision Medicine through an International Consortium. Cancer Discov. 7,
Chen, K.M., Cofer, E.M., Zhou, J., and Troyanskaya, O.G. (2019a). Selene: a
818–831.
PyTorch-based deep learning library for sequence data. Nat. Methods 16,
Ahuja, A.S. (2019). The impact of artificial intelligence in medicine on the future 315–318.
role of the physician. PeerJ 7, e7702.
Chen, P.C., Liu, Y., and Peng, L. (2019b). How to develop machine learning
Andreoletti, G., Pal, L.R., Moult, J., and Brenner, S.E. (2019). Reports from the models for healthcare. Nat. Mater. 18, 410–414.
fifth edition of CAGI: The Critical Assessment of Genome Interpretation. Hum.
Ching, T., Himmelstein, D.S., Beaulieu-Jones, B.K., Kalinin, A.A., Do, B.T.,
Mutat. 40, 1197–1201.
Way, G.P., Ferrero, E., Agapow, P.-M., Zietz, M., Hoffman, M.M., et al.
Ardila, D., Kiraly, A.P., Bharadwaj, S., Choi, B., Reicher, J.J., Peng, L., Tse, D., (2018). Opportunities and obstacles for deep learning in biology and medicine.
Etemadi, M., Ye, W., Corrado, G., et al. (2019). End-to-end lung cancer J. R. Soc. Interface 15, 20170387.
screening with three-dimensional deep learning on low-dose chest computed
Chiu, Y.-C., Chen, H.H., Zhang, T., Zhang, S., Gorthi, A., Wang, L.-J., Huang,
tomography. Nat. Med. 25, 954–961.
Y., and Chen, Y. (2019). Predicting drug response of tumors from integrated
Arora, S., Venkataraman, V., Zhan, A., Donohue, S., Biglan, K.M., Dorsey, E.R., genomic profiles by deep neural networks. BMC Med. Genomics 12 (Suppl
and Little, M.A. (2015). Detecting and monitoring the symptoms of Parkinson’s 1 ), 18.
disease using smartphones: A pilot study. Parkinsonism Relat. Disord. 21,
650–653. Costello, J.C., Heiser, L.M., Georgii, E., Gönen, M., Menden, M.P., Wang, N.J.,
Bansal, M., Ammad-ud-din, M., Hintsanen, P., Khan, S.A., et al.; NCI DREAM
Artzi, N.S., Shilo, S., Hadar, E., Rossman, H., Barbash-Hazan, S., Ben-Har-
Community (2014). A community effort to assess and improve drug sensitivity
oush, A., Balicer, R.D., Feldman, B., Wiznitzer, A., and Segal, E. (2020). Predic-
prediction algorithms. Nat. Biotechnol. 32, 1202–1212.
tion of gestational diabetes based on nationwide electronic health records.
Nat. Med. 26, 71–76. Curtis, C., Shah, S.P., Chin, S.F., Turashvili, G., Rueda, O.M., Dunning, M.J.,
Speed, D., Lynch, A.G., Samarajiwa, S., Yuan, Y., et al.; METABRIC Group
Batmaz, Z., Yurekli, A., Bilge, A., and Kaleli, C. (2019). A review on deep
(2012). The genomic and transcriptomic architecture of 2,000 breast tumours
learning for recommender systems: challenges and remedies. Artif. Intell.
reveals novel subgroups. Nature 486, 346–352.
Rev. 52, 1–37.
Deng, J., Dong, W., Socher, R., Li, L., Kai, L., and Li, F.-F. (2009). ImageNet: A
Björnsson, B., Borrebaeck, C., Elander, N., Gasslander, T., Gawel, D.R., Gus-
large-scale hierarchical image database. In Proceedings of the 2009 IEEE
tafsson, M., Jörnsten, R., Lee, E.J., Li, X., Lilja, S., et al.; Swedish Digital Twin
Conference on Computer Vision and Pattern Recognition (IEEE), pp. 248–255.
Consortium (2019). Digital twins to personalize medicine. Genome Med. 12, 4.
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Dong, W., Wang, X., Xia, Z., Zhang, X., and Yang, H. (2018). A legacy of the
Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al. (2016). End to End ‘‘1% program’’ - The ‘‘Chinese Chapter’’ of the human genome reference
Learning for Self-Driving Cars. arXiv, arXiv:1604.07316. sequence. J. Genet. Genomics 45, 565–568.

Brown, B.P., Zhang, Y.-K., Westover, D., Yan, Y., Qiao, H., Huang, V., Du, Z., Ellis, M.J., Gillette, M., Carr, S.A., Paulovich, A.G., Smith, R.D., Rodland, K.K.,
Smith, J.A., Ross, J.S., Miller, V.A., et al. (2019). On-target resistance to the Townsend, R.R., Kinsinger, C., Mesri, M., Rodriguez, H., and Liebler, D.C.;
mutant-selective EGFR inhibitor osimertinib can develop in an allele specific Clinical Proteomic Tumor Analysis Consortium (CPTAC) (2013). Connecting
manner dependent on the original EGFR activating mutation. Clin. Cancer genomic alterations to cancer biology with proteomics: the NCI Clinical Prote-
Res. Published online February 22, 2019. https://doi.org/10.1158/1078- omic Tumor Analysis Consortium. Cancer Discov. 3, 1108–1112.
0432.CCR-18-3829. Espay, A.J., Bonato, P., Nahab, F.B., Maetzler, W., Dean, J.M., Klucken, J., Es-
Bumgarner, J.M., Lambert, C.T., Hussein, A.A., Cantillon, D.J., Baranowski, kofier, B.M., Merola, A., Horak, F., Lang, A.E., et al.; Movement Disorders So-
B., Wolski, K., Lindsay, B.D., Wazni, O.M., and Tarakji, K.G. (2018). Smart- ciety Task Force on Technology (2016). Technology in Parkinson’s disease:
watch Algorithm for Automated Detection of Atrial Fibrillation. J. Am. Coll. Car- Challenges and opportunities. Mov. Disord. 31, 1272–1282.
diol. 71, 2381–2388. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., and
Cai, C.J., Winter, S., Steiner, D., Wilcox, L., and Terry, M. (2019). ‘‘Hello AI’’: Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep
Uncovering the Onboarding Needs of Medical Practitioners for Human-AI neural networks. Nature 542, 115–118.
Collaborative Decision-Making. Proc. ACM Hum. Comput. Interact. 3, 104. Fu, Y., and Guo, J. (2018). Blood Cholesterol Monitoring With Smartphone as
Camacho, D.M., Collins, K.M., Powers, R.K., Costello, J.C., and Collins, J.J. Miniaturized Electrochemical Analyzer for Cardiovascular Disease Prevention.
(2018). Next-Generation Machine Learning for Biological Networks. Cell 173, IEEE Trans. Biomed. Circuits Syst. 12, 784–790.
1581–1592. Gao, F., Wang, W., Tan, M., Zhu, L., Zhang, Y., Fessler, E., Vermeulen, L., and
Car, J., Tan, W.S., Huang, Z., Sloot, P., and Franklin, B.D. (2017). eHealth in the Wang, X. (2019). DeepCC: a novel deep learning-based framework for cancer
future of medications management: personalisation, monitoring and adher- molecular subtype classification. Oncogenesis 8, 44.
ence. BMC Med. 15, 73.
Gerstung, M., Papaemmanuil, E., Martincorena, I., Bullinger, L., Gaidzik, V.I.,
Chan, J., Rea, T., Gollakota, S., and Sunshine, J.E. (2019). Contactless cardiac Paschka, P., Heuser, M., Thol, F., Bolli, N., Ganly, P., et al. (2017). Precision
arrest detection using smart devices. NPJ Digit. Med. 2, 52. oncology for acute myeloid leukemia using a knowledge bank approach.
Chang, S., Chiang, R., Wu, S., and Chang, W. (2016). A Context-Aware, Inter- Nat. Genet. 49, 332–340.
active M-Health System for Diabetics. IT Prof. 18, 14–22. Ginis, P., Nieuwboer, A., Dorfman, M., Ferrari, A., Gazit, E., Canning, C.G.,
Chang, P., Grinband, J., Weinberg, B.D., Bardis, M., Khy, M., Cadena, G., Su, Rocchi, L., Chiari, L., Hausdorff, J.M., and Mirelman, A. (2016). Feasibility
M.-Y., Cha, S., Filippi, C.G., Bota, D., et al. (2018a). Deep-Learning Convolu- and effects of home-based smartphone-delivered automated feedback
tional Neural Networks Accurately Classify Genetic Mutations in Gliomas. training for gait in people with Parkinson’s disease: A pilot randomized
AJNR Am. J. Neuroradiol. 39, 1201–1207. controlled trial. Parkinsonism Relat. Disord. 22, 28–34.
Chang, Y., Park, H., Yang, H.-J., Lee, S., Lee, K.-Y., Kim, T.S., Jung, J., and Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A.,
Shin, J.-M. (2018b). Cancer Drug Response Profile scan (CDRscan): A Deep Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al. (2016). Develop-
Learning Model That Predicts Drug Effectiveness from Cancer Genomic ment and Validation of a Deep Learning Algorithm for Detection of Diabetic
Signature. Sci. Rep. 8, 8857. Retinopathy in Retinal Fundus Photographs. JAMA 316, 2402–2410.

Cell 181, April 2, 2020 99


Heitzer, E., Haque, I.S., Roberts, C.E.S., and Speicher, M.R. (2019). Current Paparrizos, J., White, R.W., and Horvitz, E. (2016). Screening for Pancreatic
and future perspectives of liquid biopsies in genomics-driven oncology. Nat. Adenocarcinoma Using Signals From Web Search Logs: Feasibility Study
Rev. Genet. 20, 71–88. and Results. J. Oncol. Pract. 12, 737–744.
Huang, C., Clayton, E.A., Matyunina, L.V., McDonald, L.D., Benigno, B.B., Pereira, C.R., Weber, S.A.T., Hook, C., Rosa, G.H., and Papa, J.P. (2016).
Vannberg, F., and McDonald, J.F. (2018). Machine learning predicts individual Deep Learning-Aided Parkinson’s Disease Diagnosis from Handwritten Dy-
cancer patient responses to therapeutic drugs with high accuracy. Sci. Rep. namics. In Proceedings of the 29th SIBGRAPI Conference on Graphics, Pat-
8, 16444. terns and Images (SIBGRAPI) (IEEE), pp. 340–346.
Kang, S., Li, Q., Chen, Q., Zhou, Y., Park, S., Lee, G., Grimes, B., Krysan, K., Pons, E., Braun, L.M.M., Hunink, M.G.M., and Kors, J.A. (2016). Natural Lan-
Yu, M., Wang, W., et al. (2017). CancerLocator: non-invasive cancer diagnosis guage Processing in Radiology: A Systematic Review. Radiology 279,
and tissue-of-origin prediction using methylation profiles of cell-free DNA. 329–343.
Genome Biol. 18, 53. Rahman, M.M., Feng, Y., Yankeelov, T.E., and Oden, J.T. (2017). A fully
Kobayashi, Y., and Mitsudomi, T. (2016). Not all epidermal growth factor re- coupled space-time multiscale modeling framework for predicting tumor
ceptor mutations in lung cancer are created equal: Perspectives for individu- growth. Comput. Methods Appl. Mech. Eng. 320, 261–286.
alized treatment strategy. Cancer Sci. 107, 1179–1186. Rajkomar, A., Dean, J., and Kohane, I. (2019). Machine Learning in Medicine.
Konecný, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T., and Bacon, N. Engl. J. Med. 380, 1347–1358.
D. (2016). Federated learning: Strategies for improving communication effi- Rotow, J., and Bivona, T.G. (2017). Understanding and targeting resistance
ciency. arXiv, arXiv:161005492. mechanisms in NSCLC. Nat. Rev. Cancer 17, 637–658.
Kreimeyer, K., Foster, M., Pandey, A., Arya, N., Halford, G., Jones, S.F., For- Saez-Rodriguez, J., Costello, J.C., Friend, S.H., Kellen, M.R., Mangravite, L.,
shee, R., Walderhaug, M., and Botsis, T. (2017). Natural language processing Meyer, P., Norman, T., and Stolovitzky, G. (2016). Crowdsourcing biomedical
systems for capturing and standardizing unstructured clinical information: A research: leveraging communities as innovation engines. Nat. Rev. Genet. 17,
systematic review. J. Biomed. Inform. 73, 14–29. 470–486.
Kurnit, K.C., Ileana Dumbrava, E.E., Litzenburger, B.C., Khotskaya, Y.B., Sage Bionetworks (2020). DREAM Challenges. http://dreamchallenges.org/.
Johnson, A., Yap, T.A., Rodon, J., Zeng, J., Shufean, M.A., Bailey, A., et al.
Sahoo, P.K., Thakkar, H.K., and Lee, M.-Y. (2017). A Cardiac Early Warning
(2018). Precision Oncology Decision Support: Current Approaches And Stra-
System with Multi Channel SCG and ECG Monitoring for Mobile Health. Sen-
tegies For The Future. Clin. Cancer Res. 24, 2719–2731.
sors (Basel) 17, E711.
Laing, E.E., Möller-Levet, C.S., Dijk, D.-J., and Archer, S.N. (2019). Identifying
Sakamoto, M., Ikeyama, N., Yuki, M., and Ohkuma, M. (2018). Draft Genome
and validating blood mRNA biomarkers for acute and chronic insufficient sleep
Sequence of Faecalimonas umbilicata JCM 30896T, an Acetate-Producing
in humans: a machine learning approach. Sleep 42.. https://doi.org/10.1093/
Bacterium Isolated from Human Feces. Microbiol. Resour. Announc. 7,
sleep/zsy186.
e01091-18.
Lasso, G., Mayer, S.V., Winkelmann, E.R., Chu, T., Elliot, O., Patino-Galindo,
Shickel, B., Tighe, P.J., Bihorac, A., and Rashidi, P. (2018). Deep EHR: A Sur-
J.A., Park, K., Rabadan, R., Honig, B., and Shapira, S.D. (2019). A Structure-
vey of Recent Advances in Deep Learning Techniques for Electronic Health
Informed Atlas of Human-Virus Interactions. Cell 178, 1526–1541.
Record (EHR) Analysis. IEEE J. Biomed. Health Inform. 22, 1589–1604.
Lee, J., Sun, J., Wang, F., Wang, S., Jun, C.-H., and Jiang, X. (2018). Privacy-
Shilo, S., Rossman, H., and Segal, E. (2020). Axes of a revolution: challenges
Preserving Patient Similarity Learning in a Federated Environment: Develop-
and promises of big data in healthcare. Nat. Med. 26, 29–38.
ment and Analysis. JMIR Med. Inform. 6, e20.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanc-
Litjens, G., Sánchez, C.I., Timofeeva, N., Hermsen, M., Nagtegaal, I., Kovacs,
tot, M., Sifre, L., Kumaran, D., Graepel, T., et al. (2018). A general reinforce-
I., Hulsbergen-van de Kaa, C., Bult, P., van Ginneken, B., and van der Laak, J.
ment learning algorithm that masters chess, shogi, and Go through self-play.
(2016). Deep learning as a tool for increased accuracy and efficiency of histo-
Science 362, 1140–1144.
pathological diagnosis. Sci. Rep. 6, 26286.
Stokes, J.M., Yang, K., Swanson, K., Jin, W., Cubillos-Ruiz, A., Donghia, N.M.,
Liu, X., Faes, L., Kale, A.U., Wagner, S.K., Fu, D.J., Bruynseels, A., Mahen-
MacNair, C.R., French, S., Carfrae, L.A., Bloom-Ackerman, Z., et al. (2020). A
diran, T., Moraes, G., Shamdas, M., Kern, C., et al. (2019). A comparison of
Deep Learning Approach to Antibiotic Discovery. Cell 180, 688–702.
deep learning performance against health-care professionals in detecting dis-
eases from medical imaging: a systematic review and meta-analysis. Lancet. Toh, X., Tan, H., Liang, H., and Tan, H. (2016). Elderly medication adherence
Digit. Health 1, e271–e297. monitoring with the Internet of Things. In Proceedings of the 2016 IEEE Inter-
national Conference on Pervasive Computing and Communication Workshops
Lonini, L., Dai, A., Shawen, N., Simuni, T., Poon, C., Shimanovich, L., Daesch-
(PerCom Workshops) (IEEE), pp. 1–6.
ler, M., Ghaffari, R., Rogers, J.A., and Jayaraman, A. (2018). Wearable sensors
for Parkinson’s disease: which data are worth collecting for training symptom Tomczak, K., Czerwin ska, P., and Wiznerowicz, M. (2015). The Cancer
detection models. NPJ Digit. Med. 1, 64. Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp. On-
col. (Pozn.) 19 (1A), A68–A77.
McKinney, S.M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashra-
fian, H., Back, T., Chesus, M., Corrado, G.C., Darzi, A., et al. (2020). Interna- Tsherniak, A., Vazquez, F., Montgomery, P.G., Weir, B.A., Kryukov, G., Cow-
tional evaluation of an AI system for breast cancer screening. Nature 577, ley, G.S., Gill, S., Harrington, W.F., Pantel, S., Krill-Burger, J.M., et al. (2017).
89–94. Defining a Cancer Dependency Map. Cell 170, 564–576.

Metzcar, J., Wang, Y., Heiland, R., and Macklin, P. (2019). A Review of Cell- Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G.,
Based Computational Modeling in Cancer Biology. JCO Clin. Cancer Inform. Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. (2001). The sequence
3, 1–13. of the human genome. Science 291, 1304–1351.

Micheletti, J.M., Hendrick, A.M., Khan, F.N., Ziemer, D.C., and Pasquel, F.J. Vermeulen, R., Schymanski, E.L., Barabási, A.-L., and Miller, G.W. (2020). The
(2016). Current and Next Generation Portable Screening Devices for Diabetic exposome and health: Where chemistry meets biology. Science 367, 392–396.
Retinopathy. J. Diabetes Sci. Technol. 10, 295–300. Way, G.P., and Greene, C.S. (2019). Discovering Pathway and Cell Type Sig-
Mobadersany, P., Yousefi, S., Amgad, M., Gutman, D.A., Barnholtz-Sloan, natures in Transcriptomic Compendia with Machine Learning. Annu. Rev. Bio-
J.S., Velázquez Vega, J.E., Brat, D.J., and Cooper, L.A.D. (2018). Predicting med. Data Sci. 2, 1–17.
cancer outcomes from histology and genomics using convolutional networks. Way, G.P., Sanchez-Vega, F., La, K., Armenia, J., Chatila, W.K., Luna, A.,
Proc. Natl. Acad. Sci. USA 115, E2970–E2979. Sander, C., Cherniack, A.D., Mina, M., Ciriello, G., et al.; Cancer Genome Atlas

100 Cell 181, April 2, 2020


Research Network (2018). Machine Learning Detects Pan-cancer Ras (2015). RNA splicing. The human splicing code reveals new insights into the
Pathway Activation in The Cancer Genome Atlas. Cell Rep. 23, 172–180. genetic determinants of disease. Science 347, 1254806.
White, R.W., and Horvitz, E. (2017). Evaluation of the Feasibility of Screening Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019). Federated Machine Learning:
Patients for Early Signs of Lung Carcinoma in Web Search Logs. JAMA Oncol. Concept and Applications. ACM Trans. Intell. Syst. Technol. 10, 12.
3, 398–401.
Wu, N., Phang, J., Park, J., Shen, Y., Huang, Z., Zorin, M., Jastrzebski, S., Zhang, J., Bajari, R., Andric, D., Gerthoffert, F., Lepsa, A., Nahal-Bose, H.,
Fevry, T., Katsnelson, J., Kim, E., et al. (2019). Deep Neural Networks Improve Stein, L.D., and Ferretti, V. (2019). The International Cancer Genome Con-
Radiologists’ Performance in Breast Cancer Screening. IEEE Trans. Med. Im- sortium Data Portal. Nat. Biotechnol. 37, 367–369.
aging PP, 1-1. Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., and Hoffman,
Xiong, H.Y., Alipanahi, B., Lee, L.J., Bretschneider, H., Merico, D., Yuen, M.M. (2019). Machine Learning for Integrating Data in Biology and Medicine:
R.K.C., Hua, Y., Gueroussov, S., Najafabadi, H.S., Hughes, T.R., et al. Principles, Practice, and Opportunities. Inf. Fusion 50, 71–91.

Cell 181, April 2, 2020 101

You might also like