Wiens 2017

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Clinical Infectious Diseases

INVITED ARTICLE
HEALTHCARE EPIDEMIOLOGY: Robert A. Weinstein, Section Editor

Machine Learning for Healthcare: On the Verge of a Major


Shift in Healthcare Epidemiology
Jenna Wiens1 and Erica S. Shenoy2,3,4
1
Computer Science and Engineering, University of Michigan, Ann Arbor; and 2Infection Control Unit and 3Division of Infectious Diseases, Department of Medicine, Massachusetts General Hospital,
and 4Harvard Medical School, Boston, Massachusetts

The increasing availability of electronic health data presents a major opportunity in healthcare for both discovery and practical
applications to improve healthcare. However, for healthcare epidemiologists to best use these data, computational techniques that
can handle large complex datasets are required. Machine learning (ML), the study of tools and methods for identifying patterns in

Downloaded from https://academic.oup.com/cid/article-abstract/66/1/149/4085880 by guest on 18 September 2018


data, can help. The appropriate application of ML to these data promises to transform patient risk stratification broadly in the field
of medicine and especially in infectious diseases. This, in turn, could lead to targeted interventions that reduce the spread of health-
care-associated pathogens. In this review, we begin with an introduction to the basics of ML. We then move on to discuss how ML
can transform healthcare epidemiology, providing examples of successful applications. Finally, we present special considerations for
those healthcare epidemiologists who want to use and apply ML.
Keywords. machine learning, patient risk stratification, healthcare epidemiologist, data-driven, computation.

INTRODUCTION of successful research applications. Finally, we describe some of


Increasingly, healthcare epidemiologists must process and inter- the practical considerations for design and implementation of
pret large amounts of complex data [1]. As the role of healthcare ML applied to HE.
epidemiologists has expanded, so too has the pervasiveness of
electronic health data [2]. The availability of large quantities WHAT IS MACHINE LEARNING?
of high-quality patient- and facility-level data has generated The definition of ML is broad. ML is the study of tools and
new opportunities. In particular, these data could lead to an methods for identifying patterns in data. These patterns can
improved understanding of risk factors for development of then be used to either increase our understanding of the current
healthcare-associated infections (HAIs), improved patient risk world (eg, identify risk factors for infection) or make predic-
stratification, and identification of pathways for intra- and tions about the future (eg, predict who will become infected).
interfacility spread of infectious diseases—all of which would ML draws on concepts from many fields including computer
allow for targeted prevention approaches. science, statistics, and optimization. At their core, almost all
In the past, a large fraction of clinical data were ignored (or ML problems can be formulated as an optimization problem
not collected at all). This limitation was due to both the size and with respect to a dataset. In such settings, the goal is to find
complexity of the data and the absence of techniques for collect- (or “learn” in ML parlance) a model that best explains the data
ing and storing such data. These data are frequently underused (Figure 1). While there are many different types of ML, most
and undervalued; however, new and improved methods for data applications fall into 1 of 3 categories: supervised, unsuper-
collection and storage (eg, electronic health records) provide vised, or reinforcement learning.
opportunities to tackle the issue of analysis. In particular, machine Here, we focus on supervised learning, a setting in which the
learning (ML) has begun to infiltrate the clinical literature broadly. data are “labeled” according to a specific outcome of interest (eg,
The appropriate application of ML in healthcare epidemiology patients are either infected or not infected). The algorithm then
(HE) promises returns on the field’s investment in data collection. learns a mapping from a set of covariates (eg, patient demo-
In this review, we begin by describing the basics of ML and graphics) to the outcome. This part is performed on the train-
then move on to discuss how it applies to HE, providing examples ing data. Once learned, this mapping can be applied to new test
data either for identification or prediction tasks. For example,
Received 1 May 2017; editorial decision 4 August 2017; accepted 14 August 2017; published given a dataset of patients described by their demographics and
online August 21, 2017.
Correspondence: J. Wiens, 2260 Hayward Street, University of Michigan, Ann Arbor, MI
admission details, one can try to predict the specific outcome of
48109, ([email protected]). 30-day readmission.
Clinical Infectious Diseases®  2018;66(1):149–53 Many different learning algorithms exist to accomplish
© The Author(s) 2017. Published by Oxford University Press for the Infectious Diseases Society this task (eg, logistic regression, decision trees, ensemble
of America. All rights reserved. For permissions, e-mail: [email protected].
DOI: 10.1093/cid/cix731 approaches, and deep neural networks). These techniques differ

HEALTHCARE EPIDEMIOLOGY • CID 2018:66 (1 January) • 149


of clinical tasks [8–22], from identification/diagnostic tasks (eg,
automatic classification of skin lesions or arrhythmia detection
[23, 24]) to prediction tasks (eg, predicting 30-day readmissions
[25]). While more research is required before we will under-
stand the full clinical impact of this work, efforts are already
underway to integrate ML tools into clinical practice [26–29].
Figure 1. Traditional vs. machine learning (ML) approach. In a traditional As we continue to amass more data in HE, we will be better
approach to data analysis, one starts with the model as input to the machine. In an
positioned to take advantage of these data and recent advances
ML (or data-driven) approach, one starts with the data and outputs a model that can
then be applied to new data. in ML. Below, we describe the impact ML is beginning to have
on the field of infectious disease and HE more specifically.

HOW WILL ML AFFECT INFECTIOUS DISEASE (AND


in their underlying objective function and constraints. While MORE SPECIFICALLY HE)?
closely tied to traditional statistics, ML-based analyses often
The applications of ML in infectious disease are diverse and

Downloaded from https://academic.oup.com/cid/article-abstract/66/1/149/4085880 by guest on 18 September 2018


seek nonlinear relationships among hundreds or thousands
of covariates. Unsurprisingly, such techniques do best when a include risk stratification for specific infections (eg, specific
large amount of “training” data is available (ie, when there are HAIs), identifying the relative contribution of specific risk fac-
many examples to learn from). Here, one aims to learn a model tors to overall risk, understanding pathogen–host interactions,
that will generalize beyond the data one has already seen. The and predicting the emergence and spread of infectious diseases.
goal is generalization not memorization. In many cases, espe- Here, we review 4 recent projects that highlight the diversity of
cially in settings with hundreds or thousands of covariates (ie, applications in HE and infectious disease.
high-dimensional settings), it may be straightforward to learn
Predicting Risk of Nosocomial Clostridium difficile Infection
a model that works well when applied to the training data but (CDI) [30–33]
fails when applied to never-before-seen data. In such cases, the Despite efforts to reduce incidence, HAIs remain prevalent,
model is said to have “overfit” the training data (ie, it has simply in part, because we lack an effective clinical tool for accur-
memorized the data). Different regularization methods exist to ately measuring patient risk. Along these lines, researchers
deal with such issues and depend on the underlying learning have sought to develop models for predicting patient risk of
framework. For example, in a least squares regression setting, CDI. ML-driven approaches can successfully leverage the
L2 regularization is commonly applied (ie, ridge regression). entire contents of the electronic health record (EHR). These
These techniques push algorithms toward simpler models. The clinical data contain information regarding medications, pro-
optimization loosely follows Occam’s razor, preferring simpler cedures, locations, healthcare staff, lab results, vital signs,
models over more complex ones. demographics, patient history, and admission details. ML
As a field, ML has experienced a number of successes in recent techniques learn to map these data to a value that estimates
years and continues to have an impact across several disciplines. the patient’s probability of CDI. Although more complex than
The common thread across these disciplines is the availability of low-dimensional tools for calculating patient risk, models that
data. For example, the computer vision community (ie, the field of leverage the richness of the EHR can be significantly more
computer science focused on image-related tasks) has benefited accurate [33]. Such models, based on thousands of variables,
tremendously from recent advances in ML. For many image rec- have been extended to change over the course of an admis-
ognition tasks, the performance of ML algorithms has approached sion, capturing how risk factors change over time [34]. These
or even surpassed that of humans [3, 4]. These advances have, in time-varying models could be incorporated into an EHR sys-
part, been driven by large image databases (eg, “ImageNet” con- tem, using streaming data to generate daily risk estimates for
sists of more than 14 million images [5]). In addition to providing each inpatient.
training data, such databases serve as a resource against which
researchers can benchmark their proposed algorithms. ML has Predicting Reservoirs of Zoonotic Diseases [35]
also led to recent breakthroughs in machine translation [6, 7]. Zoonotic diseases account for billions of human infections
Models that take an input sequence (eg, an English sentence) and millions of deaths per year globally [36]. Researchers have
and generate the target sequence (eg, the sentence translated to applied ML to datasets that contain information on rodent
French). Such models are trained on tens of millions of sentence species that carry zoonotic pathogens [35]. Using nearly 100
pairs. Here, the success came from training “deeper” models predictor variables (eg, lifespan, habitat), the authors identi-
(ie, more complex models) capable of capturing context within fied reservoir status with high accuracy. Furthermore, their
sequences. model predicted new hyperreservoir species (ie, those identi-
With the recent increase in availability of clinically relevant fied as harboring 2 or more zoonotic pathogens). The ability to
datasets, researchers have applied ML techniques to a wide range identify geographic areas with higher likelihood of harboring

150 • CID 2018:66 (1 January) • HEALTHCARE EPIDEMIOLOGY


rodent reservoirs of new or emerging zoonoses could help dir- It Starts with Data
ect surveillance, vector control, and research into vaccines and Data may come from a number of sources. The examples men-
therapeutics. tioned in the section above are not specific to one particular
dataset or even one data type. Researchers have successfully
Predicting Clinical Outcomes in Ebola Virus Disease (EVD) applied ML to clinical notes [20], physiological waveforms [10,
Infection Using Initial Clinical Symptoms [37] 43], structured EHR data [44], radiologic images [45], and even
Though ML is often applied to large datasets, one recent study unstructured data from publications [13]. Clinicians interested
successfully applied ML techniques to a limited clinical data- in using EHR data for ML may engage leadership within their
set from a small patient cohort. The authors learned a model institution to both obtain access to institutional data and to
to predict clinical outcomes in patients presenting with Ebola establish the resources to organize the data. Clinicians should
Virus Disease during the 2013–2016 West African epidemic. not underestimate the amount of time required for this step and
Using a publicly available de-identified dataset, they accurately should also be aware that clinical insight throughout the pro-
predicted outcome of infection with only a few clinical symp- cess is essential.
toms and laboratory results.

Downloaded from https://academic.oup.com/cid/article-abstract/66/1/149/4085880 by guest on 18 September 2018


Sharable Data Are Key
Predicting Patients at Greatest Risk of Developing Septic Shared datasets serve an important purpose by facilitating com-
Shock [18]
parison of different ML approaches to specific clinical prob-
Prediction holds the promise of early intervention. In sepsis,
lems. Without a shared dataset, it becomes difficult to compare
early intervention can reduce mortality in patients who go on to
methods in a meaningful way.
develop septic shock [38]. Using publicly available data stored
in the MIMIC-II Clinical Database (described more below),
The Data Will Be Messy
researchers learned to predict with high sensitivity which
Healthcare practitioners are well aware of the extent of inconsist-
patients were likely to develop septic shock. Importantly, the
encies, inaccuracies, and errors present in health data, in particu-
prediction could be made at a median of more than a day prior
lar, clinical notes. Often, the vast majority of such projects are
to onset of septic shock, providing clinicians sufficient time to
dedicated to “data wrangling,” that is, data extraction and pre-
potentially prevent disease or mitigate its severity.
processing. While the adage “garbage in garbage out” still holds,
These applications of ML could, in theory, facilitate the tar-
no amount of ML can identify relationships not present in the
geting of specific interventions to high-risk groups. However,
data. When the size of the data grows, in particular, the number
the potential positive impact of ML on the field of infectious
of examples, it can still be possible to identify a signal, despite the
disease goes well beyond facilitating targeted interventions. In
presence of noise. Techniques like regularization (see above) and
particular, such models could be used to design more efficient
a held-out test set (see below) can help identify whether or not
clinical trials. Often, clinical trials can be underpowered and
there’s enough of a signal to learn meaningful relationships.
inconclusive because a small fraction of the study population
experiences the outcome of interest. ML-based risk stratifica- Choose the Right Target
tion models could help identify patients at several times the When choosing the target (ie, outcome of interest), it is impor-
baseline risk, making it possible to have adequately powered tant that one has access to accurate data regarding that target.
efficacy results with fewer enrolled patients. ML models can For example, if predicting the development of CDI is the tar-
also be used to help generate testable hypotheses. Although get, one must know which patients developed CDI in order to
the relationships uncovered by ML models are not necessarily develop the model. Sometimes, it is impossible to obtain com-
causal, study of the model can generate hypotheses. Further plete certainty (eg, not all laboratory tests are 100% accurate).
investigation of such hypotheses could then lead to new find- ML techniques can, however, handle the presence of some
ings related to disease risk. uncertainty in the data. In addition, it is important to remem-
ber that the outcome used during training is the outcome that
MACHINE LEARNING IN HE: A USER’S GUIDE—SPECIAL the model is learning to predict. For example, we may want to
CONSIDERATIONS, CHALLENGES, AND PITFALLS
predict risk of CDI. However, since not all patients are tested, in
Technical and methodological details about performing ML reality we are predicting risk of a positive laboratory result for
analyses is beyond the scope of this review. For an in-depth CDI. This distinction is subtle but important. In particular, if
introduction to ML, we refer the reader to several excellent the hospital were to change its testing protocol, then the predic-
resources [39–42]. Rather than presenting a detailed user’s tive performance of an existing model may change.
guide, here, we focus on a few special considerations/require-
ments in the context of patient risk stratification and HE (ie, for Keep a Held-Out Test Set
settings in which the goal is to map patient data to a continuous As mentioned above, because such analyses often deal with a
value representing patient risk for a specific outcome). large number of covariates, one can easily overfit, obtaining a

HEALTHCARE EPIDEMIOLOGY • CID 2018:66 (1 January) • 151


model that works well on the training set but does not general- of these teams is likely to comprise individuals with expertise
ize. Thus, it is important to split one’s data into separate training in infectious disease, statistics, optimization, and computer sci-
and test sets (eg, 80% of the data is used for training and 20% ence. For studies that use EHR data, individuals with expertise
for testing). Use the training set for model selection, and hold in clinical data architecture are essential to team success. Such
the test set aside for final model evaluation. One may split the work cannot take place in the isolation of a single department
data multiple times at random or choose a temporal split (eg, or discipline. ML experts are unlikely to make a meaningful
training on data from 2010–2014, testing on data from 2015). clinical contribution using ML without close collaboration with
By splitting the data temporally, one can estimate how changes a clinical expert. Conversely, although open-source ML tools
over time may affect predictive performance. exist, without a good understanding of the underlying algo-
rithms, the misapplication of ML to clinical data can lead to
Beware of Data Leakage misleading results and incorrect conclusions.
Beware of results that are too good to be true. In cases where While the increased availability of data and ML tools holds
the discriminative power is well beyond that of humans, there the promise of improved patient outcomes, we should pro-
is often some form of “data leakage.” For example, one of the ceed cautiously. To date, ML has featured more prominently in

Downloaded from https://academic.oup.com/cid/article-abstract/66/1/149/4085880 by guest on 18 September 2018


covariates may accidentally encode the outcome (eg, receipt of research than in practice. At the point at which ML has proven
empiric oral vancomycin probably indicates that a clinician has efficacy in HE, additional questions will remain in translation
already diagnosed CDI). This type of potential pitfall makes it to practice. These include training and education of healthcare
important to “look inside” the model to try to understand why epidemiologists and creation and maintenance of ML tools and
it is making the predictions it is or test the model prospectively. applications. Barriers to implementation may be expected to
vary by institution size, resources, and interest in the technol-
Good Accuracy Isn’t Enough ogy. More research is required before we will fully understand
When evaluating predictive performance, it is important to the good, the bad, and the unintended consequences of ML in
keep in mind the clinical task of interest. For example, if the HE [47].
goal is to learn a model to predict daily risk of CDI, then the
model should be applied daily to the test data, rather than just CONCLUSIONS
prior to the event of interest. In addition, both calibration (ie,
how well the estimate risk maps to actual risk) and discrim- ML has resulted in important contributions to a number of
inative performance (ie, how well the model distinguishes disciplines in recent years, including vision and natural lan-
high-risk from low-risk patients) are important to consider. guage processing. In these fields, more complex models can
Finally, the transparency (ie, interpretability) of one’s model take advantage of the large amount of existing training data (eg,
can be as important as its accuracy. A black-box model that images in vision or sentences in natural language). Similarly, we
only tells a user who is at risk may be less actionable than a are on the verge of a major shift in HE. Through the appropri-
transparent model that tells a user why the patient is at risk. ate application of ML to increasingly available electronic health
A model’s ability to explain its predictions can help identify data—including genomic data—healthcare epidemiologists will
“bugs” or data leakage. In addition, it can point researchers to be able to better understand the underlying risk for acquisi-
testable hypotheses that have biological plausibility. tion of infectious diseases and transmission pathways, develop
targeted interventions, and reduce HAIs. While powerful, it is
important to remember that ML cannot identify relationships
Hospital-specific Models Using Generalizable Methods
that are not present in the data. Moreover, ML does not replace
In the past, researchers have most often aimed to learn mod-
the need for standard statistical analyses or randomized, control
els that generalize across hospitals or healthcare settings. Such
trials. Instead, ML can serve as a tool to augment HE’s current
models may do well on average but can perform poorly when
toolbox. Going forward, the greatest impact will come from
applied to specific institutions. This limitation is, in part,
interdisciplinary teams that work together to make sense of the
because of institutional differences in the way data are collected
data.
and stored [46]. Rather than seeking models that generalize
across all hospitals, we should seek generalizable methods that Notes
can be used to generate institution-specific models. Such an Acknowledgments. We acknowledge John Guttag, PhD, and David
approach allows institutions to train models specific to their C. Hooper, MD, for thoughtful review of the manuscript, and Erin Ryan,
data collection practices and patient populations. MPH, CCRP, for assistance with manuscript preparation.
Financial support. This work was supported by the National Institute
of Allergy and Infectious Diseases (NIAID) of the National Institutes of
Health (NIH; K01AI110524); the the Massachusetts General Hospital-
It Takes a Team
Massachusetts Institute of Technology Grand Challenge to E. S. S.; and
Finally, and most importantly, applied ML in HE requires teams the National Science Foundation (IIS-1553146) and NIAID of NIH
composed of experts from a variety of disciplines. Leadership (U01AI124255) to J. W.

152 • CID 2018:66 (1 January) • HEALTHCARE EPIDEMIOLOGY


Potential conflict of interest. All authors: No reported conflicts of inter- 23. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin
est. All authors have submitted the ICMJE Form for Disclosure of Potential cancer with deep neural networks. Nature 2017; 542:115–8.
Conflicts of Interest. Conflicts that the editors consider relevant to the con- 24. Rajpurkar P, Hannun AY, Haghpanahi M, Bourn C, Ng AY. Cardiologist-level
arrhythmia detection with convolutional neural networks. arXiv preprint
tent of the manuscript have been disclosed.
arXiv:1707.01836, 2017.
25. Bayati M, Braverman M, Gillam M, et al. Data-driven decisions for reducing read-
References missions for heart failure: general methodology and case study. PLoS One 2014;
1. Kaye KS, Anderson DJ, Cook E, et al. Guidance for infection prevention and 9:e109264.
healthcare epidemiology programs: healthcare epidemiologist skills and compe- 26. DeepMindHealth: Helping clinicians get patients from test to treatment, faster.
tencies. Infect Control Hosp Epidemiol 2015; 36:369–80. 2017; Available from: https://deepmind.com/applied/deepmind-health/.
2. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: 27. Watson for Oncology. Oncology and Genomics 2017; Available from: https://
using analytics to identify and manage high-risk and high-cost patients. Health www.ibm.com/watson/health/oncology-and-genomics/oncology/.
Aff (Millwood) 2014; 33:1123–31. 28. Caradigm: a GE Healthcare company. 2017; Available from: https://www.cara-
3. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human- digm.com/en-us/.
level performance on imagenet classification. In IEEE International Conference 29. InnerEye - Assistive AI for Cancer Treatment. 2017 [cited 2017; Available from:
on Computer Vision (ICCV) 2015. 2015. https://www.microsoft.com/en-us/research/project/medical-image-analysis/#.
4. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In 30. Wiens J, Guttag J, Horvitz E. Patient risk stratification with time-varying parame-
Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on. ters: A multitask learning approach. J Mach Learn Res 2016; 17:1–23.
2016. 31. Wiens J, Guttag J, Horvitz E. A study in transfer learning: leveraging data from

Downloaded from https://academic.oup.com/cid/article-abstract/66/1/149/4085880 by guest on 18 September 2018


5. ImageNet. 2016 [cited 2017; Available from: http://www.image-net.org/. multiple hospitals to enhance hospital-specific predictions. J Am Med Inform
6. Zhou J, Cao Y, Wang X, Li P, Xu W. Deep recurrent models with fast-forward Assoc 2014; 21:699–706.
connections for neural machine translation. arXiv preprint arXiv:1606.04199, 32. Wiens J, Horvitz E, Guttag JV. Patient risk stratification for hospital-associated
2016. c. diff as a time-series classification task. in Advances in Neural Information
7. Lewis-Kraus G. The great AI awakening. In The New York Times Magazine. 2016. Processing Systems. 2012.
8. Opmeer BC. Electronic health records as sources of research data. JAMA 2016; 33. Wiens J, Campbell WN, Franklin ES, Guttag JV, Horvitz E. Learning data-driven
315:201–2. patient risk stratification models for Clostridium difficile. Open Forum Infect Dis
9. Roski J, Bo-Linn GW, Andrews TA. Creating value in health care through big data: 2014; 1:ofu045.
opportunities and policy implications. Health Aff (Millwood) 2014; 33:1115–22. 34. Wiens J, Guttag J, Horvitz E. Patient risk stratification with time-varying parame-
10. Shoeb AH, Guttag J. Application of machine learning to epileptic seizure detec- ters: a multitask learning approach. J Mach Learn Res 2016; 17:2797–819.
tion. Proceedings of the 27th International Conference on Machine Learning 35. Han BA, Schmidt JP, Bowden SE, Drake JM. Rodent reservoirs of future zoonotic
(ICML-10), 2010: p. 975–82. diseases. Proc Natl Acad Sci U S A 2015; 112:7039–44.
11. Saria S, Rajani AK, Gould J, Koller D, Penn AA. Integration of early physiological 36. Bank TW. Zoonotic Disease Prevention and Control, One Health, and the Role of
responses predicts later illness severity in preterm infants. Sci Transl Med 2010; the World Bank. 2015.
2:48ra65. 37. Colubri A, Silver T, Fradet T, Retzepi K, Fry B, Sabeti P. Transforming clinical data
12. Ghassemi M, et al., A Multivariate Timeseries Modeling Approach to Severity of into actionable prognosis models: machine-learning framework and field-de-
Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical ployable app to predict outcome of Ebola patients. PLoS Negl Trop Dis 2016;
Data. Proceedings of the AAAI Conference on Artificial Intelligence. AAAI 10:e0004549.
Conference on Artificial Intelligence, 2015. 2015: p. 446–453. 38. Dellinger RP, Levy MM, Rhodes A, et al; Surviving Sepsis Campaign Guidelines
13. Wallace BC, Dahabreh IJ, Trikalinos TA, Barton Laws M, Wilson I, Charniak Committee including the Pediatric Subgroup. Surviving sepsis campaign: inter-
E. Identifying differences in physician communication styles with a log-linear national guidelines for management of severe sepsis and septic shock: 2012. Crit
transition component model. In Twenty-Eighhth AAAI Conference on Artificial Care Med 2013; 41:580–637.
Intelligence. 2014. 39. An introduction to machine learning with scikit-learn. 2016 [cited 2017; Available
14. Visweswaran S, Cooper GF. Learning instance-specific predictive models. J Mach from: http://scikit-learn.org/stable/tutorial/basic/tutorial.html.
Learn Res 2010; 11:3333–69. 40. Pedregosa F, Varoquaux G, Gramfortet A, et al. Scikit-learn: Machine learning in
15. Wang X, Sontag D, Wang F. Unsupervised learning of disease progression models. Python. J Mach Learn Res 2011; 12:2825–30.
In 20th ACMSIGKDD International Conference on Knowledge Discovery and 41. In-depth introduction to machine learning in 15 hours of expert vid-
Data Mining. 2014. eos. 2014 [cited 2017; Available from: http://www.dataschool.
16. Kale DC, Gong D, Che Z, et al. An examination of multivariate time series hash- io/15-hours-of-expert-machine-learning-videos/.
ing with applications to health car. In IEEE International Conference on Data 42. Ng A. Machine learning. [cited 2017; Online course]. Available from: https://
Mining (ICDM). 2014. p. 260–69. www.coursera.org/learn/machine-learning.
17. Wiens J, Guttag J. Active learning applied to patient-adaptive heartbeat classi- 43. Saria S, Rajani AK, Gould J, Koller D, Penn AA. Integration of early physiological
fication. Advances in Neural Information Processing Systems (NIPS), 2010: responses predicts later illness severity in preterm infants. Sci Transl Med 2010;
p. 2442–50. 2:48ra65.
18. Henry KE. A targeted real-time early warning score (TREWScore) for septic 44. Ghassemi M, Pimentel MAF, Naumann T, Brennan T, Clifton DA, Szolovits P, Feng
shock. Sci Transl Med 2015; 7:299ra122–299ra122. M. A multivariate time series modeling approach to severity of illness assessment
19. Wallace BC, Kuiper J, Sharma A, Zhu MB, Marshall IJ. Extracting PICO sentences and forecasting in icu with sparse, heterogeneous clinical data. in Proceedings of
from clinical trial reports using supervised distant supervision. J Mach Learn Res the... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial
2016; 17. Intelligence. 2015. NIH Public Access.
20. Rumshisky A, Ghassemi M, Naumann T, et al. Predicting early psychiatric read- 45. Ahmed B, Thesen T, Blackmon KE, Kuzniecky R, Devinsky O, Brodley CE.
mission with natural language processing of narrative discharge summaries. Decrypting cryptogenic epilepsy: semi-supervised hierarchical conditional ran-
Transl Psychiatry 2016; 6:e921. dom fields for detecting cortical lesions in MRI-negative patients. J Mach Learn
21. Wallace BC, Kuiper J, Sharma A, Zhu MB, Marshall IJ. Extracting PICO sentences Res 2016; 17:3885–9–14.
from clinical trial reports using supervised distant supervision. J Mach Learn Res 46. Wiens J, Guttag J, Horvitz E. A study in transfer learning: leveraging data from
2016; 17:1–25. multiple hospitals to enhance hospital-specific predictions. J Am Med Inform
22. Rumshisky A, Ghassemi M, Naumann T, et al. Predicting early psychiatric read- Assoc 2014; 21:699–706.
mission with natural language processing of narrative discharge summaries. 47. Cabitza F, Rasoini R, Gensini G. Unintended consequences of machine learning
Transl Psychiatry 2016; 6:e921. in medicine. JAMA 2017; 318:517–8.

HEALTHCARE EPIDEMIOLOGY • CID 2018:66 (1 January) • 153

You might also like