A Machine Learning Approach To Management of Heart Failure Populations

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

JACC: HEART FAILURE VOL. -, NO.

-, 2020
ª 2020 BY THE AMERICAN COLLEGE OF CARDIOLOGY FOUNDATION

PUBLISHED BY ELSEVIER

A Machine Learning Approach to


Management of Heart Failure Populations
Linyuan Jing, PHD,a Alvaro E. Ulloa Cerna, PHD,a Christopher W. Good, DO,b Nathan M. Sauers, PHARMD,c
Gargi Schneider, MD,d Dustin N. Hartzel, BS,e Joseph B. Leader, BA,e H. Lester Kirchner, PHD,f Yirui Hu, PHD,f
David M. Riviello, BS,g Joshua V. Stough, PHD,a,h Seth Gazes, MS,c Allyson Haggerty, MBA,a
Sushravya Raghunath, PHD,a Brendan J. Carry, MD,b Christopher M. Haggerty, PHD,a,b,*
Brandon K. Fornwalt, MD, PHDa,b,i,*

ABSTRACT

BACKGROUND Heart failure is a prevalent, costly disease for which new value-based payment models demand opti-
mized population management strategies.

OBJECTIVES This study sought to generate a strategy for managing populations of patients with heart failure by
leveraging large clinical datasets and machine learning.

METHODS Geisinger electronic health record data were used to train machine learning models to predict 1-year all-
cause mortality in 26,971 patients with heart failure who underwent 276,819 clinical episodes. There were 26 clinical
variables (demographics, laboratory test results, medications), 90 diagnostic codes, 41 electrocardiogram measurements
and patterns, 44 echocardiographic measurements, and 8 evidence-based “care gaps”: flu vaccine, blood pressure
of <130/80 mm Hg, A1c of <8%, cardiac resynchronization therapy, and active medications (active angiotensin-
converting enzyme inhibitor/angiotensin II receptor blocker/angiotensin receptor-neprilysin inhibitor, aldosterone re-
ceptor antagonist, hydralazine, and evidence-based beta-blocker) were collected. Care gaps represented actionable
variables for which associations with all-cause mortality were modeled from retrospective data and then used to predict
the benefit of prospective interventions in 13,238 currently living patients.

RESULTS Machine learning models achieved areas under the receiver-operating characteristic curve (AUCs) of 0.74 to
0.77 in a split-by-year training/test scheme, with the nonlinear XGBoost model (AUC: 0.77) outperforming linear logistic
regression (AUC: 0.74). Out of 13,238 currently living patients, 2,844 were predicted to die within a year, and closing all
care gaps was predicted to save 231 of these lives. Prioritizing patients for intervention by using the predicted reduction
in 1-year mortality risk outperformed all other priority rankings (e.g., random selection or Seattle Heart Failure risk score).

CONCLUSIONS Machine learning can be used to priority-rank patients most likely to benefit from interventions to
optimize evidence-based therapies. This approach may prove useful for optimizing heart failure population health
management teams within value-based payment models. (J Am Coll Cardiol HF 2020;-:-–-)
© 2020 by the American College of Cardiology Foundation.

From the aDepartment of Translational Data Science and Informatics, Geisinger, Danville, Pennsylvania; bHeart Institute, Gei-
singer, Danville, Pennsylvania; cCenter for Pharmacy Innovation and Outcomes, Geisinger, Danville, Pennsylvania; dDepartment
of Medicine, Geisinger, Danville, Pennsylvania; ePhenomic Analytics and Clinical Data Core, Geisinger, Danville, Pennsylvania;
f
Department of Population Health Sciences, Geisinger, Danville, Pennsylvania; gSteele Institute for Health Innovation, Geisinger,
Danville, Pennsylvania; hDepartment of Computer Science, Bucknell University, Lewisburg, Pennsylvania; and the iDepartment of
Radiology, Geisinger, Danville, Pennsylvania. *Drs. Haggerty and Fornwalt contributed equally to this work and are joint senior
authors. This work was supported by a Quality Fund Award from Geisinger Health Plan. Geisinger receives funding from Tempus
for ongoing development of predictive modeling technology and commercialization. Tempus and Geisinger have jointly applied
for a patent related to the work. None of the Geisinger authors has ownership interest in any of the intellectual property resulting
from the partnership. The authors have reported that they have no relationships relevant to the contents of this paper to disclose.

Manuscript received July 10, 2019; revised manuscript received January 2, 2020, accepted January 2, 2020.

ISSN 2213-1779/$36.00 https://doi.org/10.1016/j.jchf.2020.01.012


2 Jing et al. JACC: HEART FAILURE VOL. -, NO. -, 2020
Data-Driven Heart Failure Population Management - 2020:-–-

I
ABBREVIATIONS n response to the rising cost of chronic beta-blockers. This type of approach may be able to
AND ACRONYMS conditions, such as heart failure (HF), generate individual patient-level predictions about
new models of health care and reim- those most likely to benefit from certain therapies
ARB = angiotensin II receptor
blocker
bursement are being developed (1). In these and, thus, can be used to direct resources at a popu-
value-based care models, management is lation level.
ACE = active angiotensin-
converting enzyme extending beyond single patient-physician In the present study, we leverage a large 20-year
AUC = area under the receiver- encounters to instead treat disease at a popu- retrospective dataset derived from a health system
operating characteristic curve lation scale. The general goal of such models (Geisinger) that was an early adopter of electronic
BP = blood pressure is to improve patient outcomes while health record (EHR) technology to develop a predic-
CRT = cardiac reducing/containing costs by delivering care tive model for all patients with HF using machine
resynchronization therapy that keeps patients optimally managed and learning. This model included a comprehensive set of
EBBB = evidence-based reduces the frequency of high cost/high acu- input variables, including 8 care gap indicators.
beta-blocker
ity encounters. Optimizing this kind of man- Importantly, this novel incorporation of evidence-
ECG = electrocardiogram agement at a population level requires an based care gaps into a predictive model represents a
EHR = electronic health record effective means to identify and stratify pa- methodology for driving clinical action from a ma-
HF = heart failure tients in need of intervention and, ideally, chine learning model (not just predicting risk but
HFpEF = heart failure with identify the appropriate intervention to predicting modifiable reduction in risk, or benefit, as
preserved ejection fraction deploy. At present, there is a critical lack of a result of action). Moreover, we demonstrate how
HFrEF = heart failure with validated, data-driven models to support such insights might be used through population
reduced ejection fraction
these population health goals. health management efforts to simultaneously stratify
Data science approaches, including machine risk and therapeutic benefit at an individual patient
learning, are well-suited to assist with these tasks. level to optimally deploy health care resources.
For example, one of the first papers on this subject in
1995 showed that a neural network could use echo- METHODS
cardiography data to predict 1-year mortality in 95
patients with HF with accuracy that was superior to a EHR DATA COLLECTION. Patients with HF over the

linear model or clinical judgment (2). Since then, last 19 years (January 2001 through November 2019)
numerous additional studies with thousands of pa- were identified from the Geisinger EHR, comprising
tients have shown significant promise for machine data from 13 regional hospitals and a network of pri-
learning to predict hospitalization (3), readmission mary and specialty clinic sites. HF was defined by
(4), or death (3,5,6) in patients with HF. using the validated Electronic Medical Records and
Previously published models using machine Genomics phenotype (9) and the “definite” category
learning for risk predictions in patients with HF have (i.e., probable or possible HF were not included). All
2 primary limitations with regard to their utility in clinical encounters since 6 months before the HF
optimizing clinical population health management. diagnosis date, including outpatient office visits,
First, most models have used small, systematically hospital admissions, emergency department visits,
collected and annotated datasets, such as from a laboratory tests, and cardiac diagnostic studies (e.g.,
clinical trial (7), or focused on an important, but echocardiograms or electrocardiograms [ECGs]), were
narrow, clinical setting, such as in-hospital mortality identified, grouped into episodes, and used as inde-
during HF hospitalization for acute decompensation pendent samples (see details in the Supplemental
(8). Such approaches, although valid and appropriate Appendix). Briefly, episodes were defined by consol-
within their respective constraints, are not neces- idating data across encounters within 2 weeks of each
sarily generalizable to a broad and heterogeneous HF other, up to a maximum 2-month span. To provide an
population, as characterized in real-world clinical external validation set, all patient data from 1 mem-
data. The second limitation is that none of the pub- ber hospital were excluded from model training and
lished findings using machine learning models have used exclusively for validation of the final model.
led to clinically relevant, actionable results. Ahmad MODEL INPUTS. A total of 209 variables were
et al. (6) reported associations between therapies and collected from the EHR (see Central Illustration): 26
outcomes in 44,886 patients with HF, which showed clinical variables: age, sex, height, weight, smoking
that retrospective associations can be used to drive status, heart rate, systolic and diastolic blood pres-
prospective predictions. Levy et al. (5) took this 1 step sures (BPs), use of loop diuretics, antihypertensive
further to model predicted hazard ratios at a popu- and antidiabetic medications, and laboratory test
lation level for the addition of therapies such as values (hemoglobin, estimated glomerular filtration
angiotensin-converting enzyme (ACE) inhibitors and rate, creatine kinase-muscle/brain, lymphocytes,
JACC: HEART FAILURE VOL. -, NO. -, 2020 Jing et al. 3
- 2020:-–- Data-Driven Heart Failure Population Management

C ENTR AL I LL U STRA T I O N Overall Schematic

Electronic Health Record


(276,819 Episodes from 26,971 Heart Failure Patients)

Clinical (n = 26) Diagnostic codes Diagnostic data Evidence-based


(demographics, vitals, (n = 90) (electrocardiograms [n = 41], care gap variables (n = 8)
laboratory tests, echocardiograms [n = 44]) (flu, A1c, blood pressure,
medications) cardiac resynchronization therapy,
evidence-based beta-blocker,
aldosterone receptor antagonist,
active angiotensin-
converting enzyme inhibitor/
angiotensin II receptor blocker/
angiotensin receptor-neprilysin inhibitor,
hydralazine)

Training (Retrospective) Prediction (Most Recent Encounter


from Alive Patients, N = 13,238)

Split-by-Year Training Scheme Best Performing Model (XGBoost)

Machine Learning Models Care Gap Closure


(logistic regression, random Simulation
forest, XGBoost)
Original
Predicted Risk
Predicted Risk
1-Year All-Cause Mortality

Predicted Risk Reduction (Benefit)

Model performance
(area under the receiver-
operating characteristic curve) Patient Stratification

Jing, L. et al. J Am Coll Cardiol HF. 2020;-(-):-–-.

We studied 1-year all-cause mortality in a large cohort of patients with heart failure using machine learning models to integrate clinical variables, measures from
diagnostic studies (e.g., echocardiography and electrocardiography) and evidence-based care gap variables from electronic health records. Mean area under the
receiver-operating characteristic curve from a split-by-year training scheme was reported to evaluate model performance. The best-performing model was then used to
estimate risk reduction (potential benefit) by artificially closing care gaps in a prospective prediction set and to evaluate the efficiency of benefit-driven patient
prioritization.

high-density lipoprotein, low-density lipoprotein, of Diseases-Tenth Revision diagnostic codes


uric acid, sodium, potassium, N-terminal pro–B-type (Supplemental Table 1); 44 nonredundant echocar-
natriuretic peptide, troponin T, hemoglobin A 1c, diographic variables; 41 ECG measurements (such as
troponin I, creatinine, and total cholesterol); QRS duration) and patterns (such as atrial fibrilla-
90 cardiovascular-related International Classification tion); and 8 care gap variables (described in the next
4 Jing et al. JACC: HEART FAILURE VOL. -, NO. -, 2020
Data-Driven Heart Failure Population Management - 2020:-–-

T A B L E 1 Care Gap Definitions

Care Gap Inclusion Exclusion Gap Closure

Flu vaccine All patients eligible Allergy Received flu vaccine in the current flu season
BP in goal All patients eligible N/A Open (not in goal) if the most recent reading
and $1 of the 4 prior readings in the past
12 months are >130 mm Hg for systolic or
>80 mm Hg for diastolic
A1c in goal Diagnosis of diabetes* N/A Most recent A1c within the last 6 months: <8%
EBBB Diagnosis of heart failure  Bradycardia (heart rate of <60 beats/min by Currently taking EBBB
with most recent LVEF of <40% averaging up to 5 most recent readings in last
6 months)
 On inotropic therapy
 History of second or third degree heart block
without implantable cardioverter-defibrillator
or pacemaker
 Hypotension (systolic pressure of <100 mm
Hg by averaging last 5 readings in past
6 months)
 Severe COPD or asthma
 Allergy or contraindications
ACE inhibitor/ Diagnosis of heart failure with  Pregnancy Currently taking ACE inhibitor or ARB or ARNI
ARB/ARNI most recent LVEF of <40%  History of angioedema
 Hypotension
 Serum creatinine of >2 in any of preceding
3 measurements
 Potassium of >5 in any of previous
3 measurements
 Allergy or contraindications
 Currently taking hydralazine and isosorbide
dinitrate/mononitrate
Hydralazine  Diagnosis of heart failure with most recent  Hypotension Currently taking combination of hydralazine and
LVEF of <40%  Allergy or contraindications isosorbide dinitrate/mononitrate
 Not currently taking ACE inhibitor/ARB/ARNI
 Having any of the following conditions:
pregnancy, history of angioedema, allergy or
contraindications for ACE inhibitor/ARB/ARNI,
serum creatinine of >1.6 in any of preceding 3
measurements, and potassium of >5 in any of
preceding 3 measurements
ARA Diagnosis of heart failure with most recent LVEF  Hypotension Currently taking ARA
of <35%  Serum creatinine of >2 in any of preceding
3 measurements
 Potassium of >5 in any of preceding
3 measurements
 On dialysis
 Allergy or contraindications
CRT  Diagnosis of heart failure N/A Currently have a CRT-D or CRT-P device
 LVEF of #35%
 Left bundle branch block from ECG in last
12 months
 QRS duration $150 ms from ECG in last
12 months

*Diabetes definition is described in the Supplemental Appendix.


ACE ¼ active angiotensin-converting enzyme; ARB ¼ angiotensin II receptor blocker; ARNI ¼ angiotensin receptor-neprilysin inhibitor; ARA ¼ aldosterone receptor antagonist; BP ¼ blood pressure;
COPD ¼ chronic obstructive pulmonary disease; CRT ¼ cardiac resynchronization therapy; CRT-D ¼ cardiac resynchronization therapy defibrillator; CRTP ¼ cardiac resynchronization therapy pacemaker;
EBBB ¼ evidence-based beta-blocker; LVEF ¼ left ventricular ejection fraction; N/A ¼ not applicable.

section). Two of the care gap variables were rules reviewed and verified by physicians after clinical
defined to modify other variables (BPs and A 1c), standards. EHR data preprocessing and cleaning are
leaving 207 true input variables used in the machine further detailed in the Supplemental Appendix.
learning models. Laboratory values, vital signs, ECG
and echocardiographic measurements (recorded in CARE GAP VARIABLES. We introduced 8 evidence-
the Xcelera database) within 12 months before the based, actionable interventions (care gap variables)
encounter date were extracted. If no measurements to study their association with patient outcomes: 1)
were available within the specified time window, the flu vaccine administration; 2) hemoglobin A 1c in goal
variable was set to missing. ECG measures were range (<8%); 3) BP in goal (<130/80 mm Hg); 4) active
extracted from clinical ECG reports, which are evidence-based beta-blocker (EBBB); 5) active ACE
JACC: HEART FAILURE VOL. -, NO. -, 2020 Jing et al. 5
- 2020:-–- Data-Driven Heart Failure Population Management

T A B L E 2 Basic Demographics and Patient Characteristics

All (276,819 Episodes From 26,971 Validation Set (548 Episodes/ Prediction Set (13,238 Episodes
Patients) Patients) From Living Patients)

Value % Missing Value % Missing Value % Missing

Age, yrs 76 (67–84) 0 73 (64–83) 0 75 (65–84) 0


Male 53 0 57 0 53 0
Smoking history 63 0 59 0 62 0
Height, cm 168 (157–175) 10 168 (160–176) 2 168 (160–175) 5
Weight, kg 84 (70–102) 2 89 (72–111) 1 87 (72–105) 2
Diastolic pressure, mm Hg 68 (60–75) 1 70 (64–78) 1 70 (62–78) 2
Systolic pressure, mm Hg 124 (112–137) 1 124 (112–138) 1 124 (113–138) 2
Heart rate, beats/min 72 (64–80) 2 72 (64–81) 1 72 (64–82) 2
Left ventricular ejection fraction, % 52 (35–57) 48 47 (32–57) 44 52 (37–57) 48
HDL, mg/dl 42 (35–52) 41 41 (35–51) 47 43 (35–53) 45
LDL, mg/dl 74 (56–97) 35 79 (57–102) 47 73 (55–97) 42
NT-proBNP, pg/ml 2,137 (766–5,567) 63 2,044 (804–4,026) 53 1,755 (591–4,695) 56
Troponin T, ng/ml 0.02 (0.01–0.06) 58 0.02 (0.01–0.05) 51 0.03 (0.02–0.06) 58

Values are median, (interquartile range) or %. *Smoking history: current or ever smoking.
HDL ¼ high-density lipoprotein; LDL ¼ low-density lipoprotein; NT-proBNP ¼ N-terminal-pro hormone B-type natriuretic peptide.

inhibitor, angiotensin II receptor blocker (ARB), or (10). Detailed inclusion/exclusion criteria are listed
angiotensin receptor-neprilysin inhibitor; 6) active (Table 1). By definition, gaps 4 through 8 apply only to
aldosterone receptor antagonist; 7) active hydralazine HF with reduced ejection fraction (HFrEF), whereas
and isosorbide dinitrate; and 8) cardiac resynchroni- gaps 1 through 3 apply to all patients with HF. A
zation therapy (CRT). These care gap variables were blinded chart review validation of each care gap var-
defined with assistance from a cardiologist, a physi- iable is detailed in Supplemental Table 2.
cian trained in medical informatics, and a pharmacist
PRIMARY OUTCOME. We used machine learning
with HF expertise following national guidelines and
models to predict all-cause mortality 1 year after the
recommendations for evidence-based HF therapies
end of an episode. Survival duration was calculated
from the date of death (cross-referenced with na-
F I G U R E 1 Care Gap Prevalence tional death index databases monthly) or last living
encounter from the EHR.

MACHINE LEARNING MODEL TRAINING AND


EVALUATION. We compared performances between
a linear logistic regression classifier and nonlinear
models, including random forest and XGBoost (11) (a
scalable gradient tree boosting system). These
nonlinear models were hypothesized to improve
predictive accuracy by capturing more complex,
nonlinear relationships among input variables. The
best-performing model was selected for subsequent
analysis of care gap closure effect estimation. We
Number of patients with an open/untreated gap (orange) and evaluated models using a split-by-year cross-valida-
with a closed/treated gap (blue) as of the most recent tion to simulate clinical deployment, as described in
encounter date in living patients (N ¼ 13,238). The sum of the the Supplemental Appendix and illustrated by
orange and blue bars represents the total number of patients
Supplemental Figure 1.
eligible (i.e., that fit the inclusion criteria) for that gap. These
numbers are summarized in Supplemental Table 4. BENEFIT PREDICTION IN LIVING PATIENTS BY
ACEI ¼ active angiotensin-converting enzyme inhibitor;
SIMULATION OF CARE GAP CLOSURE. To study the
ARA ¼ aldosterone receptor antagonist; ARB ¼ angiotensin II
predicted effect of closing care gaps on reducing 1-
receptor blocker; ARNI ¼ angiotensin receptor-neprilysin in-
hibitor; BP ¼ blood pressure; CRT ¼ cardiac resynchronization year all-cause mortality, we artificially closed care
therapy; EBBB ¼ evidence-based beta-blocker. gaps while keeping all other variables unchanged as
follows: for binary gap variables, changing the value
6 Jing et al. JACC: HEART FAILURE VOL. -, NO. -, 2020
Data-Driven Heart Failure Population Management - 2020:-–-

F I G U R E 2 Model Evaluation Using Split-by-Year Cross-Validation

(A) Receiver operating characteristic curves for year 2018 and (B) areas under the receiver-operating characteristic (ROC) curves (AUCs) for all
years using linear logistic regression (LR) and nonlinear random forest (RF) and XGBoost (XGB).

from 1 (open/untreated) to 0 (closed/treated); for episodes (interquartile range: 4 to 14). The median
continuous variables, changing the value to goal (A 1c: follow-up duration was 3.4 years (interquartile range:
8% or BP: 130/80 mm Hg) if the original value was 1.3 to 6.4 years) using reverse Kaplan-Meier estima-
above the goal and the corresponding medication tion (12), and 13,733 (51%) patients had a recorded
(antidiabetic or antihypertensive) to 1 (taking medi- death. The external validation set contained 548 ep-
cation) if original value was 0. A care gap was not isodes/patients from a separate Geisinger hospital as
closed for patients who met the exclusion criteria for of January 1, 2018 (the cutoff date for the 2018
that care gap (i.e., a patient with bradycardia who model), of which 42 (8%) died within 1 year. Table 2
could not be treated with EBBB) or who had a missing and Supplemental Table 3 show summary statistics.
value (no A 1c test result in last 6 months or BP in Of the 13,238 patients who were living as of
last 12 months). November 16, 2019, there were 3,772 (28%) with
After the simulation, we calculated the change in HFrEF and 6,784 (51%) with HFpEF. The remaining
risk score for each patient, that is, the difference be- patients had either midrange EF (40% to 50%; 1,424;
tween the baseline risk score with care gaps un- 11%) or no available EF measurement (1,258; 10%). A
changed and the updated risk score with care gaps total of 10,516 (79%) had at least 1 open care gap, and
closed, which was further translated into an esti- 788 (6%) had 4 or more care gaps open as of their
mated benefit of reduction in estimated mortality most recent clinical episode. Figure 1 shows the
rate. The cumulative sum of the benefit from all pa- number of patients for each gap for which the gap was
tients was then used to provide an estimated number open/untreated (orange) or closed/treated (blue). The
of lives that could be saved by closing care gaps. sum of orange and blue represents the number of
Model evaluation and care gap simulation were patients who were eligible for the gap (i.e., who fit the
performed for all patients with HF and separately on inclusion criteria). Depending on the gap, 25% to 91%
patients with HF with HFrEF and preserved ejection of eligible patients had an open gap (additional de-
fraction (HFpEF). Detailed methods and results of this tails available in Supplemental Table 4).
subgroup analysis are provided in the supplement. ACCURACY FOR PREDICTING ALL-CAUSE MORTALITY
USING MACHINE LEARNING. The split-by-year cross-
RESULTS
validation showed that performance was highly var-
iable in early years (before 2009) for all 3 machine
STUDY POPULATION. Within our EHR, 26,971 pa- learning methods (Figure 2), likely due to small sam-
tients with HF who collectively underwent 276,819 ple sizes (Supplemental Table 5) and more missing-
episodes (median age: 76 years; 53% male) satisfied ness (Supplemental Table 3). The performance
the inclusion criteria. On average, each patient had 10 improved in subsequent years, and the nonlinear
JACC: HEART FAILURE VOL. -, NO. -, 2020 Jing et al. 7
- 2020:-–- Data-Driven Heart Failure Population Management

XGBoost model consistently achieved the best area


F I G U R E 3 Relationship Between Predicted Mortality Risk and Benefit
under the receiver-operating characteristic (Risk Reduction)
curve (AUC).
In the final and largest test set (year 2018), all 3
models predicted 1-year all-cause mortality with
AUCs above 0.70, superior to the performance of the
Seattle HF Model (AUC: 0.57). The nonlinear models
achieved higher AUCs (random forest: 0.76; XGBoost:
0.77) compared with linear logistic regression (0.74).
Results were similar in patients with HFrEF and
HFpEF (Supplemental Figure 2). Additionally, in the
holdout validation set of 548 episodes/patients from
a separate hospital, the XGBoost model had an AUC of
0.78. Calibration on the 2018 test set showed that the
XGBoost model had a tendency to slightly over-
estimate risk (Supplemental Figure 3).
PREDICTING BENEFIT OF CLOSING CARE GAPS. A
final model was fit on all training samples (all but the
most recent episodes for currently living patients) by
using XGBoost to predict the benefit of closing care
gaps in the living patients based on data from their
most recent episode. The distribution of risk scores is
available in the supplement (Supplemental Figure 4).
Of the 13,238 living patients, based on the estimated
mortality rate, 2,844 (21.5%) patients were predicted
to die within 1 year. Simulating closure of the 8 care
gaps resulted in 2,613 (19.7%) patients being pre-
dicted to die within 1 year. Hence, the aggregate
predicted absolute mortality rate reduction was 1.7%
(individual patient range –35% to 48%, absolute), and
231 (8.1% of 2,844) additional patients were predicted
to survive beyond 1 year assuming all 8 care gaps
could be closed. Of these 231 patients, 102 of them
had HFrEF and 87 had HFpEF.
We further investigated the relationship between
risk and benefit by comparing the predicted benefits
(A) Scatterplot of risk score and corresponding benefit for individual patients in the
among several subgroups. Figure 3 shows that the
prediction set (N ¼ 13,238). Negative reductions in mortality rate reflect a detrimental
overall average benefit (“Overall Average”) was pre- effect of closing care gaps on mortality risk, as predicted by the XGBoost model, in a small
dicted to be relatively small and was primarily driven proportion of patients. (B) Average mortality rate before and after care gap closure
by the large group of patients with low mortality risk simulation in selected groups. Risk is not equivalent to benefit because patients at
at baseline (risk score: #0.5) and low benefit after similarly high mortality risk levels do not have the same predicted benefit of closing care
gaps. Additionally, 81 patients had relatively low risk and high benefit (data not shown).
closing the care gaps (#10% reduction in mortality
rate) (“Low Risk, Low Benefit”). There was, however,
a subgroup of patients predicted to have high mor-
tality risk (risk score: >0.5) who were also predicted shown in Supplemental Figure 5. Compared with the
to have high benefit after closing gaps (>10% reduc- low-benefit group, high-benefit patients had, in gen-
tion in mortality rate; “High Risk, High Benefit”). eral, more open care gaps, poorer blood pressure and
However, not all high-risk patients were predicted to A 1c control, lower EF, and higher left ventricular di-
have high benefit, as evidenced by another subgroup mensions (Supplemental Table 6).
of patients with similarly high baseline risk but min- PATIENT PRIORITIZATION TO EFFICIENTLY CLOSE
imal risk reduction after closing the care gaps (“High CARE GAPS THROUGH POPULATION HEALTH MAN-
Risk, Low Benefit”). The distribution of patients with AGEMENT. Assuming that a population health man-
HFrEF and HFpEF according to this categorization is agement team could be assembled and deployed to
8 Jing et al. JACC: HEART FAILURE VOL. -, NO. -, 2020
Data-Driven Heart Failure Population Management - 2020:-–-

care resources, particularly within new value-based


F I G U R E 4 Simulation of Care Gap Closure Using XGBoost
care models. This study has made considerable
advances toward the development of such an
approach for HF that combines extensively and
carefully curated clinical data and machine learning.
The model incorporates important clinical variables,
quantitative measures from common diagnostic
studies such as echocardiography and ECG,
and evidence-based interventions in the form of
care gaps. Our results show that a machine learning
model with these inputs can achieve good accuracy
to predict 1-year all-cause mortality in patients with
Prioritization of patients according to predicted benefit is the most optimal resource HF.
allocation method based on having the highest predicted patient survival (y-axis) Several studies have been published in recent
relative to the number of patients needed to treat (x-axis). The slopes of the plotted lines years using machine learning to predict outcomes
are inversely proportional to the number needed to treat, and thus, steeper lines
(mostly survival) in patients with HF, as summarized
represent more efficient patient prioritization. The small drop in lives saved at the far
by Tripoliti et al. (13). These studies used various
right-hand side of the line corresponding to the “Benefit Driven” model reflects the
patients for whom closing the care gaps had a predicted negative impact on mortality methods, from traditional classification (e.g., logistic
risk, as shown in Figure 3A. regression, random forest) to custom-developed al-
gorithms (contrast pattern–aided logistic regression
with probabilistic loss function). The reported per-
close care gaps, the efficiency of its efforts would formances (AUC) vary from 0.61 to 0.94, while mostly
depend on effective guidance as to which patients to centered around 0.75 to 0.8. Our model performance
target first in a rank-ordered fashion. To demonstrate is, therefore, comparable to these prior studies, but
the potential value of machine learning to optimize with critical differences that enhance the clinical
care team resource deployment in this setting, we utility and generalizability of our model. Namely, we
plotted the number of lives predicted to be saved used large-scale EHR data with comprehensive fea-
versus the number of patients receiving an interven- tures from a general HF population and leveraged
tion (in which all eligible gaps were subsequently machine learning with a novel split-by-year design
assumed closed) for several different prioritiza- for optimal evaluation and deployment in a real-
tion strategies: world clinical setting. Detailed comparison with pre-
vious studies is included in the Supplemental
1. Random prioritization.
Appendix.
2. Randomly prioritizing any patient with at least 1
Furthermore, our explicit representation of clin-
open care gap.
ical care gaps in the model represents a new para-
3. Rank-ordering patients by the number of open care
digm for guiding clinical action with machine
gaps.
learning. Specifically, we showed how these care
4. Stratifying patients using the Seattle Heart Failure
gap inputs can be used to predict risk reduction
risk score (5).
associated with specific interventions on an indi-
5. Stratifying patients according to the XGBoost
vidual patient level. These model predictions can
model’s predicted benefit (i.e., mortality risk
provide guidance to integrated health systems
reduction).
working to optimally distribute scarce clinical re-
Figure 4 shows that the proposed machine learning sources (e.g., care teams) to patients who need
benefit stratification model (strategy 5) was the most them the most. Importantly, most published models
efficient. That is, benefit stratification had the and clinical scoring systems rely heavily on risk
steepest slope of any prioritization strategy and, thus, prediction, which could be used to prioritize dis-
in a resource-constrained environment, maximized tribution of health care resources. However, risk is
the predicted total number of lives saved for a given not equivalent to modifiable risk (i.e., benefit), and
number of patient interventions. thus, patients with identical risk of 1-year mortality
DISCUSSION can have very different predicted benefit from in-
terventions. Therefore, deployment of resources
Optimized population health management demands based simply on risk is unlikely to be optimal. We
novel, data-driven approaches for allocating health demonstrated support for this claim by showing the
JACC: HEART FAILURE VOL. -, NO. -, 2020 Jing et al. 9
- 2020:-–- Data-Driven Heart Failure Population Management

superiority of our model’s predicted performance STUDY LIMITATIONS. First, by treating each episode
over the Seattle Heart Failure score for prioritizing as an independent training sample, longitudinal
patient interventions. information for individual patients was not
Despite the fact that these interventions (care captured and, thus, could compromise model per-
gaps) are recommended in national guidelines based formance. However, by using the split-by-year
on demonstrated benefit—for example, even flu cross-validation approach, a patient’s historic in-
vaccination has been associated with decreased all- formation was still used in training to make pre-
cause mortality in HF (14)—the prevalence of open dictions from a current episode. Future work will
care gaps remains a significant problem in medicine. explore the use of approaches to more optimally
For example, in patients with HF, therapies proven leverage longitudinal information.
to prolong life are used at staggeringly low rates: Second, the care gaps selected in the current
only 57% are receiving ACE inhibitors, 34% are study primarily focus on patients with HFrEF
receiving evidence-based beta blockers, and 32% are with a few exceptions, because of lack of evidence-
receiving mineralocorticoid antagonists (15). Addi- based treatment for HFpEF. Some important treat-
tionally, although some gaps can be easily ments for HF were not included for various reasons,
addressed in a single setting (e.g., flu vaccine), such as newer therapies (e.g., ivabradine) for which
others are more difficult to manage and require we do not yet have enough retrospective data
close monitoring and frequent follow-up (e.g., BP or difficulty in capturing the therapy from struc-
and A1c control). This problem is highly complex tured EHR data (e.g., implantable cardioverter-
and unlikely to be solved by relying on individual defibrillator devices). Additionally, we did not
providers to change practice. However, new value- account for optimized dosing of medications in
based care models can likely address this problem the current study. Future studies will leverage
more effectively by creating organized care teams. natural-language processing to more accurately
These teams will require accurate, reliable data capture these features from unstructured data
science, such as that presented in this article, to and explore their potential impact on mortality
successfully allocate resources. risk.
A small portion of patients had a negative pre- Third, the predicted benefit is for short-term
dicted reduction in mortality rate (1.8% of the mortality risk reduction, which is highly relevant
living patients with benefit of <–5%), indicating because of the high 1-year mortality rate in
that closing gaps for these patients was predicted HF. However, this may not account for longer-
to have a detrimental effect on mortality risk. This is term benefits of treatment, and future in-
likely due to the nonlinear relationship between vestigations are required. Finally, we used EHR data
care gaps such as A1c or BP and mortality. Despite from a single health system, which may limit
the evidence-based guidelines showing that lower generalizability. An external, independent dataset
BPs are associated with reduced risk of adverse from another large health care system may help
events in HF (16), the so-called BP paradox has been further validate and improve our model. However,
noted in multiple studies where lower BP or pro- our dataset represents a broad, heterogeneous
nounced changes in BP (increases or decreases) population covering a large geographic area popu-
was associated with poor outcomes (17,18). Additional lated by approximately 3 million people served by
investigation is warranted to fully understand 13 hospitals and >100 clinics. Moreover, we vali-
the role of BP and to determine optimal BP targets dated our model using a holdout from a geograph-
in HF. ically separate independent hospital within our
It is critically important to note that this concept system.
needs prospective evaluation because, currently,
our models are based on association, not causation. CONCLUSIONS
It is logical to infer some causative relationship
between evidence-based therapies and survival, We presented a machine learning model to predict
which has been demonstrated in many prior studies 1-year all-cause mortality with good accuracy in a large
(10,19–21). However, this needs to be evaluated cohort of patients with HF. Our results leveraging
prospectively, and we have thus launched a ran- 276,819 episodes from nearly 27,000 patients show
domized prospective study (NCT03804606). that these models can be used to not only risk-stratify
10 Jing et al. JACC: HEART FAILURE VOL. -, NO. -, 2020
Data-Driven Heart Failure Population Management - 2020:-–-

patients but also efficiently prioritize patients


PERSPECTIVES
based on predicted benefits of clinically relevant
evidence-based interventions. This approach will
likely prove useful for assisting HF-population health COMPETENCY IN PATIENT CARE AND

management teams within new value-based payment PROCEDURAL SKILLS: A machine learning model
models. can integrate vast amounts of clinically acquired
electronic health record data to optimally direct heart
ACKNOWLEDGMENTS The authors acknowledge failure population health management in new value-
Paul M. Berry, Susan A. Kilbride, and Christopher D. based care models.
Nevius for their assistance with the manual chart re-
TRANSLATIONAL OUTLOOK: The clinical adop-
views for care gap validation.
tion of data science and machine learning models to
optimize population health management may ulti-
ADDRESS FOR CORRESPONDENCE: Dr. Brandon K.
mately improve survival and care delivery in popula-
Fornwalt, Department of Imaging Science and Innova-
tions of patients with HF.
tion, Geisinger, 100 North Academy Avenue, Danville,
Pennsylvania 17822-4400. E-mail: [email protected].

REFERENCES

1. O’Connor CM. Bundle up for value-based heart failure:classification and regression tree analysis. systematic review and meta-analysis. PLoS Med
failure care. J Am Coll Cardiol HF 2015;3:931–2. JAMA 2005;293:572–80. 2014;11:e1001699.

2. Ortiz J, Ghefter CG, Silva CE, Sabbatini RM. 9. Bielinski SJ. Heart Failure (HF) With Differentia- 16. SPRINT Research Group, Wright JT,
One-year mortality prognosis in heart failure: a tion Between Preserved and Reduced Ejection Frac- Williamson JD, et al. A randomized trial of inten-
neural network approach based on echocardio- tion. PheKB. Available at: https://phekb.org/ sive versus standard blood-pressure control.
graphic data. J Am Coll Cardiol 1995;26:1586–93. phenotype/heart-failure-hf-differentiation-between- N Engl J Med 2015;373:2103–16.

3. Shah SJ, Katz DH, Selvaraj S, et al. Phenomap- preserved-and-reduced-ejection-fraction. Accessed 17. Ventura HO, Messerli FH, Lavie CJ. Observa-
ping for novel classification of heart failure with December 19, 2019. tions on the blood pressure paradox in heart fail-
preserved ejection fraction. Circulation 2015;131: 10. Yancy CW, Jessup M, Bozkurt B, et al. 2017 ure. Eur J Heart Fail 2017;19:843–5.
269–79. ACC/AHA/HFSA focused update of the 2013 ACCF/ 18. Schmid FA, Schlager O, Keller P, et al. Prog-
4. Mortazavi BJ, Downing NS, Bucholz EM, et al. AHA guideline for the management of heart fail- nostic value of long-term blood pressure changes
Analysis of machine learning techniques for heart ure. J Am Coll Cardiol 2017;70:776–803. in patients with chronic heart failure. Eur J Heart
failure readmissions. Circ Cardiovasc Qual Out- 11. Chen T, Guestrin C. XGBoost: A Scalable Tree Fail 2017;19:837–42.
comes 2016;9:629–40. Boosting System. Proc 22nd ACM SIGKDD Int Conf 19. Klapholz M. b-Blocker use for the stages of
5. Levy WC, Mozaffarian D, Linker DT, et al. The Knowl Discov Data Min - KDD ‘16. March 2016: heart failure. Mayo Clin Proc 2009;84:718–29.
Seattle Heart Failure Model: prediction of survival 785-94.
20. Cheng JWM, Nayar M. A review of heart failure
in heart failure. Circulation 2006;113:1424–33. 12. Schemper M, Smith TL. A note on quantifying management in the elderly population. Am J
6. Ahmad T, Lund LH, Rao P, et al. Machine follow-up in studies of failure time. Control Clin Geriatr Pharmacother 2009;7:233–49.
learning methods improve prognostication, iden- Trials 1996;17:343–6.
21. Fonarow GC. A review of evidence-based beta-
tify clinically distinct phenotypes, and detect blockers in special populations with heart failure.
13. Tripoliti EE, Papadopoulos TG, Karanasiou GS,
heterogeneity in response to therapy in a large Rev Cardiovasc Med 2008;9:84–95.
Naka KK, Fotiadis DI. Heart failure: diagnosis,
cohort of heart failure patients. J Am Heart Assoc
severity estimation and prediction of adverse
2018;7:e008081.
events through machine learning techniques.
7. Subramanian D, Subramanian V, Deswal A, Comput Struct Biotechnol J 2017;15:26–47. KEY WORDS data science, electronic
Mann DL. New predictive models of heart failure
health records, population health
mortality using time-series measurements and 14. Modin D, Jørgensen ME, Gislason G, et al.
ensemble models. Circ Hear Fail 2011;4:456–62. Influenza vaccine in heart failure. Circulation 2019;
139:575–86.
8. Fonarow GC, Adams KF, Abraham WT, A PPE NDI X For an expanded Methods section
Yancy CW, Boscardin WJ. Risk stratification for in- 15. Callender T, Woodward M, Roth G, et al. Heart and supplemental tables and figures, please see
hospital mortality in acutely decompensated heart failure care in low- and middle-income countries: a the online version of this paper.

You might also like