Aligning EHR Data For Pediatric Leukemia With Standard Protocol Therapy

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

DATA ARCHITECTURE AND MODELS

Aligning EHR Data for Pediatric Leukemia With


original reports Standard Protocol Therapy
Nicole M. Wood, DO1,2,3; Sierra Davis, MBA2; Karen Lewing, MD1,3; Janelle Noel-MacDonnell, PhD1,2,3; Earl F. Glynn, MS2;
Doina Caragea, PhD4; and Mark A. Hoffman, PhD1,2,3,5
abstract

PURPOSE Children with acute lymphoblastic leukemia (ALL) are treated according to risk-based protocols
defined by the Children’s Oncology Group (COG). Alignment between real-world clinical practice and protocol
milestones is not widely understood. Aggregate deidentified electronic health record (EHR) data offer a useful
resource to evaluate real-world clinical practice.
METHODS A cohort of children with ALL was identified in the Cerner Health Facts deidentified aggregate EHR
data. Manual review identified candidate procedural milestones. Automated methods were developed to classify
likely standard-risk precursor B-cell ALL patients. Milestone procedures were adjusted relative to initiation of
therapy and then aligned to the COG protocols for standard induction therapy.
RESULTS We identified 7,728 patients with pediatric ALL with 188,187 encounters. Records for lumbar
punctures (LP) and bone marrow biopsies were frequently present in the data and were appropriate targets to
evaluate guideline performance. Alluvial graph analysis of 14 health systems indicated that none of the systems
have data from all three COG-recommended lumbar procedures for all patients but alignment demonstrated that
most systems test at the recommended times.
CONCLUSION Source-system variation introduces inconsistency and incompleteness into aggregate EHR data.
Data visualization was helpful in characterizing and interpreting the data. Health systems with patients meeting
the inclusion criteria demonstrated strong alignment with the recommended milestones for LP. Large-scale
aggregate EHR data are useful to evaluate alignment of recommended versus actual clinical milestones in
support of treating children with ALL. This work can inform other guideline and protocol driven care.
JCO Clin Cancer Inform 5:239-251. © 2021 by American Society of Clinical Oncology
Creative Commons Attribution Non-Commercial No Derivatives 4.0 License

INTRODUCTION care that includes evaluation with lab and other diag-
Variation in healthcare delivery affects patient out- nostic tests followed by treatment that often includes
comes when real-world practice deviates from widely chemotherapy, radiation therapy, surgery, and/or trans-
accepted evidence-based protocols. Pediatric oncol- plantation. The treatment protocol assigned to a patient
depends on personal risk level. For example, pediatric
ogy is distinguished by the widespread use of protocols
precursor B-cell acute lymphoblastic leukemia (ALL)
provided by the Children’s Oncology Group (COG) for
cases are classified into standard or high risk categories
ASSOCIATED the management of common childhood cancers.1 The
CONTENT based on National Cancer Institute (NCI) criteria.4 Pa-
American Academy of Pediatrics recommends treat-
Appendix tients with NCI standard-risk ALL must have an age
ment of pediatric cancer at a tertiary center, with board-
Author affiliations between ≥ 1 and , 10 years and an initial WBC count
and support
certified pediatric oncologists.2 Most tertiary pediatric of , 50 ×103/mcL. Patients are considered high risk if
information (if academic institutions are members of the COG, with they are 10 years of age or older or have a WBC count
applicable) appear at more than 90% of patients with pediatric cancer cared ≥ 50 × 103/mcL. Children younger than 1 year of age
the end of this for at COG sites.3 In addition to treatment guidelines,
article.
are considered to be a distinct ALL risk category with a
COG protocols also include detailed information re- distinct protocol. Patients with ALL with Down syndrome
Accepted on January
22, 2021 and
garding timing of procedures, laboratory and diagnostic have inferior survival compared to those without Down
published at evaluations, treatment, and follow-up after therapy. syndrome5-7 and are also a separate risk cohort.
ascopubs.org/journal/
cci on March 3, 2021:
Large-scale analysis of the alignment between real- Several database resources are used for pediatric
DOI https://doi.org/10. world clinical practice and standardized protocols is cancer research, each with their own strengths and
1200/CCI.20.00144 challenging. Patients with cancer experience complex limitations. For example, the SEER registry is an

239
Wood et al

CONTEXT
Key Objective
How can data science methods using aggregate, deidentified, electronic health record (EHR) data inform oncologists about
the alignment between protocol defined milestones and real-world clinical practice?
Knowledge Generated
Using lumbar puncture timing specified by the Children’s Oncology Group (COG) as an example, we found a strong level of
alignment between real-world practice and protocol recommendations for children with standard-risk acute lymphoblastic
leukemia.
Working with large, deidentified data sets benefits from the application of data science methods and visualizations to account
for missing data and other challenges.
Relevance
This work establishes data science methods that can be reapplied to aggregate analysis of other guideline and protocol-based
events and serves as a precursor to evaluating the impact of deviation from guidelines on clinical outcomes.

established source of cancer data and has been used to We demonstrate processes to use large-scale aggregate
evaluate pediatric leukemia.8 Registries such as SEER are EHR data from healthcare organizations in the United States
standardized and can include details from pathology and to compare real-world clinical practice to COG treatment
radiology. However, registries are episodic and are often protocols for managing NCI standard-risk ALL cases.
populated by manual data entry, limiting the number of
patients and volume of data. SEER has limited information METHODS
about comorbidities and only represents 12 states.9 Data To develop a representation of the COG ALL protocols, we
derived from billing and claims can provide a view into reviewed the COG pre-B-cell ALL protocols for standard-
patient interactions within the healthcare system. For ex- and high-risk regimens1 to develop a reference framework
ample, the pediatric health information system database has against which to map scheduled events in the data from HF
been used extensively to characterize pediatric cancer10,11 database (Cerner Corporation, Kansas City, MO). HF in-
and has demonstrated the value of combining diagnosis data cludes deidentified, HIPAA-compliant, EHR data from
with medication information.12 Key limitations of registries Cerner clients who agree to participate. The version of HF
and billing data are the lack of temporal specificity, the data in use at Children’s Mercy includes more than 68
absence of results, and limited ability to scale up. million patients, from 664 facilities associated with 100
nonaffiliated organizations, 4 billion lab results, 734 million
Electronic health record (EHR) systems have become widely
inpatient medication orders, and other data. Significantly,
adopted following the Meaningful Use funding of the
the data do not include text reports as those cannot be
American Recovery and Reinvestment Act (ARRA).13 EHR
reliably deidentified. Children’s Mercy received the 2018
systems include results, serve as the legally binding medical
version of the HF data and installed the data into Microsoft
record, and are rich in date- and time-stamped details. Data Azure (Redmond, WA). Queries were performed with
from EHR systems can be applied to clinical research, in- Microsoft SQL Server Management Studio version 17.9 and
cluding the use of EHR data to characterize the trajectories of R Studio version 1.1.453 with R version 3.5.2.21,22 Queries
patients treated within a single organization.14 One major EHR evaluated data from 2000 to 2017.
vendor, Cerner, has developed a large-scale aggregate data
A preliminary query to identify pediatric ALL diagnoses was
warehouse, Health Facts (HF), in which a subset of their client
performed and included the following: International Classi-
base has provided data rights to assemble and analyze a
fication of Diseases (ICD)-9 diagnosis codes (204.0, 204.00,
subset of their EHR data. The data are deidentified to Health
and 204.01), ICD-10-CM diagnosis codes (C91.0, C91.00,
Insurance Portability and Accountability Act (HIPAA) stan- and C91.01) and patients 0 to 18 years of age. We excluded
dards and are scrubbed of all protected health information. nonclinical patient encounters and ALL diagnosis codes
The HF data have been applied to research in cardiology and related to relapse. To align standard-risk (SR-ALL) patient
other disease states.15-19 A comparison of HF to the National trajectories with standard COG protocols at a large scale, we
Inpatient Survey demonstrated high correlation in the fre- needed to develop analytical methods, in the absence of
quency of diagnoses between HF and a nationally repre- risk-specific ICD codes and text notes, to exclude patients
sentative survey.20 The HF data include laboratory data, from other risk categories before inferring compliance with
inpatient medications, demographics, surgical data, billing COG guidelines. We also sought reliable, consistent, and
data, and a wide variety of clinical events including vitals. widely available milestone procedures in the EHR data.

240 © 2021 by American Society of Clinical Oncology


EHR Data Pediatric Leukemia

A
Standard-Risk Precursor B-Cell Acute Lymphoblastic Leukemia (ALL):
Cycle 1 = Induction (35 days):
LP Plus IT Chemotherapy days: 0/1, 8, 29
Systemic IV Chemotherapy: day 1 (VCR), day 4 (PEG), day 8 (VCR), day 15 (VCR), day 22 (VCR)
Oral Chemotherapy: Dexamethasone days 1-28
1 4 8 15 22 29 35

Cycle 2 = Consolidation (28 days):


LP Plus IT Chemotherapy days: 1, 8, 15
Systemic Chemotherapy: day 1 (VCR),
Oral Chemotherapy: 6-MP days 1-28
1 8 15 28

B
Higher-Risk Precursor B-Cell Acute Lymphoblastic Leukemia (ALL):
Cycle 1 = Induction (35 days):
LP Plus IT Chemotherapy days: 0/1, 8, 29
If CNS disease, get twice weekly LP Plus IT chemo until cleared and add days 15 and 22
Systemic Chemotherapy: day 1 (VCR, DAUNO), day 4 (PEG), day 8 (VCR, DAUNO), day 15 (VCR, DAUNO), day 22 (VCR, DAUNO)
Oral Chemotherapy: (dexamethasone days 1-14 if less than 10; prednisone days 1-28 if 10 years or older)
1 4 8 15 22 29 35

Cycle 2 = Consolidation (56 days):


LP Plus IT Chemotherapy days: 1, 8, 15, 22
Systemic Chemotherapy: day1 (CPM), day 15 (VCR, PEG), day 22 (VCR), day 29 (CPM), day 43 (VCR, PEG), day 50 (VCR)
Cytarabine IV or SQ: days 1-4, 8-11, 29-32, 36-39).
Oral Chemotherapy: (6-MP days 1-14 and 29-42)
1 2 3 4 8 9 10 11 15 22 28

29 30 31 32 36 37 38 39 43 50 56

FIG 1. Milestone events for B-cell ALL. (A) Milestone events for standard-risk ALL, treatment cycles 1 (induction) and 2 (consolidation). (B)
Milestone events for higher-risk ALL, treatment cycles 1 (induction) and 2 (consolidation). ALL, acute lymphoblastic leukemia; CPM, cyclo-
phosphamide; DAUNO, daunorubicin; IT, intrathecal; LP, lumbar puncture; MP, mercaptopurine; PEG, pegaspargase; VCR, vincristine.

HF data from seventy pediatric patients with ALL were RESULTS


randomly selected for manual evaluation. Oncologists from We developed a reference framework representing the
Children’s Mercy reviewed each patient to assess the milestone events in the care of a child treated following
availability of laboratory data, diagnoses, diagnostic pro- the COG protocols for NCI SR-ALL versus those with NCI
cedures, inpatient medications, and demographic and high-risk ALL (HR-ALL) (Fig 1). This framework was used
clinical data. This information was used to identify the to define machine-readable inclusion and exclusion
milestone events likely to be well represented in the data. criteria and served as the reference timeline against
The Children’s Mercy Institutional Review Board has which date- and time-stamped data found in HF would
deemed work with HF to be nonhuman subjects research. be aligned.

JCO Clinical Cancer Informatics 241


Wood et al

Patients = 68 million
Encounters = 506 million
Health Facts Data Warehouse
Diagnosis Records = 988 million
Health Systems = 100

Exclusions:
Age > 18 Years and Patient Types

Patients = 15 million
Encounters = 72 million
Base Population
Diagnosis Records = 137 million
Health Systems = 89
FIG 2. ALL cohort development. Iterative
inclusions and exclusions to develop the
preliminary ALL cohort. Data from orga-
nizations that do not consistently capture ALL Diagnosis Inclusion
labs, medications, or procedures in Cerner
were excluded. ALL, acute lymphoblastic
leukemia.
Patients = 11,476
Encounters = 270,190
Leukemia Cohort
Diagnosis Records = 165,362
Health Systems = 80

Excluded Health Systems Without


Lab, Medication, and Procedure Modules

Patients = 7,252
Encounters = 188,187
Patient Cohort
Diagnosis Records = 152,960
Health Systems = 30

The preliminary query to identify pediatric ALL diagnoses based on patient, lab, and medication factors to infer risk
yielded 11,476 patients with pediatric ALL in HF from 80 status. Patient-level exclusions were children younger than
nonaffiliated health systems (Fig 2). These patients have the age of one year, age of 10 years or older, and patients
270,190 ALL-diagnosis-related encounters in the data. We with a diagnosis code for Down syndrome (ICD-9 758.0 and
then excluded health systems without adequate lab, ICD-10 Q90.0). Lab-based exclusion was a WBC result of
medication, or procedure data,23 resulting in a subset of . 50,000 within 30 days of initiation of therapy. Patients
7,252 patients with pediatric ALL from 30 health systems, without a WBC available in the 30-day window of initiation
with 188,187 diagnosis-related encounters. of treatment were excluded. This requirement reduced our
initial cohort from 7,252 to 2,652.
Manual review of data from 70 patients from this group was
instructive in identifying data from procedures required by Patients with SR-ALL receive a three-drug induction che-
the COG protocol that are well represented in HF. Lumbar motherapy regimen: vincristine, dexamethasone, and
puncture (LP) procedures or a related CSF lab were found pegaspargase. Patients with HR-ALL receive an additional
in 83% (58/70) of these cases. Current Procedural Ter- chemotherapy agent, daunorubicin, during induction
chemotherapy and cyclophosphamide during consolida-
minology (CPT) and ICD procedure codes were used to
tion. Mesna is often used to provide chemoprotection to
directly or indirectly identify lumbar procedures (Appendix
patients with HR-ALL during consolidation. We queried the
Table A1). The timing for LP recommended by COG pro-
medication table, coded with national drug code values,
tocols did not change during the period covered by HF,
and the procedure table, coded with CPT and Healthcare
2000-2017.
Common Procedure Coding System (HCPCS) values to
Treatment protocols are based on risk category. To include exclude patients with the HR-ALL drugs (Appendix Table
the greatest number of patients, we narrowed our protocol A2). These filters narrowed the cohort to 1,313 patients with
alignment work to likely SR-ALL and excluded patients with SR-ALL from 16 nonaffiliated health systems (Fig 3). Other
additional risk factors (Fig 3). Risk status is not explicitly information potentially indicative of high risk, for example,
documented, requiring us to develop analytical methods molecular markers or cytogenetics, was not available.

242 © 2021 by American Society of Clinical Oncology


EHR Data Pediatric Leukemia

Patients = 7,252
Patients with ALL Encounters = 188,187
Health Systems = 30

Patients = 2,652
Patients with Initial WBC Health Systems = 16

Exclusions:
Patients With Down Syndrome
Initial WBC > 50,000
Age < 1 and ≥ 10 Years
Chemotherapy Agents in Induction or Consolidation:
Daunorubicin, Cyclophosphamide, and Mesna
FIG 3. Likely standard-risk ALL cohort de-
velopment. Iterative inclusions and exclu-
sions to develop a cohort of patients likely to
have standard-risk ALL and likely to have Patients = 1,313
Standard-Risk Patients Procedure and Chemo Encounters = 29,924
been newly diagnosed with ALL. ALL, acute Health Systems = 16
lymphoblastic leukemia; LP, lumbar puncture.

Inclusions:
Chemotherapy Agent Plus LP Plus Bone Marrow Aspiration and/or biopsy
OR
Chemotherapy Agent Plus LP Plus Central Venous Line Insertion
OR
Chemotherapy Agent Plus LP Plus Blast
Occurring Between 0 and 5 Days Before Chemotherapy Event

Patients = 410
Newly Diagnosed Cohort in
Encounters = 2,459
Induction Cycle Health Systems = 14

Initiation of therapy is not provided as a discrete event in documentation in HF of the treatment or procedures
HF. To infer the start of induction chemotherapy (day 1) needed to infer the date on which treatment was initiated.
and to exclude relapsed patients, we used the earliest date The remaining patients with SR-ALL were not analyzed in
of a chemotherapy and a combination of other events. Start this study.
of therapy was defined as first chemotherapy event in the The 410 patients with at least one LP were analyzed using
same period as a lumbar procedure and at least one other an UpSet plot24 to characterize the availability of data for LP
identifier: bone marrow aspiration and/or biopsy, central events at predicted times (Fig 4A). For day 1, we used the
venous line insertion, or blasts on the same day or within date on which the standard of care induction therapy is first
5 days before the first chemotherapy event (Appendix noted and positioned all other dates relative to day 1. For
Table A1). Of the 1,313 patients with SR-ALL, 1,005 pa- clarity, we grouped all days before treatment as diagnostic,
tients had codes related to LP or lab tests involving CSF as a up to and including day 1, days 7, 8, and 9, and 28, 29, 30,
surrogate (126 lumbar only, 342 CSF only, and 540 both). 31, and 32. The vertical lines connect the days and rep-
The medication codes used to identify the earliest date of resent the day sequence relationship. The unique number
chemotherapy included cytarabine, intravenous vincris- of patients in the sequence relationship is shown at the top
tine, and injection or infusion of cancer chemotherapeutic of the bar chart. If data are not available for a time cate-
substance and dexamethasone (Appendix Table A3). gorical, a light gray circle is shown. The number of patients
Using the methods described above, excluding patients with an LP on a given day is in the left side of the bar chart.
missing data for the events associated with day 1 of We noted 206 patients had data from the three COG
treatment, we identified 410 patients with an LP event recommended times for LP, before or on day 1, 8, and 29 of
(32% of SR-ALL cohort) (Fig 3). Based on the available induction chemotherapy (Fig 4A). Fifty eight patients had
data, these patients met the criteria for SR-ALL and had data only for the diagnostic or day 1 milestone.

JCO Clinical Cancer Informatics 243


Wood et al

A 206
200

Intersection Size
150

100
72
58
50 35
14 8 5
0
371 Diagnostic or Day 1
260 Day 7 or 8 or 9
300 Day 28, 29, 30, 31, or 32

300 200 100 0


Set Size

B 249

Intersection Size
200

324
8 100
66
1
85 19
8 1
300 200 100 0 0
Diagnostic or Day 1
Set Size Day 8
Day 15
Day 29

FIG 4. UpSet graph analysis of data availability. Horizontal bars represent the sets being compared; vertical bars represent the intersections. The values
in blue represent the combination of milestone events expected from the protocol. A procedure classified as diagnostic occurred within 7 days before the
inferred date of initial treatment. (A) UpSet graph of LP data availability. Dates are relative to initiation of treatment. Rows represent COG milestone date
ranges. (B) Bone marrow data availability. LP, lumbar puncture.

We performed a similar UpSet graph data availability at organization 1 follow a trajectory that includes all three
analysis for bone marrow procedural codes with the newly recommended lumbar dates. One minor track of patients at
diagnosed cohort. Of the 410 patients, 363 had a bone organization 1 (uppermost track) does not have data for the
marrow procedure. Most of these (324, 89%) had data first date but does for the second but not the third. Another
available from a bone marrow encounter during the di- small group has the first date, not the second but does have
agnostic or day 1 milestone period, whereas only 85 (23%) the third. Organization 3 has major groups following distinct
had a bone marrow encounter on day 29 (Fig 4B). We trajectories.
identified 66 patients who aligned with COG protocol for
To further examine variation between organizations, we
disease evaluation at days 0, 1, and 29. We also noted that
analyzed the granular timing of LP, relative to treatment
19 patients had a bone marrow procedure on day 29. The
initiation, for eight HF Health Systems with at least 40 en-
UpSet graph indicated that the availability and sequence of
bone marrow procedures were not as widely available as LP counters including an LP (Fig 6). Here, we show alignment
procedures. of the LP milestone timing in concordance with standard of
care during induction therapy, highlighting those LP that
One potential explanation for the variation in data avail- would align with LP on days 1, 8, and 29. Because of ex-
ability demonstrated in the UpSet graphs is that some pected minor modifications in timing of therapy due to
contributing organizations consistently lack data from scheduling or patient-related delays, these days were
particular COG recommended dates. To examine this, we grouped as follows: days 0-1, days 7-9, and days 28-32.
created categorical variables representing whether a pa-
tient did or did not have an LP for each of the three rec- DISCUSSION
ommended dates. We then used an Alluvial graph25 in R to Evidence-based protocols guide patient care. Many orga-
investigate the patterns of LP usage for each of the 14 nizations perform internal analyses comparing real-world
organizations with the 410 patients (Fig 5). All 14 organi- practice to protocol recommendations within their institu-
zations had multiple pathways. For example, most patients tion using resources such as an enterprise data warehouse.

244 © 2021 by American Society of Clinical Oncology


EHR Data Pediatric Leukemia

13% 60% 68%


1
2 no LP

3
no LP
no LP
87%
4
5
6
7
8
9
10 32%
LP
11 40%

12
LP LP
13
14
Health System Day 1 Day 8 Day 29

FIG 5. Alluvial graph of patient trajectories at 14 organizations. Data from 14 organizations representing the number of patients in LP cohort with each
milestone procedure—day 0, 1, day 7, 8, 9, and day 28, 29, 30, 31, 32. The width of each band represents the proportional number of patients following
each trajectory. LP, lumbar puncture.

Deeper understanding of alignment between clinical be- of LP data from the three required times. Day 0, 1 pro-
havior and recommended practices will be enabled by cedures were the most widely available. A significant group
cross-organization analyses. For pediatric leukemia, 90% of patients included all three COG required LP but there are
of patients are treated at COG member institutions. We also gaps in the data. This could reflect later phases of care
identified a large group of patients with pediatric ALL using provided at facilities not contributing to HF, variations in
aggregate deidentified EHR data from nonaffiliated orga- coding practices, and other process factors. HF does not
nizations. The outcomes of pediatric ALL correlate with the provide the care setting (ie, infusion center) or medications
risk level. Unfortunately, risk level is not currently docu- that were not ordered from an in-house pharmacy, pre-
mented as a discrete diagnosis code, and the text notes that
venting us from further investigating the gaps.
would clarify risk level are not accessible in a deidentified
data resource. Likewise, the date of initiation of treatment is A key challenge in analyses of EHR data is missing data;
not clearly available. To develop the capabilities needed to this issue is amplified when the data are deidentified and
map real-world machine-readable clinical data to the not traceable to the source. Our use of UpSet and Alluvial
events required by COG treatment protocols for SR-ALL, we graphs to understand data availability for pediatric cancer
developed a reference timeline visualizing representative was helpful in characterizing the complexity of this real-
milestones of care. world data. The Alluvial graph demonstrated that the gaps
Beginning with manual review of a small subset of these in data are not because of contributing organizations
patients with ALL, and then extending that to an auto- consistently failing to follow COG standards for the timing
mated data extraction, we developed informatics methods of LP because some threads through each milestone were
to infer risk classification and the date of treatment ini- visible. For example, organization 3 had a majority of
tiation. We applied stringent criteria to maximize the patients who missed the day 8 milestone, but significant
likelihood of accurately classifying patients. For example, strands traverse the day 8 milestone. Possible explana-
we rejected 3,338 patients whose data did not include a tions for the data gaps could include patients receiving
WBC count within the 30 days before the initiation of care at a separate organization, discrepancies in docu-
treatment. mentation practices between providers or coders, and
Manual data review suggested widespread availability of LP patient mortality. Although missing data are problematic,
data, an event with timing specified by the COG protocol. the use of visualization to place missingness in context is
We also found wide availability of laboratory tests but the helpful to the researcher; likewise, familiarity with the
frequency of those is not as explicitly articulated in the COG nuances of EHR source systems and workflows is par-
protocol. We used UpSet graphs to evaluate the availability ticularly important.

JCO Clinical Cancer Informatics 245


Wood et al

1 19 4 1 2 13 10 2 1 1 1 2 13 7 2 2

13 129 9 1 2 2 15 40 35 22 6 2 1 3 1 1 1 1 1 3 6 12 32 30 24 8

6 4 3 1 1 4 7 4 2 1 1 1 11 3 3 1

1 19 10 1 6 7 9 1 4 5 3 4 3
Health System

3 18 8 1 1 1 1 5 10 4 3 2 1 1 4 11 5 3

1 28 22 4 7 9 9 1 1 1 1 4 3 7 5 1

6 20 1 1 1 24 1 1 1 2 22 4 1

14 75 9 1 3 2 16 42 25 9 9 3 9 3 3 2 1 1 1 1 3 10 28 14 8 11

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Day

FIG 6. Alignment of lumbar puncture (LP) timing. Dates of lumbar punctures, relative to day 0 before treatment, were aligned for all qualifying patients
from eight nonaffiliated health systems. Boxes indicate the COG recommended timing for LPs.

Having identified methods to impute day 1 of care, we diagnosis coding for brain neoplasms have been demon-
developed an alignment method to map all other events to a strated to correlate with workflow, care setting, and
COG-based timeline of care. We demonstrated that LP personnel.27 Resolution of these challenges requires the
performed at eight independent healthcare organizations use of emerging systems to provide monitoring of EHR data
align closely with the required timing of this procedure. This for data quality issues28,29 and the inclusion of data quality
novel approach of aligning time-based events harvested considerations during EHR system implementations.
from fully deidentified (date-shifted) data from nonaffiliated The strict deidentification process used to generate HF
organizations against protocol recommended events can removes text notes that could confirm the risk status and
be extended to other required and ad hoc procedures. This the date of treatment initiation. Likewise, EHR imple-
is analogous to aligning unknown DNA sequences to a mentation variations among contributing organizations af-
reference sequence. Although there was little deviation fect the data. For example, although procedure codes for
from the recommended timing of LP, future work can LP were available from 65 organizations, a time series
evaluate higher risk cohorts and alignment with other analysis focusing exclusively on a lab test might yield more
milestone events. qualifying patients because the EHR laboratory modules
Working with a large-scale aggregate data resource derived are widely used across the Cerner organizations contrib-
from EHR data has a number of inherent challenges based uting to HF.23 By combining several factors (first chemo-
on factors specific to each contributor and to the process of therapy event, presence of blasts, and procedures), we
aggregating and deidentifying the data. EHR systems are were able to raise the likelihood that our initiation of therapy
generally designed and implemented to support clinical phenotype is specific to initiation of SR-ALL therapy and
workflow and documentation, and generating high data unlikely to represent similar sequences of events during
quality for secondary analysis has generally been a limited reinduction therapy for relapsed ALL.
focus. Data quality concerns are well known in EHR- Despite these limitations, the power of using large-scale
derived data.26 For example, variations in the quality of data to understand real-world health care is significant. The

246 © 2021 by American Society of Clinical Oncology


EHR Data Pediatric Leukemia

process of aligning patient experiences to a widely ac- whose care aligns with the guideline? Likewise, the
cepted protocol establishes the basis for future outcomes methods developed for this work have broad utility for
research. For example, do children whose care deviates additional data science research to evaluate the trajectories
from the protocol have different outcomes from those of patients with cancer using EHR data.

AFFILIATIONS AUTHORS’ DISCLOSURES OF POTENTIAL CONFLICTS OF


1
Department of Pediatrics, Children’s Mercy Hospital, Kansas City, MO INTEREST
2
Children’s Mercy Research Institute, Kansas City, MO The following represents disclosure information provided by authors of
3
Department of Pediatrics, University of Missouri, Kansas City, MO this manuscript. All relationships are considered compensated unless
4
Kansas State University, Manhattan, KS otherwise noted. Relationships are self-held unless noted. I = Immediate
5
Department of Biomedical and Health Informatics, University of Family Member, Inst = My Institution. Relationships may not relate to the
Missouri, Kansas City, MO subject matter of this manuscript. For more information about ASCO’s
conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.
CORRESPONDING AUTHOR org/cci/author-center.
Mark A. Hoffman, PhD, Children’s Mercy Hospital, 2401 Gilham Rd, Open Payments is a public database containing information reported by
Kansas City, MO 64108; Twitter: @markhoffmankc; @_sierradavis; companies about payments made to US-licensed physicians (Open
@EarlGlynn; @WoodNikkiM; @dcaragea; e-mail: [email protected]. Payments).

Karen Lewing
SUPPORT Stock and Other Ownership Interests: St Luke’s Surgicenter, Centerpointe
Supported by the Masonic Cancer Alliance, Partners Advisory Board Surgicenter, Independence
Funding.
Janelle Noel-MacDonnell
Research Funding: Merck, Genzyme
AUTHOR CONTRIBUTIONS
Mark A. Hoffman
Conception and design: Nicole M. Wood, Karen Lewing, Janelle Noel-
MacDonnell, Doina Caragea, Mark A. Hoffman Stock and Other Ownership Interests: Various
Collection and assembly of data: Sierra Davis, Karen Lewing, Earl F. Glynn, No other potential conflicts of interest were reported.
Mark A. Hoffman, Nicole M. Wood
Data analysis and interpretation: All authors
Manuscript writing: All authors ACKNOWLEDGMENT
Final approval of manuscript: All authors The authors would like to acknowledge the contributions of Bourke
Accountable for all aspects of the work: All authors Hutchinson.

REFERENCES
1. Hunger SP, Loh ML, Whitlock JA, et al: Children’s Oncology Group’s 2013 blueprint for research: acute lymphoplastic leukemia. Pediatr Blood Cancer
60:957-963, 2013
2. Corrigan JJ, Feig SA: Guidelines for pediatric cancer centers. Pediatrics 113:1833-1835, 2004
3. O’Leary M, Krailo M, Anderson JR, et al: Progress in childhood cancer: 50 years of research collaboration, a report from the Children’s Oncology Group. Semin
Oncol 35:484-493, 2008
4. Smith M, Arthur D, Camitta B, et al: Uniform approach to risk classification and treatment assignment for children with acute lymphoblastic leukemia. J Clin
Oncol 14:18-24, 1996
5. Athale UH, Puligandla M, Stevenson KE, et al: Outcome of children and adolescents with Down syndrome treated on Dana-Farber Cancer Institute Acute
Lymphoblastic Leukemia Consortium Protocols 00-001 and 05-001. Pediatr Blood Cancer 65:e27256, 2018
6. Bassal M, La MK, Whitlock JA, et al: Lymphoblast biology and outcome among children with Down syndrome and ALL treated on CCG-1952. Pediatr Blood
Cancer 44:21-28, 2005
7. Whitlock JA, Sather HN, Gaynon P, et al: Clinical characteristics and outcome of children with Down syndrome and acute lymphoblastic leukemia: A Children’s
Cancer Group study. Blood 106:4043-4049, 2005
8. Nasir SS, Giri S, Nunnery S, et al: Outcome of adolescents and young adults compared with pediatric patients with acute myeloid and promyelocytic leukemia.
Clin Lymphoma Myeloma Leuk 17:126-132.e1, 2017
9. Yu JB, Gross CP, Wilson LD, et al: NCI SEER public-use data: Applications and limitations in oncology research. Oncology (Williston Park) 23:288-295, 2009
10. Desai AV, Kavcic M, Huang YS, et al: Establishing a high-risk neuroblastoma cohort using the Pediatric Health Information System Database. Pediatr Blood
Cancer 61:1129-1131, 2014
11. Winestone LE, Getz KD, Miller TP, et al: The role of acuity of illness at presentation in early mortality in black children with acute myeloid leukemia. Am J Hematol
92:141-148, 2017
12. Fisher BT, Harris T, Torp K, et al: Establishment of an 11-year cohort of 8733 pediatric patients hospitalized at United States free-standing children’s hospitals
with de novo acute lymphoblastic leukemia from health care administrative data. Med Care 52:e1-e6, 2014
13. Adler-Milstein J, Jha AK: HITECH Act drove large gains in hospital electronic health record adoption. Health Aff (Millwood) 36:1416-1422, 2017
14. Pham T, Tran T, Phung D, et al: Predicting healthcare trajectories from medical records: A deep learning approach. J Biomed Inform 69:218-229, 2017
15. Campbell R, Dean B, Nathanson B, et al: Length of stay and hospital costs among high-risk patients with hospital-origin Clostridium difficile-associated diarrhea.
J Med Econ 16:440-448, 2013
16. Campbell RS, Chaudhari P, Hays HD, et al: Outcomes associated with conventional versus lipid-based formulations of amphotericin B in propensity-matched
groups. Clinicoecon Outcomes Res 5:507-517, 2013

JCO Clinical Cancer Informatics 247


Wood et al

17. Goyal A, Spertus JA, Gosch K, et al: Serum potassium levels and mortality in acute myocardial infarction. JAMA 307:157-164, 2012
18. Vogel TR, Kruse RL: Risk factors for readmission after lower extremity procedures for peripheral artery disease. J Vasc Surg 58:90-97.e1-4, 2013
19. Shafiq A, Goyal A, Jones PG, et al: Serum magnesium levels and in-hospital mortality in acute myocardial infarction. J Am Coll Cardiol 69:2771-2772, 2017
20. DeShazo JP, Hoffman MA: A comparison of a multistate inpatient EHR database to the HCUP Nationwide Inpatient Sample. BMC Health Serv Res 15:384,
2015
21. RCoreTeam: R: A Language and Environment for Statistical Computing. Vienna, Austria, R Foundation for Statistical Computing, 2019
22. RStudioTeam: RStudio: Integrated Development for R. Boston, MA, R Studio, 2015
23. Glynn EF, Hoffman MA: Heterogeneity introduced by EHR system implementation in a de-identified data resource from 100 non-affiliated organizations. JAMIA
Open 2:554-561, 2019
24. Lex A, Gehlenborg N, Strobelt H, et al: UpSet: Visualization of intersecting sets. IEEE Trans Vis Comput Graph 20:1983-1992, 2014
25. Rosvall M, Bergstrom CT: Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci U S A 105:1118-1123, 2008
26. Botsis T, Hartvigsen G, Chen F, et al: Secondary use of EHR: Data quality issues and informatics opportunities. Summit Transl Bioinform 2010:1-5, 2010
27. Diaz-Garelli F, Strowd R, Lawson VL, et al: Workflow differences affect data accuracy in oncologic EHRs: A first step toward detangling the diagnosis data Babel.
JCO Clin Cancer Inform 4:529-538, 2020
28. Dziadkowiec O, Callahan T, Ozkaynak M, et al: Using a data quality framework to clean data extracted from the electronic health record: A case study. EGEMS
(Wash DC) 4:1201, 2016
29. Feder SL: Data quality in electronic health records research: Quality domains and assessment methods. West J Nurs Res 40:753-766, 2018

n n n

248 © 2021 by American Society of Clinical Oncology


EHR Data Pediatric Leukemia

APPENDIX

TABLE A1. Codes Used to Infer Treatment Initiation


Code Type Description
Procedures performed at diagnosis for ALL:
Central line placement
36560 CPT4 Insertion of tunneled centrally inserted central venous access device, with
subcutaneous port; age , 5 years
36557 CPT4 Insertion of tunneled centrally inserted central venous catheter, without
subcutaneous port or pump; age , 5 years
36561 CPT4 Insertion of tunneled centrally inserted central venous access device, with
subcutaneous port; age ≥ 5 years
36566 CPT4 Insertion of tunneled centrally inserted central venous access device,
requiring two catheters via two separate venous access sites; with
subcutaneous port(s)
36571 CPT4 Insertion of peripherally inserted central venous access device, with
subcutaneous port; age ≥ 5 years
36563 CPT4 Insertion of tunneled centrally inserted central venous access device with
subcutaneous pump
36555 CPT4 Insertion of nontunneled centrally inserted central venous catheter; age ,
5 years
36569 CPT4 Insertion of PICC, without subcutaneous port or pump; age ≥ 5 years
36565 CPT4 Insertion of tunneled centrally inserted central venous access device,
requiring two catheters via two separate venous access sites; without
subcutaneous port or pump (eg, Tesio-type catheter)
36556 CPT4 Insertion of nontunneled centrally inserted central venous catheter;
age ≥ 5 years
36570 CPT4 Insertion of peripherally inserted central venous access device, with
subcutaneous port; age , 5 years
36568 CPT4 Insertion of PICC, without subcutaneous port or pump; age , 5 years
36558 CPT4 Insertion of tunneled centrally inserted central venous catheter, without
subcutaneous port or pump; age ≥ 5 years
Procedures performed at diagnosis for ALL:
Bone marrow evaluation
38220 CPT4 Bone marrow; aspiration only
38221 CPT4 Bone marrow; biopsy, needle or trocar
41.31 ICD9 Biopsy of bone marrow
Procedures performed at diagnosis for ALL:
LP with IT chemotherapy
3.92 ICD9 Injection of other agent into spinal canal
62270 CPT4 Spinal puncture, lumbar, diagnostic
96450 CPT4 Chemotherapy administration, into CNS (eg, IT), requiring and including
spinal puncture
Procedures performed at diagnosis
for ALL: Blast
26446-5 LOINC Blast NFr Bld
708-8 LOINC Diff blast
709-6 LOINC Diff blast%
(Continued on following page)

JCO Clinical Cancer Informatics 249


Wood et al

TABLE A1. Codes Used to Infer Treatment Initiation (Continued)


Code Type Description
Procedures performed at diagnosis for ALL:
Surrogate markers used for the
identification LP procedure
26517-3 LOINC Diff, CBC: Polys cell WBC CSF
14107-7 LOINC Diff, CBC: Neutrophil seg NFr CSF manual
29584-0 LOINC Diff, CBC: Diff CSF
26447-3 LOINC General test: Blasts NFr CSF
792-2 LOINC General test: RBC CSF manual
26454-9 LOINC General test: RBC CSF
19075-1 LOINC General test, CSF: Total cells counted CSF
21024-5 LOINC General test, CSF: Pathologist review CSF
55794-2 LOINC General test, CSF: Other cells CSF manual
29584-0 LOINC General test, CSF: Cell count plus diff CSF
34563-7 LOINC General test, CSF: Cell count CSF
2352-3 LOINC Glucose test: Glucose CSF/SerPl
2342-4 LOINC Glucose test, CSF: Glucose CSF quant
42209-7 LOINC Cytology test: Cytology, CSF
2880-3 LOINC Protein test, CSF: Protein CSF
26465-5 LOINC Hematology test: WBC count CSF
791-4 LOINC Hematology test: RBC count, CSF

Abbreviations: ALL, acute lymphoblastic leukemia; CPT, Current Procedural Terminology; CSF, cerebrospinal fluid; IT, intrathecal; LP, lumbar puncture;
PICC, peripherally inserted central venous catheter.

TABLE A2. Codes Used to Exclude Likely HR-ALL Patients From the Cohort
Description NDCs HCPCS
Daunorubicin hydrochloride 55390010810, 55390010801, 55390014210, J9150
55390028110,55390080510, 00703523313, 00008415501
Cyclophosphamide 10019095501, 10019095601, 10019095701, 00013560693, J9070
00013561693, 00013563670, 00015050241, 00015050301,
00015050302, 00015050401, 00015050541, 00015050641,
00015053941, 00015054641, 00015054712, 00015054741,
00015054812, 00015054841, 00015054912, 00015054941,
00054038225, 00054413025, 00781324494
Mesna 10019095301, 00015355626, 00015356302, 00015356303, J9209
00015356415, 00015356512, 25021020110, 25021020111,
00338130501, 00338130503, 55390004501, 55390034701,
63323073310, 63323073311, 67108356509 (oral tablet)

Abbreviations: ALL, acute lymphoblastic leukemia; HCPCS, Healthcare Common Procedure Coding System; HR-ALL, high-risk acute lymphoblastic
leukemia; NDC, national drug code.

250 © 2021 by American Society of Clinical Oncology


EHR Data Pediatric Leukemia

TABLE A3. Medication Codes Used to Infer Treatment Initiation


Description NDCs HCPCS
Vincristine 0002719601, 0002719909, 0002719401,0002719501, 00703441211, J9370
61703030906, 00703440211, 61703030916
Cytarabine 00013710678, 55390080610, 63323012020, 61703031922, J9100
67457045450, 00069015501, 55390013301, 00364246854,
00009329501, 61703030346, 00069015202, 00364246753,
55390013401, 67457045220, 61703030538, 55390013110,
55390013210, 61703030436, 00009047301, 00009037301,
55390080801, 00009329601, 55390080710
Dexamethasone 00054417925, 00054418025, 00054418125, 00054418225, J1100
00054418325, 00054418425, 00054418625, 00054817425,
00054817525, 00054817625, 00054817925, 00054818025,
00054818125, 00054818325, 00364039701, 00603319111

Abbreviations: HCPCS, Healthcare Common Procedure Coding System; NDC, national drug code.

JCO Clinical Cancer Informatics 251

You might also like