NCI Cancer Research Data Commons: Cloud-Based Analytic Resources
NCI Cancer Research Data Commons: Cloud-Based Analytic Resources
NCI Cancer Research Data Commons: Cloud-Based Analytic Resources
ABSTRACT
◥
The NCI’s Cloud Resources (CR) are the analytical compo- data analysis where the data resides, without download or
nents of the Cancer Research Data Commons (CRDC) ecosys- storage. In addition, users can upload their own data and tools
tem. This review describes how the three CRs (Broad Institute into their workspaces, allowing researchers to create custom
FireCloud, Institute for Systems Biology Cancer Gateway in the analysis workflows and integrate CRDC-hosted data with their
Cloud, and Seven Bridges Cancer Genomics Cloud) provide own.
access and availability to large, cloud-hosted, multimodal cancer See related articles by Brady et al., p. 1384, Wang et al., p. 1388,
datasets, as well as offer tools and workspaces for performing and Kim et al., p. 1404
Introduction CR also has unique features for use by different types of cancer
researchers (Fig. 1).
Collaboration and agreement on shared standards and formats are
This Review highlights: each of the three NCI CRs (with details
required across the medical and scientific community to collect,
provided in the Supplementary Data), how they compare and com-
organize, and analyze the large amounts of valuable diverse clinical
plement each other, available datasets, tools serving differing research-
and molecular data created on a daily basis. The NCI’s Cancer
er types, their biological success as well as teaching successes, and
Research Data Commons (CRDC) is a cloud-based data science
proposed future direction to continue serving cancer research efforts
infrastructure that provides secure access to a large, comprehensive,
across national and international communities.
and expanding collection of cancer research data. CRDC focuses on
providing high-quality curated cancer data that adheres to Findable, Data availability
Accessible, Interoperable, and Reusable (FAIR) principles. Use of NCI has long invested in making large, consistently collected datasets
FAIR principles enable different parts of the CRDC ecosystem to available, such as The Cancer Genome Atlas (TCGA). The CRDC
combine detailed clinical, molecular (e.g., -omic), and imaging data extends these efforts, by enabling researchers to perform multi-modal
obtained through various technologies where researchers can explore analysis across many data types using the Cloud Resources. CRDC’s
and analyze multimodal cancer datasets, and share results and insights Genomic Data Commons (GDC; ref. 2), Proteomic Data Commons
with the greater scientific community (1). (PDC; ref. 3), Imaging Data Commons (IDC; ref. 4), Integrated Canine
Here, we describe the analytic components of the CRDC, the NCI Data Commons (ICDC), and Cancer Data Service (CDS) all currently
Cloud Resources (CR). Three separate CRs: the Broad Institute connect to the various CRs described in Table 1 (5). Through the three
FireCloud, Institute for Systems Biology Cancer Gateway in the CRs, 9.4PB of cancer data is currently available for analysis.
Cloud (ISB-CGC), and Seven Bridges Cancer Genomics Cloud (SB- Searching through the individual data commons portals, research-
CGC) each provide common features to access and analyze cloud- ers can select and combine data of interest from various datasets for
based CRDC data, as well as user provided data, in workspaces coanalysis. Although combining datasets still remains challenging due
utilizing both common and user provided tools and pipelines. Each to current lack of harmonization, the data commons and CRs provide
houses cloud-scale analysis tools that researchers have leveraged to ways to coanalyze and harmonize depending on the researcher’s needs.
interrogate large data sets to make new discoveries. However, each These data commons include several data modalities including geno-
mics, proteomics, imaging, epigenomics, among others that, using the
1
CRs, can be leveraged for multiomics cancer research. For analysis
General Dynamics Information Technology, Falls Church, Virginia. 2Velsera
within SB-CGC and FireCloud, a user creates a study manifest with
(Seven Bridges), Charlestown, Massachusetts. 3Broad Institute, Cambridge,
Massachusetts. 4Frederick National Laboratory for Cancer Research, Frederick, metadata and file location information to be uploaded for analysis.
Maryland. 5Center for Biomedical Informatics and Information Technology, NCI, ISB-CGC ingests tabular data (Supplementary Table S1) into Google’s
Rockville, Maryland. 6Trans Divisional Research Program, Division of Cancer BigQuery for interactive and scalable analysis as well as allows
Epidemiology and Genetics, NCI, Rockville, Maryland. researchers to analyze their data in a private workspace.
D. Pot, Z. Worman, and A. Baumann contributed equally to this article. The data from CRDC fall into two categories: Open Access and
Corresponding Author: Erin Beck, National Cancer Institute, 9609 Medical
Controlled Access (see Table 1). Open Access data includes aggregated
Center Drive, Rockville, MD 20850. E-mail: [email protected] information such as gene expression levels, as well as information like
disease type, stage, and tissue type. Controlled Access data includes
Cancer Res 2024;84:1396–403
information that could lead to identification of an individual and
doi: 10.1158/0008-5472.CAN-23-2657 requires authorization, in most cases from the NIH Database of
This open access article is distributed under the Creative Commons Attribution- Genotypes and Phenotypes (dbGaP). Data from multiple commons
NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license. can be combined together and coanalyzed within the CRs. In all cases,
2024 The Authors; Published by the American Association for Cancer Research the underlying data files are protected through authorization provided
AACRJournals.org | 1396
NCI CRDC: Cloud-Based Analytic Resources
Figure 1.
The NCI Cloud Resources. Each CR
provides unique features to collective-
ly support users across varying levels
of technical expertise and access to
diverse sets of NCI data. FireCloud and
SB-CGC offer extensive repositories of
prebuilt tools, tutorials, and workflows
in CWL and WDL that provide more
assistance to beginners to the cloud,
while ISB-CGC is designed for the
more advanced user to easily combine
new data with tabulated derived data
to gain new insights. Users can bring
their own data to “Secure Work-
spaces” and combine it with NCI
cloud-hosted “Data” using the analy-
sis “Cloud-Based Tools” readily avail-
able at each CR.
by the CRDC Data Commons Framework (DCF; ref. 5). Below, we tural variants, changes in gene expression and posttranscriptional
highlight some of the data types currently available via the CRDC for modifications, and changes in DNA methylation. Within the
analysis with the NCI Cloud Resources. CRDC researchers can access this molecular data through the GDC
and the Cancer Data Service (CDS), which enable the search and
Genomics, Transcriptomics, and Other discovery of genomic, transcriptomic, and epigenomic sequencing
modalities. In particular, the GDC contains some of the largest and
Molecular Data most comprehensive cancer genomic datasets, including TCGA and
Some examples of molecular alterations, which often underlie The Therapeutically Applicable Research to Generate Effective
cancer development, include mutations, copy-number or struc- Treatments (TARGET) program. GDC’s data release v39.0 included
Table 1. Data availability: summary representation of data available to account holders in the Cloud Resources.
Broad
FireCloud ISB-CGC SB-CGC
Note: The cloud(s) hosting each data node is also provided. Refer to Supplementary Table S3 for a complete list of acronyms and definitions. Of note, the datasets
represent the most commonly requested and used data by cancer researchers.
a
More data is available than the ones highlighted on this table. Please refer to the individual websites for a full list of datasets available.
b
Data portals include both controlled and open-access data. To access controlled data, researchers must obtain the appropriate dbGaP permissions. CRDC provides a
list of key datasets on their website.
44,541 cases spanning 79 projects, and 69 primary tissue sites. The comparison and study. The IDC includes imaging data from several
GDC provides harmonized and standardized molecular, biospeci- projects such as the TCGA, HTAN, and CCDI, with plans to add more
men, and clinical data. The physical location of the GDC data is in the future. The December 2023 data release from IDC included
replicated on both the Amazon Web Services (AWS; used by SB- 142 collections representing more than 511,000 image series from
CGC) and Google Cloud Platform (GCP; used by FireCloud and 65,066 cases in a standardized Digital Imaging and Communications
ISB-CGC) for CR access. Tens of thousands of GDC raw data files in Medicine format (DICOM). IDC data can be accessed directly
and hundreds of higher level files are available in all three CRs for on IDC’s portal and, for TCGA images, via ISB-CGC. CDS also hosts
further analysis. In addition, genomic data from programs includ- raw imaging data files that are non-DICOM format from HTAN.
ing Human Tumor Atlas Network (HTAN) and Childhood Cancer All imaging data available have been deidentified of any patient
Data Initiative (CCDI) are available on the CDS. CDS data is stored information.
in the AWS cloud, can be searched on the CDS Portal, and is
available for analysis on the SB-CGC. Multispecies data
The fourth data commons linked to CRs is the ICDC. The
canine’s accelerated aging process and breed-specific cancer pre-
Proteomics Data disposition provides an interesting backdrop in which to study
The NCI PDC serves as one of the most comprehensive proteomic human disease. As of August 2023, the ICDC provides access to
data repositories currently available. The PDC provides highly curated canine data consisting of genomic and transcriptomic data, as well
and standardized biospecimen, clinical, and proteomic data. Reflecting as clinical and biospecimen metadata from nearly 700 cancer cases
the broad range of proteomic analysis, the PDC houses data represent- representing more than 80 different breeds. Studies include the
ing diverse analytical fractions including global proteome, phospho- PRE-medical Cancer Immunotherapy Network Canine Trials
proteome, glycoproteome, acetylome, lipidome and ubiquitylome (PRECINCT) and the Comparative Oncology Program. All ICDC
derived from multiple experimental technologies. The PDC is cur- data is open access and can be accessed via SB-CGC.
rently hosting 134 studies, encompassing data from 19þ cancer types
and more than 3,000 cases. Both raw and processed PDC data are Supporting multiple data modalities and analyses
openly accessible and available through all three CRs for further The types of data generated in the course of biomedical research are
analysis. The PDC’s cloud-based infrastructure and application pro- diverse and wide ranging. To accommodate situations where data does
gramming interface (API) facilitate interoperability. not fit in the above data commons, and to support researcher’s
compliance with data sharing policies, the NCI developed the CDS.
This solution provides a flexible and responsive approach for research-
Imaging Data ers to quickly and securely share data, without the need to meet the
Imaging data within the CRDC represents a wide range of applica- requirements from the data commons. The CDS includes primarily
tions from clinical and preclinical imaging, radiological images such as molecular characterization, genomic profiling, and imaging data. As of
CT, MRI, PET, digital pathology, and multispectral microscopy. Raw August 2023, numerous datasets from the CCDI (https://www.cancer.
imaging data is processed, annotated, and modeled to support cross gov/research/areas/childhood/childhood-cancer-data-initiative) as well
as HTAN (https://humantumoratlas.org/) are available through ing open access, controlled access, and private data. For FireCloud
https://dataservice.datacommons.cancer.gov/, and are updated (Supplementary Fig. S1) and SB-CGC (Supplementary Fig. S2) work-
frequently. spaces users can invite collaborators to view (read-only permissions)
or participate in their analysis (write/execute permissions). Collabora-
tors must also be given appropriate access by workspace owners to
Specialized Datasets enter a workspace containing controlled data, along with being
ISB-CGC hosts two specialized databases: The Mitelman Database authorized by dbGaP for any controlled data access. Analysts can
of Chromosome Aberrations and Gene Fusions in Cancer (https://mi choose from existing analysis tools and pipelines, as described below,
telmandatabase.isb-cgc.org/) and the TP53 Database (https://tp53.isb- or bring their own analytic tools and queries to their workspace, and
cgc.org/). In addition, ISB-CGC maintains another separately located create their own pipelines. All three CRs have extensive documenta-
database, caNanoLab (https://cananolab.cancer.gov/). The Mitelman tion on creating novel tools, including in writing [ISB-CGC doc
Database is the largest catalog of acquired chromosome aberrations (https://isb-cgc.appspot.com/programmatic_access/); FireCloud doc
available today, presently comprising >70,000 cases across multiple (https://support.terra.bio/hc/en-us/sections/7182576252315-Advanced-
cancer types (6). The TP53 Database is a comprehensive database on workflow-documentation); SB-CGC doc (https://docs.cancergenomic
variations in the tumor protein p53 gene (TP53), one of the most scloud.org/page/bring-your-own-tools-to-the-cancer-genomics-cloud)]
frequently mutated genes in human cancer (7). caNanoLab is a data and videos [Building an App (https://www.youtube.com/watch?v=
sharing portal designed to facilitate information sharing across the x1YS0u1jtPg) and Editing a Workflow (https://www.youtube.com/
international biomedical nanotechnology research community to watch?v=689JGWpjyH4)]. For ISB-CGC the analytic sandbox access
expedite and validate the use of nanotechnology in biomedicine (8). environment is controlled by the researchers through GCP native
tools (Supplementary Fig. S3). Researchers acquire copies of NCI
Interoperating with datasets from other NIH data commons dbGaP controlled data through ISB-CGC and can add their own
Researchers benefit from the breadth of cancer datasets described data, software tools, and collaborators to their own GCP project. For
above but can also gain access, within the CRDC, to many other high all three CRs, with the exception of free cloud credits, users are
impact datasets across NIH. Other NIH Institutes and Centers (IC) charged for their data storage and computation (see below), but
have made similar investments in global standards and IC-specific, CRDC-hosted data that resides outside of the CR workspaces (e.g.,
cloud based data commons over the past decade (e.g., NHGRI, NHLBI, CRDC or NCPI data) is free to access.
NCBI, NIH Common Fund). The NIH Cloud Platform Interopera-
bility (NCPI) program was established to drive key standards and Tools
policy discussions across NIH to ensure researchers can analyze cloud- Depending on the needs and computational skill set of the user,
based datasets from each of the participating NCPI data commons analysis can be carried out using publicly available analytic tools,
without the need to download or move the data. Today this means that, and/or bespoke analysis. In addition to the analytic tools themselves,
within FireCloud and SB-CGC, authorized researchers with the utility tools and cloud-native application support are provided that
appropriate dbGaP credentials are able to connect to other NIH data enable users to both take advantage of command-line and GUI-based
ecosystems [e.g., NHGRI’s AnVIL, NIH Commons Fund’s Gabriella tools for management of data and resources, as well as expand analytic
Miller Kids First, NHLBI’s BioData Catalyst, and NCBI’s Sequence capabilities beyond those provided by the resources through the use of
Read Archive (SRA)] and seamlessly analyze the many datasets tools such as highly scalable cloud-native machine learning. These
within these other NIH data commons alongside CRDC data, as apps and tools are regularly updated and evolve based on user
well as their own. CRDC spans multiple cloud service providers feedback. Different versions of curated tools are available on the cloud
(AWS, GCP), which means this external data can be accessed within platforms, and researchers are able to select the most up to date version
an analysis workspace specific to that cloud service provider without or go back to a previous one as needed. The cloud compute costs for
incurring additional storage or access costs. In addition to allowing these analytic tools vary widely as they range from smaller scale data
access to other NIH data common’s datasets, both the CRDC and visualization to complex and highly parallelized data processing for
NCPI have invested in interoperability and standards. Specifically, calling variants from raw sequencing data. Each CR works closely with
CRDC and NCPI have actively participated in standards including a researcher to provide cost information to develop a budget for their
Global Alliance for Genomics Health (https://www.ga4gh.org/), analyses. Users of the CRs can also upload their own tools to their CR
NIH Researcher Auth Service (https://datascience.nih.gov/re workspaces. A detailed breakdown of analysis tool capabilities is
searcher-auth-service-initiative), and Fast Healthcare Interopera- shown in Table 2.
bility Resources (https://fhir.org/), adopting those standards into Secondary analysis capabilities, often referred to as pipelines or
production interfaces over time, and allowing for more seamless workflows, are provided in all three CRs through workflow languages
integration of data across NIH data ecosystems. such as Common Workflow Language (CWL), NextFlow, and Work-
flow Description Language (WDL). Each of these workflow systems
Cloud analysis workspaces and tools has different benefits and drawbacks and are adopted by different
The NCI Cloud Resources provide secure analytic capabilities for research communities. Popular publicly available pipelines include
open and controlled access datasets within the CRDC. Here we outline analytical support for variant calling (e.g., whole genome DNA-seq),
shared and unique features related to workspaces, tools, analysis RNA sequencing (RNA-seq), machine learning, imaging, genome-
capabilities and performance, credits and billing for the CRs. wide association studies (GWAS), long-read data (copy-number
variations/structural variants), and proteomics. Both platforms pro-
Workspaces vide example analysis packages that can be used as tutorials to show
All three CRs provide user-controlled analytic sandbox environ- users how to use such tools, and documentation about considerations
ments that allow researchers to store and manage their data, tools, and such as cost. In addition to these curated public pipelines in FireCloud
pipelines, and run secure computations on all manner of data includ- and SB-CGC, within all three CRs users are able to write their own
Table 2. Tool availability: summary representation of tools available to account holders in the Cloud Resources.
Note: Tools are broken down by category and status of tool availability within each CR.
pipelines, or bring in additional pipelines through the Dockstore tool includes tools to manage data movement such as gsutil, docker image
repository (https://dockstore.org/). These pipelines make use of the storage and retrieval, and cloud-specific GUI interfaces for billing and
elastic scalability of the cloud to support resources well beyond resource monitoring. In addition, users are able to go beyond the out-
what researcher computers or often institutional High Performance of-the-box capabilities provided by these resources through tools
Computing clusters are capable of providing, thus reducing cost and such as cloud databases, cloud-native machine learning, and automa-
democratizing the use of data by users who are working independently tion. These tools are managed by the cloud providers, have active
or at smaller institutions. communities and documentation, and continue to expand over time,
Tertiary analysis capabilities, often referred to as interactive anal- and many researchers prefer to use them directly, even if not natively
ysis, are provided in FireCloud, ISB-CGC, and SB-CGC through both provided by the CRs.
GUI and command-line tools that support rapid iterations by
researchers to explore secondary data and derived scientific results. Performance, credits, and billing
Many of the commonly used tools within the bioinformatics com- To help researchers estimate their cloud-based computational costs,
munity are provided, including BigQuery, Galaxy, Jupyter notebooks, each CR provides sample cost information. Some common pipelines
RStudio/RShiny, and SAS. Like pipelines, these tools provide the ability within the respective platforms, as well as their time to complete and
to both make use of publicly available analytic methods, as well as write associated costs, include:
customized analyses using languages such as Python, R, and SQL,
including the enormously scalable analytic capabilities provided by * ISB-CGC - performing six billion statistical correlations using
Google’s BigQuery. Community-driven tools and libraries such as BigQuery for $2 in 3 hours
Bioconductor, Numpy, and Pandas are often preinstalled to simplify * FireCloud - whole genome variant alignment and calling pipeline
the development of use- case-specific analyses. As with pipelines, these using 65 GBs of data for $5 in 20 hours
tools can make use of elastic compute within the cloud to scale up * SB-CGC - bulk RNA-seq Transcription Profiling with differential
analyses and provide cost savings to the researcher. expression analysis for $2 in 2 hours
Cloud-native tool support In addition, to encourage cost-free experimentation on the CRs and
This enables researchers to make use of functionality that is specific to lower the barrier to cloud adoption, each CR provides access to free
to a given cloud that goes beyond those provided by the CRs. This credits for new users. After the credits are used the researcher may
continue to utilize the CRs through a billing platform. Additional * Verification of the enrichment of multiple investigational and
details on performance, free credits, and billing for each CR can be hypothetical resistance mechanisms in treated and nontreated
found in the supplementary material. Each CR has staff members patients from a pan-cancer cohort of 1,031 refractory metastatic
available to answer any questions and work with researchers to address tumors. The verification of these mechanisms confirmed their
their individual needs. putative role in treatment resistance (17)
Several organizations outside of the United States have also shown that are used to describe the datasets so that more powerful analyses
interest in the CRDC infrastructure and have requested training can be performed by more easily combining datasets and analyzing
sessions. The ISB-CGC participated in four half-day events educating them. Availability of easily findable, interoperable and computable
researchers at the European Molecular Biology Laboratory (EMBL) data that feeds readily into already existing or newly created Artificial
about the CRDC and CRs. EMBL consists of more than 80 inde- Intelligence and Machine Learning algorithms are key to advancing
pendent research groups with expertise in molecular biology. The the understanding of cancer. The NCI Cloud Resources will continue
ISB-CGC demonstrated how to utilize BigQuery to access data, and to work with the research community to make the CRDC datasets
how to access SQL and R to interact with the data on the cloud more available in order to combine these with new data using novel
platform. Likewise, the SB-CGC participated in the Data Science for analysis techniques for unique insights into cancer.
Health Discovery and Innovation in Africa Initiative (DS-I Africa),
which supports a robust pan-continental network of data scientists and Authors’ Disclosures
technologies to apply advanced data science skills and transform D.A. Pot reports other support from GDIT during the conduct of the study.
health. At this training the attendees performed a bulk RNA-seq Z.F. Worman reports other support from Velsera during the conduct of the
analysis using publicly available data, and ran a machine learning study. B.N. Davis-Dusenbery reports grants and other support from NCI
imaging analysis using Python/Jupyter Labs. All attendees were suc- during the conduct of the study, and employee and equity holder in Velsera.
J. Otridge reports other support from NCI during the conduct of the study.
cessful at running their analysis and several continued using the SB-
J.S. Barnholtz-Sloan reports other support from NIH/NCI during the conduct
CGC for their research. of the study. No disclosures were reported by the other authors.
References
1. Kim E, Davidsen T, Davis-Dusenbery BN, Baumann A, Maggio A, Chen Z, et al. 6. Wang J, Zheng J, Lee EE, Aguilar B, Phan J, Abdilleh K, et al. A cloud-based
NCI cancer research data commons: lessons learned and future state. Cancer Res resource for genome coordinate-based exploration and large-scale analysis of
2024;84:1404–9. chromosome aberrations and gene fusions in cancer. Genes Chromosomes
2. Heath AP, Ferretti V, Agrawal S, An M, Angelakos JC, Arya R, et al. The NCI Cancer 2023;62:441–8.
genomic data commons. Nat Genet 2021;53:257–62. 7. Andrade KCd, Lee EE, Tookmanian EM, Kesserwan CA, Manfredi JJ, Hatton
3. Thangudu RR, Rudnick PA, Holck M, Singhal D, MacCoss MJ, Edwards NJ, JN, et al. The TP53 database: transition from the international agency for
et al. Proteomic data commons: a resource for proteogenomic analysis research on cancer to the US national cancer institute. Cell Death Differ 2022;
[abstract]. In: Proceedings of the Annual Meeting of the American Associ- 29:1071–3.
ation for Cancer Research 2020; 2020 Apr 27–28 and Jun 22–24. Philadelphia 8. Ke W, Crist RM, Clogston JD, Stern ST, Dobrovolskaia MA, Grodzinski P, et al.
(PA): AACR; 2020. Abstract nr LB-242. Trends and patterns in cancer nanotechnology research: a survey of NCI’s
4. Fedorov A, Longabaugh WJR, Pot D, Clunie DA, Pieper S, Aerts HJWL, et al. CaNanoLab and nanotechnology characterization laboratory. Adv Drug Deliv
NCI imaging data commons. Cancer Res 2021;81:4188–93. Rev 2022;191:114591.
5. Wang Z, Davidsen T, Kuffel G, Addepalli K, Bell A, Casas-Silva E, et al. NCI 9. McKerrow W, Wang X, Mendez-Dorantes C, Mita P, Cao S, Grivainis M, et al.
cancer research data commons: resources to share key cancer data. Cancer Res LINE-1 expression in cancer correlates with P53 mutation, copy number alteration,
2024;84:1388–95. and S phase checkpoint. Proc Natl Acad Sci U S A 2022;119:e2115999119.
10. Erwin GS, G€ ursoy G, Al-Abri R, Suriyaprakash A, Dolzhenko E, Zhu K, et al. 15. Ko C, Brody JP. A genetic risk score for glioblastoma multiforme based on copy
Recurrent repeat expansions in human cancer genomes. Nature 2023;613: number variations. Cancer Treat Res Commun 2021;27:100352.
96–102. 16. Toh C, Brody JP. Genetic risk score for ovarian cancer based on chromosomal-
11. Yang A, Shao T-J, Bofill-De Ros X, Lian C, Villanueva P, Dai L, et al. AGO-bound scale length variation. BioData Mining 2021;14:18.
mature MiRNAs are oligouridylated by TUTs and subsequently degraded by 17. Pradat Y, Viot J, Yurchenko AA, Gunbin K, Cerbone L, Deloger M, et al.
DIS3L2. Nat Commun 2020;11:2765. Integrative pan-cancer genomic and transcriptomic analyses of refractory
12. Morton LM, Karyadi DM, Stewart C, Bogdanova TI, Dawson ET, Steinberg MK, metastatic cancer. Cancer Discov 2023;13:1116–43.
et al. Radiation-related genomic profile of papillary thyroid carcinoma after the 18. Pages M, Rotem D, Gydush G, Reed S, Rhoades J, Ha G, et al. Liquid biopsy
chernobyl accident. Science 2021;372:eabg2538. detection of genomic alterations in pediatric brain tumors from cell-free DNA in
13. Gillani R, Camp SY, Han S, Jones JK, Chu H, O’Brien S, et al. Germline peripheral blood, CSF, and urine. Neuro-oncol 2022;24:1352–63.
predisposition to pediatric ewing sarcoma is characterized by inherited 19. O’Grady N, Gibbs DL, Abdilleh K, Asare A, Asare S, Venters S, et al. PRoBE the
pathogenic variants in DNA damage repair genes. Am J Hum Genet 2022; cloud toolkit: finding the best biomarkers of drug response within a breast cancer
109:1026–37. clinical trial. JAMIA Open 2021;4:ooab038.
14. Katzir R, Rudberg N, Yizhak K. Estimating tumor mutational burden from 20. Koc S, Lloyd MW, Grover JW, Xiao N, Seepo S, Subramanian SL, et al. PDXNet
RNA-sequencing without a matched-normal sample. Nat Commun 2022; portal: patient-derived xenograft model, data, workflow and tool discovery.
13:3092. NAR Cancer 2022;4:zcac014.