0% found this document useful (0 votes)
15 views12 pages

CellMinerCDB: NCATS Is A Web-Based Portal Integrating Public Cancer Cell Line Databases For Pharmacogenomic Explorations

Uploaded by

vinodhrajapakse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

CellMinerCDB: NCATS Is A Web-Based Portal Integrating Public Cancer Cell Line Databases For Pharmacogenomic Explorations

Uploaded by

vinodhrajapakse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

CANCER RESEARCH | RESOURCE REPORT

CellMinerCDB: NCATS Is a Web-Based Portal Integrating


Public Cancer Cell Line Databases for Pharmacogenomic
Explorations
William C. Reinhold1, Kelli Wilson2, Fathi Elloumi1, Katie R. Bradwell3, Michele Ceribelli2, Sudhir Varma1,4,
Yanghsin Wang1,5, Damien Duveau2, Nikhil Menon2, Jane Trepel1, Xiaohu Zhang2,
Carleen Klumpp-Thomas2, Samuel Micheal2, Paul Shinn2,†, Augustin Luna6, Craig Thomas2, and
Yves Pommier1

ABSTRACT

Major advances have been made in the field of precision medicine methylation, metabolites, CRISPR, and miscellaneous signatures.
for treating cancer. However, many open questions remain that Curation of cell lines and drug names enables cross-database (CDB)

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


need to be answered to realize the goal of matching every patient analyses. Comparison of the datasets is made possible by the overlap
with cancer to the most efficacious therapy. To facilitate these between cell lines and drugs across databases. Multiple univariate
efforts, we have developed CellMinerCDB: National Center for and multivariate analysis tools are built-in, including linear regres-
Advancing Translational Sciences (NCATS; https://discover.nci. sion and LASSO. Examples have been presented here for the clinical
nih.gov/rsconnect/cellminercdb_ncats/), which makes available topoisomerase I (TOP1) inhibitors topotecan and irinotecan/SN-
activity information for 2,675 drugs and compounds, including 38. This web application provides both substantial new data and
multiple nononcology drugs and 1,866 drugs and compounds significant pharmacogenomic integration, allowing exploration of
unique to the NCATS. CellMinerCDB: NCATS comprises 183 interrelationships.
cancer cell lines, with 72 unique to NCATS, including some from
previously understudied tissues of origin. Multiple forms of data Significance: CellMinerCDB: NCATS provides activity infor-
from different institutes are integrated, including single and com- mation for 2,675 drugs in 183 cancer cell lines and analysis tools to
bination drug activity, DNA copy number, methylation and muta- facilitate pharmacogenomic research and to identify determinants
tion, transcriptome, protein levels, histone acetylation and of response.

Introduction the Cancer Therapeutics Response Portal (CTRP) from the Broad
Institute.
The approach to molecular biology and pharmacology, commonly
The Genomics and Pharmacology Facility (GPF) has pioneered omics
referred to as precision medicine, has been significantly changed over
data acquisition and integration since the mid 1990s (1–9). Its efforts
approximately the last 25 years by the introduction of omics data and
have led to the CellMiner and CellMinerCDB web application (2–7, 9, 10)
the conceptual shift to the use of computer analyses of large datasets
allowing pharmacogenomic database access and integrative analyses
with a combination of statistics, machine learning, omics visualiza-
across all public cancer cell line genomics and drug response
tions, and integration of multiple disparate forms of data. Starting with
databases (2).
the pioneering work of the Developmental Therapeutics Program
NCATS has established an automated compound screening plat-
(DTP) at the NCI (1), many projects have been and are contributing
form for large compound libraries using quantitative high-throughput
sizable blocks of data, prominently including (but not limited to) the
(qHTS) format across multiple different disease models since
large (1,000 cell line) panels of the Cancer Cell Line Encyclopedia
2008 (11–13). For cancer cell line viability screening, NCATS created
(CCLE) from the Broad/Novartis, the Genomics of Drug Sensitivity in
the Mechanism Interrogation PlatEs (MIPE) compound library com-
Cancer (GDSC) from Sanger and Massachusetts General Hospital and
prising approved and investigational chemotherapeutic agents, as well
as common medications for noncancer indications. An additional
1
design feature of the MIPE library is compound mechanistic redun-
Developmental Therapeutics Branch, Center for Cancer Research, National
dancy allowing analyses across multiple compounds reported to hit the
Cancer Institute, NIH, Bethesda, Maryland. 2National Center for Advancing
Translational Sciences, NIH Bethesda, Maryland. 3Palantir Technologies, Denver, same target. Compound screening data using the MIPE library has
Colorado. 4HiThru Analytics LLC, Princeton, New Jersey. 5ICF International Inc., demonstrated value for multiple cancer types, such as diffuse intrinsic
Fairfax, Virginia. 6cBio Center, Dana-Farber Cancer Institute and Department of pontine glioma (DIPG), Hodgkin lymphoma, Ewing sarcoma, small-
Cell Biology, Harvard Medical School, Boston, Massachusetts. cell lung cancer (SCLC), glioblastoma, and others (9, 14–17). Pub-
† lished and unpublished MIPE library compound screening data have
Deceased.
W.C. Reinhold and K. Wilson contributed equally to this article. been aggregated into a unified dataset called the NCATS–NCI Cytoxi-
city Dataset shared internally with the NCI through the Palantir
Corresponding Author: William C. Reinhold, NIH, 9000 Rockville Pike, Building
Foundry platform. A subset of this unified dataset is now being made
37, Room 5041, Bethesda, MD 20892. Phone: 240-760-7339; E-mail:
[email protected] public through CellMinerCDB.
Here we introduce the public databases and web portal of CellMi-
Cancer Res 2023;83:1941–52
nerCDB: NCATS(https://discover.nci.nih.gov/rsconnect/cellminercdb_
doi: 10.1158/0008-5472.CAN-22-2996 ncats/). CellMinerCDB: NCATS enables individual users to access and
2023 American Association for Cancer Research explore the large NCATS drug response database, with an emphasis on

AACRJournals.org | 1941
Reinhold et al.

pharmacology and its relationships to molecular genomics. CellMi- generated using GraphPad PRISM version 7.0. Violin plots were
nerCDB: NCATS is integrated with 33 datasets from multiple projects generated using ggplot version 3.3.5.
from DTP, GPF, CCLE, GDSC, CTRP, the NCI DTP SCLC, NCI60-DTP Bimodal drug activity density distributions were identified using a
Almanac, MD Anderson, and the Project Achilles from the Cancer combination of a Gaussian Mixed Model-based (norm1mix package;
Dependency Map Portal (DepMap; see Supplementary Materials and version 1.3), a kurtosis test and visual inspection. Both these calcula-
Methods for a full listing; refs. 4, 5, 7, 18–28). The omics analyses include tions and the density plots were done using The R Project for Statistical
single and two-drug activities, DNA copy number, methylation, and Computing.
sequencing, whole genome transcriptome, mRNA and selected protein Prediction of NCATS IC50 activity using CCLE microarray tran-
expression, metabolite levels, and clustered regularly interspaced short script expression by both univariate and multivariate analysis used
palindromic repeats (CRISPR) knockouts, allowing explorations of the Pearson correlation between drug response and gene expression of
relationships between those data and pharmacologic responses. Func- the target. The multivariate models use stepwise forward regression.
tionalities of the new CellMinerCDB: NCATS web application are Each model was initiated with a target for a given drug; multiple
introduced and discussed here with multiple examples validating the targets generated multiple models. Possible regression features
database. Details about general functionalities of the CellMinerCDB included genes from Onco500 (34). A maximum of 10 features were
(https://discover.nci.nih.gov/rsconnect/cellminercdb/) platforms have added to each model and then pruned. For each iteration step, the
been reviewed recently (2) and a 10-minute tutorial is on YouTube feature with the lowest partial correlation P value after removing the
(Fig. 1A). effects of already included features was added using rcellminer

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


CellMinerCDB: NCATS is a public web application hosted in the 2.9.1 (35). A 10-fold cross-validated predicted response was calcu-
Genomics and Pharmacology Facility of the Developmental Thera- lated at each step using rcellminerElasticNet 0.1.1. Models were
peutics Branch of the NCI Center for Cancer Research, and of the pruned by examining the statistical difference in the correlation of
NCATS of the NIH. predicted versus observed response with each added feature using
cocor 1.1–3. CCLE microarray expression data from CellMinerCDB
was used (2).
Materials and Methods
The NCATS screening data contained within the CellMinerCDB: Data availability
NCATS web application utilize RSTUDIO-2022.12.0–353 and were The data analyzed in this study were obtained from multiple
generated as previously described (29). Cells were treated with com- sources. Within the application, the source of each data set is accessible
pounds for 48 hours in 1,536 well plates and assessed for viability using within the Metadata tab, both within the “select here to learn more
CellTiter Glo (Promega). Data were normalized to plate controls of about. . .” link and from the “download footnotes” tab. A description of
DMSO-treated cells as 100% viability and no cells at 0% viability. A all data sources used in CellMinerCDB: NCATS is provided in the
four-parameter curve fit was used to generate an IC50 and AUC. Z- Supplementary Materials and Methods.
score AUC (across cell lines) was calculated by subtracting the mean
AUC and dividing by the SD of each drug across all cell lines screened.
All compounds were matched using SMILES and InChIKey to Results
external databases to pull clinical status. NCATS Inxight, DrugBank, The CellMinerCDB: NCATS web application
and CHEMBL were used as references for compound structure The CellMinerCDB: NCATS publicly accessible web application
matching and global clinical status (30–32). Structure matching was was created to both access the NCATS drug response data and enrich
done within the Palantir Foundry platform (Palantir Technologies) and expand its usefulness by integrating multiple other forms and
utilizing RDKit: Open-source cheminformatics (2021-09-4; Q3 2021 sources of genomics, proteomics, and metabolomics data from the
Release); and NCATSFind Resolver. NCATS cell lines were annotated other public cancer cell line datasets using the CellMinerCDB
internally using Cellosaurus for disease and tissue type and matched to platform (2).
the other cell line sets (33). The NCATS web application is an R/shiny A screenshot of the site, banner, and tabs for the CellMinerCDB:
app hosted on an NCI server. NCATS web application is presented in Fig. 1A. CellMinerCDB:
Information sources for the cell lines and drugs include the NCI NCATS allows drug comparisons and emphasizes cross-database
Thesaurus, PubChem and the scientific literature. The large amount of (CDB) analyses with the other public cancer cell line databases. The
data coming from the included omics efforts and the platforms used to univariate analyses tab allows generation of on-the-fly bivariate
develop them has been previously described. Compound and cell line scatter plots and correlation analyses from a single input to compare
name variation across the different institutions cell line sets were all profiles within selected data sets. The multivariate analyses tab
resolved internally. An example is a single-compound with the names allows the exploration of multivariate models predictive of an
122958 (NCI-60), ATRA (GDSC), tretinoin (CTRP), and isotretinoin observed profile. Analyzing selected tissues of origin is an option
(NCATS). Another example is a single-cell line with the names CO: for both univariate and multivariate analyses. The metadata tab
COLO 205 (NCI-60), COLO 205 (CCLE), COLO-205 (GDSC), allows the download of datasets of interest for further processing
COLO205 (MD Anderson). All datasets have instances of missing and archiving. The search IDs tab provides the identifiers within
data for specific cell lines, drugs, or genes. each cell line set by data type. The help tab provides explanations
Univariate analysis and multivariate analysis shown throughout were and descriptions of the various functionalities within the web
done using CellMinerCDB: NCATS functionalities or using data down- application. In addition, the video tutorial tab provides a descrip-
loaded directly from CellMinerCDB: NCATS. The web application tion and explanation of the CellMinerCDB functionalities. Thus,
generated scatterplots, tables, and heatmap shown were generated using CellMinerCDB: NCATS provides new data, multiple functionalities,
the selections described in the input boxes and figure legends. Drug and data integration, allowing users to mine independently the
versus drug activity comparisons not generated by the web application NCATS data without having to seek support from bioinformatics
were done by Pearson correlation using R version 3.6.3. Bar charts were teams.

1942 Cancer Res; 83(12) June 15, 2023 CANCER RESEARCH


CellMinerCDB: NCATS

A
url: https://discover.nci.nih.gov/rsconnect/cellminercdb_ncats/

Univariate analyses Multivariate analyses Metadata Search IDs Help Video tutorial

B MIPE v4.1
1,978
compounds
C
Anti infectives:
MIPE v4.0 2016-2017 MIPE v5.0 dolutegravir,
2,498 Not
1,912 in piperaquine...
compounds compounds Antifungals
2018-present clinical Approved Present
2013−2015 trials in Antimalarials
Unique to Antivirals
other
34% 36% collections NCATS Antibacterials
Metabolic modulators:

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


In saxagliptin, evacetaib
clinical trials 30% 70% Cardiovascular system:
NCATS-NCI tideglusib, citalopram
cytotoxicity 30%
Antibiotics
dataset Diuretics
Muscle-affecting
Nitric oxide donors

Remove cell lines with


introduced genetic mofifications

Remove cell lines with


pretreatment conditions or
nonstandard media additives
D
CNS Tumors
Glioblastoma oncospheres
Remove experiments with Novel Diffuse intrinsic pontine glioma
organoid or cell matrix conditions Overlapping cell lines
with RCC
Birt-Hogg Dube syndrome
existing 38% Hereditary leiomyomatosis
Remove experiments done CellMinerCDB and RCC
after May 2019 datasets RCC
associated with TFE3 gene
61% fusions

CellMinerCDB NCATS dataset Blood tumors


183 cell lines, 2,675 compounds Blastic plasmacytoid dendritic
IC50 and AUC metrics cell neoplasm

Figure 1.
The CellMinerCDB: NCATS web application, NCATS dataset, drugs, and cell lines. A, Url, banner and tabs for the CellMinerCDB: NCATS web application. B, Schematic
of the creation of the NCATS–NCI cytotoxicity dataset. Multiple versions of the MIPE library were combined into a single-dataset to make the “NCATS–NCI”
“cytotoxicity dataset.” This dataset was trimmed down to remove cell lines with introduced genetic modifications, pretreatment conditions, nonstandard media
additives, and data not meeting the sharing embargo date of 18 months. C, Left, pie chart showing the clinical status of the 2,675 CellMinerCDB: NCATS compounds:
36% are FDA-approved, 30% have entered clinical trials, and 34% are experimental. Right, pie chart showing the compounds overlapping between CellMinerCDB:
NCATS and all other datasets included in CellMinerCDB 1.4. Thirty percent (837) of NCATS compounds overlap with at least one of the other CellMinerCDB datasets
and 70% (1, 860) do not. Of those compounds found only in the NCATS datasets, there are multiple noncancer drug types included (see box). D, Pie chart showing the
cell line overlaps between CellMinerCDB: NCATS and all other datasets included in CellMinerCDB 1.4.

The NCATS input data and 908 drugs (34%) that are preclinical (Fig. 1C, left). Notably, 1,877
CellMinerCDB: NCATS comprises 2,675 drugs and compounds (70%) drugs and compounds are unique to NCATS (Fig. 1C, right).
tested in 183 cell lines, of which, 2,667 have mechanism of action They have been annotated with their commonly accepted mechanisms
designations. The dataset was created as described in Materials and of action. A feature of the NCATS dataset is the inclusion of 518
Methods and Fig. 1B. The output is fully compatible and integrated approved nononcology drugs not found in the other public databases
with CellMinerCDB (2). An asset of CellMinerCDB: NCATS (Supplementary Table S1). Those include 103 antiinfectives (antibac-
is the unique compounds and cancer cell lines included (Fig. 1C terial, mycobacterial, viral, or fungal) for systemic use, 86 cardiovas-
and D). cular or nervous system drugs, 72 alimentary tract and metabolism
NCATS contains two drug sensitivity metrics, Z- AUC and IC50 compounds.
values. These boast a large range of screening concentrations, routinely The 183 NCATS cell lines distribution by tissue of origin is detailed
using 11 concentrations between 0.79 nanomolar and 47 micromolar, in Supplementary Table S2. They include 72 (38%) unique cancer cell
which is an asset of NCATS drug testing (12). The drugs include 952 lines absent in other public cancer cell line databases (Fig. 1D;
(36%) clinically approved, 790 (30%) that have entered clinical trials, Supplementary Table S3). Figure 1D shows several of the rare disease

AACRJournals.org Cancer Res; 83(12) June 15, 2023 1943


Reinhold et al.

subtypes including DIPG, renal Birt-Hogg- Dube syndrome, hered- allows their comparison for identical, related by mechanism of action,
itary leiomyomatosis, and TFE3 fusion cancer cell lines. Thus, Cell- or disparate drugs.
MinerCDB: NCATS provides the user with substantial new drug and
cell line data. Omics data available for cross-comparisons in CellMinerCDB:
NCATS
Cell line and drug overlaps of NCATS with other cancer cell line Figure 2C summarizes by cell line set and measurement type the
datasets profiles available in CellMinerCDB: NCATS, including 31,617 drug
The cell lines overlaps for CellMinerCDB: NCATS as well as all (and compound) activities, 261,848 molecular measurements and
other cell line sets are listed in Fig. 2A. As in our other CellMinerCDB 18,119 miscellaneous signatures. All 28 included datasets are available
websites (https://discover.nci.nih.gov/), cell lines are matched with for download from the Metadata tab (Fig. 1A). Our curation and
common tissue of origin terms based on the OncoTree ontology levels standardization of these datasets minimizes the task of name matching.
developed by the Memorial Sloan Kettering Cancer Center (New York, The data types available for exploration based on the databases with
NY) and Dana-Farber Cancer Institute (Boston, MA), primarily overlapping cell lines include single-drug activities, two-drug combi-
version 1.1 as described previously (2). Additional information such nation activities, gene copy number, methylation and mutation levels,
as patient gender or age from which the cell line originated are also transcript expression, protein expression, metabolite levels, the Dep-
included. Comparison between drug responses in cell lines is made Map Achilles (Achilles) CRISPR genetic dependencies, and miscella-
possible by the overlap of cell lines across databases (Fig. 2A). neous molecular signatures. Those miscellaneous phenotypic signa-

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


The drug and compound activity overlap between the multiple cell tures include the antigen presenting machinery (APM), epithelial–
line sets is presented in Fig. 2B. Information on each cell line set mesenchymal transition (EMT) status, replication stress (RepStress),
activity measurements are accessible in the “data type” input box, genomic instability (HRD_LOH, HRD-SUM, NtAI, LST) and neuro-
Metadata “units” description or footnotes, or the provided urls. An endocrine status (NE). The metadata phenotypic signatures are acces-
asset for the user is that CellMinerCDB: NCATS automatically sible in the univariate analyses\data type\mda: miscellaneous pheno-
matches cell line and drug data across any cell line sets queried, which typic data. The number of data explorations one might pursue,

A Cell line overlap between NCATS and other cell line sets B Drug overlap between NCATS
and other cell line sets
Cell line sets NCATS CCLE CTRP GDSC MD Anderson Achilles PRISM NCI SCLC DTP Almanac
NCATS 183 102 90 81 59 56 30 12 8 8 NCATS PRISM DTP NCI SCLC GDSC CTRP Almanac CCLE
CCLE 1,089 823 687 389 580 480 42 52 52 NCATS 2,675 795 666 400 198 165 94 22
CTRP 823 595 327 497 441 33 40 40 PRISM 1,413 380 286 134 134 77 22
GDSC 1,080 364 424 360 55 55 55 DTP 24,360 327 143 128 89 21
MD Anderson 651 245 198 11 55 55 NCI SCLC 526 115 128 100 21
Project Achilles 769 343 17 31 31 GDSC 297 77 46 16
PRISM 480 9 44 44 CTRP 481 46 14
NCI SCLC 77 1 1 Almanac 104 9
DTP 60 60 CCLE 24
NCI Almanac 60

C Additional data types added to NCATS


RNA (gene level)
Drug activities DNA (gene level) Microarray Microarray Proteins
Cell line set Single Combo Copy number Methylation Mutation (z score) (log2) RNAseq miRNA RPPA MS H3K27ac H3K4me3 Metabolites CRISPR Signatures
NCI-60 24,047 23,232 17,553 9,307 25,040 25,040 23,826 417 94 3,167 22,073 19,625 71
CCLE 24 23,316 19,880 1,667 19,851 52,604 734 167 225 6
GDSC 297 24,502 19,846 18,099 19,562 7
CTRP 481
NCI SCLC 526 25,568 25,568 17,804 800
PRISM 1,413
NCI Almanac 5,355
MD Anderson 364
Project Achilles 18,119

Figure 2.
Cell line and drug overlap, and data types in CellMinerCDB-NCATS. A, Cell lines overlap between NCATS and the nine other cell line datasets. Project Achilles is from
the DepMap; PRISM from Broad-MIT; NCI Almanac is the NCI60-DTP Almanac. B, Drug overlap between NCATS and the seven other cell line datasets. Number of
drugs is as based on the comparison of NCATS AUC overlap and the seven other cell line sets. The MD Anderson and DepMap Achilles cell line datasets are not
included as they have no drug activities. The NCI Almanac has two-drug activities measurements. The drugs with data for inhibitory concentration 50% (IC50) are
slightly less in number. For acronym definitions see A. C, Available data in CellMinerCDB: NCATS. For the drug activities columns, the “single” numbers are compounds
or drugs. The “combo” drugs are two-drug combinations for 105 FDA-approved drugs. For the DNA, RNA, and CRISPR columns, the numbers are genes with
information for that cell line set. For the “protein” columns, the numbers are epitopes for the reverse phase protein arrays (RPPA) and protein fragments for the mass
spectrometry (MS). For the “metabolite” column, the numbers are metabolites. For the “signatures” column, the number is signatures of various types. CTRP DNA
copy number and mutation, microarray log2, and signatures data are identical to that in CCLE, and so are not included here.

1944 Cancer Res; 83(12) June 15, 2023 CANCER RESEARCH


CellMinerCDB: NCATS

depending on one’s interest, easily jumps into the billions. The NCATS significant correlations between drug activities and DNA copy num-
drug data can be compared to genomics data for the same cell lines in bers; all linked through having the same gene both as drug target and
other datasets allowing one to relate the drug responses to omics molecular measurement. All have significant correlations between
features using CellMinerCDB: NCATS. The following examples illus- gene DNA copy number and transcript levels.
trate the basic use of CellMinerCDB: NCATS. Figure 4E and F exemplify the possibility of testing NCATS drug
activity versus genetic inactivation of the drug target. Figure 4E
Drug comparisons compares the growth inhibitory activity of vemurafenib (a BRAF
The overlaps between cell lines and drugs across the “cell line sets” inhibitor) to cell survival with BRAF CRISPR knockdown (as mea-
facilitate multiple forms of drug comparisons. Figure 3A shows a sured by Project Achilles). The resultant scatter plot demonstrates a
univariate analyses/plot data output for two structurally related TOP1 significant correlation between the two. Figure 4F lists other examples
inhibitors commonly used in clinical oncology (36), topotecan (x-axis) showing significant correlations between drug activities and CRISPR
versus SN-38 (y-axis), the active metabolite of irinotecan. Both are knockdown; in each case linked through having the same gene both as
measured by NCATS and displayed using CellMinerCDB-NCATS. the drug and CRISPR target. As for the drugs in Figs. 3 and 4 provides
The highly significant correlation between the two drugs (P ¼ only a small sampling of the types of informative comparisons one
9.11052) demonstrates internal assay consistency. might do.
Similarly, Fig. 3B shows a univariate analyses/compare patterns To compare the predictive value of different genomics parameters,
comparing the NCATS anaplastic lymphoma kinase (ALK) inhibitor the NCATS approved and clinical trial drugs IC50 activities were each

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


TAE-684 to other NCATS ALK inhibitors (by entering ALK inhibitor compared with the different genomics evaluations of their gene targets
in the output MOA column). Of the 12 ALK inhibitors in the NCATS (transcript expression, gene copy number, methylation, mutations,
database, 10 show significant correlations demonstrating assay and and CRISPR) across the other nine platforms in CellMinerCDB:
mechanism of action reproducibility across cell lines within the NCATS, resulting in 1,100 drug versus gene pairings (Supplementary
NCATS drug response database. Table S4). The percent significant correlations by platform were: (i)
Comparison of NCATS with GDSC and CTRP drug activities 5.3% for the CCLE DNA copy number, (ii) 8.8% for the GDSC
in Fig. 3C and D, respectively, shows the top 15 correlated compounds methylation, (iii) 6.5% for the CCLE mutation, (iv) 5.1% for GDSC
for each. Four protein kinase inhibitors are common between these two mutation, (v) 11.8% for the CCLE transcript microarray, (vi) 10.8% for
(shown as red bars): linifanib, sorafenib, AZD-7762, and the GDSC microarray, (vii) 12.8% for the CCLE RNA sequencing
tivozanib. Figure 3E is a univariate analyses/plot data analysis of one (RNA-seq), (viii) 9.2% for the CCLE protein, and (ix) 6.2% for the
of these comparisons, AZD-7762 as measured by both NCATS Achilles CRISPR. These results demonstrate the value of RNA-seq and
(x-axis) and CTRP (y-axis), yielding a P value of 1.11010. These proteomic analyses for predicting drug activity.
observations demonstrate ways of comparing drug activities across Although determination of protein levels remains limited in clinical
databases to determine consistency across common cell line sets. samples, we found that both protein expression and gene-expression of
Compared globally, the average Pearson correlation for NCATS the proapoptotic factor BAX in CCLE are significantly correlated with
versus either GDSC or CTRP across all compounds using Z-AUC or the IC50 activity of SN-38 in NCATS (Supplementary Fig. S1; P ¼
IC50 is 0.4. Violin plots (Fig. 3F) visualize significant correlations 0.0013 and 0.0026 for 88 and 95 common cell lines, respectively). Thus,
between NCATS and 102/265 compounds (38.4%) for CTRP and 71/ on the basis of the analysis of drugs tested in NCATS, we conclude that
212 compounds (33.5%) for GDSC. The NCATS versus the PRISM RNA-seq is currently the most practical predictor of drug response.
drug data are not included in this analysis as none had a minimum 16
cell lines with overlap. The Fig. 3 examples are only a small sampling of Multivariate and miscellaneous phenotypic signature (mda)
the types of informative comparisons one might do. analyses using CellMinerCDB: NCATS
Presuming that multiple factors are involved in drug response (36),
Exploration of NCATS drug responses with omics or CRISPR we present two approaches for clinical TOP1 inhibitors (topotecan and
data SN-38, the active metabolite of irinotecan) using CellMinerCDB:
The integration of the NCATS drug responses with a wide range of NCATS.
molecular, phenotypic, and signature data from the other omics data- The first utilizes the prior knowledge that the cytotoxicity of TOP1
bases (CCLE, GDSC, and NCI) allows correlation queries for over- inhibitors are dependent on SLFN11, apoptosis and transcription (36).
lapping cell lines. We next present a small group of these as illustrations Combining transcript expression of SLFN11 (Fig. 5A), BPTF (Fig. 5B)
with outputs and screenshots from CellMinerCDB: NCATS. and high mobility group nucleosome-binding domain-containing
Figure 4A validates SN-38 activity (in NCATS) versus SLFN11 gene protein 1(HMGN1; Fig. 5C) shows how the predictive value of
transcript expression (in GDSC) using CellMinerCDB: NCATS univar- SLFN11 can be strengthened by using the multivariate analysis tool
iate analyses/plot data. The scatter plot confirms the expected significant of NCATS:CDB (Fig. 5D and 5E).
correlation between these causally linked parameters (36). Figure 4B The second multivariate analysis available in NCATS:CDB (and the
presents additional examples between NCATS and GDSC; all showing other CellMinerCDB websites) uses previously described multigene
significant correlation between a drug’s activity and the transcript levels expression signatures, which can be retrieved using the “mda” tab in
of that drug target. the “data type” pull-down menu at the left of the website (Fig. 6).
A second form of omics data comparison is given in Fig. 4C, Together, these examples demonstrate the increased power of aggre-
comparing activity of the mTOR inhibitor VS-5584 from NCATS and gating multiple genomic parameters to predict drug activity.
MTOR DNA copy number from CCLE demonstrating significant
correlation. CellMinerCDB: NCATS also shows that mTOR DNA Drug activity distributions and additional multivariate analysis
copy number is significantly correlated to its transcript level (r ¼ 0.49, Figure 7 presents another form of exploration generated from the
P ¼ 1.6E–61), providing the logical link between the drug activity and NCATS drug database: drug activity distributions with consideration
DNA copy number. Figure 4D provides additional examples of of tissues of origin. Bimodal drug distributions were identified,

AACRJournals.org Cancer Res; 83(12) June 15, 2023 1945


Reinhold et al.

A Univariate analyses/plot data


B Univariate analyses/compare patterns
SN-38 vs. Topotecan act., NCATS, n = 174 Pattern comparison results are computed with respect to that data
x-axis cell line set Pearson cor. (r) = 0.86, P = 9.1e-52 defined and shared by both the x and y-axis inputs.
NCATS CT IC50
Biliary-tract x-axis cell line set Select molecular or activity data Drug data
x-axis data type 3 Bladder-urinary-tract
NCATS auc Compare x-axis input to x-axis molecular or activity data
act: Drug activity Blood
Bone x-axis data type Compare x-axis input to x-axis molecular or activity data
-log10 Bowel
Identifier Brain-CNS act: Drug activity Drug ID MOA Correlation P
Topotecan Breast
2 Head-neck asp-3026 Alk inhibitor 0.995 3.4e-169
Identifier
SN38 (act, NCATS)
Kidney alectinib Alk inhibitor 0.953 4.2e-87
y-axis cell line set Lung TAE-684 LDK-378 Alk inhibitor 0.989 7.1e-47
Lymph
NCATS CT IC50 Ovary Ensartinib Trk,ros1,Alk inhibitor 0.945 6.4e-29
y-axis data type Pancreas y-axis cell line set Ap-26113 Alk inhibitor 0.913 1.7e-23
act: Drug activity 1 Prostate NCATS ic50
Skin MPS1-IN-1 Alk inhibitor 0.845 1.6e-17
−log10 Soft-tissue Ensartinib Alk inhibitor 0.717 9.9e-9
Identifier Select tissues Ldn-214117 Alk inhibitor 0.674 1.5e-7
SN-38 To include tpx-0005 Alk inhibitor 0.418 0.0031
0 To exclude Cep28122 Alk inhibitor 0.392 0.0058
Select tissues/s of origin Select tissues/s of origin Belizatinib Alk inhibitor 0.212 0.149
All All Pf-06439015 Alk inhibitor 0.160 0.277
Select tissues to color
−1
−1 0 1 2 3
Topotecan (act, NCATs)

C D

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


NCATS vs. GDSC
NCATS vs. CTRP
Linifanib (FLT3)
Sorafenib (KDR) MLN-2480 (BRAF)
Cediranib (KDR)
LMP-744 (TOP1)
Masitinib (KIT)
Sunitinib (FLT1)
Linifanib (FLT3)

Drug name (target)


Drug name (target)

Dabrafenib (BRAF) KI-8751(KDR)


Dasatinib (ABL1) Glesatinib (KDR)
HG-6-64-1 (BRAF) Tivozanib (FLT1)
Trametinib (MAPK2K1) SN-38 (TOP1)
Navitoclax (BCL2) ABT-737 (BCL2)
Lestaurtinib (JAK1) Venetoclax (BCL2)
AZD-7762 (CHEK1) Foretinib (KDR)
Tivozanib (FLT1) Sorafenib (KDR)
Ponatinib (FGFR1) AZD-7762 (CHEK1)
MK-2206 (AKT1) Sepantronium bromide (BIRC5)
Mirdametinib (MAP2K1) Onalespib (HSP90)

0 0.4 0.8 0 0.4 0.8

Pearson correlation Pearson correlation

E Univariate analyses/plot data F


NCATS vs. CTRP AZD-7762 activity, n = 85
x-axis cell line set Pearson cor. (r) = 0.63, P = 1.1e-10
NCATS CT IC50
x-axis data type 30 Biliary-tract
Bladder-urinary-tract Correlation of significant IC50s across
act: Drug activity IC50 Blood
−log10 Bone screening datasets
Bowel
Identifier Brain-CNS
AZD-7762 (act, CTRP)

AZD-7762 Breast
25 Head-neck
NCATS vs. CTRP
Lung
y-axis cell line set Lymph 90 cell lines
Ovary
CTRP Pancreas 102/265 compounds
y-axis data type Prostate
act: Drug activity (AUC) Skin

−log10 20 Soft-tissue
NCATS vs. GDSC
Identifier 80 cell lines
AZD-7662
71/212 compounds
Select tissue/s of origin
All 15
Select tissues to color 0.3 0.6 0.9
Pearson correlation
−1 0 1 2 3
AZD-7762 (act, NCATS)

Figure 3.
Comparisons of drugs in CellMinerCDB: NCATS. A, Scatter plot of the activities of topotecan (x-axis) versus SN38 (y-axis), both measured by NCATS. The plot is a
screenshot from CellMinerCDB-NCATS (Fig. 1A, univariate analyses). B, Comparison of the ALK inhibitor TAE-684 with the other ALK inhibitors tested by NCATS. The
results were generated using CellMinerCDB-NCATS (univariate analyses/compare patterns tab selections) including a filter to output only “ALK inhibitor” in the
mechanism of action (MOA) column and ordered by P value. C, Bar graph showing the top 15 compounds with the highest positive correlation for IC50 value
comparisons between NCATS and GDSC. Red bars highlight the compounds highly correlated between NCATS and CTRP (D): linifanib, sorafenib, AZD-7762, and
tivozanib. The primary target of each compound is shown in parenthesis. D, Bar graph showing the top 15 compounds with the highest positive correlation for IC50
values between NCATS and CTRP. Red bars highlight the compounds highly correlated between NCATS and GDSC (C). The primary target of each compound is
shown in parenthesis. E, A scatter plot of AZD-7762 activity as measured by NCATS (x-axis) versus CTRP (y-axis). The plot is a screenshot generated using the
univariate analyses/plot data tab selections. For the scatter plots A, B, and E, individual dots are cell lines with color coding by tissue of origin. F, Violin plot showing all
compounds with IC50 with positive correlations and with P < 0.05 either between NCATS and CTRP or NCATS and GDSC. All compounds shown had a minimum of 16
cell lines overlap between datasets. The box plot overlay shows a median correlation of 0.4. All correlations presented are Pearson.

1946 Cancer Res; 83(12) June 15, 2023 CANCER RESEARCH


CellMinerCDB: NCATS

A Univariate analyses/plot data


SN-38 (act. NCATS) vs. SLFN11 (exp. GDSC)
B
n = 79, Pearson cor. (r) = 0.55, P value = 8.3e-5
x-axis cell line set
GDSC-MGH-Sanger Bladder-urinary-tract
NCATS drug activities vs. drug target expression correlations
x-axis data type Bone Drug Activity vs. transcript expression
exp: mRNA Expression Bowel
Brain-CNS Name Mechanism of action Gene Correlation P value
(log2) Breast
Identifier Head-neck Elacridar ABCB1 inhibitor ABCB1 0.48 0.024
SLFN11 2 Lung
SN38 (act, NCATS) Ovary Bosutinib ABL1 inhibitor ABL1 0.30 0.015
Pancreas
y-axis cell line set Prostate asp-3026 ALK inhibitor ALK 0.34 0.004
Skin
NCATS CT IC50 Soft-tissue at-9283 AURKA inhibitor AURKA -0.26 0.022
y-axis data type
act: Drug activity
Venetoclax BCL2 inhibitor BCL2 0.46 1.1e-4
1
−log10 Voruciclib CDK inhibitor CDK4 -0.53 0.016
Identifier Neratinib EGFR inhibitor EGFR 0.34 0.004
SN-38
Select tissues/s of origin Sunitinib malate FLT3 inhibitor FLT3 0.53 6.3e-4
To exclude Adefovir dipivoxil POL inhibitor POLH 0.29 0.018
leukemias,lymphomas,other 0
Select tissues to color Irinotecan TOP1 inhibitor TOP1 0.23 0.048

4 6 8
SLFN11 (exp, GDSC)

C Univariate analyses/plot data D

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


VS-5584 (act. NCATS) vs. MTOR (cop. CCLE)
n = 30, Pearson cor. (r) = -0.58, P value = 8.3e-4
x-axis cell line set NCATS drug activities vs. DNA copy-number correlations
CCLE-Broad-MIT Blood Drug Activity vs. DNA copy number
x-axis data type Brain-CNS
Name Mechanism of action Gene Correlation P value
Lung
cop: DNA copy number Lymph
Ovary Alectinib ALK inhibitor ALK -0.31 0.009
Soft-tissue
VS-5584 (act, NCATS)

Identifier 1 Finasteride AR Antagonist AR -0.35 0.015


MTOR
Venetoclax BCL2 inhibitor BCL2 0.28 0.023
y-axis cell line set Vorucuclib Cdk 1/2/4/9 inhibitor CDK2 -0.54 0.008
NCATS CT IC50 Gefitinib EGFR inhibitor 0.041
EGFR 0.25
y-axis data type
act: Drug activity Quizartinib FLT3 inhibitor FLT3 0.39 0.008
−log10 0
As-703026 MEK1/2 inhibitor MAP2K7 -0.28 0.037
Identifier
VS-5584 AZD-8055 MTOR inhibitor MTOR -0.24 0.031
Entecavir DNA POL inhibitor POLG -0.35 0.046
Select tissues/s of origin
Daunorubicin TOP2 inhibitor TOP2A -0.24 0.016
Select tissues to color −1
1N 2N 4N
MTOR (cop, CCLE)

E Univariate analyses/plot data F


Vemurafenib (act. NCATS) vs. BRAF (cri, Achilles).
x-axis cell line set n = 34, r = -0.60, P value = 1.5e-4 NCATS IC50 drug activities vs. drug target CRISPR correlations
Achilles project 0.5 Drug Activity vs. cell survival
x-axis data type Bladder-urinary
Name Mechanism of action Gene Correlation P value
Blood
cri: Crispr knockout
screen
Bone
Bowel
azd-5363 Pkb/akt inhibitor AKT2 -0.39 0.013
Vemurafenib (act, NCAT SIC50)

Identifier 0
Brain-CNS
Breast
Venetoclax BCL2 inhibitor BCL2 -0.64 2.3e-5
BRAF Head-neck
Lung
sch-900776 CDK1 inhibitor CDK1 -0.30 0.040
y-axis cell line set Lymph
Peripheral-nervous
Panobinostat HDAC inhibitor HDAC8 -0.40 0.033
NCATS CT IC50 −0.5 Skin
bms-754807 IGF1R inhibitor IGF1R -0.52 3.4e-4
y-axis data type Soft-tissue

act: Drug activity As-703026 MEK 1/2 inhibitor MAP2K1 -0.43 0.014
−log10 Milademetan MDM2 inhibitor MDM2 -0.89 2.4e-4
Identifier
−1 Buparlisib PIK3CA inhibitor PIK3CA -0.36 0.008
Vemurafenib
Entecavir POL inhibitor POLE -0.54 0.012
Select tissues/s of origin
Idarubicin TOP2A inhibitor TOP2 -0.37 0.005
−1.5
Select tissues to color

−1 −0.5 0
BRAF (cri, Achilles)

Figure 4.
NCATS: CDB univariate comparisons of drug activities to transcript, DNA copy number, and CRISPR signatures. A, Scatter plot of SLFN11 transcript expression from
GDSC (x-axis) versus SN-38 activity measured by NCATS (y-axis). The plot is a snapshot from CellMinerCDB-NCATS (univariate analyses). B, Additional examples of
significantly correlated and biologically linked NCATS IC50 drug activities versus GDSC transcript expression levels. All gene examples are targets for the
corresponding drugs. C, Scatter plot of MTOR DNA copy number as measured by CCLE (x-axis) versus -5584 activity as measured by NCATS (y-axis). The plot is a
screenshot from CellMinerCDB-NCATS (univariate analyses/plot data tab selections), with the specific inputs used detailed in the boxes to the left. The vertical line
was added at 0 intensity or 2N DNA copy number. The units for the x-axis were converted from intensity to ploidy (copy number ¼ 22intensity) for biological clarity. D,
Additional examples of significantly correlated and biologically linked NCATS IC50 drug activities versus CCLE DNA copy number from plots generated as in C. Genes
are targets of the corresponding drugs. E, Scatter plot of BRAF CRISPR knockdown cell survival from the Achilles Project (x-axis) versus vemurafenib activity as
measured by NCATS (y-axis). The plot is a screenshot from CellMinerCDB (univariate analyses/plot data tab), with the specific inputs used detailed in the input boxes
to the left. The vertical line was added at 0 to indicate that the cell lines to the left of line have decreased survival following knocking down BRAF. F, Additional
examples of significant correlations of drug activities versus CRISPR knockdown of the target genes. The CRISPR knockdown cell survival data are from the Achilles
Project. All correlations presented in the figure are Pearson. For all scatter plots, dots are cell lines with color coding by tissue of origin indicated to the right.

AACRJournals.org Cancer Res; 83(12) June 15, 2023 1947


Reinhold et al.

A D
R = 0.43, P = 1.6e-5
SLFN11 (RNA-seq, CCLE)

SN-38 (act., NCATS)


1.00
xsqSLFN11_ccle 0.75
0.50
xsqBPTF_ccle 0.25
0.00
xsqHMGN1_ccle
xsqBAX_ccle

Individual cell lines

0 1 2 3
E Observed versus 10x cross-validation for SN-38 act. (NCATS IC50),
SN-38 act. (NCATS IC50) r = 0.6, P 1.4e-10
B R = 0.15, P = 0.15 3 Biliary-tract
BPTF (RNA-seq, CCLE)

Bladder-urinary-tract
Blood
Bone
Bowel

Observed SN-38 act. (NCATS IC50)


Brain-CNS
Breast
Head-neck

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


2
Lung
Lymph
Ovary
Pancreas
0 1 2 3 Prostate
SN-38 act. (NCATS IC50) Skin
C 1 Soft-tissue
R = 0.48, P = 6.5e-7
HMGN1 (RNA-seq, CCLE)

0.5 1 1.5 2 2.5

0 1 2 3 10-fold cross-validation SN-38 act. (NCATS IC50)


SN-38 act. (NCATS IC50)

Figure 5.
Multivariate analysis of SN-38 activity in NCATS using the expression of SLFN11, BPTF, HMGN1, and BAX in the overlapping cell lines in CCLE is a better predictor of SN-
38 activity than any of the four genes taken individually. A, Predictive value of SLFN11 expression. B, Predictive value of BPTF (encoding a protein regulating
chromatin remodeling as a regulator of ATP hydrolysis of the NURF complex). C, Predictive value of HMGN1 (encoding HMGN1) associated with active transcription.
D, Cluster image map of the multivariate analysis of SN-38 activity predicted by the expression of four genes together. See Supplementary Fig. S1 for BAX univariate
data. E, Scatter plot of the observed versus 10-fold cross-validation for SN-38 using the same predictor genes as in D.

demonstrating both sensitive and resistant cancer cell line responses.


Enrichment for specific tissues of origin in the activity peaks demon-
Discussion
strates novel prospective therapeutic indications. Multivariate analyses Making the NCATS drug activities publicly available is a significant
using CCLE transcriptomics visualize multivariate molecular predic- addition to the omics arena. CellMinerCDB: NCATS gathers the
tors. The first example given is for filanesib, with its bimodal activity NCATS drug response database and integrates it with nine other
distribution visualized in Fig. 7A and the significant prediction of that genomic and proteomic projects (see Fig. 1). The NCATS 2,675 drugs
activity by KIF11, MYBBP1A, and TNFRSF10D (P ¼ 1.2107) and compounds is second only to the large NCI/DTP activity screening
in Fig. 7B. The second example given is for epothilone, with its in number (Figs. 1 and 2; ref. 2). Its high proportion of novel drugs, large
bimodal activity distribution visualized in Fig. 7C and the significant number of nononcology drugs and inclusion of many novel cell lines,
prediction of that activity by TUBB6, ABCG1, GSK3G, and MLH1 (P ¼ including rare tumors add significantly to the omics cancer cell line field.
1.2107) in Fig. 7D. Diverse mechanisms of action drugs reveal Our curation of both the cell line and drug names enables integra-
enhanced activities for bladder, blood (leukemia), bone (sarcoma), tion with our previous CellMiner databases (2, 3, 9). It also resolves
bowel, brain, and lymphatic cancer cells in Fig. 7E. differences, making data retrieval and comparisons available with an
Supplementary Table S5 presents an example of a more systematic intuitive web application. This combined with the molecular, meta-
pharmacologic prediction approach of NCATS IC50 drug activity bolic, phenotypic, and signature data from NCI, CCLE, GDSC, and
distributions using CCLE microarray transcript levels. Included are other databases adds a myriad of informative molecular parameters for
63 significant gene–drug combinations in which the genes are known the purposes of exploration, discovery, prediction, and verification of
targets for those drugs. In the case of ABT-737 (a BH3 mimetic and either previously known or novel relationships.
BCL2 gene family inhibitor), the generated multivariate model We find that the activity of drugs with similar mechanisms of action
includes two known targets: BCL2L2 and BCL2 (as given by NCATS is in general internally consistent within NCATS and across the other
annotation). drug databases (CCLE, CTRIP, GDSC, and NCI) as shown in Fig. 3.

1948 Cancer Res; 83(12) June 15, 2023 CANCER RESEARCH


CellMinerCDB: NCATS

RepStress (mda, CCLE) vs. SN-38 act. EMT (mda, CCLE) vs. SN-38 act.
(NCATS IC50) (NCATS IC50)
R = 0.25, P = 0.0014 8 R = 0.06, P = 0.69

10
RepStress (mda, CCLE)

EMT (mda, CCLE)


0
0

−10 −4

0 1 2 3
0 1 2 3

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


SN-38 act. (NCATS IC50) SN-38 act. (NCATS IC50)

Figure 6.
Genomic signature analysis identifies RepStress but not EMT as predictor of SN-38 activity in the overlapping cell lines of NCATS and CCLE. Left and right, snapshots
of CellMinerCDB: NCATS for RepStress and EMT, respectively.

Activity variability for overlapping drugs between institutes is recog- data (gene mutations, copy-number variation, or methylation) available
nized and presumably comes from a combination of the type of in the cancer cell lines (Supplementary Table S4). Currently DNA
robotics and biological techniques employed (37). NCATS uses mutation is a predominant biomarker used for drug prediction.
1,536 well plates, with compounds added immediately after cell plating Although we see the expected predictive value of BRAF mutations with
and 48-hour drug exposure. CTRP and GDSC use 384 well plates, with the activity of vemurafenib and dabrafenib (Supplementary Fig. S2),
compounds added 24 hours after cells plating and 72-hour drug mutations only predict the activity of a relatively small subset of drugs
incubation. All three projects use CellTiter-Glo. It is unsurprising routinely used in oncology. In addition to having reliable gene coverage
that drug activity assays done under different conditions might give and being implemented clinically RNA-seq data are advantageous for
different results. However, our analyses shows that multiple drugs and the construction of multigene signatures. The cell line superiority for the
compounds perform similarly regardless of differences in assay para- prediction of pharmacologic response is likely to translate clinically over
meters. Thus, our recommendation for pharmacogenomics explora- time, leading to its gaining dominance for that purpose.
tion with CellMinerCDB: NCATS is to first perform interdatabase Because pharmacologic response is a product of multiple molecular
analyses with drugs present in at least two platforms and prioritize factors, drug activity prediction, or exploration is expected to be
drugs with consistent cytotoxicity response across databases. improved and tested using the “multivariate analyses” tools of Cell-
CellMinerCDB: NCATS comprises two main analysis tools): “uni- MinerCDB: NCATS. Figure 5 provides examples of how building
variate analyses” and “multivariate analyses” (Fig. 1A). The pharma- multigene analyses can be explored. This approach requires an under-
cogenomics analyses shown in Figs. 3–7, all generated within the standing of the pathways and targets that determine drug response.
CellMinerCDB: NCATS web application, provide examples of the Taking the example of SN-38 (the active metabolite of irinotecan) and
many types of analysis possible. With 14.7 billion drug activity versus topotecan (36), Fig. 5 shows how “multivariate analyses” can be
gene-molecular or phenotypic (CRISPR) measurements, practically, generated. CellMinerCDB also provides preexisting gene-
one is limited only by the number of questions and knowledge one has. signatures. Fig. 6 uses a precomputed multigene signature, the 18-
This number does not include the many intergene molecular and transcript RepStress signature (29). Increased level of this stress
interdrug activity comparisons one might do. parameter is significantly correlated with topotecan and SN-38
Figures 3A, 4A, 5A–E, and Supplementary Fig. S1 provide phar- response, providing proof-of-principle and a testable preclinical model
macogenomic and proteomic explorations for SN-38, as prior for RepStress as predictive for patient response to TOP1 inhibitors.
work has causally related SLFN11 expression to the activity of TOP1 Having precomputed signatures avoids looking up the reference,
inhibitors (6, 38–40). The additional transcript examples in Fig. 4B, finding the genes involved, determining, and then applying the
and DNA copy-number examples in Fig. 4C and D link various algorithm for the cell line set of interest.
NCATS drugs to their molecular targets. The ability to perform gene Downloading the data of CellMinerCDB: NCATS reveals drug
knockdown (CRISPR) comparisons reflect how a gene knockdown activity distribution enrichments for some tissue of origins within the
measured in Project Achilles relates to response to drugs measured in cancer cell line panels. All the cancer types enriched indicate prospective
NCATS. None of the 33 drug-target examples listed are FDA- novel applications for those drugs, presumably with responsive subsets.
approved biomarkers for their respective drugs; so each of them Nononcology drugs might also be studied. An example from Fig. 7E is
provides possible incentive for their development and use. One might disulfiram, a drug used to discourage alcohol intake. Response to this
easily expand this type of analysis to nontarget, but biologically drug is bimodal across the NCATS cancer cell lines, with improved
relevant genes based on domain knowledge. activity in bone (sarcoma) cell lines. This result expands our prior work
When using “univariate analyses,” we find the transcript data are on the discovery of acetalax, another noncancer drug, with activity in
stronger predictors of pharmacologic response than the other genomic triple-negative breast cancer cell lines (3).

AACRJournals.org Cancer Res; 83(12) June 15, 2023 1949


Reinhold et al.

A Filanesib (KIF11 inh.) B Multivariate analyses/Observed vs. predicted


Response cell line set n = 54, Pearson cor. (r) = 0.68, P value = 1.6e-8
NCATS IC50
0.8 3 Bladder-urinary-tract
Response data type Blood
Brain (6/7) act: Drug activity

Observed filanesib activity


Bone
Response identifier Bowel
Brain-CNS
Filanesib 2
0.6 Breast
Head-neck
Lung
Predictor cell line set
Density

Lymph
CCLE Ovary
0.4 Predictor data type/s 1 Pancreas
Skin
xsq: RNA-seq expression Soft-tissue

Predictor identifiers 0
0.2 KIF11 MYBBP1A
TNFRSF10D
Select tissue/s of origin
0.0 All −1

−2 −1 0 1 2 Algorithm
Linear regression
Drug activities
−1 0 1 2 3

Predicted Filanesib (act., NCATS IC50)

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


C D Multivariate analyses/Observed vs. predicted
Epothilone A (TUB stabilizer)
Response cell line set n = 42, Pearson cor. (r) = 0.77, P value = 2.1e-9
0.7
Blood (19/21*) NCATS IC50 Bladder-urinary-tract
Response data type Blood

Observed epothilone a activity


0.6 Brain (7/7) 3 Bone
act: Drug activity Bowel
Response identifier Brain-CNS

0.5 epothilone A Head-neck


Lung
2 Lymph
Predictor cell line set Ovary
Density

0.4 Skin
CCLE Soft-tissue
Predictor data type/s
0.3 1
exp: mRNA Expression
(log2)
0.2 Predictor identifiers
TUBB6 ABCG1 GSK3B 0
MLH1
0.1
Select tissue/s of origin
All
0.0 −1
Algorithm
−3 −2 −1 0 1 2
Linear regression
Drug activities −1 0 1 2 3

Predicted epothilone a (act., NCATS IC50)

E Disulfiram Mithramycin Sapanisertib Demecolcine


(ALDH1A2 inh.) (DNA/RNA polymerase inh.) (MTOR complex inh.) (TUBB-poly. inh.)
Bone Lymph Bladder
0.5 0.5 Blood 0.4 0.4 (6/6*)
(5/7*) (25/35*)
(30/34*) Bowel
0.4 0.4
0.3 0.3 (6/6*)
Density

Density
Density

Density

0.3 0.3
0.2 0.2
0.2 0.2
0.1 0.1
0.1 0.1

0.0 0.0 0.0 0.0


−3 −2 −1 0 1 2 −2 −1 0 1 2 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2
Drug activities Drug activities Drug activities Drug activities

Figure 7.
Drug distributions, tissue of origin enrichments and molecular predictors of drug activity. A, A density plot of filanesib activity (IC50 z-scores from NCATS; x-axis)
versus distribution of the cell lines plotted as density (y-axis). B, Multivariate analysis for filanesib activity as the response variable and CCLE transcript expression of
three genes as predictor variables. C, Density plot of epothilone A activity (x-axis) versus density (y-axis). The brain enrichment, P ¼ 0.082. D, Multivariate analysis for
epothilone A activity as the response variable and CCLE transcript expression of four genes as predictor variables. E, Density plots for four NCATS drugs showing drug
activity IC50 z-scores versus distribution of the cell lines plotted as density (y-axis). For the density plots in A, C, and E, drug activities are z-scores calculated across
cell lines for IC50s (x-axis). Enriched tissue of origins are included (if present) with both the number of cell lines present within the peak (first number) and total number
of cell lines of that type (second number). The asterisks indicate significant P < 0.05. All other P values are less than 0.07. In the scatter plots B and D, the predicted
drug activity is on the x-axis and the observed drug activity is on the y-axis. All correlations presented are Pearson. Dots are cell lines with color coding by tissue of
origin. The plots were created using the CellMinerCDB: NCATS\multivariate analyses\plot data tab selections, with the specific inputs used detailed in the input boxes
to the left.

1950 Cancer Res; 83(12) June 15, 2023 CANCER RESEARCH


CellMinerCDB: NCATS

In summary, the wealth of information in the CellMinerCDB: Authors’ Contributions


NCATS web application, albeit with its own limitations, allows W.C. Reinhold: Conceptualization, formal analysis, investigation, visualization,
basic and clinician researchers to explore pharmacogenomic rela- methodology, writing–original draft, project administration, writing–review and
tionships in either univariate or multivariate fashion. One may editing. K. Wilson: Resources, data curation, software, formal analysis, investigation,
consider drug response in the context of multiple forms or combi- visualization, project administration, writing–review and editing. F. Elloumi:
Resources, data curation, software, formal analysis, visualization, writing–review
nations of outputs that easily run into the billions. The web
and editing. K.R. Bradwell: Resources, data curation, software, writing–review and
application facilitates the user’s ability to explore those relation- editing. M. Ceribelli: Resources, data curation, and software. S. Varma: Resources,
ships and explore potential pharmacogenomic parameters applica- data curation, formal analysis, and investigation. Y. Wang: Resources, data curation,
ble to clinical studies. and software. D. Duveau: Resources, data curation, and software. N. Menon:
Limitations of the data come in multiple forms requiring multiple Resources and data curation. J. Trepel: Conceptualization, resources, data curation,
solutions. Missing data might be addressed by simply carrying out the software, investigation, writing–review and editing. X. Zhang: Conceptualization,
resources, data curation, software, investigation, writing–review and editing. C.
salient form of analysis to fill those gaps.
Klumpp-Thomas: Resources. S. Michael: Resources. P. Shinn: Resources, data
More complete analysis of variability between platforms might be curation, and software. A. Luna: Data curation, software, formal analysis, writ-
done by adding overlapping cell lines, drugs, or assays of interest. ing–review and editing. C. Thomas: Conceptualization, resources, data curation,
Algorithmic approaches that better consider the limitations and software, writing–review and editing. Y. Pommier: Conceptualization, resources,
proper interpretation of datasets can improve results at that level, supervision, investigation, visualization, writing–review and editing.
including the expansion of multivariate analysis functionality and

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


approach selection. Recognitions of signatures predictive of phar- Acknowledgments
macologic response should yield improved success in that area. It Our studies are supported by the Center for Cancer Research, the Intramural
should be noted that the relationships found do not constitute proof Program of the NCI, NIH, Bethesda, MD (Z01 BC 006150).
of causality. The continued exploration and definition of how best
The publication costs of this article were defrayed in part by the payment of
to integrate cancer cell lines omics data with that from patients publication fees. Therefore, and solely to indicate this fact, this article is hereby
and to integrate clinical data into the omics format remain fields in marked “advertisement” in accordance with 18 USC section 1734.
their infancy.
Note
Supplementary data for this article are available at Cancer Research Online
Authors’ Disclosures (http://cancerres.aacrjournals.org/).
K.R. Bradwell reports other support from Palantir Technologies during the
conduct of the study and other support from Palantir Technologies outside the Received November 4, 2022; revised February 27, 2023; accepted April 25, 2023;
submitted work. No disclosures were reported by the other authors. published first May 4, 2023.

References
1. Weinstein JN, Myers TG, O’Connor PM, Friend SH, Fornace AJ Jr, Kohn KW, 11. Allison M. NCATS launches drug repurposing program. Nat Biotechnol 2012;30:
et al. An information-intensive approach to the molecular pharmacology of 571–2.
cancer. Science 1997;275:343–9. 12. Huang R, Zhu H, Shinn P, Ngan D, Ye L, Thakur A, et al. The NCATS
2. Luna A, Elloumi F, Varma S, Wang Y, Rajapakse VN, Aladjem MI, et al. pharmaceutical collection: a 10-year update. Drug Discov Today 2019;24:
CellMiner cross-database (CellMinerCDB) version 1.2: exploration of patient- 2341–9.
derived cancer cell line pharmacogenomics. Nucleic Acids Res 2021;49:D1083– 13. Mathews Griner LA, Guha R, Shinn P, Young RM, Keller JM, Liu D, et al. High-
D93. throughput combinatorial screening identifies drugs that cooperate with ibru-
3. Rajapakse VN, Luna A, Yamade M, Loman L, Varma S, Sunshine M, et al. tinib to kill activated B-cell-like diffuse large B-cell lymphoma cells. Proc Natl
CellMinerCDB for integrative cross-database genomics and pharmacogenomics Acad Sci U S A 2014;111:2349–54.
analyses of cancer cell lines. iScience 2018;10:247–64. 14. Heske CM, Davis MI, Baumgart JT, Wilson K, Gormally MV, Chen L, et al.
4. Reinhold WC, Sunshine M, Liu H, Varma S, Kohn KW, Morris J, et al. CellMiner: Matrix screen identifies synergistic combination of PARP inhibitors and nic-
a web-based suite of genomic and pharmacologic tools to explore transcript and otinamide phosphoribosyltransferase (NAMPT) inhibitors in ewing sarcoma.
drug patterns in the NCI-60 cell line set. Cancer Res 2012;13. Clin Cancer Res 2017;23:7301–11.
5. Reinhold WC, Sunshine M, Varma S, Doroshow JH, Pommier Y. Using 15. Ju W, Zhang M, Wilson KM, Petrus MN, Bamford RN, Zhang X, et al.
cellminer 1.6 for systems pharmacology and genomic analysis of the NCI-60. Augmented efficacy of brentuximab vedotin combined with ruxolitinib and/or
Clin Cancer Res 2015;21:3841–52. Navitoclax in a murine model of human Hodgkin’s lymphoma. Proc Natl Acad
6. Reinhold WC, Thomas A, Pommier Y. DNA-targeted precision medicine; have Sci U S A 2016;113:1624–9.
we been caught sleeping? Trends Cancer 2017;3:2–6. 16. Lin GL, Wilson KM, Ceribelli M, Stanton BZ, Woo PJ, Kreimer S, et al.
7. Reinhold WC, Varma S, Sousa F, Sunshine M, Abaan OD, Davis SR, et al. NCI-60 Therapeutic strategies for diffuse midline glioma from high-throughput com-
whole exome sequencing and pharmacological cellminer analyses. PLoS One bination drug screening. Sci Transl Med 2019;11:eaaw0064.
2014;9:e101670. 17. Wilson KM, Mathews-Griner LA, Williamson T, Guha R, Chen L, Shinn P, et al.
8. Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, et al. A gene Mutation profiles in glioblastoma 3D oncospheres modulate drug efficacy.
expression database for the molecular pharmacology of cancer. Nat Genet 2000; SLAS Technol 2019;24:28–40.
24:236–44. 18. Holbeck SL, Camalier R, Crowell JA, Govindharajulu JP, Hollingshead M,
9. Tlemsani C, Takahashi N, Pongor L, Rajapakse VN, Tyagi M, Wen X, et al. Anderson LW, et al. The National Cancer Institute ALMANAC: a comprehen-
Whole-exome sequencing reveals germline-mutated small cell lung cancer sive screening resource for the detection of anticancer drug pairs with enhanced
subtype with favorable response to DNA repair-targeted therapies. Sci Transl therapeutic activity. Cancer Res 2017;77:3564–76.
Med 2021;13:eabc7488. 19. Varma S, Pommier Y, Sunshine M, Weinstein JN, Reinhold WC. High
10. Pongor LS, Tlemsani C, Elloumi F, Arakawa Y, Jo U, Gross JM, et al. Integrative resolution copy number variation data in the NCI-60 cancer cell lines from
epigenomic analyses of small cell lung cancer cells demonstrates the clinical whole genome microarrays accessible through CellMiner. PLoS One 2014;9:
translational relevance of gene body methylation. iScience 2022;25:105338. e92047.

AACRJournals.org Cancer Res; 83(12) June 15, 2023 1951


Reinhold et al.

20. Reinhold WC, Varma S, Sunshine M, Rajapakse V, Luna A, Kohn KW, et al. 30. Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Felix E, et al. ChEMBL:
The NCI-60 methylome and its integration into cellminer. Cancer Res 2017;77: towards direct deposition of bioassay data. Nucleic Acids Res 2019;47:D930–
601–12. D40.
21. Reinhold WC, Varma S, Sunshine M, Elloumi F, Ofori-Atta K, Lee S, et al. RNA 31. Siramshetty VB, Grishagin I, Nguyen Eth T, Peryea T, Skovpen Y, Stroganov O,
sequencing of the NCI-60: integration into cellminer and cellminer CDB. et al. NCATS inxight drugs: a comprehensive and curated portal for translational
Cancer Res 2019;79:3514–24. research. Nucleic Acids Res 2022;50:D1307–D16.
22. Liu H, D’Andrade P, Fulmer-Smentek S, Lorenzi P, Kohn KW, Weinstein JN, 32. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank
et al. mRNA and microRNA expression profiles integrated with drug sensitivities 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 2018;
of the NCI-60 human cancer cell lines MCT 2010;9:1080–91. 46:D1074–D82.
23. Nishizuka S, Chen ST, Gwadry FG, Alexander J, Major SM, Scherf U, et al. 33. Bairoch A. The cellosaurus, a cell-line knowledge resource. J Biomol Tech 2018;
Diagnostic markers that distinguish colon and ovarian adenocarcinomas: iden- 29:25–38.
tification by genomic, proteomic, and tissue array profiling. Cancer Res 2003;63: 34. Zhao C, Jiang T, Ju JH, Zhang S, Tao J, Fu Y, et al. TruSight oncology 500:
5243–50. enabling comprehensive genomic profiling and biomarker reporting with tar-
24. Guo T, Luna A, Rajapakse VN, Koh CC, Wu Z, Liu W, et al. Quantitative geted sequencing. Biorxiv 2020.
proteome landscape of the NCI-60 cancer cell lines. iScience 2019;21: 35. Luna A, Rajapakse VN, Sousa FG, Gao J, Schultz N, Varma S, et al. rcellminer:
664–80. exploring molecular profiles and drug response of the NCI-60 cell lines in R.
25. Gopi LK, Kidder BL. Integrative pan cancer analysis reveals epigenomic Bioinformatics 2016;32:1272–4.
variation in cancer type and cell specific chromatin domains. Nat Commun 36. Thomas A, Pommier Y. Targeting topoisomerase i in the era of precision
2021;12:1419. medicine. Clin Cancer Res 2019;25:6581–9.
26. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. 37. Niepel M, Hafner M, Mills CE, Subramanian K, Williams EH, Chung M, et al. A

Downloaded from http://aacrjournals.org/cancerres/article-pdf/83/12/1941/3339402/1941.pdf by guest on 09 January 2025


The cancer cell line encyclopedia enables predictive modelling of anticancer drug multi-center study on the reproducibility of drug-response assays in mammalian
sensitivity. Nature 2012;483:603–7. cell lines. Cell Syst 2019;9:35–48.
27. Ghandi M, Huang FW, Jane-Valbuena J, Kryukov GV, Lo CC, McDonald ER III, 38. Zoppoli G, Regairaz M, Leo E, Reinhold WC, Varma S, Ballestrero A, et al.
et al. Next-generation characterization of the cancer cell line encyclopedia. Putative DNA/RNA helicase Schlafen-11 (SLFN11) sensitizes cancer cells to
Nature 2019;569:503–8. DNA-damaging agents. Proc Natl Acad Sci U S A 2012;109:15030–5.
28. Heimerdinger P, Rosin A, Danzer MA, Gerdes T. A novel method for humidity- 39. Rees MG, Seashore-Ludlow B, Cheah JH, Adams DJ, Price EV, Gill S, et al.
dependent through-plane impedance measurement for proton conducting Correlating chemical sensitivity and basal gene expression reveals mechanism of
polymer membranes. Membranes 2019;9:62. action. Nat Chem Biol 2016;12:109–16.
29. Thomas A, Takahashi N, Rajapakse VN, Zhang X, Sun Y, Ceribelli M, et al. 40. Jo U, Murai Y, Takebe N, Thomas A, Pommier Y. Precision oncology with drugs
Therapeutic targeting of ATR yields durable regressions in small cell lung cancers targeting the replication stress, ATR, and Schlafen 11. Cancers (Basel) 2021;13:
with high replication stress. Cancer Cell 2021;39:566–79. 4601.

1952 Cancer Res; 83(12) June 15, 2023 CANCER RESEARCH

You might also like