CellMinerCDB: NCATS Is A Web-Based Portal Integrating Public Cancer Cell Line Databases For Pharmacogenomic Explorations
CellMinerCDB: NCATS Is A Web-Based Portal Integrating Public Cancer Cell Line Databases For Pharmacogenomic Explorations
ABSTRACT
◥
Major advances have been made in the field of precision medicine methylation, metabolites, CRISPR, and miscellaneous signatures.
for treating cancer. However, many open questions remain that Curation of cell lines and drug names enables cross-database (CDB)
Introduction the Cancer Therapeutics Response Portal (CTRP) from the Broad
Institute.
The approach to molecular biology and pharmacology, commonly
The Genomics and Pharmacology Facility (GPF) has pioneered omics
referred to as precision medicine, has been significantly changed over
data acquisition and integration since the mid 1990s (1–9). Its efforts
approximately the last 25 years by the introduction of omics data and
have led to the CellMiner and CellMinerCDB web application (2–7, 9, 10)
the conceptual shift to the use of computer analyses of large datasets
allowing pharmacogenomic database access and integrative analyses
with a combination of statistics, machine learning, omics visualiza-
across all public cancer cell line genomics and drug response
tions, and integration of multiple disparate forms of data. Starting with
databases (2).
the pioneering work of the Developmental Therapeutics Program
NCATS has established an automated compound screening plat-
(DTP) at the NCI (1), many projects have been and are contributing
form for large compound libraries using quantitative high-throughput
sizable blocks of data, prominently including (but not limited to) the
(qHTS) format across multiple different disease models since
large (1,000 cell line) panels of the Cancer Cell Line Encyclopedia
2008 (11–13). For cancer cell line viability screening, NCATS created
(CCLE) from the Broad/Novartis, the Genomics of Drug Sensitivity in
the Mechanism Interrogation PlatEs (MIPE) compound library com-
Cancer (GDSC) from Sanger and Massachusetts General Hospital and
prising approved and investigational chemotherapeutic agents, as well
as common medications for noncancer indications. An additional
1
design feature of the MIPE library is compound mechanistic redun-
Developmental Therapeutics Branch, Center for Cancer Research, National
dancy allowing analyses across multiple compounds reported to hit the
Cancer Institute, NIH, Bethesda, Maryland. 2National Center for Advancing
Translational Sciences, NIH Bethesda, Maryland. 3Palantir Technologies, Denver, same target. Compound screening data using the MIPE library has
Colorado. 4HiThru Analytics LLC, Princeton, New Jersey. 5ICF International Inc., demonstrated value for multiple cancer types, such as diffuse intrinsic
Fairfax, Virginia. 6cBio Center, Dana-Farber Cancer Institute and Department of pontine glioma (DIPG), Hodgkin lymphoma, Ewing sarcoma, small-
Cell Biology, Harvard Medical School, Boston, Massachusetts. cell lung cancer (SCLC), glioblastoma, and others (9, 14–17). Pub-
† lished and unpublished MIPE library compound screening data have
Deceased.
W.C. Reinhold and K. Wilson contributed equally to this article. been aggregated into a unified dataset called the NCATS–NCI Cytoxi-
city Dataset shared internally with the NCI through the Palantir
Corresponding Author: William C. Reinhold, NIH, 9000 Rockville Pike, Building
Foundry platform. A subset of this unified dataset is now being made
37, Room 5041, Bethesda, MD 20892. Phone: 240-760-7339; E-mail:
[email protected] public through CellMinerCDB.
Here we introduce the public databases and web portal of CellMi-
Cancer Res 2023;83:1941–52
nerCDB: NCATS(https://discover.nci.nih.gov/rsconnect/cellminercdb_
doi: 10.1158/0008-5472.CAN-22-2996 ncats/). CellMinerCDB: NCATS enables individual users to access and
2023 American Association for Cancer Research explore the large NCATS drug response database, with an emphasis on
AACRJournals.org | 1941
Reinhold et al.
pharmacology and its relationships to molecular genomics. CellMi- generated using GraphPad PRISM version 7.0. Violin plots were
nerCDB: NCATS is integrated with 33 datasets from multiple projects generated using ggplot version 3.3.5.
from DTP, GPF, CCLE, GDSC, CTRP, the NCI DTP SCLC, NCI60-DTP Bimodal drug activity density distributions were identified using a
Almanac, MD Anderson, and the Project Achilles from the Cancer combination of a Gaussian Mixed Model-based (norm1mix package;
Dependency Map Portal (DepMap; see Supplementary Materials and version 1.3), a kurtosis test and visual inspection. Both these calcula-
Methods for a full listing; refs. 4, 5, 7, 18–28). The omics analyses include tions and the density plots were done using The R Project for Statistical
single and two-drug activities, DNA copy number, methylation, and Computing.
sequencing, whole genome transcriptome, mRNA and selected protein Prediction of NCATS IC50 activity using CCLE microarray tran-
expression, metabolite levels, and clustered regularly interspaced short script expression by both univariate and multivariate analysis used
palindromic repeats (CRISPR) knockouts, allowing explorations of the Pearson correlation between drug response and gene expression of
relationships between those data and pharmacologic responses. Func- the target. The multivariate models use stepwise forward regression.
tionalities of the new CellMinerCDB: NCATS web application are Each model was initiated with a target for a given drug; multiple
introduced and discussed here with multiple examples validating the targets generated multiple models. Possible regression features
database. Details about general functionalities of the CellMinerCDB included genes from Onco500 (34). A maximum of 10 features were
(https://discover.nci.nih.gov/rsconnect/cellminercdb/) platforms have added to each model and then pruned. For each iteration step, the
been reviewed recently (2) and a 10-minute tutorial is on YouTube feature with the lowest partial correlation P value after removing the
(Fig. 1A). effects of already included features was added using rcellminer
A
url: https://discover.nci.nih.gov/rsconnect/cellminercdb_ncats/
Univariate analyses Multivariate analyses Metadata Search IDs Help Video tutorial
B MIPE v4.1
1,978
compounds
C
Anti infectives:
MIPE v4.0 2016-2017 MIPE v5.0 dolutegravir,
2,498 Not
1,912 in piperaquine...
compounds compounds Antifungals
2018-present clinical Approved Present
2013−2015 trials in Antimalarials
Unique to Antivirals
other
34% 36% collections NCATS Antibacterials
Metabolic modulators:
Figure 1.
The CellMinerCDB: NCATS web application, NCATS dataset, drugs, and cell lines. A, Url, banner and tabs for the CellMinerCDB: NCATS web application. B, Schematic
of the creation of the NCATS–NCI cytotoxicity dataset. Multiple versions of the MIPE library were combined into a single-dataset to make the “NCATS–NCI”
“cytotoxicity dataset.” This dataset was trimmed down to remove cell lines with introduced genetic modifications, pretreatment conditions, nonstandard media
additives, and data not meeting the sharing embargo date of 18 months. C, Left, pie chart showing the clinical status of the 2,675 CellMinerCDB: NCATS compounds:
36% are FDA-approved, 30% have entered clinical trials, and 34% are experimental. Right, pie chart showing the compounds overlapping between CellMinerCDB:
NCATS and all other datasets included in CellMinerCDB 1.4. Thirty percent (837) of NCATS compounds overlap with at least one of the other CellMinerCDB datasets
and 70% (1, 860) do not. Of those compounds found only in the NCATS datasets, there are multiple noncancer drug types included (see box). D, Pie chart showing the
cell line overlaps between CellMinerCDB: NCATS and all other datasets included in CellMinerCDB 1.4.
The NCATS input data and 908 drugs (34%) that are preclinical (Fig. 1C, left). Notably, 1,877
CellMinerCDB: NCATS comprises 2,675 drugs and compounds (70%) drugs and compounds are unique to NCATS (Fig. 1C, right).
tested in 183 cell lines, of which, 2,667 have mechanism of action They have been annotated with their commonly accepted mechanisms
designations. The dataset was created as described in Materials and of action. A feature of the NCATS dataset is the inclusion of 518
Methods and Fig. 1B. The output is fully compatible and integrated approved nononcology drugs not found in the other public databases
with CellMinerCDB (2). An asset of CellMinerCDB: NCATS (Supplementary Table S1). Those include 103 antiinfectives (antibac-
is the unique compounds and cancer cell lines included (Fig. 1C terial, mycobacterial, viral, or fungal) for systemic use, 86 cardiovas-
and D). cular or nervous system drugs, 72 alimentary tract and metabolism
NCATS contains two drug sensitivity metrics, Z- AUC and IC50 compounds.
values. These boast a large range of screening concentrations, routinely The 183 NCATS cell lines distribution by tissue of origin is detailed
using 11 concentrations between 0.79 nanomolar and 47 micromolar, in Supplementary Table S2. They include 72 (38%) unique cancer cell
which is an asset of NCATS drug testing (12). The drugs include 952 lines absent in other public cancer cell line databases (Fig. 1D;
(36%) clinically approved, 790 (30%) that have entered clinical trials, Supplementary Table S3). Figure 1D shows several of the rare disease
subtypes including DIPG, renal Birt-Hogg- Dube syndrome, hered- allows their comparison for identical, related by mechanism of action,
itary leiomyomatosis, and TFE3 fusion cancer cell lines. Thus, Cell- or disparate drugs.
MinerCDB: NCATS provides the user with substantial new drug and
cell line data. Omics data available for cross-comparisons in CellMinerCDB:
NCATS
Cell line and drug overlaps of NCATS with other cancer cell line Figure 2C summarizes by cell line set and measurement type the
datasets profiles available in CellMinerCDB: NCATS, including 31,617 drug
The cell lines overlaps for CellMinerCDB: NCATS as well as all (and compound) activities, 261,848 molecular measurements and
other cell line sets are listed in Fig. 2A. As in our other CellMinerCDB 18,119 miscellaneous signatures. All 28 included datasets are available
websites (https://discover.nci.nih.gov/), cell lines are matched with for download from the Metadata tab (Fig. 1A). Our curation and
common tissue of origin terms based on the OncoTree ontology levels standardization of these datasets minimizes the task of name matching.
developed by the Memorial Sloan Kettering Cancer Center (New York, The data types available for exploration based on the databases with
NY) and Dana-Farber Cancer Institute (Boston, MA), primarily overlapping cell lines include single-drug activities, two-drug combi-
version 1.1 as described previously (2). Additional information such nation activities, gene copy number, methylation and mutation levels,
as patient gender or age from which the cell line originated are also transcript expression, protein expression, metabolite levels, the Dep-
included. Comparison between drug responses in cell lines is made Map Achilles (Achilles) CRISPR genetic dependencies, and miscella-
possible by the overlap of cell lines across databases (Fig. 2A). neous molecular signatures. Those miscellaneous phenotypic signa-
A Cell line overlap between NCATS and other cell line sets B Drug overlap between NCATS
and other cell line sets
Cell line sets NCATS CCLE CTRP GDSC MD Anderson Achilles PRISM NCI SCLC DTP Almanac
NCATS 183 102 90 81 59 56 30 12 8 8 NCATS PRISM DTP NCI SCLC GDSC CTRP Almanac CCLE
CCLE 1,089 823 687 389 580 480 42 52 52 NCATS 2,675 795 666 400 198 165 94 22
CTRP 823 595 327 497 441 33 40 40 PRISM 1,413 380 286 134 134 77 22
GDSC 1,080 364 424 360 55 55 55 DTP 24,360 327 143 128 89 21
MD Anderson 651 245 198 11 55 55 NCI SCLC 526 115 128 100 21
Project Achilles 769 343 17 31 31 GDSC 297 77 46 16
PRISM 480 9 44 44 CTRP 481 46 14
NCI SCLC 77 1 1 Almanac 104 9
DTP 60 60 CCLE 24
NCI Almanac 60
Figure 2.
Cell line and drug overlap, and data types in CellMinerCDB-NCATS. A, Cell lines overlap between NCATS and the nine other cell line datasets. Project Achilles is from
the DepMap; PRISM from Broad-MIT; NCI Almanac is the NCI60-DTP Almanac. B, Drug overlap between NCATS and the seven other cell line datasets. Number of
drugs is as based on the comparison of NCATS AUC overlap and the seven other cell line sets. The MD Anderson and DepMap Achilles cell line datasets are not
included as they have no drug activities. The NCI Almanac has two-drug activities measurements. The drugs with data for inhibitory concentration 50% (IC50) are
slightly less in number. For acronym definitions see A. C, Available data in CellMinerCDB: NCATS. For the drug activities columns, the “single” numbers are compounds
or drugs. The “combo” drugs are two-drug combinations for 105 FDA-approved drugs. For the DNA, RNA, and CRISPR columns, the numbers are genes with
information for that cell line set. For the “protein” columns, the numbers are epitopes for the reverse phase protein arrays (RPPA) and protein fragments for the mass
spectrometry (MS). For the “metabolite” column, the numbers are metabolites. For the “signatures” column, the number is signatures of various types. CTRP DNA
copy number and mutation, microarray log2, and signatures data are identical to that in CCLE, and so are not included here.
depending on one’s interest, easily jumps into the billions. The NCATS significant correlations between drug activities and DNA copy num-
drug data can be compared to genomics data for the same cell lines in bers; all linked through having the same gene both as drug target and
other datasets allowing one to relate the drug responses to omics molecular measurement. All have significant correlations between
features using CellMinerCDB: NCATS. The following examples illus- gene DNA copy number and transcript levels.
trate the basic use of CellMinerCDB: NCATS. Figure 4E and F exemplify the possibility of testing NCATS drug
activity versus genetic inactivation of the drug target. Figure 4E
Drug comparisons compares the growth inhibitory activity of vemurafenib (a BRAF
The overlaps between cell lines and drugs across the “cell line sets” inhibitor) to cell survival with BRAF CRISPR knockdown (as mea-
facilitate multiple forms of drug comparisons. Figure 3A shows a sured by Project Achilles). The resultant scatter plot demonstrates a
univariate analyses/plot data output for two structurally related TOP1 significant correlation between the two. Figure 4F lists other examples
inhibitors commonly used in clinical oncology (36), topotecan (x-axis) showing significant correlations between drug activities and CRISPR
versus SN-38 (y-axis), the active metabolite of irinotecan. Both are knockdown; in each case linked through having the same gene both as
measured by NCATS and displayed using CellMinerCDB-NCATS. the drug and CRISPR target. As for the drugs in Figs. 3 and 4 provides
The highly significant correlation between the two drugs (P ¼ only a small sampling of the types of informative comparisons one
9.11052) demonstrates internal assay consistency. might do.
Similarly, Fig. 3B shows a univariate analyses/compare patterns To compare the predictive value of different genomics parameters,
comparing the NCATS anaplastic lymphoma kinase (ALK) inhibitor the NCATS approved and clinical trial drugs IC50 activities were each
C D
AZD-7762 Breast
25 Head-neck
NCATS vs. CTRP
Lung
y-axis cell line set Lymph 90 cell lines
Ovary
CTRP Pancreas 102/265 compounds
y-axis data type Prostate
act: Drug activity (AUC) Skin
−log10 20 Soft-tissue
NCATS vs. GDSC
Identifier 80 cell lines
AZD-7662
71/212 compounds
Select tissue/s of origin
All 15
Select tissues to color 0.3 0.6 0.9
Pearson correlation
−1 0 1 2 3
AZD-7762 (act, NCATS)
Figure 3.
Comparisons of drugs in CellMinerCDB: NCATS. A, Scatter plot of the activities of topotecan (x-axis) versus SN38 (y-axis), both measured by NCATS. The plot is a
screenshot from CellMinerCDB-NCATS (Fig. 1A, univariate analyses). B, Comparison of the ALK inhibitor TAE-684 with the other ALK inhibitors tested by NCATS. The
results were generated using CellMinerCDB-NCATS (univariate analyses/compare patterns tab selections) including a filter to output only “ALK inhibitor” in the
mechanism of action (MOA) column and ordered by P value. C, Bar graph showing the top 15 compounds with the highest positive correlation for IC50 value
comparisons between NCATS and GDSC. Red bars highlight the compounds highly correlated between NCATS and CTRP (D): linifanib, sorafenib, AZD-7762, and
tivozanib. The primary target of each compound is shown in parenthesis. D, Bar graph showing the top 15 compounds with the highest positive correlation for IC50
values between NCATS and CTRP. Red bars highlight the compounds highly correlated between NCATS and GDSC (C). The primary target of each compound is
shown in parenthesis. E, A scatter plot of AZD-7762 activity as measured by NCATS (x-axis) versus CTRP (y-axis). The plot is a screenshot generated using the
univariate analyses/plot data tab selections. For the scatter plots A, B, and E, individual dots are cell lines with color coding by tissue of origin. F, Violin plot showing all
compounds with IC50 with positive correlations and with P < 0.05 either between NCATS and CTRP or NCATS and GDSC. All compounds shown had a minimum of 16
cell lines overlap between datasets. The box plot overlay shows a median correlation of 0.4. All correlations presented are Pearson.
4 6 8
SLFN11 (exp, GDSC)
Identifier 0
Brain-CNS
Breast
Venetoclax BCL2 inhibitor BCL2 -0.64 2.3e-5
BRAF Head-neck
Lung
sch-900776 CDK1 inhibitor CDK1 -0.30 0.040
y-axis cell line set Lymph
Peripheral-nervous
Panobinostat HDAC inhibitor HDAC8 -0.40 0.033
NCATS CT IC50 −0.5 Skin
bms-754807 IGF1R inhibitor IGF1R -0.52 3.4e-4
y-axis data type Soft-tissue
act: Drug activity As-703026 MEK 1/2 inhibitor MAP2K1 -0.43 0.014
−log10 Milademetan MDM2 inhibitor MDM2 -0.89 2.4e-4
Identifier
−1 Buparlisib PIK3CA inhibitor PIK3CA -0.36 0.008
Vemurafenib
Entecavir POL inhibitor POLE -0.54 0.012
Select tissues/s of origin
Idarubicin TOP2A inhibitor TOP2 -0.37 0.005
−1.5
Select tissues to color
−1 −0.5 0
BRAF (cri, Achilles)
Figure 4.
NCATS: CDB univariate comparisons of drug activities to transcript, DNA copy number, and CRISPR signatures. A, Scatter plot of SLFN11 transcript expression from
GDSC (x-axis) versus SN-38 activity measured by NCATS (y-axis). The plot is a snapshot from CellMinerCDB-NCATS (univariate analyses). B, Additional examples of
significantly correlated and biologically linked NCATS IC50 drug activities versus GDSC transcript expression levels. All gene examples are targets for the
corresponding drugs. C, Scatter plot of MTOR DNA copy number as measured by CCLE (x-axis) versus -5584 activity as measured by NCATS (y-axis). The plot is a
screenshot from CellMinerCDB-NCATS (univariate analyses/plot data tab selections), with the specific inputs used detailed in the boxes to the left. The vertical line
was added at 0 intensity or 2N DNA copy number. The units for the x-axis were converted from intensity to ploidy (copy number ¼ 22intensity) for biological clarity. D,
Additional examples of significantly correlated and biologically linked NCATS IC50 drug activities versus CCLE DNA copy number from plots generated as in C. Genes
are targets of the corresponding drugs. E, Scatter plot of BRAF CRISPR knockdown cell survival from the Achilles Project (x-axis) versus vemurafenib activity as
measured by NCATS (y-axis). The plot is a screenshot from CellMinerCDB (univariate analyses/plot data tab), with the specific inputs used detailed in the input boxes
to the left. The vertical line was added at 0 to indicate that the cell lines to the left of line have decreased survival following knocking down BRAF. F, Additional
examples of significant correlations of drug activities versus CRISPR knockdown of the target genes. The CRISPR knockdown cell survival data are from the Achilles
Project. All correlations presented in the figure are Pearson. For all scatter plots, dots are cell lines with color coding by tissue of origin indicated to the right.
A D
R = 0.43, P = 1.6e-5
SLFN11 (RNA-seq, CCLE)
0 1 2 3
E Observed versus 10x cross-validation for SN-38 act. (NCATS IC50),
SN-38 act. (NCATS IC50) r = 0.6, P 1.4e-10
B R = 0.15, P = 0.15 3 Biliary-tract
BPTF (RNA-seq, CCLE)
Bladder-urinary-tract
Blood
Bone
Bowel
Figure 5.
Multivariate analysis of SN-38 activity in NCATS using the expression of SLFN11, BPTF, HMGN1, and BAX in the overlapping cell lines in CCLE is a better predictor of SN-
38 activity than any of the four genes taken individually. A, Predictive value of SLFN11 expression. B, Predictive value of BPTF (encoding a protein regulating
chromatin remodeling as a regulator of ATP hydrolysis of the NURF complex). C, Predictive value of HMGN1 (encoding HMGN1) associated with active transcription.
D, Cluster image map of the multivariate analysis of SN-38 activity predicted by the expression of four genes together. See Supplementary Fig. S1 for BAX univariate
data. E, Scatter plot of the observed versus 10-fold cross-validation for SN-38 using the same predictor genes as in D.
RepStress (mda, CCLE) vs. SN-38 act. EMT (mda, CCLE) vs. SN-38 act.
(NCATS IC50) (NCATS IC50)
R = 0.25, P = 0.0014 8 R = 0.06, P = 0.69
10
RepStress (mda, CCLE)
−10 −4
0 1 2 3
0 1 2 3
Figure 6.
Genomic signature analysis identifies RepStress but not EMT as predictor of SN-38 activity in the overlapping cell lines of NCATS and CCLE. Left and right, snapshots
of CellMinerCDB: NCATS for RepStress and EMT, respectively.
Activity variability for overlapping drugs between institutes is recog- data (gene mutations, copy-number variation, or methylation) available
nized and presumably comes from a combination of the type of in the cancer cell lines (Supplementary Table S4). Currently DNA
robotics and biological techniques employed (37). NCATS uses mutation is a predominant biomarker used for drug prediction.
1,536 well plates, with compounds added immediately after cell plating Although we see the expected predictive value of BRAF mutations with
and 48-hour drug exposure. CTRP and GDSC use 384 well plates, with the activity of vemurafenib and dabrafenib (Supplementary Fig. S2),
compounds added 24 hours after cells plating and 72-hour drug mutations only predict the activity of a relatively small subset of drugs
incubation. All three projects use CellTiter-Glo. It is unsurprising routinely used in oncology. In addition to having reliable gene coverage
that drug activity assays done under different conditions might give and being implemented clinically RNA-seq data are advantageous for
different results. However, our analyses shows that multiple drugs and the construction of multigene signatures. The cell line superiority for the
compounds perform similarly regardless of differences in assay para- prediction of pharmacologic response is likely to translate clinically over
meters. Thus, our recommendation for pharmacogenomics explora- time, leading to its gaining dominance for that purpose.
tion with CellMinerCDB: NCATS is to first perform interdatabase Because pharmacologic response is a product of multiple molecular
analyses with drugs present in at least two platforms and prioritize factors, drug activity prediction, or exploration is expected to be
drugs with consistent cytotoxicity response across databases. improved and tested using the “multivariate analyses” tools of Cell-
CellMinerCDB: NCATS comprises two main analysis tools): “uni- MinerCDB: NCATS. Figure 5 provides examples of how building
variate analyses” and “multivariate analyses” (Fig. 1A). The pharma- multigene analyses can be explored. This approach requires an under-
cogenomics analyses shown in Figs. 3–7, all generated within the standing of the pathways and targets that determine drug response.
CellMinerCDB: NCATS web application, provide examples of the Taking the example of SN-38 (the active metabolite of irinotecan) and
many types of analysis possible. With 14.7 billion drug activity versus topotecan (36), Fig. 5 shows how “multivariate analyses” can be
gene-molecular or phenotypic (CRISPR) measurements, practically, generated. CellMinerCDB also provides preexisting gene-
one is limited only by the number of questions and knowledge one has. signatures. Fig. 6 uses a precomputed multigene signature, the 18-
This number does not include the many intergene molecular and transcript RepStress signature (29). Increased level of this stress
interdrug activity comparisons one might do. parameter is significantly correlated with topotecan and SN-38
Figures 3A, 4A, 5A–E, and Supplementary Fig. S1 provide phar- response, providing proof-of-principle and a testable preclinical model
macogenomic and proteomic explorations for SN-38, as prior for RepStress as predictive for patient response to TOP1 inhibitors.
work has causally related SLFN11 expression to the activity of TOP1 Having precomputed signatures avoids looking up the reference,
inhibitors (6, 38–40). The additional transcript examples in Fig. 4B, finding the genes involved, determining, and then applying the
and DNA copy-number examples in Fig. 4C and D link various algorithm for the cell line set of interest.
NCATS drugs to their molecular targets. The ability to perform gene Downloading the data of CellMinerCDB: NCATS reveals drug
knockdown (CRISPR) comparisons reflect how a gene knockdown activity distribution enrichments for some tissue of origins within the
measured in Project Achilles relates to response to drugs measured in cancer cell line panels. All the cancer types enriched indicate prospective
NCATS. None of the 33 drug-target examples listed are FDA- novel applications for those drugs, presumably with responsive subsets.
approved biomarkers for their respective drugs; so each of them Nononcology drugs might also be studied. An example from Fig. 7E is
provides possible incentive for their development and use. One might disulfiram, a drug used to discourage alcohol intake. Response to this
easily expand this type of analysis to nontarget, but biologically drug is bimodal across the NCATS cancer cell lines, with improved
relevant genes based on domain knowledge. activity in bone (sarcoma) cell lines. This result expands our prior work
When using “univariate analyses,” we find the transcript data are on the discovery of acetalax, another noncancer drug, with activity in
stronger predictors of pharmacologic response than the other genomic triple-negative breast cancer cell lines (3).
Lymph
CCLE Ovary
0.4 Predictor data type/s 1 Pancreas
Skin
xsq: RNA-seq expression Soft-tissue
Predictor identifiers 0
0.2 KIF11 MYBBP1A
TNFRSF10D
Select tissue/s of origin
0.0 All −1
−2 −1 0 1 2 Algorithm
Linear regression
Drug activities
−1 0 1 2 3
0.4 Skin
CCLE Soft-tissue
Predictor data type/s
0.3 1
exp: mRNA Expression
(log2)
0.2 Predictor identifiers
TUBB6 ABCG1 GSK3B 0
MLH1
0.1
Select tissue/s of origin
All
0.0 −1
Algorithm
−3 −2 −1 0 1 2
Linear regression
Drug activities −1 0 1 2 3
Density
Density
Density
0.3 0.3
0.2 0.2
0.2 0.2
0.1 0.1
0.1 0.1
Figure 7.
Drug distributions, tissue of origin enrichments and molecular predictors of drug activity. A, A density plot of filanesib activity (IC50 z-scores from NCATS; x-axis)
versus distribution of the cell lines plotted as density (y-axis). B, Multivariate analysis for filanesib activity as the response variable and CCLE transcript expression of
three genes as predictor variables. C, Density plot of epothilone A activity (x-axis) versus density (y-axis). The brain enrichment, P ¼ 0.082. D, Multivariate analysis for
epothilone A activity as the response variable and CCLE transcript expression of four genes as predictor variables. E, Density plots for four NCATS drugs showing drug
activity IC50 z-scores versus distribution of the cell lines plotted as density (y-axis). For the density plots in A, C, and E, drug activities are z-scores calculated across
cell lines for IC50s (x-axis). Enriched tissue of origins are included (if present) with both the number of cell lines present within the peak (first number) and total number
of cell lines of that type (second number). The asterisks indicate significant P < 0.05. All other P values are less than 0.07. In the scatter plots B and D, the predicted
drug activity is on the x-axis and the observed drug activity is on the y-axis. All correlations presented are Pearson. Dots are cell lines with color coding by tissue of
origin. The plots were created using the CellMinerCDB: NCATS\multivariate analyses\plot data tab selections, with the specific inputs used detailed in the input boxes
to the left.
References
1. Weinstein JN, Myers TG, O’Connor PM, Friend SH, Fornace AJ Jr, Kohn KW, 11. Allison M. NCATS launches drug repurposing program. Nat Biotechnol 2012;30:
et al. An information-intensive approach to the molecular pharmacology of 571–2.
cancer. Science 1997;275:343–9. 12. Huang R, Zhu H, Shinn P, Ngan D, Ye L, Thakur A, et al. The NCATS
2. Luna A, Elloumi F, Varma S, Wang Y, Rajapakse VN, Aladjem MI, et al. pharmaceutical collection: a 10-year update. Drug Discov Today 2019;24:
CellMiner cross-database (CellMinerCDB) version 1.2: exploration of patient- 2341–9.
derived cancer cell line pharmacogenomics. Nucleic Acids Res 2021;49:D1083– 13. Mathews Griner LA, Guha R, Shinn P, Young RM, Keller JM, Liu D, et al. High-
D93. throughput combinatorial screening identifies drugs that cooperate with ibru-
3. Rajapakse VN, Luna A, Yamade M, Loman L, Varma S, Sunshine M, et al. tinib to kill activated B-cell-like diffuse large B-cell lymphoma cells. Proc Natl
CellMinerCDB for integrative cross-database genomics and pharmacogenomics Acad Sci U S A 2014;111:2349–54.
analyses of cancer cell lines. iScience 2018;10:247–64. 14. Heske CM, Davis MI, Baumgart JT, Wilson K, Gormally MV, Chen L, et al.
4. Reinhold WC, Sunshine M, Liu H, Varma S, Kohn KW, Morris J, et al. CellMiner: Matrix screen identifies synergistic combination of PARP inhibitors and nic-
a web-based suite of genomic and pharmacologic tools to explore transcript and otinamide phosphoribosyltransferase (NAMPT) inhibitors in ewing sarcoma.
drug patterns in the NCI-60 cell line set. Cancer Res 2012;13. Clin Cancer Res 2017;23:7301–11.
5. Reinhold WC, Sunshine M, Varma S, Doroshow JH, Pommier Y. Using 15. Ju W, Zhang M, Wilson KM, Petrus MN, Bamford RN, Zhang X, et al.
cellminer 1.6 for systems pharmacology and genomic analysis of the NCI-60. Augmented efficacy of brentuximab vedotin combined with ruxolitinib and/or
Clin Cancer Res 2015;21:3841–52. Navitoclax in a murine model of human Hodgkin’s lymphoma. Proc Natl Acad
6. Reinhold WC, Thomas A, Pommier Y. DNA-targeted precision medicine; have Sci U S A 2016;113:1624–9.
we been caught sleeping? Trends Cancer 2017;3:2–6. 16. Lin GL, Wilson KM, Ceribelli M, Stanton BZ, Woo PJ, Kreimer S, et al.
7. Reinhold WC, Varma S, Sousa F, Sunshine M, Abaan OD, Davis SR, et al. NCI-60 Therapeutic strategies for diffuse midline glioma from high-throughput com-
whole exome sequencing and pharmacological cellminer analyses. PLoS One bination drug screening. Sci Transl Med 2019;11:eaaw0064.
2014;9:e101670. 17. Wilson KM, Mathews-Griner LA, Williamson T, Guha R, Chen L, Shinn P, et al.
8. Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, et al. A gene Mutation profiles in glioblastoma 3D oncospheres modulate drug efficacy.
expression database for the molecular pharmacology of cancer. Nat Genet 2000; SLAS Technol 2019;24:28–40.
24:236–44. 18. Holbeck SL, Camalier R, Crowell JA, Govindharajulu JP, Hollingshead M,
9. Tlemsani C, Takahashi N, Pongor L, Rajapakse VN, Tyagi M, Wen X, et al. Anderson LW, et al. The National Cancer Institute ALMANAC: a comprehen-
Whole-exome sequencing reveals germline-mutated small cell lung cancer sive screening resource for the detection of anticancer drug pairs with enhanced
subtype with favorable response to DNA repair-targeted therapies. Sci Transl therapeutic activity. Cancer Res 2017;77:3564–76.
Med 2021;13:eabc7488. 19. Varma S, Pommier Y, Sunshine M, Weinstein JN, Reinhold WC. High
10. Pongor LS, Tlemsani C, Elloumi F, Arakawa Y, Jo U, Gross JM, et al. Integrative resolution copy number variation data in the NCI-60 cancer cell lines from
epigenomic analyses of small cell lung cancer cells demonstrates the clinical whole genome microarrays accessible through CellMiner. PLoS One 2014;9:
translational relevance of gene body methylation. iScience 2022;25:105338. e92047.
20. Reinhold WC, Varma S, Sunshine M, Rajapakse V, Luna A, Kohn KW, et al. 30. Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Felix E, et al. ChEMBL:
The NCI-60 methylome and its integration into cellminer. Cancer Res 2017;77: towards direct deposition of bioassay data. Nucleic Acids Res 2019;47:D930–
601–12. D40.
21. Reinhold WC, Varma S, Sunshine M, Elloumi F, Ofori-Atta K, Lee S, et al. RNA 31. Siramshetty VB, Grishagin I, Nguyen Eth T, Peryea T, Skovpen Y, Stroganov O,
sequencing of the NCI-60: integration into cellminer and cellminer CDB. et al. NCATS inxight drugs: a comprehensive and curated portal for translational
Cancer Res 2019;79:3514–24. research. Nucleic Acids Res 2022;50:D1307–D16.
22. Liu H, D’Andrade P, Fulmer-Smentek S, Lorenzi P, Kohn KW, Weinstein JN, 32. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank
et al. mRNA and microRNA expression profiles integrated with drug sensitivities 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 2018;
of the NCI-60 human cancer cell lines MCT 2010;9:1080–91. 46:D1074–D82.
23. Nishizuka S, Chen ST, Gwadry FG, Alexander J, Major SM, Scherf U, et al. 33. Bairoch A. The cellosaurus, a cell-line knowledge resource. J Biomol Tech 2018;
Diagnostic markers that distinguish colon and ovarian adenocarcinomas: iden- 29:25–38.
tification by genomic, proteomic, and tissue array profiling. Cancer Res 2003;63: 34. Zhao C, Jiang T, Ju JH, Zhang S, Tao J, Fu Y, et al. TruSight oncology 500:
5243–50. enabling comprehensive genomic profiling and biomarker reporting with tar-
24. Guo T, Luna A, Rajapakse VN, Koh CC, Wu Z, Liu W, et al. Quantitative geted sequencing. Biorxiv 2020.
proteome landscape of the NCI-60 cancer cell lines. iScience 2019;21: 35. Luna A, Rajapakse VN, Sousa FG, Gao J, Schultz N, Varma S, et al. rcellminer:
664–80. exploring molecular profiles and drug response of the NCI-60 cell lines in R.
25. Gopi LK, Kidder BL. Integrative pan cancer analysis reveals epigenomic Bioinformatics 2016;32:1272–4.
variation in cancer type and cell specific chromatin domains. Nat Commun 36. Thomas A, Pommier Y. Targeting topoisomerase i in the era of precision
2021;12:1419. medicine. Clin Cancer Res 2019;25:6581–9.
26. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. 37. Niepel M, Hafner M, Mills CE, Subramanian K, Williams EH, Chung M, et al. A