Molecular Characterization of Richter Syn-Drome Identi Fies de Novo Diffuse Large B-Cell Lymphomas With Poor Prognosis
Molecular Characterization of Richter Syn-Drome Identi Fies de Novo Diffuse Large B-Cell Lymphomas With Poor Prognosis
Molecular Characterization of Richter Syn-Drome Identi Fies de Novo Diffuse Large B-Cell Lymphomas With Poor Prognosis
1038/s41467-022-34642-6
Chronic lymphocytic leukemia (CLL) is the most frequent leukemia in differ in their degree of somatic hypermutations in the immunoglo-
Western countries1. While generally considered an indolent B cell dis- bulin heavy chain variable (IGHV) domains. IGHV-unmutated CLLs (U-
ease, CLL is in fact associated with a highly heterogeneous clinical CLL) are associated with an inferior prognosis than IGHV-mutated CLLs
course. CLLs are classified into two major molecular subtypes that (M-CLL)2. CLL transformation into a more aggressive histology is
A full list of affiliations appears at the end of the paper. *A list of authors and their affiliations appears at the end of the paper.
e-mail: [email protected]; [email protected]
termed Richter syndrome (RS)3. Diffuse large B cell lymphoma Distinguishing between CLL-derived RS and de novo DLBCL in a
(DLBCL) subtype accounts for 90–95% of RS cases. Around 80% of RS diagnostic setting based on histology and immunochemistry alone is
cases are IGHV-unmutated while the remainder are IGHV-mutated4. In challenging. Around 80% of RS cases are clonally related to the CLL
contrast, most de novo DLBCL (from now on called DLBCL) are IGHV- disease stage while the remainder are unrelated (i.e. independent de
mutated as they originate from germinal center (GC) or post-GC B novo DLBCL). This dichotomy is of importance for treatment deci-
cells. Based on gene expression patterns, different cell-of-origin (COO) sions. De novo DLBCLs are chemosensitive in most patients, whereas
derivations of DLBCL include GC B cell like (GCB) and activated B cell- CLL-derived RS is mainly characterized by chemoresistance and poor
like (ABC) DLBCL5. Recent genomic studies combining DNA and RNA outcome, with a median overall survival (OS) of around 12 months.
sequencing extended DLBCL subtyping beyond COO6–10, identifying In this study, we perform genome-wide DNAm analysis and whole-
DLBCL subgroups defined by their genomic alteration patterns and transcriptome profiling for a large series of primary human RS sam-
associated clinical courses, but a notable proportion remains ples, and comprehensively compare our findings to those in CLL and
unclassified7,8,10. Moreover, although studies have shown some extent DLBCL. We extensively characterize the epigenetic architecture of the
of association of genetically defined groups with transcriptionally RS samples and find the majority retain a CLL imprint. Remarkably,
defined COO signatures, the transcriptome in its entirety is not fully applying DNAm- and gene expression-based classifiers to datasets
used in current classifications. from landmark studies identifies a subset of “RS-type” DLBCL that is
As compared to other lymphoid malignancies, the availability of not previously described at the genomic level, is enriched in cases with
in vitro or in vivo models to study RS is limited11–15, and therefore our an ABC-like COO signature, and has an unfavorable prognosis 7,8,10.
current knowledge on RS biology remains incomplete. The few geno-
mic studies attempting to decipher oncogenic mechanisms underlying Results
RS described disabled DNA damage response and cell cycle control Data quality controls
through TP53 abnormalities and CDKN2A deletions, chronic B cell The study workflow is described in Fig. 1. We investigated DNAm using
receptor (BCR) signaling, and NOTCH, MYC, and MAPK pathway array-based technologies, exploring a total of 433 samples, including
deregulations16–20. A recent report using multiome and single cell 58 RS samples, 25 CLLs paired with RS (i.e. tumor samples were avail-
approaches in sequential CLL-RS samples describes that the increased able at both CLL and RS stages; hereafter “paired-CLLs”), 68 DLBCLs,
molecular complexity of RS does not seem to be the consequence of and additional published methylomes from 190 other CLLs, and
clonal evolution over time but rather the selection of minute subclones 92 samples representing normal B cell subpopulations (Supplemen-
present at CLL diagnosis and years before overt transformation21. tary Fig. 1)22,25,26,34,35. Limiting the batch effect is critical for comparing
Additionally, recent studies focusing on DNA methylation (DNAm) large cohorts explored with different platforms in different facilities. In
further captured the genomic complexity of CLL22–26, RS27, de novo this regard, we used EPIC and 450K Illumina microarray platforms, as
DLBCL28–30, and other B cell neoplasms31–33. A better understanding of these provide accurate, robust, and reproducible genome-wide cov-
epigenetic signatures is needed, whether related to B cell development erage of CpG sites36,37. We extensively explored potential batch effects
or tumor transformation mechanisms. and showed it was completely removed after applying strict quality
Targeted NGS
n=58
Overlapping samples
DNA
methylation
n=58 Normal B-cell control groups
Fig. 1 | Study workflow. Genome-wide DNA methylation data were available for 58 detailed exploration of oncogenic processes and epigenetic network deregula-
RS, 25 CLLs paired with RS (tumor DNA samples were available at both CLL and RS tions. RNA sequencing data were obtained for another 6 RS, 28 de novo DLBCLs,
stages), 190 other CLLs, 68 de novo DLBCLs, and 92 samples from normal B cells and 10 non-tumoral lymph nodes. Data acquired from normal B cell control groups
spanning the entire B lineage. All 58 RS samples were also documented for muta- were used for methodologic purposes only (see “Methods”). CLL chronic lym-
tions in a custom panel of 13 CLL driver genes, and RNA-sequencing data were phocytic leukemia, DLBCL de novo diffuse large B cell lymphoma, NGS next-
concomitantly available for 41 RS samples, allowing integrative analyses and generation sequencing, RS Richter syndrome.
controls (see “Methods”). In addition, we applied a bioinformatic 1 separated CLL from RS and DLBCL, while principal component
deconvolution method to separate methylation data attributable to 2 separated DLBCL from RS. However, some RS clustered within the
five subtypes of normal white blood cells (CD4+ T-lymphocytes, CD8+ DLBCLs or the CLLs. Decreased DNAm was observed in RS compared
T-lymphocytes, neutrophils, monocytes, B cells). Use of respective cell to CLL, DLBCL, and normal B cells (Fig. 2b and Supplementary Figs. 4
composition data as covariates in supervised analyses limited the and 5). DNAm levels of the paired-CLLs were intermediate between RS
influence of tumor cell content of our samples. and the other CLLs (Fig. S6). Hypomethylated and hypermethylated
CpGs in RS were differentially distributed regarding CpG islands but
RS is a DNA hypomethylated entity versus CLL and de similarly distributed regarding genomic context (Supplementary
novo DLBCL Figs. 7 and 8).
Unsupervised principal component analysis (PCA) showed a clear Next, we annotated CpGs differentially methylated between RS,
partitioning between RS, CLL, and DLBCL samples in the most variable CLL, and DLBCL according to 12 chromatin states reported in 7 CLL
components, highlighting different DNAm patterns in each group reference epigenomes26. The 102,614 CpGs differentially methylated
(Fig. 2a and Supplementary Figs. 2 and 3). Principal component between RS and CLL (two-way moderated t test adjusted for a false
Fig. 2 | DNA methylation comparative analysis with CLL and de novo DLBCL (FDR < 0.01; beta-value differential >30%; moderated t test). f Density map of DNAm
shows that RS is a heterogeneous and hypomethylated entity. a Unsupervised between highCLL-derived and DLBCL-like RS. Smoothed beta-value densities from
principal component analysis of the adjusted DNAm values of RS, CLL, and DLBCL. the EPIC dataset. Scale from blue (no density) to yellow (medium density) and red
Geometrical centers are represented by bigger circles of the same color. b Boxplots (high density). g Boxplots showing general methylation levels for highCLL-derived
of sample-averaged methylation levels with all 397,769 CpGs. RS (n = 58) versus (n = 33), lowCLL-derived (n = 12), and DLBCL-like RS (n = 13), de novo DLBCLs
U-CLL (n = 112): p = 7.74e−11; RS versus M-CLL (n = 103): p = 4.46e−12); RS versus (n = 68), and CLLs (n = 215). CLL versus highCLL-derived RS: p = 2.2e−16; highCLL-
DLBCL (n = 68): p = 6.07e−12. c Distribution of differential CpGs (FDR < 0.01; derived RS versus DLBCL-like RS: p = 5e−3; lowCLL-derived RS versus DLBCL-like RS:
methylation differential >10%) according to the reported chromatin states in 7 CLL p = 9.9e−3; DLBCL-like RS versus DLBCL: p = 3.5e−2. BCP B cell precursors, CLL
reference epigenomes26. Enrichments are shown as a heatmap and were calculated chronic lymphocytic leukemia, DLBCL de novo diffuse large B cell lymphoma,
from the position of the selected CpGs. Their distribution was reported among 12 DNAm DNA methylation, EBV Epstein–Barr virus, FDR false discovery rate, gcBC
different chromatin state categories. Barplots in the right part of each panel show germinal center B cells, highCLL-derived RS CLL-derived RS with a high LPS, HMFDR
the methylation status difference in RS versus CLL or DLBCL. Differentially harmonic mean of the individual components FDR, MBC memory B cells, M-CLL
methylated CpGs are distributed among 3 methylation level categories. Upward IGHV-mutated CLL, lowCLL-derived RS CLL-derived RS with a LPS score below
bars indicate a comparative gain of CpGs in RS for the corresponding category, threshold, LPS linear predictor score, naiBC naive B cells, PC plasma cells, PC1/2
while downward bars indicate a comparative loss in RS. d RS versus CLL top principal component 1/2, RS Richter syndrome, U-CLL IGHV-unmutated CLL. p
annotations network (ReactomePA) from 238 differential DMRs computed with values were derived from two-sided t tests. **p < 0.01; ***p < 0.001; ns not sig-
DMRcate (Fisher’s multiple comparison statistics: min_smoothed_FDR and HMFDR nificant. For all box plots, center line indicates median; box limits indicate upper
both <0.01; max beta-value differential >30%; at least 3 CpGs in the DMR with no and lower quartiles; whiskers indicate 1.5× interquartile range; points indicate
gap >1 kb between CpGs). e DNAm-based linear predictor score (LPS) CpG archi- outliers. Source data are provided as a Source data file.
tecture. Hierarchical clustering of 4863 CpGs differential between CLL and DLBCL
discovery rate (FDR) < 0.01; 90.8% hypomethylations in RS) were: score CLL-derived RS” (highCLL-derived RS). Comparing highCLL-
depleted (ratio < 0.75) in active promoters, poised promoters, derived RS and DLBCL-like RS confirmed global hypomethylation of
promoter-associated strong enhancers, and weak promoters; and highCLL-derived RS. In addition, DLBCL-like RS genomic distribution of
enriched (ratio > 1.5) in transcription transition regions and hetero- DNAm did not coincide with that of DLBCL, with most locations
chromatin (Fig. 2c). The 82,940 CpGs differentially methylated hypomethylated in DLBCL-like RS (Fig. 2f, g and Supplementary
between RS and DLBCL (96.4% hypomethylations in RS) were: deple- Fig. 14). This subgrouping was not influenced by the tumor cell content
ted in active promoters; and enriched in poised promoters and regions (Supplementary Fig. 10).
repressed by H3K27me3. Differentially methylated regions (DMRs; see
“Methods”) between RS and DLBCL were strongly enriched in targets RS homogeneous subgrouping corroborates with gene
of polycomb complex components SUZ12 (p = 1.2e−121) and EZH2 expression
(p = 1.5e−30), which likely corresponds to the derivation of DLBCL Among the 58 RS samples investigated for DNAm, 41 also underwent
from GC or post-GC B cells. Notably, genes associated with the extra- whole-transcriptome profiling. RNA samples from 6 independent RS
cellular matrix were overrepresented in this subset (Supplementary cases were also sequenced. In total, the RNA-sequencing experi-
Fig. 9 and Supplementary Data 1). DMRs between RS and CLL were ment included lymph node samples of 47 RS, 2 paired CLLs, and 28
linked to NOTCH and Wnt pathways, and to the adaptive immune DLBCLs, plus 10 non-tumoral samples for methodologic validation
system, with PD-1 signaling and T cell/B cell co-stimulations (Fig. 2d purposes (see “Methods”). Hierarchical clustering of the 23,508
and Supplementary Data 2), which likely corresponds to the driver role identified genes confirmed clear subgrouping among RS samples
of NOTCH and PD-1 signaling in RS onset. (Fig. 3a). All RS classified as DLBCL-like RS by DNAm clustered with
DLBCL (predominantly with non-GCB subtype) and separated from
DNA methylation separates CLL-derived and DLBCL-like RS CLL-derived RS. This supports the existence of CLL-derived RS and
subgroups DLBCL-like RS, through cross-validation using an orthogonal tech-
The PCA principal component 2 split the RS samples into two sub- nique (>95% concordance). Annotations of gene clusters showed
groups, one with a profile similar to CLL, the other closer to DLBCL that CLL-derived RS shared a solid CLL gene expression signature,
(Fig. 2a). We postulated that “CLL-derived RS” (maintaining a CLL with upregulated genes involved in the BCR pathway and down-
imprint) could be separated from “DLBCL-like RS” (distinct from the regulated genes involved in the immune response, p53-signaling,
preceding CLL and closer to DLBCL). To test this, we modeled a linear and JAK-STAT pathways. Furthermore, K-means gene clustering of
predictor score (LPS)38, computing two underlying probabilities the 47 RS samples ranked according to LPS gradient revealed two
(p): one to label samples according to their CLL-derived RS profile main clusters of differentially expressed genes between highCLL-
(pCLL-derived), one for DLBCL-like RS (pDLBCL-like), defining pCLL- derived and DLBCL-like RS (Fig. 3b). One cluster is downregulated in
derived ≥ 98% and pDLBCL-like ≥ 98% to obtain highly specific and homo- highCLL-derived RS, is related to the extracellular matrix and TLR
geneous groups (see “Methods”; Supplementary Fig. 10). The statis- signaling, and included methylation-regulated p53 activity as an
tical model devised to compute LPS was constructed with 4863 CpGs interesting feature (Supplementary Data 3). The other cluster is
robust in separating CLL from DLBCL. Since de novo DLBCLs are reminiscent of a CLL signature, overexpressed in highCLL-derived
usually IGHV-mutated whereas CLL may be IGHV-mutated or -unmu- RS, and linked with NOTCH, PI3K signaling, and DNAm metabolism
tated, we excluded CpGs highly differential according to IGHV (Supplementary Data 4).
status22,24 from the LPS calculation to focus on other distinctive fea-
tures between CLL and DLBCL. The LPS scoring system was confirmed RS subgroups correlate with IGHV mutational status and CLL-RS
with hierarchical clustering (Fig. 2e), non-negative matrix factorization clonal relationship
(NMF), PCA (Supplementary Figs. 11 and 12), and displayed differential To reduce the influence of IGHV mutational status on LPS, CpGs highly
patterns on normal cells spanning the B cell lineage (Supplementary differential between U-CLL and M-CLL were filtered from the scoring
Fig. 13). The scoring system identified 33 CLL-derived RS (57%) and 13 CpGs. However, IGHV mutational status is associated with major
DLBCL-like RS (22%), leaving 12 intermediate samples (21%). This latter DNAm changes in CLL22,24,34. Therefore, we next performed PCA on the
subgroup clustered within the CLL and CLL-derived RS branch, albeit 10,000 most variable CpGs, whether associated or not with GC reac-
marginally (Fig. 2e). The subgroup was then referred to as “low-LPS tion, tagging samples with IGHV annotations (Fig. 3c). CLL-derived RS
score CLL-derived RS” (lowCLL-derived RS), in contrast to the “high-LPS accounted for nearly 80% of our RS samples and displayed a high
prevalence of IGHV-unmutated samples. In contrast, 12/13 (93%) Moreover, none of the DLBCL-like RS were clonally related to their
DLBCL-like RS were IGHV-mutated. RS subgrouping was thus highly respective CLL component (n = 5 pairs), confirming that DLBCL-like RS
associated with IGHV mutational status (p = 6.3e−9). This raises the were not M-CLL-derived RS but rather de novo DLBCLs. In contrast,
possibility that RS subgroup partitioning simply reflects DNAm pat- CLL epigenetic imprint is a feature of CLL-derived RS, likely an entity
terns of U-CLL and M-CLL. However, while most CLL-derived RS sam- arising from CLL cells (Supplementary Fig. 15). This CLL-RS clonal
ples gathered among U-CLL, DLBCL-like RS samples regrouped with relationship was further confirmed by identical IGHV-CDR3 sequences
DLBCL, well separated from M-CLL (Fig. 3c). found in paired CLL and RS samples (n = 26 pairs; p = 5.8e−6). To
Fig. 3 | RS gene expression profiles corroborate DNA methylation subgrouping. dataset. The focus is made on the most variable CpGs because these are highly
a Unsupervised hierarchical clustering of RS and de novo DLBCL transcriptomes representative of the IGHV signature in CLL (59% of these CpGs are strongly dif-
(RNA-Seq; 23,508 genes). b K-means consensus clustering of RS transcriptomes ferential between U-CLL and M-CLL). Indeed, PC1 separates IGHV-unmutated from
according to DNA methylation-based LPS gradient. Expression level statistics for IGHV-mutated B cell malignancies, with U-CLLs and U-RS segregating in the same
each cluster are displayed as barplots. Barplot: data are presented as mean values area. Conversely, M-RS partition with DLBCLs, clearly separated from M-CLLs on
+/− standard deviation from the mean. Cluster 1: n = 1657 genes; p = 1.29e−5. Cluster PC2. CLL chronic lymphocytic leukemia, COO cell of origin, DLBCL de novo diffuse
6: n = 2203 genes; p = 2.56e−7. p values were derived from two-sided t tests. Source large B cell lymphoma, DLBCL-like RS DLBCL-like Richter syndrome, e enrichment,
data are provided as a Source data file. Differential clusters are functionally anno- EBV Epstein–Barr virus, GCB germinal center B cell, highCLL-derived RS CLL-derived
tated to the right. Mutational statuses as reported with NGS, or abnormalities RS with a high LPS, LN lymph node, lowCLL-derived RS CLL-derived RS with an LPS
determined with CNV analysis on DNAm data, are added below sample annotation score below threshold, LPS linear predictor score, M-CLL IGHV-mutated CLL, M-RS
for a selected panel frequently described in CLL and RS. c Sample partitioning IGHV-mutated Richter syndrome, PC1/2 principal component 1/2, q q-value (cor-
according to IGHV mutational status. Unsupervised PCA clustering of U-RS, M-RS, rected p value), RS Richter syndrome, U-CLL IGHV-unmutated CLL, U-RS IGHV-
U-CLL, M-CLL, and DLBCL according to the 10,000 most variable CpGs in the unmutated Richter syndrome.
confirm the ability of the LPS to identify CLL-derived RS, we set up an correlations between promoter methylation levels and gene expres-
independent validation EPIC 850 K experiment, investigating 52 sam- sion (rho < −0.33; at least three hits in the same regulatory region;
ples (see “Methods” and Supplementary Fig. 16): (i) 44 new samples, Supplementary Data 5) led to a list of 666 unique associations showing
including 18 new RS, the CLL component of 14 of these, 6 new DLBCLs, enrichment in TF binding sites of SUZ12, TP63, TP53, and target genes
and 6 new CLLs; (ii) 8 samples from the first series: 4 RS samples (3 of early B cell development TFs. Conversely, 234 regions correlated
clonally related and 1 clonally unrelated), with the 4 respective CLL positively between DNAm and gene expression levels (22.7%; 3 hits
components. LPS classified 5/22 RS samples (22.7%; including the with rho >0.33; Supplementary Data 6 and Fig. 4b). These were
clonallyunrelated RS from the first series) as DLBCL-like RS. Absence involved in controlling cellular proliferation and differentiation, reg-
of clonal relationship with preceding CLL was confirmed by IGHV ulation of transcription, protein metabolism, and immune response.
sequencing for 3 of these (data unavailable for the 2 other cases). The Taken together, positively and negatively correlated locations
other 17 RS samples were identified as CLL-derived RS, with IGHV- amounted to 861 unique genes summarizing the most prominent
assessed clonal relationship for 15/15 samples with concomitant CLL features of highCLL-derived compared to DLBCL-like RS in terms of
(Supplementary Fig. 16). These findings clearly indicate that DNAm is a transcriptional mechanisms. Substantial differences in B cell devel-
powerful tool to determine the cellular origin in cases diagnosed as RS, opment programs were highlighted, including the lower expression of
as it differentiates DLBCL arising in a patient with CLL from true B-lymphocyte-associated TFs EBF1 and E2F partner MSC/ABF1, and the
morphological transformations of CLL. higher expression of CD5, CCND1, ZAP70, ID3, BLK, WNT3, PRKCZ, and
To further characterize our RS samples, we sequenced a panel of MGMT in highCLL-derived RS (Fig. 4c, Supplementary Fig. 20, and
13 CLL driver genes. Data integrated with copy number variations Supplementary Data 7).
obtained from DNAm showed a high prevalence of CLL-driver muta-
tions in RS samples harboring a CLL methylation signature (Supple- Methylome and transcriptome integration provide insights into
mentary Fig. 17). CLL-derived RS and DLBCL-like RS clinical features are RS regulatory features
displayed in Table 1. Both RS groups were uniformly treated with Key players of RS epigenetic deregulations were further identified in
rituximab-based chemotherapy regimens, yet with inferior outcome highCLL-derived RS, using DLBCL-like RS as a reference, and the 861
for CLL-derived RS (p = 1.7e−3). This was further confirmed with gene- genes transcriptionally controlled through methylation. Among these,
expression profiling, where RS samples aggregating in the CLL-derived 156 were identified as TFs (18.1%; 2.3-fold enrichment; p < 1e−16)39. The
branch of the dendrogram (Fig. 3a) were associated with a median OS regulatory network reconstructed in silico from these genes showed a
of only 8 months. In contrast, RS samples clustering with the DLBCLs central role of p53-like TFs and STAT proteins, an extensive control
were associated with a longer median OS (35.5 months; p = 0.018) emanating from master regulators such as TP53, NF-KB1, and FOXC1,
(Supplementary Fig. 18). an essential developmental TF in many tissues which may have a role as
a tumor suppressor. Over-represented target genes included those of
CLL-derived and DLBCL-like RS feature different epigenetic the transcriptional repressors ZNF418 (6.1-fold; FDR = 1.87e−21) and
networks ZNF217 (2.1-fold; FDR = 1.56e−8), involved in differentiation and
To better understand the epigenetic architecture of RS subgroups, we antagonizing cell death, respectively. On the network, downstream
performed an integrative analysis based on correlations between effectors were mainly involved in epigenetic repression via the poly-
DNAm and gene expression data (see “Methods”). The resulting inte- comb complex Prc2 (Supplementary Fig. 21 and Supplementary
grome associated 674,567 transcripts with methylation loci. From Data 8), for which we noted a SUZ12 signature (FDR = 5.68e−4) and an
these, 63,305 (9.4%) significant correlations (p < 0.01, Spearman’s rho EZH2 target enrichment (FDR = 2.84e−4) in B cells, also linked with
<−0.33 and >0.33) were first selected. Compared with DLBCL-like RS, H3K9me3, H3K27me, and H3K27me3 epigenetic marks (FDR < 3.85e−6
highCLL-derived RS were mainly hypomethylated, which transcribed in GM12878 cell line). The 156 TFs were strongly enriched in KRAB
into a dominant direction of overexpression (Fig. 4a). Matching den- domain/C2H2-ZF-type TFs defining homeobox developmental
sity maps were observed for highCLL-derived and lowCLL-derived RS, proteins40. We observed P300 favored interactions (4.2-fold increase;
with only slight differences. In contrast, DLBCL-like epigenomic pro- FDR = 3.1e−3), denoting enhancers as enriched targets41. These results
grams largely differed (Supplementary Fig. 19), so we undertook an in- support our previous findings and highlight critical pathway repro-
depth comparison of their integrome against that of highCLL-derived gramming through selected epigenetic control of key TFs as an
RS. Significant correlations between the two RS groups accumulated at important mechanism in RS.
regulatory locations and were mostly negative (77.3%; Fig. 4b). Genes
under the control of these regions were related to cell proliferation RS-based classifiers uncover “RS-type” DLBCLs with poor
(cell cycle, NOTCH pathway, PLCγ-mediated BCR signaling), epigenetic outcome
regulation and RNA processing, immune response (T- and DLBCL histological presentation of RS is essential to be distinguished
B-lymphocyte activation and differentiation), and transcriptional reg- from de novo DLBCL because they differ greatly in terms of prognosis.
ulation, including STAT family transcription factors (TF). Negative We thus developed a gene expression based linear classifier score
Table 1 | Biological characteristics of the different RS subgroups, according to DNA methylation profiling
Characteristic Full cohort CLL-derived RS DLBCL-like RS CLL-derived versus DLBCL-like RS
n/N % n/N % n/N %
Clinical features at CLL diagnosis
Age at diagnosis (years)
Median (range) 60 (35–82) 59 (35–80) 64 (52–82) p = 0.1 (NS)
Number of CLL treatment lines before RS transformation
0 18/56 32 10/44 23 8/12 66 p = 0.02
1 14/56 25 12/44 27 2/12 17
≥2 24/56 43 22/44 50 2/12 17
Clinical and biologic features at RS diagnosis
Male (%) 39/58 67 31/45 69 8/13 62 p = 0.73 (NS)
Age at diagnosis (y)
Median (range) 66 (42–88) 65 (42–83) 69 (59–88) p = 0.12 (NS)
Time to RS transformation (y)
Time <2 y 15/56 27 10/44 23 5/12 42 p = 0.44 (NS)
2 y ≤ time ≤5 y 10/56 18 8/44 18 2/12 16
Time >5 y 31/56 55 26/44 59 5/12 42
CLL status at RS diagnosis
Binet A 34/50 68 27/40 68 7/10 70 p = 0.41 (NS)
Binet B 10/50 20 7/40 17 3/10 30
Binet C 6/50 12 6/40 15 0/10 0
Response 13/52 25 12/43 28 1/9 11 p = 0.42 (NS)
Progression 39/52 75 31/43 72 8/9 89
ECOG PS > 1 28/52 54 21/42 50 7/10 70 p = 0.30 (NS)
Ann Arbor stage I–II 8/55 15 7/43 16 1/12 8 p = 0.67 (NS)
Ann Arbor stage III–IV 47/55 85 36/43 84 11/12 92
RS score
0−1 30/49 61 21/39 54 9/10 90 p = 0.07 (NS)
2–3 19/49 39 18/39 46 1/10 10
Rossi score17
High risk 28/50 56 21/40 52 7/10 70 p = 0.67 (NS)
Intermediate risk 17/50 34 15/40 38 2/10 20
Low risk 5/50 10 4/40 10 1/10 10
First-line RS treatment
R-CHOP/R-ACVBP 46/53 87 37/43 86 9/10 90 p = 1 (NS)
Platinum-based immuno-chemotherapies 7/53 13 6/43 14 1/10 10
Response to RS first-line treatment
Complete remission 15/53 28 10/42 24 5/11 45 p = 0.35 (NS)
Partial remission 2/53 4 2/42 5 0/11 0
Stable disease progression 36/53 68 30/42 71 6/11 55
OS < 12 months 42/56 75 35/44 80 7/12 58 p = 1.7 × 10−3
12 ≤ OS ≤ 48 months 8/56 14 8/44 18 0/12 0
OS > 48 months 6/56 11 1/44 2 5/12 42
EBV positive 3/21 14 1/16 6 2/5 40 p = 0.12 (NS)
IGHV unmutated 43/58 74 42/45 93 1/13 7 p = 6.3 × 10−9
Stereotyped IGHV 12/58 21 10/45 22 2/13 15 p = 0.71 (NS)
CLL clonally related 26/31 84 26/26 100 0/5 0 p = 5.8 × 10−6
Large cell component (%), median [range] 80 [50–95] 80 [50–95] 80 [50–90] p = 0.44 (NS)
Del 17p (13.1) 26/58 45 23/45 51 3/13 23 p = 0.11 (NS)
Del 11q (22.3) 6/58 10 6/45 13 0/13 0 p = 0.32 (NS)
Trisomy 12 11/58 19 9/45 20 2/13 15 p = 1 (NS)
Del 13q (14.3) 10/58 17 10/45 22 0/13 0 p = 0.09 (NS)
TP53 21/58 36 17/45 38 4/13 31 p = 0.75 (NS)
NOTCH1 21/58 36 18/45 40 3/13 23 p = 0.33 (NS)
SF3B1 12/58 22 12/45 27 0/13 0 p = 0.05 (NS)
EGR2 11/58 19 11/45 24 0/13 0 p = 0.055 (NS)
Table 1 (continued) | Biological characteristics of the different RS subgroups, according to DNA methylation profiling
Characteristic Full cohort CLL-derived RS DLBCL-like RS CLL-derived versus DLBCL-like RS
n/N % n/N % n/N %
XPO1 7/58 12 7/45 16 0/13 0 p = 0.33 (NS)
MYD88 5/58 8 1/45 2 4/13 31 p = 7 ×10-3
ATM 4/58 7 4/45 11 0/13 0 p = 1 (NS)
POT1 3/58 5 3/45 7 0/13 0 p = 0.1 (NS)
RPS15 2/58 3.5 2/45 4 0/13 0 p = 1 (NS)
FBXW7 1/58 2 0/45 0 1/13 8 p = 0.22 (NS)
BIRC3 1/58 2 1/45 2 0/13 0 p = 0.4 (NS)
BRAF 1/58 2 1/45 2 0/13 0 p = 0.4 (NS)
Two-sided Student’s t tests.
CLL chronic lymphocytic leukemia, DLBCL diffuse large B cell lymphoma, EBV Epstein–Barr virus, ECOG PS Eastern Cooperative Oncology Group performance status, NS non-significant, OS overall
survival, RS Richter syndrome.
(LCS) to discriminate CLL-derived RS cases among DLBCL samples. We confirmed strong associations with survival, independently from other
used a set of 215 genes selected from the transcriptomic CLL-derived covariates. In particular, this shorter survival was unrelated to inter-
RS signature (Supplementary Data 9; see “Methods”) to screen external national prognostic index distribution (Supplementary Fig. 30).
datasets of supposedly de novo DLBCL for the CLL-derived RS imprint. We next explored whether this effect might be due to the
We first explored an independent gene expression dataset containing enrichment of a previously described genomic subgroup of ABC-like
RS samples, untransformed CLLs, and EBV-positive DLBCL cell lines DLBCL associated with unfavorable prognosis10. In the 562-sample
(GSE103265). The 215-gene set allowed unequivocal clustering of RS dataset from Wright and colleagues, the 25 cases with top LCS scores
and CLL samples, well separated from DLBCLs (Supplementary were enriched in formerly unassigned (1.5-fold relative enrichment;
Fig. 22). To cross-validate the previously described DNAm-based p = 0.04) and N1 subgroups (6-fold; p = 6e−3) while depleted in EZB
classifier (LPS) with the gene expression-derived classifier (LCS), we subtype (p = 0.03). These cases were also strongly enriched (6.74-fold;
explored array-based DNAm and transcriptome-sequencing data of p = 4.1e−4) in samples collected at relapse, raising the hypothesis of the
the ICGC MMML-Seq consortium (both classifiers can be used inde- ability of our classifier to identify DLBCL prone to relapse. Thus, the
pendently). Four (6.2%) DLBCL samples with classical DLBCL mor- extreme LCS values seemed to characterize a distinct subset of ABC-
phology showed extreme DNAm and gene expression scores, type DLBCL, accounting for 4.3–8.3% of de novo DLBCL, with poor
suggesting a CLL-like RS profile (Supplementary Fig. 23). Applying the prognosis. The highest 25% scores in the series from Wright and col-
gene expression-based classifier to array-based gene expression data leagues showed biased distributions in genomic subgroups, domi-
of 430 DLBCL from the MMML-network identified 31 samples (7.2%) nated by unclassified cases, and associated with shorter PFS and OS
with a statistically significant score (see “Methods”). Next, we mined (Fig. 6). These findings suggest an ability of the LCS classifier to: (i)
four large external cohorts of de novo DLBCL, including identify high-scoring DLBCL samples as a separate DLBCL entity within
1342 samples8–10,42. As with previous datasets, gene expression-based de novo DLBCL, associated with ABC phenotypes and other features
LCS distributions were biased toward overrepresenting extreme comparable to RS; and (ii) linearly classify other samples according to
positive values (Supplementary Fig. 24). Our transcriptomic classifier survival and overall prognosis (Supplementary Figs. 29 and 31). Inter-
identified 35/420 (8.3%; series from Lenz and colleagues)42, 8/137 (5.8%; estingly, while absent from the 215-gene list, the CLL-associated mar-
series from Chapuy and colleagues)8, 13/223 (5.8%; series from Dubois ker CD5 was overexpressed in RS versus DLBCL (2.4-fold; FDR = 2.13e
and colleagues)9, and 24/562 (4.3%; series from Wright and −3) and highCLL-derived versus DLBCL-like RS (2.3-fold; FDR = 0.01). In
colleagues)10 samples harboring the CLL-derived RS signature with a the dataset from Wright and colleagues10, CD5 expression was higher in
score above the threshold, for a total of 80/1342 (5.9%) samples. In the samples within the top 25% LCS than in other samples (p = 5.8e−7),
four datasets, 91.6% to 100% of these samples were of ABC-like sub- corroborating our results. Last, in a dataset with concomitant tran-
type. We cross-compared this 215-gene signature and a discriminant scriptome and CD5 immunochemistry staining GSE66770, the majority
44-gene signature of ABC-type DLBCL38, identifying LMO2 as the only (17/22; 77.2%) of the top 25% samples were CD5+ DLBCL (2.1-fold
common gene. LMO2 is an important gene of the ABC signature but enrichment) while this proportion was significantly lower (16/68;
holds no more weight in our classifier than the other 214 genes. Indeed, 23.5%) in the rest of the cohort (p = 4.73e−3).
instead of just outlining every ABC-subtype DLBCL, our classifier
extracted DLBCL with outstanding features, enriched in, but not Discussion
exclusively, ABC-subtype DLBCL, with an overlap between ABC and In this study, by using genome-wide DNAm analysis and whole-
GCB DLBCL and a subset of ABC-subtype DLBCL associated with a low transcriptome gene expression profiling, we extensively characterized
LCS (Supplementary Fig. 25). DLBCL sharing the extreme score values the epigenetic architecture of primary human RS samples. We identi-
with CLL-like RS showed a shorter progression-free survival (PFS) and/ fied a CLL epigenetic imprint that can act as a surrogate for identifying
or OS (p values ranging from <10−3 to 0.02 depending on the cohort) whether an RS is clonally related to CLL or has arisen de novo. Dis-
compared to all other samples, and compared to other ABC-subtype covery of the CLL imprint in an RS sample avoids reliance on obtaining
DLBCL (p values ranging from <10−3 to 0.07) (Fig. 5a, b and Supple- tumor DNA at the CLL stage. Considering de novo DLBCL, DNAm- and
mentary Figs. 26–28). We next conducted a multivariate analysis with gene expression-based classifiers delineated an RS-like subset in
Cox Proportional Hazards models, including all available covariates to datasets from several landmark studies that was not previously
evaluate the association of gene expression-based LCS with survival described at the genomic level, was enriched in cases with an ABC-like
(OS and PFS). This association was set up in binary (top 25% versus the COO signature, and had an unfavorable prognosis7,8,10.
rest) as well as linear (as a continuous variable) models and provided Previous extensive explorations with exome or full genome-
estimates and effect size for each covariate (IPI, TP53 and MYC/BCL2 sequencing had found differences in genomic landscapes between
double hit status; Supplementary Fig. 29). This systematic analysis DLBCL-subtype RS and de novo DLBCL16–20. Here we used a different
a c
highCLL-derived vs DLBCL-like RS
Hypomethylated Hypermethylated
Overexpressed
Underexpressed
-2
Negative
-4 correlation
Homodirectional
Opposite change
rho = -0.5
rho = -0.33
Negative
correlations
627
(72.8%)
39
(4.5%)
195
(22.6%)
Positive
correlations
rho = -0.5
rho = -0.33
Fig. 4 | Integrative analysis of DNA methylation and transcriptome data correlations over sliding windows of 1 Mb. Series of vertically aligned dots indicate
highlights different epigenetic programs in highCLL-derived and DLBCL-like RS. DMRs (of at least 3 CpGs with a hit in TSS-associated location) significantly corre-
a Density map (smoothed density scatterplot) representing overall DNA methyla- lated with gene expression. Upper part: negative correlations, amounting to 666
tion versus gene expression changes between highCLL-derived RS and DLBCL-like unique genes; bottom part: positive correlations, amounting to 234 unique genes; a
RS. Scale ranges from blue (no density), to yellow (medium density) and red (high VENN diagram indicates the overlap between negative and positive correlations.
density). Only genes with at least one significant correlation (Spearman’s test; p c Quadrant scatterplot displaying methylation levels of regulatory sequences and
value <0.01) were retained. Locations of the corresponding CpGs were mainly corresponding expression levels for the 861 selected genes (overall absolute cor-
distributed in proximal and distal regulatory regions, with specific enrichments in relations: rho = 0.72; p < 2.2e−16; Spearman’s tests). The upper left and lower right
TSS features for negative (TSS200: 2.6-fold, TSS1500: 2.2-fold) and positive quadrants show genes with a negative correlation between methylation and
(TSS200: 1.2-fold, TSS1500: 1.6-fold) correlations. Hypo/hyper-methylations and expression. Lower left and upper right areas: genes with positive correlations. CLL
under/over-expressions are indicated relatively to the highCLL-derived RS sub- chronic lymphocytic leukemia, DLBCL de novo diffuse large B cell lymphoma,
group. b Manhattan plots of negatively and positively correlated regulatory regions DLBCL-like RS DLBCL-like Richter syndrome, DMR differentially methylated region,
and associated transcript expressions. Chromosomes are displayed at the bottom highCLL-derived RS CLL-derived RS with a high linear predictor score, RS Richter
of each plot, with a color code (from green to red) indicating the density of syndrome, TSS transcription start site.
study design and methodological approach to expand this knowledge. rather than pinpointing a limited number of specific targets. Thirdly,
Firstly, we studied epigenetic deregulations using robust and proven we compared the RS epigenetic profile to that of large cohorts of
methods, and profiled the RS molecular landscape beyond gene diverse CLL and de novo DLBCL, which contrasts with previous work
mutations and copy number variations. Secondly, we conducted a mostly focusing on the RS transformation process.
comprehensive analysis of RS pathophysiology which combined the Human-derived xenograft mouse models and cell lines were
analysis of genome-wide DNAm and whole transcriptome profiling, recently reported to study RS biology and test drug response11–15.
Survival probability
Top scores
b
Survival probability
Fig. 5 | DLBCLs harboring the CLL-derived RS epigenetic signature are asso- multitesting adjustments. Datasets: from Lenz et al. (n = 420; microarray, accession
ciated with ABC phenotype and worse outcome. a Kaplan–Meier estimates of under GSE10846; PMID: 21546504); from Chapuy et al. (n = 137; microarray,
progression-free survival for n = 429 patients from three combined and clinically accession under GSE98588; PMID: 29713087); from Dubois et al. (n = 223; micro-
annotated public DLBCL datasets8–10. Comparative PFS between patients with top array, accession under GSE87371; PMID: 31648986); from Wright et al. (n = 562;
LCS and the rest of the cohorts, according to COO (p = 8.4e−8). b Kaplan–Meier RNA-Seq; PMID: 32289277). ABC activated B cell, CLL chronic lymphocytic leuke-
estimates of overall survival for n = 780 patients from four combined and clinically mia, COO cell of origin, DLBCL de novo diffuse large B cell lymphoma, GCB
annotated DLBCL public datasets8–10,42. Comparative OS between patients with top germinal center B cell, LCS linear classifier score, OS overall survival, PFS
LCS and the rest of the cohorts, according to COO (p = 1.1e−11). Statistical com- progression-free survival, RS Richter syndrome.
parisons were performed with the log-rank test. Bonferroni method was used for
However, the availability of these models is limited and they cannot patterns, that were largely corroborated by transcriptome data, in an
recapitulate the full heterogeneity of RS, as they were generated from a independent manner.
limited number of tumor samples. Our approach using large cohorts of Our genome-wide DNAm data provide a more complete RS
primary human RS samples and comparative tumor material also holds hypomethylation profile description. The DNAm patterns confirm pre-
promise for discoveries and better characterize the wide RS epigenetic vious findings that RS is a DNA-hypomethylated entity as compared
complexity. We cross-validated our epigenetic findings using DNAm with CLL and de novo DLBCL27. Such global hypomethylation may in
Other
BN2 (NOTCH2/BCL6)
EZB (epigenetic, KMT2D, CREBBP, EP300, EBF1)
MCD (MYD88/CD79B)
A53 (TP53 mutations/deletions)
ST2 (SGK1/TET2)
N1 (NOTCH1 gain of function)
Mixture
* *
*
*
* **
**
** **
__ +++ __ __ __
++ ++ ++ ++ ++
12.2%
52.8%
Proportion among the top 25 (4.4%) LCS Proportion among the top 25 LCS
Proportion among the lower LCS Proportion among the lower LCS
Progression free survival (years)
part reflect a more extensive proliferative history of the RS subclone21, CLL-RS follow-ups. As nearly all de novo DLBCLs harbor a mutated
as measured by the epiCMIT mitotic clock33. Using a reproducible IGHV, we propose that RS clonally related to the underlying CLL clone
DNAm microarray uniformly spanning the vast majority of regulatory are: (i) IGHV-unmutated DLBCL; and (ii) IGHV-mutated DLBCL with a
regions at a whole-genome scale37, we characterized the epigenetic CLL imprint. Determining CLL history using DNAm and gene expression
architecture underlying the commonly accepted dichotomic hetero- by identifying a CLL imprint independently from matched-CLL avail-
geneity with regard to whether a primary RS is clonally related to CLL or ability is a step forward, and is essential for clinical and therapeutic
has arisen de novo17. As expected, around 80% of our RS samples har- management. Interestingly, DLBCL-like RS would conversely be DLBCL
bored a CLL epigenetic imprint (likely derived from a pre-existing CLL without clonal relationship with the CLL counterpart. However, DNAm
clone). This was confirmed by identical IGHV-CDR3 sequences for all of DLBCL-like RS differed from that of de novo DLBCL in terms of
Fig. 6 | The gene expression-based LCS linearly classifies de novo DLBCL sam- from Wright and colleagues. Comparative PFS between patients with top 25% LCS
ples, with high scores enriched in N1, unclassified genomic profiles10, and and the rest of the cohort. Statistical comparisons were performed with the log-
shorter progression-free survival. Dataset from Wright et al. (n = 562; RNA-Seq; rank test (p = 1e−4). Source data are provided as a Source Data file. ABC activated B
PMID: 32289277). Two-sided t tests were used to assess statistical significance. Top cell like, A53 TP53 mutations/deletions-associated DLBCL subgroup, BN2 DLBCL
25 LCS scores: enrichment in “other” subtype (e = 1.51; p = 4.6e−2); depletion in EZB subgroup associated with lesions of BCL6 and/or NOTCH2, COO cell of origin,
subtype (e = 0; p = 3.0e−2); enrichment in N1 subtype (e = 5.99; p = 6.4e−3). Top 141 DLBCL de novo diffuse large B cell lymphoma, EZB DLBCL subgroup associated
(25%) LCS scores: enrichment in “other” subtype (e = 1.28; p = 1.4e−2); depletion in with abnormalities of epigenetic regulators KMT2D, CREBBP, EP300, and/or EBF1,
BN2 subtype (e = 0.56; p = 1.9e−2); enrichment in MCD subtype (e = 1.57; p = 1.7e−2); GCB germinal center B cell, LCS linear classifier score, MCD DLBCL subgroup
depletion in EZB subtype (e = 0; p = 1.7e−8); enrichment in A53 subtype (e = 1.62; associated with lesions of MYD88 and/or CD79B, N1 DLBCL subgroup associated
p = 7.5e−2); depletion in ST2 subtype (e = 0; p = 2.6e−3); enrichment in N1 subtype with NOTCH1 gain of function, PFS progression-free survival, RS Richter syndrome,
(e = 2.92; p = 7.0e−3). Survival curves: Kaplan-Meier estimates of progression-free ST2 DLBCL subgroup associated with lesions of SGK1 and/or TET2. *p value <0.05;
survival for n = 233 patients from a clinically and genomically annotated dataset **p value <0.01; ++: enrichment >1.2; +++: enrichment >5; –: depletion <0.6.
increased cell cycle activity and IGF1, ERK/MAPK, PI3K/AKT, and PD-1 R-CHOP chemotherapy in N1 subtype DLBCL49, the enrichment in N1
signaling pathways. These differences suggest influences of the CLL- profile within RS samples supports research into whether these
invaded microenvironment for the development of a specific DLBCL patients may also benefit from BTK inhibition combined with R-CHOP
pathogenesis43. chemotherapy. However, a recent single cell transcriptome analysis of
Moreover, by integrating the DNAm and transcriptomic data, we sequential CLL-RS samples revealed that, as compared to the CLL cells,
evidenced different epigenetic networks in CLL-derived and DLBCL- RS cells downregulate genes related to BCR signaling and upregulate
like RS. Epigenetic architecture remodeling and subsequent dereg- those involved in oxidative phosphorylation21, and therefore RS may
ulation of EZH2 and Wnt pathways, as well as PI3kinase/AKT and be less sensitive to Ibrutinib. By applying a stringent cut-off to our
IGFR1 signaling cascades, unravel CLL-derived RS underlying transcriptomic score, generalized to all studied DLBCL datasets8–10,42,
mechanisms potentially responsible for chemotherapy resistance. we identified a separate de novo DLBCL subset associated with a
These mechanisms are potentially druggable through EZH2, PI3K/AKT, median PFS comparable to that of clonally-related RS. Based on our
or IGFR1 inhibitors. IGFR1 pathway triggering was recently described as observations, 4-8% of DLBCL diagnosed as de novo DLBCL, non-
a resistance mechanism to targeted therapy in CLL44. Interestingly, O6- otherwise specified, may in fact be a subgroup of DLBCLs sharing
methylguanine-DNA methyltransferase MGMT regulatory sequences common epigenetic and transcriptional features with clonally related
are hypomethylated and MGMT is consequently overexpressed in CLL- RS, and with a similar unfavorable outcome. We propose a stable and
derived RS. MGMT promoter hypomethylation status is a known reproducible expression-based classifier widely applicable to tran-
negative prognostic marker in glioblastoma45, de novo DLBCL46, and an scriptomic data, enabling the identification of this specific entity
actionable target. This marker is easily assessable in the context of within supposedly de novo DLBCL, termed “RS-type DLBCL.” Limita-
DLBCL diagnosis and routinely used to guide therapeutic decisions. tions of the transcriptome scoring method are dataset size and com-
Our results show B cell-specific TF implication and epigenetic position (DLBCL features associated with outcome), which by design
imprint in CLL-derived RS, and emphasize the previously described prevent the exploration of single samples independently and may
important role of TP53, FOXC1, NF-KB, and epigenetic regulators in exert biases. However, the method also demonstrated the linear
oncogenic mechanisms. Strikingly, genes involved in the regulation of association of DLBCL scores with poor outcome and clinical variables
TP53 activity through methylation were overexpressed in CLL-derived of cancer aggressiveness, and so constitutes a means for improving the
RS, confirming the central role of TP53 in clonally-related RS and the current DLBCL classification system.
primary importance of epigenetic deregulation in the transformation In conclusion, our study has revealed several relevant aspects of
process. An interesting finding of this study is the putative role of the RS biology, including the complete RS hypomethylation profile and
FOXC1 TF in the RS regulatory network. FOXC1 has previously been differentiation of clonal versus non-clonal RS according to DNAm
described as cooperating with HOX family members for orchestrating patterns and gene expression profiles. The discovery of a CLL imprint
mesenchymal tissue development, through NF-KB signaling47. FOXC1 allows clonal relationship assessment without the need for tumor DNA
is PRC2 repressed during hematopoietic development, but frequently at the CLL stage. Subgrouping of primary RS samples according to
derepressed in hematopoietic progenitors in acute myeloid extensive characterization of the epigenetic architecture has provided
leukemia48. Our data identified FOXC1 derepression as a hallmark of information underlying oncogenic processes, with clear clinical
CLL-derived RS, likely associated with the blockade of B cell develop- implications. In particular, identification of RS-type DLBCL cases helps
ment and proliferation due to NF-KB signaling unleashing. We also to advance the current DLBCL classification system and could be
observed hypomethylation of DMRs regulating the expression of incorporated in treatment decisions, potentially improving disease
genes involved in the extracellular matrix organization, and in the management. Our findings also enable the evaluation of larger cohorts
immune system. These observations suggest a strong influence of the recruited in clinical trials and the development of novel treatment
microenvironment in RS development. approaches, which are urgently needed in RS.
Notably, our findings directly translate into classification and
prognostication of de novo DLBCL, the most common human B cell Methods
lymphoma. We provide a gene expression-based, stable, reproducible, Our methods and results made extensive use of data from previous
and potentially widely applicable classifier, on the basis of a CLL- landmark studies26,31. Care was taken to follow good practices in the
derived RS epigenetic imprint. The classifier differentiates a particular analyses of methylome and transcriptome data, employing widely
DLBCL subgroup from supposedly de novo DLBCL datasets. Of clinical approved procedures previously used in other high-standard studies.
importance, cases assigned to this subgroup are frequently not Regarding the handling of large cohorts, we used sample correlations,
detected by recently described genomic and gene expression classi- performed genotype checks between omics data, and added technical
fiers of DLBCL, and they are associated with an unfavorable prognosis. and biologic replicates wherever possible.
These cases were ABC-like DLBCL, enriched in unclassified or N1
DLBCL genomic subtypes7,10. This is in line with the association of RS Ethics statement
with a particular gene expression profile and with NOTCH1 mutations This study complies with all relevant ethical regulations and we have
and NOTCH pathway activation. Given the efficacy of ibrutinib plus obtained written informed consent for all participants. No
compensation was provided. We obtained consent to use and publish distinct samples (92 benign B cells, 215 CLLs, 68 DLBCLs and 58 RS),
information that identifies individuals, including indirect identifiers combined into a single 450K object containing probes shared by 850K
such as gender and age. Individuals recruited for this study can no and 450K microarrays: (i) raw IDAT files corresponding to 96 and
longer be identified by the information provided, due to sample 377 samples for the 850K (866,091 CpGs) and 450K (485,512 CpGs)
anonymization and processing of the genomic data. All procedures platforms, respectively, and included technical replicates; (ii) each
were in accordance with Helsinki declaration. Study protocol was subset was loaded independently, stored into a dedicated RGChan-
approved by the Institutional Review Boards and Ethics Committees of nelSet minfi object, along with full sample annotations, then both were
Nancy, Kiel (#A150/10), Ulm (#349/11; #459/19 and #96/08) and Bar- combined into a third subset containing 473 samples × 452,567 CpGs
celona university hospitals, and by the French national ethics com- using the combineArrays function with output type as “IlluminaHu-
mittee (Comité de Protection des Personnes Ouest IV 09/05/2017). manMethylation450k”; (iii) the EPIC dataset stems from the first
(850K) subset alone, the FULL dataset is obtained from the combined
Patients and materials subsets.
A multicenter registry of RS accrual was established across nine centers To reduce technological issues and biases, the same preparation
affiliated to the French Innovative Leukemia Organization (Clinical- protocol was applied to both EPIC (850K) and FULL (combined) sub-
Trials.gov Identifier: NCT03619512). Sixty-four patients diagnosed with sets. The main stages of the filtering and quality control pipeline are as
DLBCL-subtype RS were enrolled. Fresh frozen biopsies were gathered follows: (i) technical checks, filtering, and evaluations (ii) data nor-
at RS diagnosis and met the criteria for DLBCL, including diffuse pat- malization with SWAN54; (iii) probes located on X and Y chromosomes,
terns of large B cells with the same size as macrophages or twice the size flagged as cross-hybridization probes, or located near known SNPs
of normal lymphocytes3,50. For all patients, diagnoses were reviewed and were further removed with the rmSNPandCH function (with para-
confirmed by two independent pathologists. Only RS samples with at meters dist = 2 and mafcut = 0.05) available from the DMRcate
least 50% (median 80%, range 50-95%) high-grade component assessed package55; (iv) imputation of the remaining failed β-value positions
by pathology review were selected for analysis. The same process was with imputePCA of the R missMDA package56,57, (v) 2 × 2 sample cor-
applied for assembling a validation cohort of 58 samples, further relation checks (Supplementary Figs. 32 and 33). Correlation heat
reduced to 52 QC-passed samples, which we processed to an indepen- maps were rendered with the R corrplot package; (vi) extended quality
dent EPIC 850K experiment. This 52-sample validation cohort included control step to remove sample outliers and check for residual post-
44 new samples: 18 new RS samples, the CLL component of 14 of these, 6 normalization batch effects (Supplementary Fig. 34); (vii) ultimately,
new DLBCL samples, and 6 additional CLLs. In addition, 8 samples from technical replicates were averaged into unique samples as all replicates
the training series were used as controls: 4 RS samples (3 clonally related were found comparable (Supplementary Fig. 35). These filtering steps
and 1 clonally unrelated), with the 4 respective CLL components (Sup- led to the final EPIC (90 samples × 794,927 CpGs) and FULL (433 sam-
plementary Fig. 16). Thus, this EPIC 850K experiment investigated 22 RS, ples × 397,769 CpGs) datasets.
6 new DLBCLs, and 24 CLLs, including 18 paired-CLLs.
Fifty-eight of the 64 enrolled patients with RS were from a pre- Technical checks, filtering, evaluations, and quality control. These
viously described cohort, and both targeted NGS sequencing and DNAm steps included failed CpGs removal (>10% samples with a detection p
exploration were performed; 56 of these 58 patients with RS underwent value >0.01), gender check between clinical data and gender returned
18F-fluorodeoxyglucose positron emission tomography/computed by the getSex function, and genotype checks (Supplementary Data 10)
tomography for initial diagnosis51. For the other six patients with RS, the between RNA-seq data (see Supplementals) and genotypes inferred
fresh frozen biopsy was too small for extracting both DNA and RNA, and with the beta2genotype function available from the R OmicsPrint
due to the large cellular component (>70%), we prioritized gene package58.
expression data and only RNA sequencing was performed. The minimal
tumor purity was raised to 70% for RNA analysis, as contamination by Cell composition deconvolution. Cell type composition was esti-
signal from residual normal cells strongly influences global gene mated for each sample with the estimateCellCounts function against a
expression, especially for a subset of transcripts with very low expres- library of 6 normal white blood cells (CD8 T cells, CD4 T cells, NK, B
sion in tumor cells but high expression in residual normal cells. cells, monocytes, and granulocytes) (Supplementary Data 11). The
Additional data for CLL (n = 215), and 92 normal B cells spanning proportions of each explored cell type were reported and later used as
the entire B lineage development were obtained as part of previously covariates in statistical models to adjust for B cell representation in the
published studies22,25,26,34,35. DNA methylation from 68 de novo DLBCL mixes. Blood samples deprived in B cells (<30%) were thus discarded
cases were also used as a reference. These DLBCLs originate from a from further analyses.
larger lymphoma cohort gathered by the ICGC MMML-seq
consortium52. Finally, 10 lymph nodes from healthy subjects were Downstream bioinformatics
analyzed as a control group for transcriptome sequencing. Supervised analyses. As a rule, β-values were used for direct inter-
pretation and graphical representation, while M-values were favored
Methylome data analyses for statistics and computations. Linear modeling based on empirical
EPIC microarray. DNAm status of 866,562 CpG sites was interrogated Bayesian methods was used to assess for CpG differential methylation.
on the Infinium Methylation EPIC array (Illumina, San Diego, CA, USA; When applicable, these models included cell deconvolution results as
see Supplementals), later referred to as the EPIC 850K platform. added covariates to correct for B cell content. Additionally, at this
point, any unwanted methylation variation such as residual batch
Dataset generation. Datasets were created using the minfi package53. effects were removed by using the RUVm function from package
The EPIC set comprises 90 distinct samples (58 RS, 25 CLL, plus a missMethyl59. The overall dispersion was calculated on the entire
subset of 7 DLBCL replicates also available on 450K), interrogated on dataset, then p values for each comparison were obtained with a two-
EPIC 850K. DNA methylation data from the control groups (215 CLL, 68 way moderated t test and adjusted for FDR following the
DLBCL, 92 normal B cells spanning the entire B-lineage) were acquired Benjamini–Hochberg procedure. At probe level, an FDR < 0.01 indi-
with the Illumina Infinium® HumanMethylation450 BeadChip (later cated statistical significance. Differentially methylated region (DMR)
referred to as the 450K platform)22,25,26,34. These and the EPIC 850K data determination was performed on the same linear models with dmrcate
were processed from IDAT files. Analyzes were run under R 3.6 with (package DMRcate), with lambda = 1000 and C = 3. FDR cut-off for first
Bioconductor 3.10 and later versions. The FULL dataset comprises 433 allowing a CpG to initiate a DMR was set to FDR = 0.01, and DMRs were
considered statistically significant if both min_smoothed_fdr and Finally, from each of these 4863 CpGs and for each sample S of the
HMFDR output probabilities were <0.01. cohort, the score
Raw abundance filtering and normalization. Raw counts were fil- to “TSS200,” “TSS1500,” “first exon,” or “TSSoverlap2kb” (each
tered by applying a minimum expression threshold for a gene or linked to the same transcript identifier).
transcript. Those had to be expressed (non-zero value) in at least two Manhattan representations were plotted against the background
samples and present an average expression value across all samples with the R CMplot package. Gene set enrichment and pathway analyses
higher than 1/5,000,000 of the average library size (64 ± 3 million of selected candidate lists were carried out as described in Methylome
reads per sample), that is, at least 20 reads per feature. Data was fur- data analyses. Interaction networks of putative TFs encoded by can-
ther adjusted with the TMM normalization method64, and finally was didate genes, protein domain enrichments, and effector functions
log2 and cpm (count per million) transformed65. A total of 23,508 were performed with STRING tools [https://string-db.org/]71. A curated
genes and 77,491 transcripts were identified and reported at the end of database of 1639 human TFs with DNA-binding domain information
the process. Pearson’s correlations for gene expression levels averaged was obtained from http://humantfs.ccbr.utoronto.ca/. Regulatory
at 0.92 for genes and 0.75 for transcripts and were very stable across networks were built with NetworkAnalyst [www.networkanalyst.ca]72.
samples (data not shown).
Methodology for building the gene expression-based scoring
Gene and transcript annotations. All transcriptomic analyses were system
performed using the hg38 reference assembly of the human genome. CLL-derived RS signature. A 215-gene set was obtained by extracting
Results were fully annotated with known symbols corresponding to two clusters of strongly correlated up- and down-regulated profiles
gene and transcript genomic locations whenever possible. Upon from the transcriptome hierarchical clustering tree (Fig. 3a, Supple-
completion of the transcript assembly, gene symbols were assigned mentary Fig. 36, and Supplementary Data 9). The two initial clusters
Ensembl IDs based on overlapping positions with known transcripts displayed a very high enrichment in CLL genes and mainly drove the
(90% overlap minimum). In case of failed overlap, custom and unique whole sample aggregation process. These were further reduced to
IDs were used. Therefore, gene and transcript assignments were based protein-coding genes, to avoid biases when applying the signature to
on the Ensembl66 GRCh38 annotations available in both core and transcriptomes of different origins, which may not contain ncRNAs or
funcgene databases, version 90. These were downloaded from ftp:// genes of undefined biotype. The reduced set was then overlapped with
ftp.ensembl.org/pub/release-90/mysql/ for local installation and query genes integrating significantly between transcriptome and methy-
with in-house custom tools). lome. The resulting 215-gene signature contained 93 protein-coding
genes underexpressed in CLL-derived RS and 122 protein-coding genes
Transcriptome explorations. Unsupervised analyses were all carried overexpressed in CLL-derived RS.
out with hierarchical and K-means clustering techniques, as previously
described67. Expression values were median-centered, and uncentered Linear classifier score (LCS). For each analyzed dataset, scores were
Pearson’s correlation was used as distance metrics. Supervised ana- obtained according to the following procedure, to render the process
lyses were performed through linear modeling (empirical Bayes), and as reproducible as possible. (i) When applicable, raw expression data
differential expression p values were obtained using a two-way mod- with relevant sample annotations were retrieved from the Gene
erated t test then adjusted for FDR following the Benjamini–Hochberg Expression Omnibus curated database (https://www.ncbi.nlm.nih.gov/
procedure. An FDR < 0.01 indicated statistical significance. Cluster geo/) with GEOquery73. Expression matrices were then prepared,
dissection was achieved with functional annotation tools for target described statistically, and normalized according to a well-established
gene associations, such as the Open Targets platform68, and gene protocol74. Otherwise, already normalized expression data were used
signature correlation with public datasets from multiple databases, “as is”. (ii) Whole transcriptomes were reduced to their features (genes,
such as GEO (Gene Expression Omnibus), with Enrichr [https:// transcripts, probes) corresponding to matches with the 215-gene sig-
maayanlab.cloud/Enrichr]. nature. (iii) Expression values were summed up over genes to obtain an
aggregated and unique expression for each gene. (iv) Data were scaled,
Methylome and transcriptome data integrations i.e., mean-centered and standard-deviation-reduced. (v) Positive out-
Here we focused on the RS cohort, for which 41 RS samples overlapped liers were trimmed at the last permille (99.9%) to reduce the impact of
between methylome and transcriptome experiments. A subset of the extreme gene expression values on the score but preserve high
methylome EPIC dataset (M-values, normalized and curated) and part enough values as essential markers. Trimmed values were replaced
of the transcriptome dataset (gene and transcript CPMs – also nor- with the last permille value. After a distribution check, no negative
malized and filtered) were integrated to eliminate unwanted signals outliers were found in any dataset. (vi) For each of the 215 genes,
and pinpoint the functional mechanisms linking DNA methylation of weights were assigned: those originating from the upregulated cluster
regulatory regions with gene expression in RS. were weighted +1 and those originating from the downregulated
Both datasets were re-annotated with biomaRt69 and linked cluster were weighted −1. (vii) Finally, LCS scores were computed as
using two methods: (i) with shared Ensembl identifiers; and (ii) by the mean of weighted gene expressions for each sample S of the
genomic coordinates for refined feature overlap when the first dataset:
method failed. We used “TSS200,” “TSS1500,” and “first exon” CpG
information to define associations with promoter regions in the next 1X n
LCSðSÞ = G :W ð3Þ
analysis steps, and overlap was considered successful within 2 kb n i=1 i i
between CpG and gene transcription start sites (TSS). The inte-
gromes generated at this step represented 475,148 and 674,567 with n the number of genes in the signature, and Gi representing
associations at the gene and transcript levels, respectively. As the gene i weighted by Wi. LCS scores were then standardized (mean-
described in a similar setup70, Spearman’s correlations were calcu- centering to 0 and standard-deviation-reducing to obtain scores fully
lated for each association. Correlations at the gene level were used comparable between datasets). The obtained Z-scores were compared
for generating density plots and presenting a general view, whereas to a normal distribution in a one-way test to calculate a p value, used to
transcripts were used for precise analyses and final results. These define the initial LCS cutoff (p < 0.05) in each dataset.
were filtered into candidate transcriptional effector locations, by
selecting “promoter regions” containing at least three negatively- Statistics and reproducibility
correlated CpGs (rho < −1/3; p value <0.001) or three positively cor- No statistical method was used to predetermine sample size. Data
related CpGs (rho > 1/3; p value <0.001) with features corresponding exclusion criteria according to quality controls are explained in the
“Methods” section. The experiments were not randomized. The 9. Dubois, S. et al. Refining diffuse large B-cell lymphoma subgroups
investigators were not blinded to allocation during experiments and using integrated analysis of molecular profiles. EBioMedicine 48,
outcome assessment. 58–69 (2019).
10. Wright, G. W. et al. A probabilistic classification tool for genetic
Reporting summary subtypes of diffuse large B cell lymphoma with therapeutic impli-
Further information on research design is available in the Nature cations. Cancer Cell 37, 551–568.e14 (2020).
Portfolio Reporting Summary linked to this article. 11. Vaisitti, T. et al. Novel Richter syndrome xenograft models to study
genetic architecture, biology, and therapy responses. Cancer Res.
Data availability 78, 3413–3420 (2018).
Raw DNA methylation, gene expression and targeted NGS data gen- 12. Chakraborty, S. et al. B-cell receptor signaling and genetic lesions
erated in this study from RS samples have been deposited in the Eur- in TP53 and CDKN2A/CDKN2B cooperate in Richter transformation.
opean Genome-Phenome Archive (study EGAS00001005495) under Blood 138, 1053–1066 (2021).
accession number EGAD00010002194 for DNA methylation data; 13. Iannello, A. et al. Synergistic efficacy of the dual PI3K-δ/γ inhibitor
accession number EGAD00001007922 for transcriptomic data, and duvelisib with the Bcl-2 inhibitor venetoclax in Richter syndrome
accession number EGAD00001009509 for targeted NGS data. The raw PDX models. Blood 137, 3378–3389 (2021).
data are protected and available under restricted access. Clinical and 14. Vaisitti, T. et al. ROR1 targeting with the antibody-drug conjugate
genomic data can be obtained by contacting the data access com- VLS-101 is effective in Richter syndrome patient-derived xenograft
mittee, according to the European Genome-Phenome Archive’s pro- mouse models. Blood 137, 3365–3377 (2021).
cedure. Data access will be granted if their use complies with the data 15. Schmid, T. et al. U-RT1 - a new model for Richter transformation.
use conditions, including a commitment to strictly use these data for a Neoplasia 23, 140–148 (2021).
clearly identified academic research programs and according to good 16. Scandurra, M. et al. Genomic profiling of Richter’s syndrome:
practice recommendations. The Data Access Committee will respond recurrent lesions and differences with de novo diffuse large B-cell
to requests within 2 weeks. Once access to the data is granted, these lymphomas. Hematol. Oncol. 28, 62–67 (2010).
are available until the end of the research program they support. 17. Rossi, D. et al. The genetics of Richter syndrome reveals disease
Previously published DNA methylation datasets from the ICGC MMML- heterogeneity and predicts survival after transformation. Blood 117,
seq consortium that were used in this study are available upon request 3391–3401 (2011).
from the data access committee at the ICGC consortium data portal 18. Fabbri, G. et al. Genetic lesions associated with chronic lympho-
[https://dcc.icgc.org/]. Published datasets can be found under the cytic leukemia transformation to Richter syndrome. J. Exp. Med.
following accession codes: GSE103265; GSE66770; GSE10846; 210, 2273–2288 (2013).
GSE98588; GSE87371. All other data supporting the findings of this 19. Chigrinova, E. et al. Two main genetic pathways lead to the trans-
study are available from the corresponding authors upon formation of chronic lymphocytic leukemia to Richter syndrome.
request. Source data are provided with this paper. Blood 122, 2673–2682 (2013).
20. Klintman, J. et al. Genomic and transcriptomic correlates of Rich-
Code availability ter’s transformation in chronic lymphocytic leukemia. Blood 122,
The source code developed for this study for designing the DNam 2800–2816 (2021).
and gene expression classifiers and the methylome–transcriptome 21. Nadeu, F. et al. Detection of early seeding of Richter transformation
integrative analyses is available on the GitHub platform, [https:// in chronic lymphocytic leukemia. Nat. Med. 28, 1662–1671 (2022).
github.com/zetcheuv/RichterOmicsCode]. All other source data 22. Kulis, M. et al. Epigenomic analysis detects widespread gene-body
supporting the findings of this study are available from the corre- DNA hypomethylation in chronic lymphocytic leukemia. Nat. Genet.
sponding authors. 44, 1236–1242 (2012).
23. Oakes, C. C. et al. Evolution of DNA methylation is linked to genetic
References aberrations in chronic lymphocytic leukemia. Cancer Discov. 4,
1. Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 348–361 (2014).
2021. CA Cancer J. Clin. 71, 7–33 (2021). 24. Queirós, A. C. et al. A B-cell epigenetic signature defines three
2. Kipps, T. J. et al. Chronic lymphocytic leukaemia. Nat. Rev. Dis. Prim. biologic subgroups of chronic lymphocytic leukemia with clinical
3, 17008 (2017). impact. Leukemia 29, 598–605 (2015).
3. Hallek, M. et al. iwCLL guidelines for diagnosis, indications for 25. Oakes, C. C. et al. DNA methylation dynamics during B cell
treatment, response assessment, and supportive management of maturation underlie a continuum of disease phenotypes in chronic
CLL. Blood 131, 2745–2760 (2018). lymphocytic leukemia. Nat. Genet. 48, 253–264 (2016).
4. Mao, Z. et al. IgVH mutational status and clonality analysis of 26. Beekman, R. et al. The reference epigenome and regulatory chro-
Richter’s transformation: diffuse large B-cell lymphoma and matin landscape of chronic lymphocytic leukemia. Nat. Med. 24,
Hodgkin lymphoma in association with B-cell chronic lymphocytic 868–880 (2018).
leukemia (B-CLL) represent 2 different pathways of disease evolu- 27. Rinaldi, A. et al. Promoter methylation patterns in Richter syndrome
tion. Am. J. Surg. Pathol. 31, 1605–1614 (2007). affect stem-cell maintenance and cell cycle regulation and differ
5. Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma from de novo diffuse large B-cell lymphoma. Br. J. Haematol. 163,
identified by gene expression profiling. Nature 403, 194–204 (2013).
503–511 (2000). 28. Shaknovich, R. et al. DNA methylation signatures define molecular
6. Reddy, A. et al. Genetic and functional drivers of diffuse large B cell subtypes of diffuse large B-cell lymphoma. Blood 116,
lymphoma. Cell 171, 481.e15–494.e15 (2017). e81–e89 (2010).
7. Schmitz, R. et al. Genetics and pathogenesis of diffuse large B-cell 29. Chambwe, N. et al. Variability in DNA methylation defines novel
lymphoma. N. Engl. J. Med. 378, 1396–1407 (2018). epigenetic subgroups of DLBCL associated with different clinical
8. Chapuy, B. et al. Molecular subtypes of diffuse large B cell lym- outcomes. Blood 123, 1699–1708 (2014).
phoma are associated with distinct pathogenic mechanisms and 30. Pan, H. et al. Epigenomic evolution in diffuse large B-cell lympho-
outcomes. Nat. Med. 24, 679–690 (2018). mas. Nat. Commun. 6, 6921 (2015).
31. Kretzmer, H. et al. DNA methylome analysis in Burkitt and follicular 53. Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor
lymphomas identifies differentially methylated regions linked to package for the analysis of Infinium DNA methylation microarrays.
somatic mutation and transcriptional control. Nat. Genet. 47, Bioinformatics 30, 1363–1369 (2014).
1316–1325 (2015). 54. Maksimovic, J., Gordon, L. & Oshlack, A. SWAN: subset-quantile
32. Queirós, A. C. et al. Decoding the DNA methylome of mantle cell within array normalization for illumina infinium HumanMethyla-
lymphoma in the light of the entire B cell lineage. Cancer Cell 30, tion450 BeadChips. Genome Biol. 13, R44 (2012).
806–821 (2016). 55. Peters, T. J. et al. De novo identification of differentially methylated
33. Duran-Ferrer, M. et al. The proliferative history shapes the DNA regions in the human genome. Epigenetics Chromatin 8, 6 (2015).
methylome of B-cell tumors and predicts clinical outcome. Nat. 56. Josse, J. & François, H. missMDA: a package for handling missing
Cancer 1, 1066–1081 (2020). values in multivariate data analysis. J. Stat. Softw. 70, 1–31 (2016).
34. Kulis, M. et al. Whole-genome fingerprint of the DNA methylome 57. Lena, P. D., Sala, C., Prodi, A. & Nardini, C. Methylation data impu-
during human B cell differentiation. Nat. Genet. 47, 746–756 (2015). tation performances under different representations and missing-
35. Lee, S. T. et al. A global DNA methylation and gene expression ness patterns. BMC Bioinformatics 21, 268 (2020).
analysis of early human B-cell development reveals a demethyla- 58. Van Iterson, M., Cats, D., Hop, P., Heijmans, B. T. & Consortium, B.
tion signature and transcription factor network. Nucleic Acids Res. omicsPrint: detection of data linkage errors in multiple omics stu-
40, 11339–11351 (2012). dies. Bioinformatics 34, 2142–2143 (2018).
36. Bibikova, M. et al. High density DNA methylation array with single 59. Phipson, B., Maksimovic, J. & Oshlack, A. missMethyl: an R package
CpG site resolution. Genomics 98, 288–295 (2011). for analyzing data from Illumina’s HumanMethylation450 platform.
37. Pidsley, R. et al. Critical evaluation of the Illumina MethylationEPIC Bioinformatics 32, 286–288 (2016).
BeadChip microarray for whole-genome DNA methylation profiling. 60. Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment
Genome Biol. 17, 208 (2016). analysis web server 2016 update. Nucleic Acids Res. 44,
38. Wright, G. et al. A gene expression-based method to diagnose 90–97 (2016).
clinically distinct subgroups of diffuse large B cell lymphoma. Proc. 61. Yu, G. & He, Q. Y. ReactomePA: an R/Bioconductor package for
Natl Acad. Sci. USA 100, 9991–9996 (2003). reactome pathway analysis and visualization. Mol. Biosyst. 12,
39. Lambert, S. A. et al. The human transcription factors. Cell 175, 477–479 (2016).
598–599 (2018). 62. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner
40. Ecco, G., Imbeault, M. & Trono, D. KRAB zinc finger proteins. with low memory requirements. Nat. Methods 12, 357–360 (2015).
Development 144, 2719–2729 (2017). 63. Pertea, M. et al. StringTie enables improved reconstruction of a
41. Visel, A., Rubin, E. M. & Pennacchio, L. A. Genomic views of distant- transcriptome from RNA-seq reads. Nat. Biotechnol. 33,
acting enhancers. Nature 461, 199–205 (2009). 290–295 (2015).
42. Lenz, G. et al. Stromal gene signatures in large-B-cell lymphomas. 64. Robinson, M. D. & Oshlack, A. A scaling normalization method for
N. Engl. J. Med. 359, 2313–2323 (2008). differential expression analysis of RNA-seq data. Genome Biol. 11,
43. Augé, H. et al. Microenvironment remodeling and subsequent R25 (2010).
clinical implications in diffuse large B-cell histologic variant of 65. Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: Precision weights
Richter syndrome. Front. Immunol. 11, 594841 (2020). unlock linear model analysis tools for RNA-seq read counts. Gen-
44. Scheffold, A. et al. IGF1R as druggable target mediating PI3K-δ ome Biol. 15, R29 (2014).
inhibitor resistance in a murine model of chronic lymphocytic leu- 66. Cunningham, F. et al. Ensembl 2019. Nucleic Acids Res. 47,
kemia. Blood 134, 534–547 (2019). D745–D751 (2019).
45. Kitange, G. J. et al. Evaluation of MGMT promoter methylation status 67. Pouget, C. et al. Ki-67 and MCM6 labeling indices are correlated
and correlation with temozolomide response in orthotopic glio- with overall survival in anaplastic oligodendroglioma, IDH1-mutant
blastoma xenograft model. J. Neurooncol. 92, 23–31 (2009). and 1p/19q-codeleted: a multicenter study from the French POLA
46. Esteller, M. et al. Hypermethylation of the DNA repair gene O(6)- network. Brain Pathol. 30, 465–478 (2020).
methylguanine DNA methyltransferase and survival of patients with 68. Ochoa, D. et al. Open Targets Platform: supporting systematic
diffuse large B-cell lymphoma. J. Natl Cancer Inst. 94, 26–32 (2002). drug-target identification and prioritisation. Nucleic Acids Res. 49,
47. Wang, J. et al. FOXC1 regulates the functions of human basal-like D1302–D1310 (2021).
breast cancer cells by activating NF-κB signaling. Oncogene 31, 69. Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping iden-
4798–4802 (2012). tifiers for the integration of genomic datasets with the R/Bio-
48. Somerville, T. D. et al. Frequent derepression of the mesenchymal conductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).
transcription factor gene FOXC1 in acute myeloid leukemia. Cancer 70. Zgheib, R. et al. Folate can promote the methionine-dependent
Cell 28, 329–342 (2015). reprogramming of glioblastoma cells towards pluripotency. Cell
49. Wilson, W. H. et al. Effect of ibrutinib with R-CHOP chemotherapy in Death Dis. 10, 596 (2019).
genetic subtypes of DLBCL. Cancer Cell 39, 1643–1653.e3 (2021). 71. Szklarczyk, D. et al. STRING v11: protein-protein association net-
50. Soilleux, E. J. et al. Diagnostic dilemmas of high-grade transfor- works with increased coverage, supporting functional discovery in
mation (Richter’s syndrome) of chronic lymphocytic leukaemia: genome-wide experimental datasets. Nucleic Acids Res. 47,
results of the phase II National Cancer Research Institute CHOP-OR D607–D613 (2019).
clinical trial specialist haemato-pathology central review. Histo- 72. Zhou, G. et al. NetworkAnalyst 3.0: a visual analytics platform for
pathology 69, 1066–1076 (2016). comprehensive gene expression profiling and meta-analysis.
51. Moulin, C. et al. Clinical, biological, and molecular genetic features Nucleic Acids Res. 47, W234–W241 (2019).
of Richter syndrome and prognostic significance: a study of the 73. Davis, S. & Meltzer, P. S. GEOquery: a bridge between the Gene
French Innovative Leukemia Organization. Am. J. Hematol. 96, Expression Omnibus (GEO) and BioConductor. Bioinformatics 23,
E311–E314 (2021). 1846–1847 (2007).
52. Hübschmann, D. et al. Mutational mechanisms shaping the coding 74. Willekens, J. et al. Wnt signaling pathways are dysregulated in rat
and noncoding genome of germinal center derived B-cell lym- female cerebellum following early methyl donor deficiency. Mol.
phomas. Leukemia 35, 2002–2016 (2021). Neurobiol. 56, 892–906 (2019).
Acknowledgements putational analysis): S.H., J.B., C.M., C.S., A.M., M.K., R.S., S.S. Writing,
The authors would like to thank the divisions of clinical hematology, review and/or revision of the manuscript: J.B., S.H., P.F., R.S. and S.S.
hematology laboratory and pathology of Nancy (Dr Hélène Busby, Pr wrote the first and the revised version of the paper. All authors critically
Hervé Sartelet, Dr Ludovic Dubouis), Poitiers, Angers, Reims (Dr Pascale reviewed and agreed on the final version of the manuscript. Adminis-
Cornillet-Lefebvre), Clermont-Ferrand (Dr Lauren Veronèse and Dr trative, technical and material support (i.e., reporting or organizing data,
Albane Ledoux-Pilon), Tours (Dr Flavie Arbion), Avicenne, Saint-Louis constructing databases): S.H., J.B., J.V., C.M., E.C., S.L., E.T., P.L., O.A.,
(Dr Véronique Meignin) and Pitié-Salpêtrière (Dr Frédéric Charlotte and ICGC MMML-seq consortium, MMML consortium, P.F., R.S., S.S.
Pr Isabelle Brocheriou). The authors would like to thank the tumor
libraries biological resource centers of Nancy (BB-0033-00035), Poi- Competing interests
tiers (BB-0033-00068), Caen (Pr Xavier Troussard), Tours, Clermont- The authors declare no competing interests.
Ferrand, Angers (BB -0033-00038), Reims-Champagne-Ardenne,
Besançon (Franck Monnien, Dr Etienne Daguindau) and Bordeaux Additional information
(Marie-Pierre Fort, Dr Fontanet Bijou) who provided us with the biolo- Supplementary information The online version contains supplementary
gical material. The authors would like to thank Véronique Saunier material available at
(direction of research at University Hospital of Nancy) for supporting https://doi.org/10.1038/s41467-022-34642-6.
the project. RNA sequencing was performed by the GenomEast plat-
form, a member of the “France Génomique” consortium (ANR−10-INBS- Correspondence and requests for materials should be addressed to
0009). The authors would like to thank Louis Staudt (Center for Cancer Julien Broséus or Stephan Stilgenbauer.
Genomics, National Cancer Institute, Bethesda, MD 20892, USA) for
giving access to the data published in Schmitz and colleagues (2018) Peer review information Nature Communications thanks Silvia Deaglio
and Wright and colleagues (2020). The authors would like to thank the and the other anonymous reviewer(s) for their contribution to the peer
members of ICGC the MMML-seq consortium for contribution to the review of this work. Peer review reports are available.
generation of the DLBCL omics datasets and the MMML-seq con-
sortium for data access. The authors would like to thank Pr Catherine Reprints and permissions information is available at
Wu and Dr Erin Parry for shared expertise and language editing, and Dr http://www.nature.com/reprints
Cath Carsberg for English language editing. This work was supported
in part by the Cancéropôle Est (J.B., S.H., P.F.), the Ligue contre le Publisher’s note Springer Nature remains neutral with regard to jur-
Cancer (J.B., P.F.), the University Hospital of Nancy (J.B., P.F.), the isdictional claims in published maps and institutional affiliations.
association of SILLC patients (J.B., P.F.), and the Association des Chefs
de Services of the University Hospital of Nancy (J.B.). E.T., S.S., and R.S. Open Access This article is licensed under a Creative Commons
were supported by the DFG (SFB1074 projects B1, B9, and B10). The Attribution 4.0 International License, which permits use, sharing,
ICGC MMML-Seq consortium has been supported by the German adaptation, distribution and reproduction in any medium or format, as
Ministry of Science and Education in the framework of the ICGC MMML- long as you give appropriate credit to the original author(s) and the
Seq consortium (01KU1002) and ICGC DE-Mining (01KU1505). source, provide a link to the Creative Commons license, and indicate if
changes were made. The images or other third party material in this
Author contributions article are included in the article’s Creative Commons license, unless
Conception and design: J.B., S.H., P.F., R.S., S.S. Development and indicated otherwise in a credit line to the material. If material is not
methodology: S.H., J.B., E.T., M.K., P.F., J.I.M.-S., R.S., S.S. Sample and included in the article’s Creative Commons license and your intended
clinical data providing, acquired and managed patients: C.D., D.R.W., use is not permitted by statutory regulation or exceeds the permitted
A.Q., O.B., C. Tomowiak, G.L., G.G., S.L., E.C., F.N.K., F.D., A.R., M.-C.B., use, you will need to obtain permission directly from the copyright
A.D., O.T., G.O., M.H., C. Thieblemont, R.G., J.I.M.-S. and F.C. Acquisition holder. To view a copy of this license, visit http://creativecommons.org/
of data (data production and techniques, provided facilities): J.B., S.H., licenses/by/4.0/.
J.V., M.K., R.H., P.R., C.C., D.M., E.C., C.S., S.L., E.T., S.B., G.O., J.-L.G.,
ICGC MMML-seq consortium, MMML consortium, P.F., R.S., S.S. Analysis © The Author(s) 2023
and interpretation of data (e.g., statistical analysis, biostatistics, com-
1
Division of CLL. Department of Internal Medicine III, Ulm University, Ulm, Germany. 2Inserm UMRS1256 Nutrition-Génétique et Exposition aux Risques
Environnementaux (N-GERE), Université de Lorraine, Nancy, France. 3Université de Lorraine, CHRU-Nancy, Service d’Hématologie Biologique, Pôle Labor-
atoires, F54000 Nancy, France. 4Institute of Human Genetics, Ulm University & Ulm University Medical Center, Ulm, Germany. 5Fraunhofer Institute for Cell
Therapy and Immunology IZI, Leipzig, Germany. 6Department of Haematology, University Hospital of Tours, Tours, France. 7Department of Hematology,
Hôpital de la Pitié-Salpêtrière, AP-HP, Paris, France. 8Université de Reims Champagne-Ardenne, IRMAIC, Centre Hospitalier Universitaire de Reims, Héma-
tologie Clinique, Reims, France. 9Department of Hematology, University Hospital of Nancy, Vandoeuvre-lès-Nancy, France. 10Inserm, CHRU, University of
Lorraine, CIC Clinical Epidemiology, Nancy, France. 11Department of Clinical Pathology, Robert-Bosch-Krankenhaus, and Dr. Margarete Fischer-Bosch
Institute for Clinical Pharmacology, Stuttgart, Germany. 12CHU Angers, Biological Resource Center of Angers (CRB-CHU Angers), BB-0033-00038, Labor-
atoire d’Hématologie, Angers, France. 13Department of Hematology, CHU Poitiers, Poitiers, France. 14CIC1402 Inserm Poitiers, Poitiers, France. 15Hematology
Laboratory, Avicenne Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France. 16Bioinformatics Group, Department of Computer Science and Inter-
disciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany. 17Hematology department, Clermont-Ferrand University Hospital, Clermont-
Ferrand, France. 18Department of Biopathology CHRU-ICL, BBB, CHRU Nancy, Vandoeuvre-lès-Nancy, France. 19Biological Resource Center of Nancy, BB-
0033-00035, CHRU de Nancy, Nancy, France. 20Sorbonne Université, Cytogénétique Hématologique, Hôpital Pitié-Salpêtrière, AP-HP, Paris, France. 21Centre
de Recherche des Cordeliers, INSERM, Université Sorbonne Paris Cite, Université Paris Descartes, Université Paris Diderot, F-75006 Paris, France. 22CHRU of
Nancy, Service de Biochimie-Biologie Moléculaire-Nutrition, Pôle Laboratoires, F54000 Nancy, France. 23Hematology Department, Hôpital Pitié-Salpêtrière,
AP-HP, Sorbonne University, Paris, France. 24Department of Hematology, University Hospital of Angers, Angers, France. 25Institute of Pathology, University
Hospital of Würzburg, Bavaria, Germany. 26Hematology Biology, University Hospital of Nantes, Hôtel-Dieu, France. 27Inserm 1232 Centre de Recherche en
Cancérologie et Immunologie Nantes Angers (CRCINA), Nantes, France. 28Department of Hematology, Hôpital Saint-Louis, Paris, France. 29Division of
Molecular Genetics, German Cancer Consortium (DKTK) and National Center for Tumor Diseases (NCT) Heidelberg, German Cancer Research Center (DKFZ),
Heidelberg, Germany. 30Biomedical Epigenomics Group, Institut d’investigacions Biomèdiques August Pi I Sunyer (IDIBAPS), University of Barcelona,
Barcelona, Spain. 31Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain. 32These authors contributed equally: Julien Broséus,
Sébastien Hergalant. 33These authors jointly supervised this work: Pierre Feugier, Reiner Siebert, Stephan Stilgenbauer.
e-mail: [email protected]; [email protected]
Ole Ammerpohl4, Stephan Bernhart16, Markus Kreuz5, Peter Lichter29, German Ott11, Andreas Rosenwald25,
Reiner Siebert4,33 & Stephan Stilgenbauer 1,33
A full list of members and their affiliations appears in the Supplementary Information.