Nature
Nature
https://doi.org/10.1038/s41586-024-07054-3 Lei Chen1,14 ✉, Chong Zhang2,14, Ruidong Xue3,4,14, Mo Liu5,14, Jian Bai6,14, Jinxia Bao7,14, Yin Wang6,14,
Nanhai Jiang5, Zhixuan Li1, Wenwen Wang8, Ruiru Wang6, Bo Zheng1,8, Airong Yang6, Ji Hu1,8,
Received: 13 June 2022
Ke Liu6, Siyun Shen1,8, Yangqianwen Zhang1, Mixue Bai1, Yan Wang6, Yanjing Zhu1,8,
Accepted: 10 January 2024 Shuai Yang1,8, Qiang Gao9, Jin Gu10, Dong Gao11, Xin Wei Wang12, Hidewaki Nakagawa13,
Ning Zhang3,4, Lin Wu6 ✉, Steven G. Rozen5 ✉, Fan Bai2 ✉ & Hongyang Wang1 ✉
Published online: xx xx xxxx
Previous genomic analyses of HCC in Chinese individuals are limited infection (94.5% versus 30.6%) and Edmondson–Steiner grades
in cohort size and focus mainly on the exome11–14, precluding detailed 3 and 4 (85.6% versus 12.1%), but lower proportions of hepatitis C virus
investigations at the whole-genome level. Recently, the Pan-Cancer (HCV) infection (2.6% versus 55.6%), alcohol drinking (26.7% versus
Analysis of Whole Genomes (PCAWG) Consortium analysed the 58.1%) and smoking (36.8% versus 53.6%) (Extended Data Fig. 1a,b,
genomic complexity of cancer at a considerable scale4–8. Neverthe- Supplementary Table 1 and Supplementary Note 1). These statistics
less, the relatively shallow sequencing depth could not fully resolve represent the epidemiology of the Chinese population with liver can-
the subclonal structure of the HCC genome. Here, in the CLCA, we cer, highlighting the necessity of the current study. After stringent
performed deep whole-genome sequencing (WGS) analysis of 494 quality control, a total of 9,287,828 somatic mutations was identified,
HCC tumours (average depth, 120×), as well as of the matched con- with a median of 13,735.5 mutations and 95 nonsynonymous muta-
trol blood samples (average depth, 36×). Our cohort comprised 427 tions for each tumour (Fig. 1). We also performed RNA sequencing
men (86.4%) and 67 women (13.6%). In comparison to the PCAWG-HCC (RNA-seq) analysis of 239 tumours from this cohort (Supplementary
(n = 248) cohort, the CLCA cohort had higher proportions of HBV Table 2).
1
National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China. 2Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for
Genomics (ICG), School of Life Sciences, Peking University, Beijing, China. 3Peking University-Yunnan Baiyao International Medical Research Center, International Cancer Institute, Department
of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China. 4Translational Cancer Research Center, Peking University First Hospital,
Beijing, China. 5Centre for Computational Biology and Programme in Cancer & Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore. 6Berry Oncology Corporation, Beijing,
China. 7Model Animal Research Center, Medical School, Nanjing University, Nanjing, China. 8The International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery
Hospital, Shanghai, China. 9Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China. 10MOE Key Laboratory for
Bioinformatics, Department of Automation, Tsinghua University, Beijing, China. 11State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence
in Molecular Cell Science, CAS, Shanghai, China. 12Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA. 13Laboratory for Cancer
Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan. 14These authors contributed equally: Lei Chen, Chong Zhang, Ruidong Xue, Mo Liu, Jian Bai, Jinxia Bao, Yin Wang.
✉e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Nature | www.nature.com | 1
Article
a b 215,000
CLCA Point mutations
(HCC, n = 494) 51,000
No. of mutations
Coding Non-coding 50,000
5′ UTR 3′ UTR
25,000
ing
l
ho
Promoter lncRNA
V
V
HB
ok
HC
o
Alc
lncRNA promoter 0
Sm
1,300
Mutational signature 300
Rearrangement 0
Sex
HBV ecDNA Chromothripsis Hepatitis
Kataegis Chromoplexy BCLC
Cirrhosis/fibrosis
Evolutionary history
Edmondson
Multiple lesions
Percentage Smoking
Subclonal Clonal Alcohol
20 0 30 60 Recurrence
AAA
2.27 × 10–10 TP53 51
CTNNB1 21
AAA
1.36 × 10–4 ALB 15
A T G C G A C T AXIN1 12
ARID1A 10
AA
A
RB1 7.3
Deep WGS RNA-seq TSC2 6.3
(~120×, n = 494) (n = 239) ARID2 5.9
Coding (n = 23)
JAK1 5.7
KEAP1 5.5
BRD7 4.9
Point mutations FGA 4.5
TSC1 3.8
Coding, non-synonymous ACVR2A 3.4
Coding, non-synonymous PTEN 3.2
RPS6KA3 3.2
Sex Virus BCLC HNF1A 2.8
PRDM11 2.8
Female HCV 0 CDKN2A 2.2
Male HBV A CDKN1B 2.0
NBNC B BMP5 1.6
HBV and HCV C RPL22 1.2
ECHS1 0.4
Cirrhosis/fibrosis Edmondson TERT 35
Promoter
Normal Level I 3.24 × 10–54 ZNF595 12
(n = 6)
Fibrosis Level II 2.54 × 10–10 KCNJ12 7.3
Cirrhosis Level III 0.079 ALB 4.9
Level IV KHNYN 3.4
1.36 × 10–4 OR2A7 1.2
Others NEAT1 37
G035338 8.5
Multiple lesions Yes 2.17 × 10–16 Z95704.4 4.9
lncRNA
No
(n = 8)
Smoking RMRP 2.0
Alcohol NA G085970 1.8
Recurrence RN7SK 1.8
G032906 1.2
RNU12 1.2
Coding mutations 5.33 × 10–17 Z95704.4 5.1
(n = 4)
WT Start loss RP11−1151B14.3 4.3
lncP
Missense In-frame indel RMRP 4.3
Splice site Frameshift indel G085970 2.2
Stop loss 0.044 ADH1B 6.3
Stop gain 0.056 PPP1R12B 5.9
FGA 4.5
3′ UTR
(n = 8)
Non-coding mutations SEC14L2 4.3
SERPINA1 3.4
Promoter mutation ADH4 1.8
lncRNA mutation RABGEF1 1.8
lncRNA promoter mutation KCTD6 1.2
5′ UTR PPP1R10 1.6
3′ UTR HIST1H4C 1.2
5′ UTR
(n = 5)
POLR2A 1.2
Clonal status of mutations SERBP1 1.0
Clonal HIST1H1E 0.8
Subclonal
Group 1 (418) Group 2 (38) Group 3 (38)
Fig. 1 | Candidate driver landscape. a, The research strategy. The diagram gene symbols indicate previously undescribed drivers identified in the CLCA.
was created using BioRender. WT, wild type. b, The candidate driver Underlined drivers are those identified as a driver in different forms. Group 1
landscape of the CLCA. The top two graphs show the number of all mutations had drivers in both coding and non-coding regions, whereas group 2 had
and nonsynonymous mutations identified in each tumour, followed by drivers only in non-coding regions. Tumours in group 3 had no identified
annotation of clinical variables. BCLC, Barcelona Clinic Liver Cancer staging drivers but other somatic mutations. The number of individual tumours
system. In total, 23 candidate drivers identified in coding regions and 31 included is denoted for groups 1–3. The bar plot on the left shows the clonal
candidate drivers identified in non-coding regions are listed, and the mutational and subclonal mutational frequencies of each gene. Statistical analysis was
frequency (%) is shown next to the gene IDs. The mutation types are indicated performed using two-sided Fisher’s exact tests with the Benjamini–Hochberg
on the right and n denotes the number of drivers in the category. lncP, lncRNA multiple-hypothesis test. Q values are shown next to the bars. A threshold of
promoter; NBNC, double negative for HBV and HCV; NA, not available. Orange Q < 0.1 was used for significance.
2 | Nature | www.nature.com
a C>A C>G C>T T>A T>C T>G b Q = 3.06 × 10–4
c
16 A C G T
SBS_H8 Q = 3.01× 10–3 G G G G
CTG 20
Percentage of
mutations (%)
12 T>A, 42.5%; T>C, 21.3%
C
A
Cosine similarity to SBS22: 0.71 CTC
8 10
C
C
CTA
T G
C C
4
0
0
>A
>G
>T
A
C
G
T>
T>
T>
C
C
Percentage of mutations (%)
C
24
SBS_H2 Q = 2.61 × 10–17
Percentage of
mutations (%)
18
Preceding bases
T>A, 85.4%; T>C, 6.6%
50
Cosine similarity to SBS22: 0.99
(5′ 2 bp)
12 40
30
6 20
0 10
0
24
AA-exposed cell lines Q = 2.6 × 10–323
Percentage of
mutations (%)
17
11
4
5
ID_H3
0
3
e 1 bp 1 bp >1 bp deletions >1 bp insertions Deletions with
deletions insertions at repeats at repeats microhomology
C T C T 2 3 4 5+ 2 3 4 5+ 2 3 4 5+ 3
32 2
ID_H3 (AA) 2
Percentage of
mutations (%)
24
1
16 1 DBS_H2
3 4 5 6
8 SBS_H2
CLCA tumours AA-exposed cell lines
0
Fig. 2 | Previously undescribed mutational signatures. a–c, Comparison of comparisons. Trans, transcribed strand; Untrans, untranscribed strand.
the mutational profile (a), transcriptional strand bias (b) and pentanucleotide d,e, The mutational profiles of the signatures DBS_H2 (d) and ID_H3 (e), both
context of T>A mutations (c) of SBS_H8, SBS_H2 and AA-exposed cell lines. related to AA. f, The correlation between the numbers of mutations associated
Cosine similarity to COSMIC SBS22 is denoted. Statistical analysis was performed with SBS_H2, DBS_H2 and ID_H3. The grey plane is the linear regression plane
using two-sided binomial tests with Benjamini–Hochberg correction for multiple with projection lines showing residuals (red, positive; blue, negative).
Two coding drivers, TP53 and ALB, were enriched with clonal mutations. SBS24 (Extended Data Fig. 3e,g), suggesting its relevance to aflatoxin
By contrast, 62.5% (5 out of 8) of non-coding drivers were enriched with exposure.
subclonal mutations, including the promoters of ZNF595, KCNJ12 and Notably, SBS_H8 was dominated by T>[A/C] mutations with signifi-
OR2A7, and lncRNA and lncRNA promoter of Z95704.4. No significant cant transcriptional strand bias (Fig. 2a–c). Although the pattern of T>A
association between tumour purity and the percentage of clonal driv- mutations in SBS_H8 was similar to that of aristolochic acid (AA)-related
ers was observed across our cohort (Extended Data Fig. 1h), showing COSMIC SBS22, SBS_H8 also contained a substantial proportion of T>C
that our clonality analysis is not confounded by tumour purity. The mutations (21.3%), together leading to an overall cosine similarity of
identification of subclonal non-coding drivers highlighted the strength 0.71 between SBS_H8 and SBS22. The low pentanucleotide context
of high-depth WGS data in investigating the non-coding genome, par- cosine similarity of 0.61 further supported that SBS_H8 was a novel
tially explained the low number of non-coding drivers identified in signature rather than a combination of SBS22 and other signatures
previous low-depth WGS studies, and motivated us to systematically (Extended Data Fig. 3b). SBS_H8 was present in 57.1% (282 out of 494) of
investigate the subclonal events in our cohort. Furthermore, a ratio CLCA cases, suggesting the prevalence of this previously undescribed
value of mutated nonsynonymous (dN) and synonymous (dS) sites signature of HCC in Chinese individuals. High co-occurrence between
(dN/dS) of higher than 1 for all mutations was observed for both clonal SBS_H8 and SBS_H2 (SBS22) indicated that the aetiological factor of
and subclonal coding drivers (Extended Data Fig. 1i), confirming that SBS_H8 might often co-exist with AA. SBS_H8 is present in only 1 out
these drivers are shaped by positive selection, consistent with previous of 326 (0.31%) PCAWG-HCC cases and potentially in chronic liver dis-
pan-cancer analyses17–19. ease20. These results supported the existence of this signature and its
enrichment in HCCs in Chinese individuals.
As for AA, we not only found the well-established SBS_H2, but also
SBS_H8 is a novel signature identified two previously undescribed types of AA signatures—DBS_H2
We identified 17 single-base substitution (SBS), 3 doublet-base sub- and ID_H3 (Fig. 2d,e). DBS_H2 consisted primarily of TA>NT, TC>AA,
stitution (DBS) and 8 small insertion-and-deletion (ID) signatures TG>AN and TT>AA mutations. ID_H3 showed mainly 1 bp and 2 bp dele-
(Extended Data Figs. 1j and 2–4). In comparison to COSMICv3.2, five tions in short repeats. Both DBS_H2 and ID_H3 were almost exclusively
signatures were novel (Supplementary Table 3 and Supplementary found in SBS_H2-positive (SBS22) tumours and were highly correlated
Note 3) containing one SBS signature: SBS_H8; two DBS signatures, with SBS_H2 activity (Fig. 2f). To test whether SBS_H2, DBS_H2 and
DBS_H1 and DBS_H2; and two ID signatures, ID_H3 and ID_H8. DBS_H1 ID_H3 are directly caused by AA exposure, we treated two cancer cell
consisted mainly of [C/G/T]C>NN mutations. This signature was found lines, MCF-10A and HepG2, with sublethal concentrations of AA1 (the
in most tumours and correlated with age as well as other age-related major component of AA). The mutational spectrum of each clone
signatures (Extended Data Fig. 3d,f). ID_H8 showed mostly 1 bp showed the presence of SBS_H2, DBS_H2 and ID_H3 (Supplementary
cytosine deletions and thymine insertions. It was exclusively found Fig. 1), confirming that these mutational signatures can be caused by
in SBS_H3-positive (COSMIC SBS24) tumours and correlated with AA exposure. These findings complemented the AA signature spectrum
Nature | www.nature.com | 3
Article
a b c d P = 0.0079
0.12 100 30
Copy number
27% ecDNA
of HBV
+ Yes 10
9% 0.09
22% + No
Percentage of PFS
+ + 3
Frequency
++
++ ++++++++++++++++++++++ ++
39% 0.06 + +++++++ ecDNA Others
2% P = 0.035 + ++++++++++++++ ++++ ++
(7) (45)
0.03
e 105 P = 0.031
Amplicon Number at risk
Expression
BFB 103
100 60 47 0
(TPM)
Circular (ecDNA) 0
0 285 193 167 0 10
CCND1
EXT1
MYC
RAD21
NDRG1
UBR5
COX6C
RECQL4
MUC1
TPM3
NCOA2
NTRK1
PBX1
PRCC
FCGR2B
HEY1
SDHC
CHCHD7
ARNT
MET
Heavily rearranged
Linear 0 500 1,000 1,500 0.1
No fSCNA Time (days)
Amplicon Others
(24) (215)
Copy number
Copy number
1,200 800 150
Coverage
Coverage
200
800 600
150 100
100 400
400 200 50
50
0 0 0 0
Chr. 5 TERT Chr. 12 Chr. 17 HBV TERT Chr. 5
CLCA_0109 amplicon 1
Chr. 19
5,000 1,000 Chr. 14
CIRCLE-seq
Copy number
800
Coverage
4,000 Chr. 13
3,000 600 Chr. 12
2,000 400 Chr. 10
1,000 200 Chr. 7
0 0
Chr. 1Chr. 2 GATA3 Chr. 10 12 13 14 21 HBV
Fig. 3 | ecDNA analysis. a, The proportion of different amplicons across the line shows median, the box limits indicate the upper and lower quartiles, and
CLCA cohort. Circular, breakage–fusion–bridge (BFB), heavily rearranged and the whiskers extend to 1.5× the interquartile range; data beyond the end of the
linear, and no focal somatic copy-number amplification detected (fSCNA) whiskers are outlying points that are plotted individually. n denotes biologically
amplicon categories are shown. b, The top frequently amplified genes detected independent samples. Statistical analysis was performed using two-sided
in ecDNA. c, Progression-free survival (PFS) of patients in the CLCA stratified Student’s t-tests. TPM, transcripts per million. f, Two representative ecDNA
by the existence of ecDNA. Statistical analysis was performed using log-rank amplicons involving HBV segments detected in two patients. g, CIRCLE-seq
tests. d,e, Comparison of the copy number (d) and RNA expression (e) of HBV reads supporting the structure of ecDNA. Chr., chromosome.
between circular amplicons and other amplicons. For the box plots, the centre
and revealed the diverse paths of AA mutagenesis. However, notably, of 76 oncogenes was detected in ecDNA, including HCC driver genes
SBS_H8 was not found in the mutational spectrum of AA1-treated cell such as MYC (Fig. 3b and Extended Data Fig. 5d). Oncogenes in ecDNA
clones (Fig. 2a–c), which further supported that SBS_H8 was not associ- had higher copy numbers and elevated gene expression compared with
ated with AA exposure. their counterparts not in ecDNA (Extended Data Fig. 5e,f). The presence
Unsupervised hierarchical clustering based on mutational signa- of ecDNA was associated with a poor prognosis (Fig. 3c and Extended
tures classified 494 tumours into 5 clusters (Extended Data Fig. 3h Data Fig. 5a). Notably, we identified ecDNAs incorporating HBV seg-
and Supplementary Note 4). SBS_H8 contributed most to cluster V, ments (HBV-ecDNA) in seven patients (Fig. 3d–f) affecting well-known
which was enriched with CTNNB1 mutations (Extended Data Fig. 3i,j). oncogenes such as TERT. HBV segments in ecDNA showed an elevated
Higher percentages of SBS_H8 were significantly associated with poorer number of copies, as well as increased expression levels. Despite the
prognosis (Extended Data Figs. 3k and 5a), implying that the underly- fact that HBV-TERT integration has been identified in HCC, our results
ing aetiology of SBS_H8 might be a carcinogen of the liver. We also demonstrated that these integrations can exploit the circular structure
analysed the contribution of mutational processes to driver genes of ecDNA and therefore amplify to hundreds of copies. The existence of
and hotspot mutations (Extended Data Fig. 4). Focusing on SBS_H8, ecDNA was successfully validated (Fig. 3g). Collectively, these results
JAK1 and CTNNB1 were the top coding drivers and the ALB promoter suggest that ecDNA-based amplification22 may have an important role
was the top non-coding driver. Multiple mutation hotspots of CTNNB1, in HBV-associated HCC.
JAK1S729C and TP53H193R were affected by SBS_H8. Moreover, multiple
hotspots of TP53 were associated with aflatoxin, while the TP53H179L
hotspot was associated with AA exposure. SBS_H8, as well as other Subclonal catastrophic events
signatures related to exogenous factors such as SBS_H2 (AA), SBS_H3 Clustered mutational processes, including chromothripsis23, chromo-
(aflatoxin), DBS_H2 (AA), ID_H3 (AA), SBS_H10 (tobacco) and ID_H8 plexy7,24 and kataegis25, are genomic alterations that are often generated
(aflatoxin), were enriched for clonal mutations compared with sub- in a single catastrophic event. These alterations are often described as
clonal mutations, suggesting that they occurred at earlier stages of clonal events and support the punctuated evolution of tumours24,25.
tumorigenesis. Whether these clustered alterations could be subclonal events and
occur late during tumour evolution remains less explored. We inves-
tigated the clonal status of these events with our high-depth WGS data
HBV integration in ecDNA of the CLCA (Extended Data Fig. 6).
Our deep WGS data enabled a comprehensive profiling of genomic We observed chromothripsis in 30.2% of cases (Supplementary
rearrangements, including copy-number alterations (CNAs), structural Table 4), comparable to that of PCAWG-HCC (32.2%)26. Among those,
variations (SVs), HBV integrations, extrachromosomal circular DNA 61% of high-confidence events affected multiple chromosomes
(ecDNA) and three forms of clustered alterations—kataegis, chromo- (for example, CLCA_0119), whereas 22% affected only a single chromo-
thripsis and chromoplexy (Extended Data Figs. 5 and 6 and Supplemen- some (for example, CLCA_0090) (Fig. 4a). Chromoplexy was observed
tary Note 5). ecDNA was detected in 27.3% of CLCA tumours (Fig. 3a in 10.1% of CLCA cases; 8.3% of cases contained a single event (such
and Supplementary Table 4), significantly higher than that reported in as CLCA_0489) and 1.8% contained multiple events (for example,
PCAWG-HCC (13.1%, P = 3 × 10−4; two-sided Fisher’s exact test)21. A total CLCA_0232) (Fig. 4b). In total, 364 kataegis events were identified in
4 | Nature | www.nature.com
a Single chromosome Multi-chromosome CLCA_0119 b CLCA_0489 (single) CLCA_0232 (multiple)
CLCA_0090
14 15 16 X Y 1 X Y 1
13 17 22 22
12 18 21 21
20 2 20 2
19 19 19
11 20 18 18
22 3 3
CN state 17 17
10
CN > 2
X
CN = 2
16 4 16 4
9 CN < 2 15 15
1 Patterns of SVs 14 5 14 5
Chr. 1 8
Head to head (+/+)
Tail to tail (–/–) 13 13
7 6 6
2 Deletion like (+/–) 12 12
6 Duplication like (–/+) 7 7
11 11
5 4 10 8 10 8
9 9
c d f
CLCA_0247 chr. 1 (single) CLCA_0285 chr. 5 (multiple)
71% Kataegis Chromoplexy Timing
2.4 × 106 Clonal Clonal early
1.8 × 106
16% Subclonal Chromothripsis Clonal late
per Mb
per Mb
Reads
Reads
Clonal
0 0 Mix unspecified
13% Kataegis
8 8 Subclonal
log10 (intervariant
log10 (intervariant
distance)
No. of mutations
4 4 4,000 Kataegis 25
3,000
Others 20 g
Clonal/subclonal
2 2 15
2,000 10 Chromothripsis
0 0 1,000 5
1.0 1.0 0 Chromoplexy
0
0.8 0.8 Kataegis
0.2 0.6 1.0 0.2 0.6 1.0
BAF
0.6
BAF
0.6
0.4 0.4 Cancer cell fraction Cancer cell fraction
0.2 0.2 0.3 1.0 3.0
0 3
0 kataegis events Early/late
8 12 15
Copy number
Copy number
6 2 Chromothripsis
No. of
8 10
4 Chromoplexy
4 5 1
2
0 0 Kataegis
0 0
0 50 100 150 200 250 0 50 100 150 200 30 50 70 90 30 50 70 90 0.3 1.0 3.0
Position (Mb) Position (Mb) Depth Depth Odds ratio
Fig. 4 | Genomic rearrangement. a, Circos plots for chromothripsis events. distribution of non-kataegis and kataegis mutations. Bottom, the detected
CN, copy number. b, Circos plots for chromoplexy events. Arcs in the same colour kataegis events at different sequencing depths (simulated in silico). f, The
denote regions that are involved in the same chromoplexy event. c, Rainfall timing of three types of clustered alteration events. g, The relative odds of
plots for kataegis events and related SVs and CNAs. BAF, B allele frequency. clustered alterations being clonal or subclonal are shown with bootstrapped
d, The clonal status composition of kataegis events. Mixed events are indicated 95% confidence intervals (top). Bottom, the relative odds of the events being
in grey. e, The clonal status of kataegis events. Top, the cancer cell fraction early or late clonal are shown as above.
33.6% of CLCA cases, and 14.6% of cases had multiple kataegis events. more to the subclonal diversification (Extended Data Fig. 7c). In the
We observed the occurrence of kataegis and oscillations in copy-number CLCA, the earliest events were PPP1R12B 3′ UTR mutation and 17p loss,
states, suggesting that localized hypermutation could be associated followed by mutations in TP53, ARID2 and the ADH1B 3′ UTR (Extended
with regional SVs and chromothripsis27 (Fig. 4c and Extended Data Data Fig. 7d). By contrast, TP53 mutation was found to be the earliest
Fig. 7a). Kataegis events were highly enriched in cases with APOBEC mutational event in the PCAWG8. Notably, TERT promoter mutations
signatures (Extended Data Fig. 5g). In total, 46 (13%) kataegis events were among the latest events, which was distinct to the observation
occurring in 32 cases (6.5%) were subclonal events (Fig. 4d). This result that TERT promoter was an early event in HCC in European individuals10.
was distinct to that reported by PCAWG-HCC, in which all kataegis These results revealed the distinct evolutionary history of the Chinese
events were clonal events, suggesting that kataegis may be subclonal CLCA HCC cohort and highlighted the early and pervasive contributions
and occur late during hepatocarcinogenesis. In silico analysis further of non-coding mutations during HCC progression. Moreover, the SBS
showed that the detected number of kataegis events increased along signatures related to tobacco, aflatoxin and AA exposure (SBS_H10,
with the sequencing depth (Fig. 4e), corroborating that our high-depth SBS_H3 and SBS_H2), as well as the previously undescribed signature
WGS enabled the detection of subclonal kataegis events. Furthermore, SBS_H8, tended to occur early across all cases (Extended Data Fig. 7e),
timing analysis showed that 15.1% of kataegis, 67.2% of chromothripsis consistent with that shown in Extended Data Fig. 4h. Furthermore, strat-
and 62.7% of chromoplexy events were determined to be subclonal ification based on cluster V (SBS_H8), alcohol and smoking revealed
events, respectively (Fig. 4f). Although all of these forms of clustered distinct evolutionary histories associated with aetiology (Extended
alterations tended to be clonal rather than subclonal, the broad distri- Data Fig. 7f and Supplementary Fig. 2). Notably, FGA mutations were
bution of odds ratios suggests that these events could occur at various among the earliest drivers in patients in cluster V, patients who drink
timings during tumorigenesis (Fig. 4g). alcohol and patients who smoke.
Nature | www.nature.com | 5
Article
a CLCA
b CLCA
c CLCA
d CLCA
e f
5.5 P = 1.06 × 10–5 FGA prime editing
FGA intensity by WB
log10[FGA TPM + 1]
log10[FGA TPM + 1]
4.4 4 ** *
P = 9.48 × 10–17
P = 3.86 × 10–15
P = 1.58 × 10–14
3
FGA by IHC
0.50 P289L P362R D793H 0.8
3.3 3
Unedited
2
2.2 2
0.25 0.4
1.1 1 1
0 0
Edited
0 0 0
Altered WT Tumour Normal Tumour Normal Tumour Normal
P2 T
P3 L
H
D7 R
W
89
93
62
(n = 134) (n = 105) (n = 48) (n = 48) (n = 47) (n = 47) (n = 39) (n = 39)
D793H
P362R
P289L
1.45 × 10–16
1,500 shCtrl
P = 8.34 × 10–9
21
3.54 × 10–37
Relative proliferation
WT
D793H shFGA
volume (mm3)
3.61 × 10–38
sh A
sh A
A
(kDa)
sh rl
sh rl
sh rl
FG
FG
FG
P362R
t
C
C
rate (CCK8)
Tumour
130 14
sh
Long P289L shCtrl (kDa)
100 pTYK2
FGA
WT (Y1054) 130
Short 70
7 shFGA 0 TYK2
5 26 130
β-Actin 40 Time (days)
pSTAT3 100
0
0 1 2 3 4 5 (Y705)
Time (days) 100
STAT3
k shCtrl shFGA
GAPDH
35
i WT P289L P362R D793H
Lamin A
0.0015 0.0008 0.0006
H&E
Lamin C 70
Migration
m n
P = 1.86 × 10–5
40 shCtrl R = –0.262
0.0002 0.0002 0.0006 Ki-67 shFGA P = 0.0273
2.5
P = 0.001
intensity
FGA
pTYK2 (Y1054)
0 0
0 100
/5
TT
IL-6 concentration
RF
PV
in tissue (pg ml–1)
/P
C
PL
Fig. 5 | FGA dysfunction facilitates HCC progression. a,b, FGA expression eosin (H&E) and immunohistochemistry staining of tumour samples in j. Scale
between altered and WT tumours (a) and between paired tumours and normal bars, 200 μm (main images) and 25 μm (magnified images). l, The subcellular
tissues (b). c,d, FGA protein in paired tumour and normal samples was compared localization of pTYK2 and pSTAT3. GAPDH (cytoplasmic reference) and lamin
using western blot (WB; c) and immunohistochemistry (IHC; d) analysis. e, Sanger A/C (nuclear reference). m, The IL-6 concentration in the supernatant. n = 3
sequencing plots of edited sites in the FGA coding region. f, Quantitative PCR per group. n, Two-tailed Pearson correlation analysis of FGA protein and IL-6
with reverse transcription (RT–qPCR) analysis of FGA mRNA across HepG2 concentration (n = 71). For all panels, n denotes biologically independent
WT and mutated cell lines. n = 3 per group. g, Western blot analysis of FGA. samples. For the box plots in a–c, the centre line shows the median, the box
h,i, Comparison of the proliferation (h), and migration, invasion and limits indicate the upper and lower quartiles, and the whiskers extend to
self-renewal (i) abilities across FGA-edited cell lines. Each assay was repeated 1.5× the interquartile range; data beyond the whiskers are outlying points.
three times independently and representative images are shown. For i, scale For f, h, j and m, data are mean ± s.e.m. Statistical analysis was performed using
bars, 100 μm (top and middle) and 3 mm (bottom). j, In vivo cell proliferation two-sided Student’s t-tests (a, f, i and m), two-sided paired t-tests (b–d) and
assay comparing xenograft tumours of shCtrl (n = 6) and shFGA (n = 7) PLC/ two-way analysis of variance (h and j). Gel source data are provided in
PRF/5 cells. Growth curves are shown. k, Representative haematoxylin and Supplementary Figs. 3–5.
These alterations affected various metabolic programs, including lead to lower mRNA expression and were enough to cause phenotypic
hepatic metabolism (APOB, ALB and HNF1A), oxidative stress (KEAP1 changes (Extended Data Fig. 9b–e). KCNJ12 disruption significantly
and NFE2L2), urea metabolism (CPS1), alcohol metabolism (ADH1B impaired tumour migration, invasion, self-renewal and cell prolif-
and ADH4), fatty acid metabolism (SERPINA1 and SERBP1) and hypoxia eration (Extended Data Fig. 9f). Point mutations in KCNJ12 lead to a
(ARNT). FGA in the JAK–STAT pathway also has a role in hepatic metab- higher level of mRNA expression and subsequent phenotypic changes
olism. Given that the liver is a key metabolic organ and metabolism (Extended Data Fig. 9g–j). These data validated that PPP1R12B and
dysregulation is an important feature of liver cancer20,28, this result KCNJ12 are non-coding drivers of HCC.
underlined the necessity of weighting the contribution of non-coding
alterations to investigate the metabolic status of HCC.
FGA dysfunction promotes HCC
Next, we investigated the biological functions of a candidate driver,
KCNJ12 and PPP1R12B FGA, which was determined independently as both a candidate coding
To investigate whether the candidate non-coding drivers have tumo- and non-coding driver (Fig. 5 and Extended Data Fig. 10a). In the CLCA,
rigenic functions, we selected three representative drivers to perform FGA alterations, including point mutations, loss of heterozygosity and
functional assays, including KCNJ12 (potassium inwardly rectifying copy-number loss could all result in reduced expression level (Fig. 5a).
channel subfamily J member 12), PPP1R12B (protein phosphatase 1 Meanwhile, the mRNA and protein levels of FGA were lower in tumours
regulatory subunit 12B) (Extended Data Fig. 9, Supplementary Table 7 compared with the levels in normal tissues (Fig. 5b–d and Extended Data
and Supplementary Note 8) and FGA (Fig. 5, Extended Data Fig. 10 and Fig. 10b–d). Furthermore, the rate of biallelic inactivation for FGA was
Supplementary Figs. 3–5). PPP1R12B is one of the earliest driver events, comparable to other recurrently mutated tumour suppressor genes
whereas KCNJ12 is one of the latest driver events during the evolu- of HCC in the CLCA (Supplementary Table 1). We therefore speculated
tionary history of HCC. Low expression of PPP1R12B significantly that FGA is a tumour suppressor gene and explored the potential role
enhanced tumour migration, invasion, self-renewal and cell prolif- of FGA dysfunction in HCC progression.
eration (Extended Data Fig. 9a). Using the prime editing technology, Induction of FGA point mutations leads to lower mRNA and pro-
we showed that point mutations of PPP1R12B identified in the CLCA tein expression and enhanced tumour progression (Fig. 5e–i and
6 | Nature | www.nature.com
Extended Data Fig. 10e–h). Consistent phenotypes were confirmed relative timing of diverse underlying aetiological factors. The identifica-
in FGA-disrupted cell lines (Extended Data Fig. 10i,j). Furthermore, an tion of subclonal kataegis, chromothripsis and chromoplexy showed
in vivo assay by subcutaneous injection of short hairpin RNA against that these catastrophic genomic alterations could occur with variable
FGA (shFGA) cells into BALB/c nude mice resulted in larger and more timing during HCC evolution, consistent with the reported combined
aggressive tumours in comparison to those of mice injected with punctuated and gradual clonal evolution in HCC29. Furthermore, mul-
shCtrl cells (Fig. 5j,k and Extended Data Fig. 10k). Phosphorylated tiple non-coding drivers were mapped to the evolutionary history of
tyrosine kinase 2 (pTYK2) and its target protein signal transducer CLCA tumours, while the PCAWG reports only one non-coding driver.
and activator of transcription 3 (STAT3, Tyr705) were identified Our results reconstructed a high-resolution evolutionary history
as the top downstream signals of FGA (Extended Data Fig. 10l–n). for HCC.
We also found that pTYK2 accumulated more in the cytoplasm than in HBV integration has been extensively reported in the HBV-positive
the nucleus (Fig. 5l). A specific inhibitor of pTYK2 (BMS-986165), rather tumours of Chinese patients with liver cancer, with hotspots iden-
than AKT inhibitors, attenuated the migration ability of shFGA cells tified in TERT and KMT2B12,30. However, the manner in which these
(Extended Data Fig. 10o). These results suggested that FGA dysfunc- integrations localize in the genome has not been comprehensively
tion might not activate AKT signalling in HCC. We further checked the assessed. Here we showed that these HBV integrations could be cyclized
expression of interleukin-6 (IL-6), a downstream signal of STAT3. The as ecDNAs. ecDNA amplifications lead to higher levels of oncogene
levels of IL6 mRNA and cellular supernatant IL-6 protein were signifi- transcription in comparison to copy-number-matched linear DNA21
cantly higher in shFGA cells compared with in shCtrl cells (Fig. 5m and and they are characterized by enhanced chromatin accessibility31. We
Extended Data Fig. 10p,q). Significant negative correlations between identified HBV–oncogene–ecDNA structures, and observed consistent
FGA and TYK2 phosphorylation, as well as between FGA and IL-6 con- elevated copy numbers and gene expression of HBV together with
centration, were confirmed in an independent HCC cohort (Fig. 5n and targeted oncogenes. These results revealed a mechanism of HBV inte-
Extended Data Fig. 10r). Taken together, our results support that FGA gration in HCC tumorigenesis.
is a tumour suppressor and FGA mutations could promote hepatocar- We report a comprehensive genomic landscape of HCC in Chinese
cinogenesis by activating the TYK2–STAT3–IL6 circuit, which could be a individuals covering multiple classes of somatic alterations. How these
potential target for HCC intervention and clinical treatment (Extended different genetic alterations cooperate with the diverse immune and
Data Fig. 10s). stromal cell types in the tumour microenvironment32 is worth in-depth
investigation. Collectively, our CLCA study is a valuable resource that
provides important biological insights into HCC carcinogenesis and
Discussion clinical implications to HCC diagnosis and treatment.
Here we depict a comprehensive whole-genome landscape of
HBV-enriched HCC in Chinese individuals. Our high-depth WGS
data enabled the identification of previously undescribed candidate Online content
non-coding drivers, mutational signatures and subclonal catastrophic Any methods, additional references, Nature Portfolio reporting summa-
events, and the pervasive contribution of non-coding events during ries, source data, extended data, supplementary information, acknowl-
HCC evolution. Many of our findings, including the SBS_H8 signature, edgements, peer review information; details of author contributions
HBV-ecDNA and distinct aetiology-related evolutionary histories, were and competing interests; and statements of data and code availability
highly dependent on the differences between tumours of Chinese and are available at https://doi.org/10.1038/s41586-024-07054-3.
non-Chinese individuals with HCC. These findings shed light on the
genomic alterations and processes that are enriched in the tumours
of Chinese individuals with HCC. On the other hand, many potential 1. Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and
mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249
driver events, including candidate driver genes, mutational processes (2021).
and clustered alterations were shared among our CLCA cohort, the 2. Llovet, J. M. et al. Hepatocellular carcinoma. Nat. Rev. Dis. Primers 7, 6 (2021).
PCAWG-HCC and TCGA-HCC cohort, suggesting universal processes 3. Villanueva, A. Hepatocellular Carcinoma. N. Engl. J. Med. 380, 1450–1462 (2019).
4. The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis
of HCC pathogenesis. In this regard, our findings of previously unde- of whole genomes. Nature 578, 82–93 (2020).
scribed non-coding candidates, signatures related to AA and aflatoxin, 5. Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.
and subclonal clustered alterations are largely due to the higher depth Nature 578, 102–111 (2020).
6. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature
of the CLCA compared with that of other HCC WGS studies (around 578, 94–101 (2020).
30–40×). These findings should therefore also apply to other HCC 7. Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578,
cohorts. Notably, 28 non-coding drivers identified in our cohort were 112–121 (2020).
8. Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128
previously unreported for HCC, suggesting that our understanding of (2020).
HCC genome is still very limited. 9. Fujimoto, A. et al. Whole-genome mutational landscape and characterization of noncoding
Although the PCAWG project has characterized 81 mutational signa- and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016).
10. Letouze, E. et al. Mutational signatures reveal the dynamic interplay of risk factors and
tures across human cancers6, we were able to identify five additional cellular processes during liver tumorigenesis. Nat. Commun. 8, 1315 (2017).
previously undescribed signatures in the CLCA cohort. This result 11. Gao, Q. et al. Integrated proteogenomic characterization of HBV-related hepatocellular
suggested that Chinese patients with HCC have a distinct mutational carcinoma. Cell 179, 561–577 (2019).
12. Sung, W. K. et al. Genome-wide survey of recurrent HBV integration in hepatocellular
background in comparison to the members of the cohorts of Japanese carcinoma. Nat. Genet. 44, 765–769 (2012).
and European individuals with HCC. Although SBS_H8 is distinct from 13. Kan, Z. et al. Whole-genome sequencing identifies recurrent mutations in hepatocellular
AA-related SBS_H2, significant co-occurrence between SBS_H8 and carcinoma. Genome Res. 23, 1422–1433 (2013).
14. Xue, R. et al. Variable intra-tumor genomic heterogeneity of multiple lesions in patients
SBS_H2 across the CLCA suggested that the underlying aetiological with hepatocellular carcinoma. Gastroenterology 150, 998–1008 (2016).
factors might often co-exist. Future experiments are needed to identify 15. Schulze, K. et al. Exome sequencing of hepatocellular carcinomas identifies new mutational
the aetiological factors of SBS_H8. signatures and potential therapeutic targets. Nat. Genet. 47, 505–511 (2015).
16. Imielinski, M., Guo, G. & Meyerson, M. Insertions and deletions target lineage-defining
The high-depth data enabled us to accurately determine the clonal genes in human cancers. Cell 168, 460–472 (2017).
composition of 494 tumours, resulting in the identification of a series 17. Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human
of subclonal events. Five out of eight non-coding drivers showed sig- cancer genomes. Cell 184, 2239–2254 (2021).
18. Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of
nificant enrichments of subclonal mutations. Mutational signatures somatic mutations in normal human skin. Science 348, 880–886 (2015).
also exhibited clonality preference, providing important clues for the 19. Tarabichi, M. et al. Neutral tumor evolution? Nat. Genet. 50, 1630–1633 (2018).
Nature | www.nature.com | 7
Article
20. Ng, S. W. K. et al. Convergent somatic mutations in metabolism genes in chronic liver 29. Guo, L. et al. Single-cell DNA sequencing reveals punctuated and gradual clonal evolution
disease. Nature 598, 473–478 (2021). in hepatocellular carcinoma. Gastroenterology 162, 238–252 (2022).
21. Kim, H. et al. Extrachromosomal DNA is associated with oncogene amplification and poor 30. Xue, R. et al. Genomic and transcriptomic profiling of combined hepatocellular and
outcome across multiple cancers. Nat. Genet. 52, 891–897 (2020). intrahepatic cholangiocarcinoma reveals distinct molecular subtypes. Cancer Cell 35,
22. Deshpande, V. et al. Exploring the landscape of focal amplifications in cancer using 932–947 (2019).
AmpliconArchitect. Nat. Commun. 10, 392 (2019). 31. Wu, S. et al. Circular ecDNA promotes accessible chromatin and high oncogene expression.
23. Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic Nature 575, 699–703 (2019).
event during cancer development. Cell 144, 27–40 (2011). 32. Xue, R. et al. Liver tumour immune microenvironment subtypes and neutrophil
24. Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 heterogeneity. Nature 612, 141–147 (2022).
(2013).
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
25. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
published maps and institutional affiliations.
26. Cortes-Ciriano, I. et al. Comprehensive analysis of chromothripsis in 2,658 human cancers
using whole-genome sequencing. Nat. Genet. 52, 331–341 (2020). Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this
27. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, article under a publishing agreement with the author(s) or other rightsholder(s); author
415–421 (2013). self-archiving of the accepted manuscript version of this article is solely governed by the
28. Satriano, L., Lewinska, M., Rodrigues, P. M., Banales, J. M. & Andersen, J. B. Metabolic terms of such publishing agreement and applicable law.
rearrangements in primary liver cancers: cause and consequences. Nat. Rev. Gastroenterol.
Hepatol. 16, 748–766 (2019). © The Author(s), under exclusive licence to Springer Nature Limited 2024
8 | Nature | www.nature.com
Methods used to process PCR duplicates for mapped BAM files. Somatic muta-
tions, including single-nucleotide variants (SNVs) and small inser-
Patient cohort of CLCA tions and deletions (indels), were called using two methods—Mutect2
Patients with HCC were enrolled from Eastern Hepatobiliary Surgery (v.4.0.11.0)33 and Strelka2 (v.2.8.4)34.
Hospital and Shanghai Zhongshan Hospital during 2017–2020. No For Mutect2, a panel of normals (PON) file was first created and
patients received any preoperative anti-cancer treatment. Each speci- somatic mutations were called by comparing each tumour sample with
men was diagnosed by two senior pathologists. Patients with tissue sam- its matched non-tumour counterpart and the PON file. We filtered any
ples that had sufficient and good-quality DNA were selected. In total, mutations with a ‘fragment_length’, ‘mapping_quality’, ‘strand_artifact’,
samples from 494 patients with HCC were processed for sequencing ‘base_quality’ or ‘read_position’. We selected mutations covered by ≥20
analysis, including WGS (n = 494) and RNA-seq (n = 239). Since this is an reads in the tumour and 10 reads in the normal samples, and excluded
observational study, no statitistical methods are used to predetermine mutations belonging to the ENCODE Data Analysis Consortium black-
sample size and no randomization is performed. The study is not an listed regions. For Strelka2, somatic mutations were called with the
intervention study and therefore blinding is not required. Detailed flag ‘PASS’. We added an additional quality filter to tighten filtering for
clinical information is summarized in Supplementary Table 1. DNA low allelic frequency variants: quality score × allele frequency > 1.3. We
from primary tumours and matched peripheral blood lymphocytes filtered any variant that was supported by three or more reads in the
was obtained. The study protocol was reviewed and approved by the reference sample in at least three patients. We also filtered indels that
institutional review board at Eastern Hepatobiliary Surgery Hospi- were three bases or longer where there was a PON-filtered indel of three
tal and Shanghai Zhongshan Hospital. This study was performed in bases or longer within ten bases in the same sample. The intersection
accordance with the principles of the Declaration of Helsinki. All of of the Mutect2 and Strelka results was used as the final set of somatic
the participants provided written informed consent. All of the samples mutations.
were anonymously coded in accordance with local ethical guidelines.
All research participants consent to the publication of research results. Identification of candidate drivers
We combined P values obtained from independent methods of driver
Cell lines discovery using the empirical Browns method as described in the
For the functional validation of three candidate drivers, the human liver PCAWG study of non-coding drivers5. Three methods of driver dis-
cancer cell lines PLC/PRF/5, PVTT, HepG2, Huh7, SNU387 and SNU182, covery were used for coding regions: MutSigCV35, dndsCV36 and Onco-
and the normal liver cell line HHL5 were obtained from Shanghai Cell driveFML37. We explored potential non-coding drivers by combining
Bank of the Chinese Academy of Sciences. PLC/PRF/5, PVTT, HepG2 four methods: MutSigCV-NC38, NBR5, ActiveDriverWGS39 and Onco-
and Huh7 cells were cultured in high-d-glucose Dulbecco’s modified driveFML37. All drivers were manually checked to filter false-positive
Eagle medium (DMEM, Gibco); and SNU387, SNU182 and HHL5 cells in ‘driver’ loci caused by the sequencing and mapping artefacts, inaccurate
RPMI 1640 medium (basal medium) containing 10% fetal bovine serum background models or local increases in mutations due to mutational
(FBS, Gibco), supplemented with 100 U ml−1 penicillin and 100 μg ml−1 processes that were unaccounted for, as previously reported5.
streptomycin.
For the validation of AA-related mutational signatures, MCF-10A and dN/dS analysis
HepG2 cells were obtained from the American Type Culture Collec- The dN/dS is the ratio between the rates of nonsynonymous and syn-
tion (ATCC). HepG2 cells were cultured as described above. MCF-10A onymous substitutions, and is used for assessing selection in cancer
cells were cultured in DMEM/F12 medium supplemented with 10% FBS, genomes as described previously17,18. In brief, dN/dS ratios can be cal-
10 ng ml−1 insulin, 20 ng ml−1 EGF, 0.5 µg ml−1 hydrocortisone, 50 ng µl−1 culated for different groups of mutations, such as clonal and subclonal
penicillin and 50 U ml−1 streptomycin. All of the cell lines used in this mutations in known cancer genes, yielding insights about the density
study were authenticated by applying short-tandem-repeat DNA pro- of driver mutations in each group of mutations. Using the dndscv R
filing, and were tested to be mycoplasma negative. All of the cell lines package36, dN/dS analysis was run on the clonal and subclonal muta-
were maintained at 37 °C in a humidified incubator with an atmosphere tions. A dN/dS ratio of more than 1 indicates positive selection, whereas
containing 5% CO2. smaller ratios characterize negative selection, and dN/dS ≈ 1 points
toward neutral evolutionary dynamics.
WGS
Fresh frozen tumour tissues and matched peripheral blood were col- TERT promoter mutation
lected from each patient. DNA was isolated using the DNeasy Blood To double check TERT promoter mutations, we performed targeted
& Tissue Kit (Qiagen). RNA was extracted using the RNeasy Mini Kit sequencing of TERT promoter mutations on tumour samples. The
(Qiagen). The DNA concentration was measured using Qubit 3.0 (Inv- library was constructed by two rounds of PCR amplification. The first
itrogen). The size of the DNA was checked using the Fragment Analyzer round of PCR used a barcoded primer targeting the TERT promoter,
(Advanced Analytical Technologies). DNA (200 ng to 1 μg) was sheared which yielded a product of 239 bp. The second round of PCR uses uni-
into fragments of approximately 300 bp using the Covaris S2 (Covaris) versal indexed primers, yielding a 333 bp product. The sequencing
ultrasonicator. The library was constructed using the NEBNext Ultra library was then pooled by mixing the PCR products with the same index
DNA Library Prep Kit for Illumina (New England Biolabs) according but with different barcodes. The library was then processed for quality
to the manufacturer’s protocol. The library (2 × 150 bp paired-end control and sequenced as described for WGS. The average sequencing
reads) was quality-checked and sequenced on the Illumina NovaSeq depth for the region is 378,535×. Data processing was performed the
(Illumina) system. same as for WGS, except that PCR duplicates were not removed.
Additional information
Acknowledgements We thank D. Li, S. Yin and C. Zhang for their support in gene editing and Supplementary information The online version contains supplementary material available at
the members of the Shanghai Key Laboratory of Hepato-biliary Tumour Biology and the Key https://doi.org/10.1038/s41586-024-07054-3.
Laboratory of Signaling Regulation and Targeting Therapy of Liver Cancer (SMMU) for their Correspondence and requests for materials should be addressed to Lei Chen, Lin Wu,
technical support. This work was supported by the National Natural Science Foundation of Steven G. Rozen, Fan Bai or Hongyang Wang.
China (81988101, T2125002, 82322047, 82241230, U21A20376, 81830054, 82173035, 82141103 Peer review information Nature thanks Lewis Roberts and the other, anonymous, reviewer(s)
and 82341007), the Innovation Program of Shanghai Municipal Education Commission for their contribution to the peer review of this work.
(21JC1406600 and 22140901000), Beijing Natural Science Foundation (Z220014), Beijing Nova Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | See next page for caption.
Article
Extended Data Fig. 1 | Comparison of CLCA with other HCC cohorts. test performed with the Benjamini–Hochberg method. A threshold of Q < 0.1 was
a, Comparison of clinical information between CLCA and PCAWG-HCC. DP, used for significance and denoted in blue. h, Two-sided Spearman correlation
double positive of HBV and HCV; DN, double negative of HBV and HCV. between the ratio of clonal drivers and tumour purity across all CLCA samples.
b, Sequencing depth of 494 tumours and their matched normal controls in The grey shaded area represents the 95% confidence interval. i, The dN/dS
CLCA. c, Relationships among driver genes using the DISCOVER mutual ratios for clonal and subclonal SNVs in 23 cancer coding drivers across our
exclusivity test. d-e, Venn plot showing the comparison of potential driver CLCA cohort. n denotes the total number of mutations for each category
genes identified in the TCGA-HCC, PCAWG-HCC, and our CLCA cohort. collected from 494 individual tumours. Centre points denote dN/dS values for
*Potential non-true drivers curated by PCAWG-HCC. f-g, Comparison missense, nonsense, splice site, and all mutations. Error bars denote the 95%
of frequency of potential drivers between CLCA and PCAWG-HCC (f) and confidence intervals. Red dashed line denotes dN/dS value of 1. j, Workflow for
TCGA-HCC (g), respectively. Two-sided Fisher’s exact test, multiple hypothesis mutational signature analysis in CLCA.
Extended Data Fig. 2 | Profiles of all mutational signatures in CLCA. classification of each mutation subtype in each plot. The cosine similarity
Mutational profiles of all signatures. SBS (single base substitution), DBS between each signature and its matched COSMICv3.2 signature is indicated.
(doublet base substitution), and ID (small insertion and deletion). Magnified Novel signatures are labelled in red.
versions of signatures SBS_H1, DBS_H1 and ID_H1 are shown to illustrate the
Article
Extended Data Fig. 5 | Survival, CNAs, HBV integrations and ecDNA. a, Multi- oncogenes in ecDNA compared with these not in ecDNA. In e-f, n denotes
variate analysis for OS and DFS. Multivariate Cox analysis was performed. biologically independent samples. Two-sided Wilcoxon rank-sum test. For
Hazard ratios with a 95% confidence interval are shown for each predictor and boxplots, centre line shows median, box limits indicate upper and lower
are plotted on a natural log scale. b, Significant CNAs identified by GISTIC quartiles, and whiskers extend 1.5 times the interquartile range, while data
analysis. Red for amplification and blue for deletion. Green lines denote the beyond the end of the whiskers are outlying points that are plotted individually.
threshold of Q value = 0.001. c, Hotspots of HBV integrations across CLCA. g, Comparison of the frequency of cases with kataegis events (denoted in
d, Top frequently amplified genes detected in ecDNA. e, Boxplots comparing blue) between patients with or without APOBEC signatures. Two-sided
the copy number of genes detected in ecDNA to others. f, Higher expression of Chi-square test.
Extended Data Fig. 6 | Patterns of SVs and clustered mutational genome-wide (top) and 2D density of juxtapositions (bottom) of SV,
processes. a, The number of SV events, focal CN segments, kataegis events, chromothripsis, and chromoplexy. d-e, Examples of chromothripsis (d) and
chromoplexy events, and chromothripsis events in the CLCA. b, Proportions chromoplexy (e) events involving oncogenes.
of different categories for each type of alteration. c, The density of breakpoints
Article
Extended Data Fig. 7 | Kataegis and evolutionary history. a, Rainfall plots of total number of individual tumours with the presence of the noted signature.
kataegis events. n denotes the total number of kataegis events detected in the For boxplots, centre line shows median, box limits indicate upper and lower
tumour and marked with arrows below. b, Distribution of point mutations over quartiles, and whiskers extend 1.5 times the interquartile range, while data
different mutation periods. c, Distribution of mutations across early clonal, beyond the end of the whiskers are outlying points that are plotted individually.
late clonal and subclonal stages, for drivers in CLCA. Barplots comparing Boxplots are ordered by the median and no statistical test is used. f, Preferential
the distribution of coding and noncoding mutations are shown, Two-sided ordering diagrams for patients stratified based on Cluster V, alcohol, and
Chi-square test. d, Relative ordering of CN events and driver mutations across smoking. The relative ordering of candidate drivers was compared.
all samples. e, Relative timing of signatures across all patients. n denotes the
Extended Data Fig. 8 | Dysregulated pathways. Each gene box includes integrations. Solid rectangles enclose genes in eight major signalling pathways.
the frequency of patients influenced by different types of somatic alterations Dashed rectangles enclose genes in specific signalling pathways. Interactions
affecting the corresponding gene. A total of eight forms of somatic alterations between genes are indicated. For each pathway, the frequencies of patients
are listed and colour-coded, including coding SNVs, noncoding SNVs (further altered by coding mutations only, noncoding mutations only, and both coding
divided into promoters, lncRNAs and UTRs), CNAs, ecDNA, SVs and HBV and noncoding mutations are denoted, as shown in the Venn diagram.
Article
Extended Data Fig. 9 | Functional validation of PPP1R12B and KCNJ12. sites in KCNJ12. h, RT-qPCR analysis of KCNJ12 mRNA expression across
a, Comparison of tumour migration, invasion, self-renewal and cell proliferation wild-type (WT) and point-mutated HepG2 cell lines. i-j, Comparison of the
capacities of PPP1R12B disruption across cell lines. b, Edited sites in PPP1R12B proliferation (i), migration, invasion, and self-renewal ( j) capacities across cell
by Prime Editing. c, RT-qPCR analysis of PPP1R12B mRNA expression across lines of indicated genotypes. For all panels, each experimental condition was
wild-type (WT) and point-mutated HepG2 cell lines. d-e, Comparison of the independently repeated for three times. Representative images of each assay
proliferation (d), migration, invasion, and self-renewal (e) capacities across cell are shown. Data are presented as mean ± s.e.m. In a, e, f, j, P values for the
lines of indicated genotypes. Representative images of each assay are shown comparison between a certain group with the control group are denoted on
for each cell line. f, Comparison of tumour migration, invasion, self-renewal the top of images. Two-way ANOVA test is used for proliferation analysis in
and cell proliferation capacities of KCNJ12 disruption across cell lines. g, Edited (a, d, f, i). For other plots, P value was derived with two-tailed Student’s t-test.
Extended Data Fig. 10 | See next page for caption.
Article
Extended Data Fig. 10 | Function validation of FGA. a, Lollipop plot of FGA array (n = 2 for each phosphorylated site or unphosphorylated protein).
mutations in CLCA. b, Overall survival of TCGA-HCC patients (n = 364) classified n, Western blot analysis of p-TYK2 (Y1054) and p-STAT3 (Y705) protein levels by
by FGA expression levels, Log-rank test. c, Comparison of FGA mRNA expression FGA knockdown in PLC/PRF/5 and PVTT cell lines. p-, phosphorylated. Source
between tumour and normal tissues in the TCGA-HCC cohort. For boxplots, gels in Supplementary Fig. 4. o, Representative images of cell migration assay
centre line shows median, box limits indicate upper and lower quartiles, and following inhibitor treatment. p, IL6 mRNA expression of sh-Ctrl and sh-FGA
whiskers extend 1.5 times the interquartile range, while data beyond the end of cells. q, IL6 mRNA levels between PLC/PRF/5- sh-Ctrl and sh-FGA cell lines
the whiskers are outlying points that are plotted individually. d, Representative following FBS stimulation. Cells were incubated in DMEM supplemented with
FGA IHC images of paired tumour and normal tissues. Quantitative result is 10% FBS for the indicated time intervals after treated with FBS-free medium
shown in Fig. 5d. e, Schematic of the edited site in the FGA noncoding region. overnight. r, Two-tailed Pearson correlation analysis of FGA protein and TYK2
f, Western blot analysis of FGA levels across wild-type and mutated HepG2 cell phosphorylation (n = 75) in an independent HCC patient cohort. The relative
lines. Source gels in Supplementary Fig. 3. g-h, Comparison of the proliferation intensity of FGA and p-TYK2 were normalized to β-actin. Source gels in
(g), migration, invasion, and self-renewal (h) capacities across cell lines of Supplementary Fig. 5. s, A proposed model illustrating the role of the FGA/
indicated genotypes. i-j, Comparison of tumour migration (i), invasion and TYK2/STAT3 axis during HCC tumorigenesis. Wildtype and mutated forms of
self-renewal, and cell proliferation ( j) capacities of FGA disruption across cell FGA were shown, respectively. The diagram was created using BioRender. For
lines. k, Resected xenograft tumours by sh-Ctrl (n = 6) and sh-FGA cells (n = 7) in all panels, n denotes biologically independent samples. Each experimental
PLC/PRF/5. l, Specific phospho-antibody array analysis between PLC/PRF/5-sh- condition was independently repeated three to five times. Data are presented
Ctrl and sh-FGA cell lines. Top significantly altered phosphorylation sites as mean ± s.e.m. In h and i, P value for the comparison between a certain group
among 156 phosphoproteins are listed. m, TYK2 phosphorylation and its with the control group are denoted on the top of images. Two-tailed Student’s
unphosphorylated counterpart between PLC/PRF/5- sh-Ctrl and sh-FGA cell t-test is used in (c, h, i, p, and q). Two-way ANOVA test is used in g and j.
lines determined with Cy3-labelled streptavidin via specific phospho-antibody
nature portfolio | reporting summary
Lei Chen, Lin Wu, Steven G. Rozen, Fan Bai,
Corresponding author(s): and Hongyang Wang
Last updated by author(s): Jan 2, 2024
Reporting Summary
Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
For the experimental section, the CCK8 signals for cell proliferation assay were detected with Synergy Neo microplate reader (BioTek). The
crystal-violet-stained images for colony formation, cell migration, and invasion assays were scanned by an Olympus IX73 microscope equipped
with an DP80 camera (Olympus). IHC slides were scanned by a Leica Aperio AT2. Immunoreactive bands were detected using e-BLOT Touch
Imager XLi or Odyssey Sa Infrared Imaging System (LI-COR Biosciences). Electrochemiluminescence (ECL) signals for IL-6 concentration in
tumor tissues were recorded on 1300 MESO QuickPlex SQ 120MM instrument (Meso Scale Discovery). RT-qPCR was performed on LightCycler
96 PCR platform (Roche). Phospho-specific protein microarray data was obtained with an Axon Instruments GenePix 4000B Microarray
Scanner.
Data analysis The Linux working environment we used for sequencing data analysis is packed into a Singularity container file and published at Zenodo
(https://doi.org/10.5281/zenodo.7260221). The detailed codes for all the software have been deposited at GitHub (https://github.com/
March 2021
ChongJenniferZhang/CLCA_WGS). Statistical analyses were performed using R (version 3.6.0) and GraphPad Prism (version 9.0).
1
OncodriveFML (v2.3.0) https://oncodrivefml.readthedocs.io/en/latest/index.html
MutSig2CV_NC (v1.0) https://github.com/broadinstitute/getzlab-PCAWG-MutSig2CV_NC
IHC images were analyzed by Aperio ImageScope v12.4.6(Leica). Band intensity of western blots were assessed by ImageJ 1.53a. RT-qPCR
were analyzed with LightCycler 96 SW 1.1 (Roche). Electrochemiluminescence (ECL) signals were analyzed with DISCOVERY WORKBENCH
Desktop Analysis Software version 4.0 (Meso Scale Discovery). Phospho-specific protein microarray data was analyzed with GenePix Pro 6.0.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A description of any restrictions on data availability
- For clinical datasets or third party data, please ensure that the statement adheres to our policy
The raw sequence data reported in this paper has been deposited in the Genome Sequence Archive in BIG Data Center, Beijing Institute of Genomics (BIG), Chinese
Academy of Sciences, under the study accession number PRJCA002666 (https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA002666). We also built an interactive
website (http://lifeome.net/database/liver) for visualizing and analyzing our CLCA data. The data deposited and made public are compliant with the regulations of
the Ministry of Science and Technology of China. Other public data used in this study includes, the human reference genome of hg19/GRCh37 (https://
ftp.ensembl.org/pub/grch37/), PCAWG data (https://dcc.icgc.org/pcawg/#!), TCGA-HCC data (https://portal.gdc.cancer.gov/projects/TCGA-LIHC), and COSMIC
signatures (https://cancer.sanger.ac.uk/signatures/).
Reporting on sex and gender CLCA cohort comprised 427 men (86.4%) and 67 women (13.6%), with a mean age of 56 years (range, 23–84 years). All
patients were enrolled from Eastern Hepatobiliary Surgery Hospital and Shanghai Zhongshan Hospital during 2017-2020.
Population characteristics 94.5% of patients had HBV infection. 85.6% of patients were Edmondson-Steiner grades 3 and 4. 26.7% and 36.8% of
patients had alcohol drinking and smoking history, respectively. Detailed clinical information was summarized in
Supplementary Table 1.
Recruitment All patients included where diagnosed with hepatocellular carcinoma. No patients received any pre-operative anti-cancer
treatment. Each specimen was diagnosed by two senior pathologists. Patients with tissue samples that had sufficient and
good-quality DNA were selected.
Ethics oversight The study protocol was reviewed and approved by the institutional review board at all participating hospitals. This study was
performed in accordance with the principles of the Declaration of Helsinki. All participants provided written informed
March 2021
consent. All samples were anonymously coded in accordance with local ethical guidelines. All research participants consent
to the publication of research results.
Note that full information on the approval of the study protocol must also be provided in the manuscript.
2
nature portfolio | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Data exclusions There is no data that were excluded from the WGS and RNA-seq analyses.
Replication No replication is needed for WGS and RNA-seq samples in our study since they are all clinical samples.
For experimental validation of potential drivers of PPP1R12B, KCNJ12, and FGA, dysfunctional cell lines were constructed by either knockdown
with two independent short hairpin RNA (shRNA, #1, #2) or knockout with two independent short guide RNA (sgRNA, #1, #2). Then these cell
lines were subjected for assessing proliferation, migration, invasion, and self-renewal capacities. Each assay was repeated three times
independently and representative images are shown.
Randomization No randomization was performed for the human tumor samples because this is an observational study. For xenograft models,
body weight-matched mice were randomized for subcutaneous injection of PLC/PRF/5-sh-Ctrl and PLC/PRF/5-sh-FGA cells.
Blinding Our study was not an intervention study and therefore blinding was not required.
Antibodies
Antibodies used Mouse monoclonal anti-β-actin (Cat# AC004, clone AMC0001, RRID:AB_2737399, blot 1:5000; ABclonal)
Mouse monoclonal anti-GAPDH (Cat# AC033, clone AMC0062, RRID:AB_2769570, blot 1:5000; ABclonal)
Rabbit polyclonal anti-TYK2 (Cat# 9312, RRID:AB_2256719, blot 1:1000; Cell Signaling Technology)
Rabbit monoclonal anti-Phospho-Tyk2 (Tyr1054/1055) (D7T8A) (Cat# 68790, clone D7T8A, RRID:AB_2799752, blot 1:1000, staining
1:100; Cell Signaling Technology)
Rabbit monoclonal anti-Phospho-Stat3 (Tyr705) (D3A7) XP (Cat# 9145, clone D3A7, RRID:AB_2491009, blot 1:2000; Cell Signaling
Technology)
March 2021
Mouse monoclonal anti-Lamin A/C(4C11) (Cat# 4777, clone 4C11, blot 1:2000; Cell Signaling Technology)
Mouse monoclonal anti-Fibrinogen α (C-7) (Cat# sc-398806, clone C-7, blot 1:500; Santa Cruz Biotechnology)
Rabbit polyclonal anti-Fibrinogen Alpha Chain (Cat# 20645-1-AP, RRID:AB_2878715, staining 1:100; Proteintech)
Mouse monoclonal anti-STAT3 (Cat# 60199-1-Ig, clone 3G2D12, RRID:AB_10913811, blot 1:2000; Proteintech)
Rabbit polyclonal anti-KI67 (Cat# ab15580, RRID:AB_443209, staining 1:500; abcam)
HRP-conjugated anti-Rabbit (Cat# D-3002; staining 1:1; Supervision)
HRP-conjugated Affinipure Goat Anti-Rabbit IgG(H+L) (Cat# SA00001-2, RRID:AB_2722564, blot 1:5000; Proteintech)
HRP-conjugated Affinipure Goat Anti-Mouse IgG(H+L) (Cat# SA00001-1, RRID:AB_2722565, blot 1:5000; Proteintech)
3
IRDye 800CW Goat anti-Rabbit IgG (H + L) (Cat# 926-32211, RRID:AB_621843,blot 1:20000; LI-COR)
IRDye 800CW Goat anti-Mouse IgG (H + L) (Cat# 926-32210, RRID:AB_621842, blot 1:20000; LI-COR)
Authentication All cell lines used in this study were authenticated by applying short tandem-repeat DNA profiling.
Mycoplasma contamination We confirm that all cells were tested as mycoplasma negative.
Laboratory animals BALB/c nude mice (5-7 weeks) were obtained from GemPharmatech LLC (JiangSu, China) and used for subcutaneous xenograft. All
mice were housed in pathogen free conditions at an ambient temperature 20-26°C and humidity of 30-70% with a 12:12 hour
light:dark cycle prior to use. Mice had unrestricted access to regular mouse chow and water. The tumour width (w) and length (l)
were measured every 3 days with a caliper, and the diameter of single tumour was < 2cm when sacrificed.
Reporting on sex Preliminary subcutaneous xenograft experiments were performed on male and female mice, respectively. Similar trends of sh-FGA
cells resulted in larger and more aggressive tumours in comparison with those of mice injected with sh-Ctrl cells were observed. To
exclude the potential confounding factors of aggression and biting in the male groups, only the female groups were kept and
recorded. The tumorigenic role of FGA dysfunction in HCC applies to box sexes.
Ethics oversight All mouse experiments were approved by the Animal Care and Use Committee at Eastern Hepatobiliary Surgery Hospital.
Note that full information on the approval of the study protocol must also be provided in the manuscript.
March 2021