0% found this document useful (0 votes)
57 views32 pages

Nature

The study performed deep whole-genome sequencing of 494 hepatocellular carcinoma tumors and matched control samples from Chinese individuals. This is one of the largest whole-genome analyses of HCC in Chinese patients to date. The sequencing identified over 9 million somatic mutations, including previously undescribed coding and non-coding driver candidates. Novel mutational signatures were also discovered. Pathway analysis linked non-coding mutations to dysregulation of liver metabolism. Further experiments showed that a candidate driver, fibrinogen alpha chain, regulates HCC progression and metastasis. The study provides insights into the genomic landscape and evolution of HCC in Chinese patients.

Uploaded by

zhouqunli514
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views32 pages

Nature

The study performed deep whole-genome sequencing of 494 hepatocellular carcinoma tumors and matched control samples from Chinese individuals. This is one of the largest whole-genome analyses of HCC in Chinese patients to date. The sequencing identified over 9 million somatic mutations, including previously undescribed coding and non-coding driver candidates. Novel mutational signatures were also discovered. Pathway analysis linked non-coding mutations to dysregulation of liver metabolism. Further experiments showed that a candidate driver, fibrinogen alpha chain, regulates HCC progression and metastasis. The study provides insights into the genomic landscape and evolution of HCC in Chinese patients.

Uploaded by

zhouqunli514
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Article

Deep whole-genome analysis of 494


hepatocellular carcinomas

https://doi.org/10.1038/s41586-024-07054-3 Lei Chen1,14 ✉, Chong Zhang2,14, Ruidong Xue3,4,14, Mo Liu5,14, Jian Bai6,14, Jinxia Bao7,14, Yin Wang6,14,
Nanhai Jiang5, Zhixuan Li1, Wenwen Wang8, Ruiru Wang6, Bo Zheng1,8, Airong Yang6, Ji Hu1,8,
Received: 13 June 2022
Ke Liu6, Siyun Shen1,8, Yangqianwen Zhang1, Mixue Bai1, Yan Wang6, Yanjing Zhu1,8,
Accepted: 10 January 2024 Shuai Yang1,8, Qiang Gao9, Jin Gu10, Dong Gao11, Xin Wei Wang12, Hidewaki Nakagawa13,
Ning Zhang3,4, Lin Wu6 ✉, Steven G. Rozen5 ✉, Fan Bai2 ✉ & Hongyang Wang1 ✉
Published online: xx xx xxxx

Check for updates


Over half of hepatocellular carcinoma (HCC) cases diagnosed worldwide are in China1–3.
However, whole-genome analysis of hepatitis B virus (HBV)-associated HCC in Chinese
individuals is limited4–8, with current analyses of HCC mainly from non-HBV-enriched
populations9,10. Here we initiated the Chinese Liver Cancer Atlas (CLCA) project and
performed deep whole-genome sequencing (average depth, 120×) of 494 HCC
tumours. We identified 6 coding and 28 non-coding previously undescribed driver
candidates. Five previously undescribed mutational signatures were found, including
aristolochic-acid-associated indel and doublet base signatures, and a single-base-
substitution signature that we termed SBS_H8. Pentanucleotide context analysis and
experimental validation confirmed that SBS_H8 was distinct to the aristolochic-acid-
associated SBS22. Notably, HBV integrations could take the form of extrachromosomal
circular DNA, resulting in elevated copy numbers and gene expression. Our high-
depth data also enabled us to characterize subclonal clustered alterations, including
chromothripsis, chromoplexy and kataegis, suggesting that these catastrophic events
could also occur in late stages of hepatocarcinogenesis. Pathway analysis of all
classes of alterations further linked non-coding mutations to dysregulation of liver
metabolism. Finally, we performed in vitro and in vivo assays to show that fibrinogen
alpha chain (FGA), determined as both a candidate coding and non-coding driver,
regulates HCC progression and metastasis. Our CLCA study depicts a detailed genomic
landscape and evolutionary history of HCC in Chinese individuals, providing important
clinical implications.

Previous genomic analyses of HCC in Chinese individuals are limited infection (94.5% versus 30.6%) and Edmondson–Steiner grades
in cohort size and focus mainly on the exome11–14, precluding detailed 3 and 4 (85.6% versus 12.1%), but lower proportions of hepatitis C virus
investigations at the whole-genome level. Recently, the Pan-Cancer (HCV) infection (2.6% versus 55.6%), alcohol drinking (26.7% versus
Analysis of Whole Genomes (PCAWG) Consortium analysed the 58.1%) and smoking (36.8% versus 53.6%) (Extended Data Fig. 1a,b,
genomic complexity of cancer at a considerable scale4–8. Neverthe- Supplementary Table 1 and Supplementary Note 1). These statistics
less, the relatively shallow sequencing depth could not fully resolve represent the epidemiology of the Chinese population with liver can-
the subclonal structure of the HCC genome. Here, in the CLCA, we cer, highlighting the necessity of the current study. After stringent
performed deep whole-genome sequencing (WGS) analysis of 494 quality control, a total of 9,287,828 somatic mutations was identified,
HCC tumours (average depth, 120×), as well as of the matched con- with a median of 13,735.5 mutations and 95 nonsynonymous muta-
trol blood samples (average depth, 36×). Our cohort comprised 427 tions for each tumour (Fig. 1). We also performed RNA sequencing
men (86.4%) and 67 women (13.6%). In comparison to the PCAWG-HCC (RNA-seq) analysis of 239 tumours from this cohort (Supplementary
(n = 248) cohort, the CLCA cohort had higher proportions of HBV Table 2).
1
National Center for Liver Cancer/Eastern Hepatobiliary Surgery Hospital, Shanghai, China. 2Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for
Genomics (ICG), School of Life Sciences, Peking University, Beijing, China. 3Peking University-Yunnan Baiyao International Medical Research Center, International Cancer Institute, Department
of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China. 4Translational Cancer Research Center, Peking University First Hospital,
Beijing, China. 5Centre for Computational Biology and Programme in Cancer & Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore. 6Berry Oncology Corporation, Beijing,
China. 7Model Animal Research Center, Medical School, Nanjing University, Nanjing, China. 8The International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery
Hospital, Shanghai, China. 9Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China. 10MOE Key Laboratory for
Bioinformatics, Department of Automation, Tsinghua University, Beijing, China. 11State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence
in Molecular Cell Science, CAS, Shanghai, China. 12Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA. 13Laboratory for Cancer
Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan. 14These authors contributed equally: Lei Chen, Chong Zhang, Ruidong Xue, Mo Liu, Jian Bai, Jinxia Bao, Yin Wang.
✉e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]

Nature | www.nature.com | 1
Article
a b 215,000
CLCA Point mutations
(HCC, n = 494) 51,000

No. of mutations
Coding Non-coding 50,000
5′ UTR 3′ UTR
25,000

ing
l
ho
Promoter lncRNA
V

V
HB

ok
HC

o
Alc
lncRNA promoter 0

Sm
1,300
Mutational signature 300

Rearrangement 0
Sex
HBV ecDNA Chromothripsis Hepatitis
Kataegis Chromoplexy BCLC
Cirrhosis/fibrosis
Evolutionary history
Edmondson
Multiple lesions
Percentage Smoking
Subclonal Clonal Alcohol
20 0 30 60 Recurrence
AAA
2.27 × 10–10 TP53 51
CTNNB1 21
AAA
1.36 × 10–4 ALB 15
A T G C G A C T AXIN1 12
ARID1A 10
AA
A

RB1 7.3
Deep WGS RNA-seq TSC2 6.3
(~120×, n = 494) (n = 239) ARID2 5.9

Coding (n = 23)
JAK1 5.7
KEAP1 5.5
BRD7 4.9
Point mutations FGA 4.5
TSC1 3.8
Coding, non-synonymous ACVR2A 3.4
Coding, non-synonymous PTEN 3.2
RPS6KA3 3.2
Sex Virus BCLC HNF1A 2.8
PRDM11 2.8
Female HCV 0 CDKN2A 2.2
Male HBV A CDKN1B 2.0
NBNC B BMP5 1.6
HBV and HCV C RPL22 1.2
ECHS1 0.4
Cirrhosis/fibrosis Edmondson TERT 35

Promoter
Normal Level I 3.24 × 10–54 ZNF595 12

(n = 6)
Fibrosis Level II 2.54 × 10–10 KCNJ12 7.3
Cirrhosis Level III 0.079 ALB 4.9
Level IV KHNYN 3.4
1.36 × 10–4 OR2A7 1.2
Others NEAT1 37
G035338 8.5
Multiple lesions Yes 2.17 × 10–16 Z95704.4 4.9

lncRNA
No

(n = 8)
Smoking RMRP 2.0
Alcohol NA G085970 1.8
Recurrence RN7SK 1.8
G032906 1.2
RNU12 1.2
Coding mutations 5.33 × 10–17 Z95704.4 5.1

(n = 4)
WT Start loss RP11−1151B14.3 4.3

lncP
Missense In-frame indel RMRP 4.3
Splice site Frameshift indel G085970 2.2
Stop loss 0.044 ADH1B 6.3
Stop gain 0.056 PPP1R12B 5.9
FGA 4.5

3′ UTR
(n = 8)
Non-coding mutations SEC14L2 4.3
SERPINA1 3.4
Promoter mutation ADH4 1.8
lncRNA mutation RABGEF1 1.8
lncRNA promoter mutation KCTD6 1.2
5′ UTR PPP1R10 1.6
3′ UTR HIST1H4C 1.2

5′ UTR
(n = 5)
POLR2A 1.2
Clonal status of mutations SERBP1 1.0
Clonal HIST1H1E 0.8
Subclonal
Group 1 (418) Group 2 (38) Group 3 (38)

Fig. 1 | Candidate driver landscape. a, The research strategy. The diagram gene symbols indicate previously undescribed drivers identified in the CLCA.
was created using BioRender. WT, wild type. b, The candidate driver Underlined drivers are those identified as a driver in different forms. Group 1
landscape of the CLCA. The top two graphs show the number of all mutations had drivers in both coding and non-coding regions, whereas group 2 had
and nonsynonymous mutations identified in each tumour, followed by drivers only in non-coding regions. Tumours in group 3 had no identified
annotation of clinical variables. BCLC, Barcelona Clinic Liver Cancer staging drivers but other somatic mutations. The number of individual tumours
system. In total, 23 candidate drivers identified in coding regions and 31 included is denoted for groups 1–3. The bar plot on the left shows the clonal
candidate drivers identified in non-coding regions are listed, and the mutational and subclonal mutational frequencies of each gene. Statistical analysis was
frequency (%) is shown next to the gene IDs. The mutation types are indicated performed using two-sided Fisher’s exact tests with the Benjamini–Hochberg
on the right and n denotes the number of drivers in the category. lncP, lncRNA multiple-hypothesis test. Q values are shown next to the bars. A threshold of
promoter; NBNC, double negative for HBV and HCV; NA, not available. Orange Q < 0.1 was used for significance.

Five genes were determined as driver events in different forms, indicat-


Candidate coding and non-coding drivers ing convergent evolution. FGA (encoding fibrinogen alpha chain) was
We identified 23 candidate coding drivers, including TP53, CTNNB1 and determined as both a candidate coding and non-coding (3′ UTR) driver.
ALB (Fig. 1 and Supplementary Note 2). CTNNB1 mutations were mutu- Mutations in the 3′ UTR of FGA were not enriched for 2–5 bp indels
ally exclusive to either TP53 or AXIN1 mutations (Extended Data Fig. 1c), (Q = 0.73) and are therefore not related to the transcription-associated
consistent with HCC in European individuals15. Compared with other indel signature16. With the exception of all three non-coding drivers
cohorts5, six previously undescribed candidate coding drivers were reported by PCAWG-HCC, including the TERT promoter, lncRNA NEAT1
identified in the CLCA, including FGA, HNF1A, PRDM11, CDKN1B, BMP5 and lncRNA promoter of RMRP, all other 28 events (90.3%) were previ-
and ECHS1 (Extended Data Fig. 1d–g). The mutational frequency of TP53 ously undescribed candidate non-coding drivers. These results confer a
is significantly higher in the CLCA compared with either PCAWG-HCC rich resource to investigate the contributions of non-coding mutations
or TCGA-HCC. By contrast, the mutational frequencies of six previously during hepatocarcinogenesis.
undescribed candidate coding drivers were comparable across the
three cohorts, indicating the prevalence of these candidate drivers.
A total of 31 candidate non-coding drivers was identified, including Clonality of candidate drivers
six promoters, eight long non-coding RNAs (lncRNAs), four lncRNA Ten candidate drivers showed significant clonality enrichment of
promoters, five 5′ untranslated regions (UTRs) and eight 3′ UTRs. mutations, including two coding and eight non-coding drivers (Fig. 1).

2 | Nature | www.nature.com
a C>A C>G C>T T>A T>C T>G b Q = 3.06 × 10–4
c
16 A C G T
SBS_H8 Q = 3.01× 10–3 G G G G
CTG 20

Percentage of
mutations (%)
12 T>A, 42.5%; T>C, 21.3%

C
A
Cosine similarity to SBS22: 0.71 CTC
8 10

C
C
CTA

T G
C C
4
0
0

>A
>G
>T

A
C
G
T>
T>
T>
C
C
Percentage of mutations (%)

C
24
SBS_H2 Q = 2.61 × 10–17

Percentage of
mutations (%)
18

Preceding bases
T>A, 85.4%; T>C, 6.6%
50
Cosine similarity to SBS22: 0.99

(5′ 2 bp)
12 40
30
6 20
0 10
0
24
AA-exposed cell lines Q = 2.6 × 10–323
Percentage of
mutations (%)

18 T>A, 94.4%; T>C, 5.6%


50 Trans.
Cosine similarity to SBS22: 0.98
12 40 Untrans.
30
6 Succeeding
20
(3′ 2-bp)
0 10
0
d AC>NN AT>NN CC>NN CG>NN CT>NN GC>NN TA>NN TC>NN TG>NN TT>NN
23 DBS_H2 (AA) f
Percentage of
mutations (%)

17

11
4
5

ID_H3
0
3
e 1 bp 1 bp >1 bp deletions >1 bp insertions Deletions with
deletions insertions at repeats at repeats microhomology
C T C T 2 3 4 5+ 2 3 4 5+ 2 3 4 5+ 3
32 2
ID_H3 (AA) 2
Percentage of
mutations (%)

24
1
16 1 DBS_H2
3 4 5 6
8 SBS_H2
CLCA tumours AA-exposed cell lines
0

Fig. 2 | Previously undescribed mutational signatures. a–c, Comparison of comparisons. Trans, transcribed strand; Untrans, untranscribed strand.
the mutational profile (a), transcriptional strand bias (b) and pentanucleotide d,e, The mutational profiles of the signatures DBS_H2 (d) and ID_H3 (e), both
context of T>A mutations (c) of SBS_H8, SBS_H2 and AA-exposed cell lines. related to AA. f, The correlation between the numbers of mutations associated
Cosine similarity to COSMIC SBS22 is denoted. Statistical analysis was performed with SBS_H2, DBS_H2 and ID_H3. The grey plane is the linear regression plane
using two-sided binomial tests with Benjamini–Hochberg correction for multiple with projection lines showing residuals (red, positive; blue, negative).

Two coding drivers, TP53 and ALB, were enriched with clonal mutations. SBS24 (Extended Data Fig. 3e,g), suggesting its relevance to aflatoxin
By contrast, 62.5% (5 out of 8) of non-coding drivers were enriched with exposure.
subclonal mutations, including the promoters of ZNF595, KCNJ12 and Notably, SBS_H8 was dominated by T>[A/C] mutations with signifi-
OR2A7, and lncRNA and lncRNA promoter of Z95704.4. No significant cant transcriptional strand bias (Fig. 2a–c). Although the pattern of T>A
association between tumour purity and the percentage of clonal driv- mutations in SBS_H8 was similar to that of aristolochic acid (AA)-related
ers was observed across our cohort (Extended Data Fig. 1h), showing COSMIC SBS22, SBS_H8 also contained a substantial proportion of T>C
that our clonality analysis is not confounded by tumour purity. The mutations (21.3%), together leading to an overall cosine similarity of
identification of subclonal non-coding drivers highlighted the strength 0.71 between SBS_H8 and SBS22. The low pentanucleotide context
of high-depth WGS data in investigating the non-coding genome, par- cosine similarity of 0.61 further supported that SBS_H8 was a novel
tially explained the low number of non-coding drivers identified in signature rather than a combination of SBS22 and other signatures
previous low-depth WGS studies, and motivated us to systematically (Extended Data Fig. 3b). SBS_H8 was present in 57.1% (282 out of 494) of
investigate the subclonal events in our cohort. Furthermore, a ratio CLCA cases, suggesting the prevalence of this previously undescribed
value of mutated nonsynonymous (dN) and synonymous (dS) sites signature of HCC in Chinese individuals. High co-occurrence between
(dN/dS) of higher than 1 for all mutations was observed for both clonal SBS_H8 and SBS_H2 (SBS22) indicated that the aetiological factor of
and subclonal coding drivers (Extended Data Fig. 1i), confirming that SBS_H8 might often co-exist with AA. SBS_H8 is present in only 1 out
these drivers are shaped by positive selection, consistent with previous of 326 (0.31%) PCAWG-HCC cases and potentially in chronic liver dis-
pan-cancer analyses17–19. ease20. These results supported the existence of this signature and its
enrichment in HCCs in Chinese individuals.
As for AA, we not only found the well-established SBS_H2, but also
SBS_H8 is a novel signature identified two previously undescribed types of AA signatures—DBS_H2
We identified 17 single-base substitution (SBS), 3 doublet-base sub- and ID_H3 (Fig. 2d,e). DBS_H2 consisted primarily of TA>NT, TC>AA,
stitution (DBS) and 8 small insertion-and-deletion (ID) signatures TG>AN and TT>AA mutations. ID_H3 showed mainly 1 bp and 2 bp dele-
(Extended Data Figs. 1j and 2–4). In comparison to COSMICv3.2, five tions in short repeats. Both DBS_H2 and ID_H3 were almost exclusively
signatures were novel (Supplementary Table 3 and Supplementary found in SBS_H2-positive (SBS22) tumours and were highly correlated
Note 3) containing one SBS signature: SBS_H8; two DBS signatures, with SBS_H2 activity (Fig. 2f). To test whether SBS_H2, DBS_H2 and
DBS_H1 and DBS_H2; and two ID signatures, ID_H3 and ID_H8. DBS_H1 ID_H3 are directly caused by AA exposure, we treated two cancer cell
consisted mainly of [C/G/T]C>NN mutations. This signature was found lines, MCF-10A and HepG2, with sublethal concentrations of AA1 (the
in most tumours and correlated with age as well as other age-related major component of AA). The mutational spectrum of each clone
signatures (Extended Data Fig. 3d,f). ID_H8 showed mostly 1 bp showed the presence of SBS_H2, DBS_H2 and ID_H3 (Supplementary
cytosine deletions and thymine insertions. It was exclusively found Fig. 1), confirming that these mutational signatures can be caused by
in SBS_H3-positive (COSMIC SBS24) tumours and correlated with AA exposure. These findings complemented the AA signature spectrum

Nature | www.nature.com | 3
Article
a b c d P = 0.0079
0.12 100 30

Copy number
27% ecDNA

of HBV
+ Yes 10
9% 0.09
22% + No

Percentage of PFS
+ + 3

Frequency
++
++ ++++++++++++++++++++++ ++
39% 0.06 + +++++++ ecDNA Others
2% P = 0.035 + ++++++++++++++ ++++ ++
(7) (45)

0.03
e 105 P = 0.031
Amplicon Number at risk

Expression
BFB 103
100 60 47 0

(TPM)
Circular (ecDNA) 0
0 285 193 167 0 10

CCND1
EXT1
MYC
RAD21
NDRG1
UBR5
COX6C
RECQL4
MUC1
TPM3
NCOA2
NTRK1
PBX1
PRCC
FCGR2B
HEY1
SDHC
CHCHD7
ARNT

MET
Heavily rearranged
Linear 0 500 1,000 1,500 0.1
No fSCNA Time (days)
Amplicon Others
(24) (215)

f CLCA_0324 amplicon 6 g CLCA_0203 amplicon 2


250

Copy number
Copy number
1,200 800 150

Coverage
Coverage

200
800 600
150 100
100 400
400 200 50
50
0 0 0 0
Chr. 5 TERT Chr. 12 Chr. 17 HBV TERT Chr. 5
CLCA_0109 amplicon 1
Chr. 19
5,000 1,000 Chr. 14

CIRCLE-seq
Copy number
800
Coverage

4,000 Chr. 13
3,000 600 Chr. 12
2,000 400 Chr. 10
1,000 200 Chr. 7
0 0
Chr. 1Chr. 2 GATA3 Chr. 10 12 13 14 21 HBV

Fig. 3 | ecDNA analysis. a, The proportion of different amplicons across the line shows median, the box limits indicate the upper and lower quartiles, and
CLCA cohort. Circular, breakage–fusion–bridge (BFB), heavily rearranged and the whiskers extend to 1.5× the interquartile range; data beyond the end of the
linear, and no focal somatic copy-number amplification detected (fSCNA) whiskers are outlying points that are plotted individually. n denotes biologically
amplicon categories are shown. b, The top frequently amplified genes detected independent samples. Statistical analysis was performed using two-sided
in ecDNA. c, Progression-free survival (PFS) of patients in the CLCA stratified Student’s t-tests. TPM, transcripts per million. f, Two representative ecDNA
by the existence of ecDNA. Statistical analysis was performed using log-rank amplicons involving HBV segments detected in two patients. g, CIRCLE-seq
tests. d,e, Comparison of the copy number (d) and RNA expression (e) of HBV reads supporting the structure of ecDNA. Chr., chromosome.
between circular amplicons and other amplicons. For the box plots, the centre

and revealed the diverse paths of AA mutagenesis. However, notably, of 76 oncogenes was detected in ecDNA, including HCC driver genes
SBS_H8 was not found in the mutational spectrum of AA1-treated cell such as MYC (Fig. 3b and Extended Data Fig. 5d). Oncogenes in ecDNA
clones (Fig. 2a–c), which further supported that SBS_H8 was not associ- had higher copy numbers and elevated gene expression compared with
ated with AA exposure. their counterparts not in ecDNA (Extended Data Fig. 5e,f). The presence
Unsupervised hierarchical clustering based on mutational signa- of ecDNA was associated with a poor prognosis (Fig. 3c and Extended
tures classified 494 tumours into 5 clusters (Extended Data Fig. 3h Data Fig. 5a). Notably, we identified ecDNAs incorporating HBV seg-
and Supplementary Note 4). SBS_H8 contributed most to cluster V, ments (HBV-ecDNA) in seven patients (Fig. 3d–f) affecting well-known
which was enriched with CTNNB1 mutations (Extended Data Fig. 3i,j). oncogenes such as TERT. HBV segments in ecDNA showed an elevated
Higher percentages of SBS_H8 were significantly associated with poorer number of copies, as well as increased expression levels. Despite the
prognosis (Extended Data Figs. 3k and 5a), implying that the underly- fact that HBV-TERT integration has been identified in HCC, our results
ing aetiology of SBS_H8 might be a carcinogen of the liver. We also demonstrated that these integrations can exploit the circular structure
analysed the contribution of mutational processes to driver genes of ecDNA and therefore amplify to hundreds of copies. The existence of
and hotspot mutations (Extended Data Fig. 4). Focusing on SBS_H8, ecDNA was successfully validated (Fig. 3g). Collectively, these results
JAK1 and CTNNB1 were the top coding drivers and the ALB promoter suggest that ecDNA-based amplification22 may have an important role
was the top non-coding driver. Multiple mutation hotspots of CTNNB1, in HBV-associated HCC.
JAK1S729C and TP53H193R were affected by SBS_H8. Moreover, multiple
hotspots of TP53 were associated with aflatoxin, while the TP53H179L
hotspot was associated with AA exposure. SBS_H8, as well as other Subclonal catastrophic events
signatures related to exogenous factors such as SBS_H2 (AA), SBS_H3 Clustered mutational processes, including chromothripsis23, chromo-
(aflatoxin), DBS_H2 (AA), ID_H3 (AA), SBS_H10 (tobacco) and ID_H8 plexy7,24 and kataegis25, are genomic alterations that are often generated
(aflatoxin), were enriched for clonal mutations compared with sub- in a single catastrophic event. These alterations are often described as
clonal mutations, suggesting that they occurred at earlier stages of clonal events and support the punctuated evolution of tumours24,25.
tumorigenesis. Whether these clustered alterations could be subclonal events and
occur late during tumour evolution remains less explored. We inves-
tigated the clonal status of these events with our high-depth WGS data
HBV integration in ecDNA of the CLCA (Extended Data Fig. 6).
Our deep WGS data enabled a comprehensive profiling of genomic We observed chromothripsis in 30.2% of cases (Supplementary
rearrangements, including copy-number alterations (CNAs), structural Table 4), comparable to that of PCAWG-HCC (32.2%)26. Among those,
variations (SVs), HBV integrations, extrachromosomal circular DNA 61% of high-confidence events affected multiple chromosomes
(ecDNA) and three forms of clustered alterations—kataegis, chromo- (for example, CLCA_0119), whereas 22% affected only a single chromo-
thripsis and chromoplexy (Extended Data Figs. 5 and 6 and Supplemen- some (for example, CLCA_0090) (Fig. 4a). Chromoplexy was observed
tary Note 5). ecDNA was detected in 27.3% of CLCA tumours (Fig. 3a in 10.1% of CLCA cases; 8.3% of cases contained a single event (such
and Supplementary Table 4), significantly higher than that reported in as CLCA_0489) and 1.8% contained multiple events (for example,
PCAWG-HCC (13.1%, P = 3 × 10−4; two-sided Fisher’s exact test)21. A total CLCA_0232) (Fig. 4b). In total, 364 kataegis events were identified in

4 | Nature | www.nature.com
a Single chromosome Multi-chromosome CLCA_0119 b CLCA_0489 (single) CLCA_0232 (multiple)
CLCA_0090
14 15 16 X Y 1 X Y 1
13 17 22 22
12 18 21 21
20 2 20 2
19 19 19
11 20 18 18
22 3 3
CN state 17 17
10
CN > 2
X
CN = 2
16 4 16 4
9 CN < 2 15 15
1 Patterns of SVs 14 5 14 5
Chr. 1 8
Head to head (+/+)
Tail to tail (–/–) 13 13
7 6 6
2 Deletion like (+/–) 12 12
6 Duplication like (–/+) 7 7
11 11
5 4 10 8 10 8
9 9

c d f
CLCA_0247 chr. 1 (single) CLCA_0285 chr. 5 (multiple)
71% Kataegis Chromoplexy Timing
2.4 × 106 Clonal Clonal early
1.8 × 106
16% Subclonal Chromothripsis Clonal late
per Mb

per Mb
Reads

Reads

Clonal
0 0 Mix unspecified
13% Kataegis
8 8 Subclonal
log10 (intervariant

log10 (intervariant

0 0.25 0.50 0.75 1.00


6 6 e CLCA_0009 CLCA_0428 Percentage
distance)

distance)

No. of mutations
4 4 4,000 Kataegis 25
3,000
Others 20 g
Clonal/subclonal
2 2 15
2,000 10 Chromothripsis
0 0 1,000 5
1.0 1.0 0 Chromoplexy
0
0.8 0.8 Kataegis
0.2 0.6 1.0 0.2 0.6 1.0
BAF

0.6
BAF

0.6
0.4 0.4 Cancer cell fraction Cancer cell fraction
0.2 0.2 0.3 1.0 3.0
0 3
0 kataegis events Early/late
8 12 15
Copy number

Copy number

6 2 Chromothripsis
No. of

8 10
4 Chromoplexy
4 5 1
2
0 0 Kataegis
0 0
0 50 100 150 200 250 0 50 100 150 200 30 50 70 90 30 50 70 90 0.3 1.0 3.0
Position (Mb) Position (Mb) Depth Depth Odds ratio

Fig. 4 | Genomic rearrangement. a, Circos plots for chromothripsis events. distribution of non-kataegis and kataegis mutations. Bottom, the detected
CN, copy number. b, Circos plots for chromoplexy events. Arcs in the same colour kataegis events at different sequencing depths (simulated in silico). f, The
denote regions that are involved in the same chromoplexy event. c, Rainfall timing of three types of clustered alteration events. g, The relative odds of
plots for kataegis events and related SVs and CNAs. BAF, B allele frequency. clustered alterations being clonal or subclonal are shown with bootstrapped
d, The clonal status composition of kataegis events. Mixed events are indicated 95% confidence intervals (top). Bottom, the relative odds of the events being
in grey. e, The clonal status of kataegis events. Top, the cancer cell fraction early or late clonal are shown as above.

33.6% of CLCA cases, and 14.6% of cases had multiple kataegis events. more to the subclonal diversification (Extended Data Fig. 7c). In the
We observed the occurrence of kataegis and oscillations in copy-number CLCA, the earliest events were PPP1R12B 3′ UTR mutation and 17p loss,
states, suggesting that localized hypermutation could be associated followed by mutations in TP53, ARID2 and the ADH1B 3′ UTR (Extended
with regional SVs and chromothripsis27 (Fig. 4c and Extended Data Data Fig. 7d). By contrast, TP53 mutation was found to be the earliest
Fig. 7a). Kataegis events were highly enriched in cases with APOBEC mutational event in the PCAWG8. Notably, TERT promoter mutations
signatures (Extended Data Fig. 5g). In total, 46 (13%) kataegis events were among the latest events, which was distinct to the observation
occurring in 32 cases (6.5%) were subclonal events (Fig. 4d). This result that TERT promoter was an early event in HCC in European individuals10.
was distinct to that reported by PCAWG-HCC, in which all kataegis These results revealed the distinct evolutionary history of the Chinese
events were clonal events, suggesting that kataegis may be subclonal CLCA HCC cohort and highlighted the early and pervasive contributions
and occur late during hepatocarcinogenesis. In silico analysis further of non-coding mutations during HCC progression. Moreover, the SBS
showed that the detected number of kataegis events increased along signatures related to tobacco, aflatoxin and AA exposure (SBS_H10,
with the sequencing depth (Fig. 4e), corroborating that our high-depth SBS_H3 and SBS_H2), as well as the previously undescribed signature
WGS enabled the detection of subclonal kataegis events. Furthermore, SBS_H8, tended to occur early across all cases (Extended Data Fig. 7e),
timing analysis showed that 15.1% of kataegis, 67.2% of chromothripsis consistent with that shown in Extended Data Fig. 4h. Furthermore, strat-
and 62.7% of chromoplexy events were determined to be subclonal ification based on cluster V (SBS_H8), alcohol and smoking revealed
events, respectively (Fig. 4f). Although all of these forms of clustered distinct evolutionary histories associated with aetiology (Extended
alterations tended to be clonal rather than subclonal, the broad distri- Data Fig. 7f and Supplementary Fig. 2). Notably, FGA mutations were
bution of odds ratios suggests that these events could occur at various among the earliest drivers in patients in cluster V, patients who drink
timings during tumorigenesis (Fig. 4g). alcohol and patients who smoke.

Pervasive non-coding drivers Metabolic dysregulation


Reconstruction of the evolutionary history of the CLCA catego- Signalling pathway analysis revealed the higher contributions of
rized 44.98% point mutations as subclonal, in contrast to that of 8% non-coding mutations compared with coding mutations in RTK–RAS–
in PCAWG-HCC (Extended Data Fig. 7b, Supplementary Table 5 and MAPK (22.1% versus 6.5%), telomere maintenance (34% versus 1.4%) and
Supplementary Note 6). In comparison to candidate coding drivers, liver metabolism (23.1% versus 18.2%), respectively (Extended Data
candidate non-coding drivers were more enriched in the subclonal Fig. 8, Supplementary Table 6 and Supplementary Note 7). Particularly,
category, suggesting that candidate non-coding drivers may contribute for liver metabolism, a total of 15 potential driver genes was included.

Nature | www.nature.com | 5
Article
a CLCA
b CLCA
c CLCA
d CLCA
e f
5.5 P = 1.06 × 10–5 FGA prime editing

Relative RNA expression


5 P = 2.40 × 10–8 4 P = 0.0299 0.75 P = 0.0077 1.2
Exon 1 Exon 5 Exon 6

FGA intensity by WB
log10[FGA TPM + 1]

log10[FGA TPM + 1]
4.4 4 ** *

P = 9.48 × 10–17
P = 3.86 × 10–15
P = 1.58 × 10–14
3

FGA by IHC
0.50 P289L P362R D793H 0.8
3.3 3

Unedited
2
2.2 2
0.25 0.4
1.1 1 1

0 0

Edited
0 0 0
Altered WT Tumour Normal Tumour Normal Tumour Normal

P2 T
P3 L

H
D7 R
W
89

93
62
(n = 134) (n = 105) (n = 48) (n = 48) (n = 47) (n = 47) (n = 39) (n = 39)

g h P value j l Whole Cytoplasm Nuclear

D793H
P362R
P289L

1.45 × 10–16
1,500 shCtrl

P = 8.34 × 10–9
21

3.54 × 10–37
Relative proliferation
WT
D793H shFGA

volume (mm3)
3.61 × 10–38

sh A

sh A

A
(kDa)

sh rl

sh rl

sh rl
FG

FG

FG
P362R

t
C

C
rate (CCK8)

Tumour
130 14

sh
Long P289L shCtrl (kDa)
100 pTYK2
FGA

WT (Y1054) 130
Short 70
7 shFGA 0 TYK2
5 26 130
β-Actin 40 Time (days)
pSTAT3 100
0
0 1 2 3 4 5 (Y705)
Time (days) 100
STAT3
k shCtrl shFGA
GAPDH
35
i WT P289L P362R D793H
Lamin A
0.0015 0.0008 0.0006

H&E
Lamin C 70
Migration

m n

P = 1.86 × 10–5
40 shCtrl R = –0.262
0.0002 0.0002 0.0006 Ki-67 shFGA P = 0.0273
2.5

IL-6 (ng ml–1)


Invasion

P = 0.001

intensity
FGA
pTYK2 (Y1054)

4.05 × 10–5 0.0004 0.0001


Self-renewal

0 0
0 100

/5

TT
IL-6 concentration

RF

PV
in tissue (pg ml–1)

/P
C
PL
Fig. 5 | FGA dysfunction facilitates HCC progression. a,b, FGA expression eosin (H&E) and immunohistochemistry staining of tumour samples in j. Scale
between altered and WT tumours (a) and between paired tumours and normal bars, 200 μm (main images) and 25 μm (magnified images). l, The subcellular
tissues (b). c,d, FGA protein in paired tumour and normal samples was compared localization of pTYK2 and pSTAT3. GAPDH (cytoplasmic reference) and lamin
using western blot (WB; c) and immunohistochemistry (IHC; d) analysis. e, Sanger A/C (nuclear reference). m, The IL-6 concentration in the supernatant. n = 3
sequencing plots of edited sites in the FGA coding region. f, Quantitative PCR per group. n, Two-tailed Pearson correlation analysis of FGA protein and IL-6
with reverse transcription (RT–qPCR) analysis of FGA mRNA across HepG2 concentration (n = 71). For all panels, n denotes biologically independent
WT and mutated cell lines. n = 3 per group. g, Western blot analysis of FGA. samples. For the box plots in a–c, the centre line shows the median, the box
h,i, Comparison of the proliferation (h), and migration, invasion and limits indicate the upper and lower quartiles, and the whiskers extend to
self-renewal (i) abilities across FGA-edited cell lines. Each assay was repeated 1.5× the interquartile range; data beyond the whiskers are outlying points.
three times independently and representative images are shown. For i, scale For f, h, j and m, data are mean ± s.e.m. Statistical analysis was performed using
bars, 100 μm (top and middle) and 3 mm (bottom). j, In vivo cell proliferation two-sided Student’s t-tests (a, f, i and m), two-sided paired t-tests (b–d) and
assay comparing xenograft tumours of shCtrl (n = 6) and shFGA (n = 7) PLC/ two-way analysis of variance (h and j). Gel source data are provided in
PRF/5 cells. Growth curves are shown. k, Representative haematoxylin and Supplementary Figs. 3–5.

These alterations affected various metabolic programs, including lead to lower mRNA expression and were enough to cause phenotypic
hepatic metabolism (APOB, ALB and HNF1A), oxidative stress (KEAP1 changes (Extended Data Fig. 9b–e). KCNJ12 disruption significantly
and NFE2L2), urea metabolism (CPS1), alcohol metabolism (ADH1B impaired tumour migration, invasion, self-renewal and cell prolif-
and ADH4), fatty acid metabolism (SERPINA1 and SERBP1) and hypoxia eration (Extended Data Fig. 9f). Point mutations in KCNJ12 lead to a
(ARNT). FGA in the JAK–STAT pathway also has a role in hepatic metab- higher level of mRNA expression and subsequent phenotypic changes
olism. Given that the liver is a key metabolic organ and metabolism (Extended Data Fig. 9g–j). These data validated that PPP1R12B and
dysregulation is an important feature of liver cancer20,28, this result KCNJ12 are non-coding drivers of HCC.
underlined the necessity of weighting the contribution of non-coding
alterations to investigate the metabolic status of HCC.
FGA dysfunction promotes HCC
Next, we investigated the biological functions of a candidate driver,
KCNJ12 and PPP1R12B FGA, which was determined independently as both a candidate coding
To investigate whether the candidate non-coding drivers have tumo- and non-coding driver (Fig. 5 and Extended Data Fig. 10a). In the CLCA,
rigenic functions, we selected three representative drivers to perform FGA alterations, including point mutations, loss of heterozygosity and
functional assays, including KCNJ12 (potassium inwardly rectifying copy-number loss could all result in reduced expression level (Fig. 5a).
channel subfamily J member 12), PPP1R12B (protein phosphatase 1 Meanwhile, the mRNA and protein levels of FGA were lower in tumours
regulatory subunit 12B) (Extended Data Fig. 9, Supplementary Table 7 compared with the levels in normal tissues (Fig. 5b–d and Extended Data
and Supplementary Note 8) and FGA (Fig. 5, Extended Data Fig. 10 and Fig. 10b–d). Furthermore, the rate of biallelic inactivation for FGA was
Supplementary Figs. 3–5). PPP1R12B is one of the earliest driver events, comparable to other recurrently mutated tumour suppressor genes
whereas KCNJ12 is one of the latest driver events during the evolu- of HCC in the CLCA (Supplementary Table 1). We therefore speculated
tionary history of HCC. Low expression of PPP1R12B significantly that FGA is a tumour suppressor gene and explored the potential role
enhanced tumour migration, invasion, self-renewal and cell prolif- of FGA dysfunction in HCC progression.
eration (Extended Data Fig. 9a). Using the prime editing technology, Induction of FGA point mutations leads to lower mRNA and pro-
we showed that point mutations of PPP1R12B identified in the CLCA tein expression and enhanced tumour progression (Fig. 5e–i and

6 | Nature | www.nature.com
Extended Data Fig. 10e–h). Consistent phenotypes were confirmed relative timing of diverse underlying aetiological factors. The identifica-
in FGA-disrupted cell lines (Extended Data Fig. 10i,j). Furthermore, an tion of subclonal kataegis, chromothripsis and chromoplexy showed
in vivo assay by subcutaneous injection of short hairpin RNA against that these catastrophic genomic alterations could occur with variable
FGA (shFGA) cells into BALB/c nude mice resulted in larger and more timing during HCC evolution, consistent with the reported combined
aggressive tumours in comparison to those of mice injected with punctuated and gradual clonal evolution in HCC29. Furthermore, mul-
shCtrl cells (Fig. 5j,k and Extended Data Fig. 10k). Phosphorylated tiple non-coding drivers were mapped to the evolutionary history of
tyrosine kinase 2 (pTYK2) and its target protein signal transducer CLCA tumours, while the PCAWG reports only one non-coding driver.
and activator of transcription 3 (STAT3, Tyr705) were identified Our results reconstructed a high-resolution evolutionary history
as the top downstream signals of FGA (Extended Data Fig. 10l–n). for HCC.
We also found that pTYK2 accumulated more in the cytoplasm than in HBV integration has been extensively reported in the HBV-positive
the nucleus (Fig. 5l). A specific inhibitor of pTYK2 (BMS-986165), rather tumours of Chinese patients with liver cancer, with hotspots iden-
than AKT inhibitors, attenuated the migration ability of shFGA cells tified in TERT and KMT2B12,30. However, the manner in which these
(Extended Data Fig. 10o). These results suggested that FGA dysfunc- integrations localize in the genome has not been comprehensively
tion might not activate AKT signalling in HCC. We further checked the assessed. Here we showed that these HBV integrations could be cyclized
expression of interleukin-6 (IL-6), a downstream signal of STAT3. The as ecDNAs. ecDNA amplifications lead to higher levels of oncogene
levels of IL6 mRNA and cellular supernatant IL-6 protein were signifi- transcription in comparison to copy-number-matched linear DNA21
cantly higher in shFGA cells compared with in shCtrl cells (Fig. 5m and and they are characterized by enhanced chromatin accessibility31. We
Extended Data Fig. 10p,q). Significant negative correlations between identified HBV–oncogene–ecDNA structures, and observed consistent
FGA and TYK2 phosphorylation, as well as between FGA and IL-6 con- elevated copy numbers and gene expression of HBV together with
centration, were confirmed in an independent HCC cohort (Fig. 5n and targeted oncogenes. These results revealed a mechanism of HBV inte-
Extended Data Fig. 10r). Taken together, our results support that FGA gration in HCC tumorigenesis.
is a tumour suppressor and FGA mutations could promote hepatocar- We report a comprehensive genomic landscape of HCC in Chinese
cinogenesis by activating the TYK2–STAT3–IL6 circuit, which could be a individuals covering multiple classes of somatic alterations. How these
potential target for HCC intervention and clinical treatment (Extended different genetic alterations cooperate with the diverse immune and
Data Fig. 10s). stromal cell types in the tumour microenvironment32 is worth in-depth
investigation. Collectively, our CLCA study is a valuable resource that
provides important biological insights into HCC carcinogenesis and
Discussion clinical implications to HCC diagnosis and treatment.
Here we depict a comprehensive whole-genome landscape of
HBV-enriched HCC in Chinese individuals. Our high-depth WGS
data enabled the identification of previously undescribed candidate Online content
non-coding drivers, mutational signatures and subclonal catastrophic Any methods, additional references, Nature Portfolio reporting summa-
events, and the pervasive contribution of non-coding events during ries, source data, extended data, supplementary information, acknowl-
HCC evolution. Many of our findings, including the SBS_H8 signature, edgements, peer review information; details of author contributions
HBV-ecDNA and distinct aetiology-related evolutionary histories, were and competing interests; and statements of data and code availability
highly dependent on the differences between tumours of Chinese and are available at https://doi.org/10.1038/s41586-024-07054-3.
non-Chinese individuals with HCC. These findings shed light on the
genomic alterations and processes that are enriched in the tumours
of Chinese individuals with HCC. On the other hand, many potential 1. Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and
mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249
driver events, including candidate driver genes, mutational processes (2021).
and clustered alterations were shared among our CLCA cohort, the 2. Llovet, J. M. et al. Hepatocellular carcinoma. Nat. Rev. Dis. Primers 7, 6 (2021).
PCAWG-HCC and TCGA-HCC cohort, suggesting universal processes 3. Villanueva, A. Hepatocellular Carcinoma. N. Engl. J. Med. 380, 1450–1462 (2019).
4. The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis
of HCC pathogenesis. In this regard, our findings of previously unde- of whole genomes. Nature 578, 82–93 (2020).
scribed non-coding candidates, signatures related to AA and aflatoxin, 5. Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.
and subclonal clustered alterations are largely due to the higher depth Nature 578, 102–111 (2020).
6. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature
of the CLCA compared with that of other HCC WGS studies (around 578, 94–101 (2020).
30–40×). These findings should therefore also apply to other HCC 7. Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578,
cohorts. Notably, 28 non-coding drivers identified in our cohort were 112–121 (2020).
8. Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128
previously unreported for HCC, suggesting that our understanding of (2020).
HCC genome is still very limited. 9. Fujimoto, A. et al. Whole-genome mutational landscape and characterization of noncoding
Although the PCAWG project has characterized 81 mutational signa- and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016).
10. Letouze, E. et al. Mutational signatures reveal the dynamic interplay of risk factors and
tures across human cancers6, we were able to identify five additional cellular processes during liver tumorigenesis. Nat. Commun. 8, 1315 (2017).
previously undescribed signatures in the CLCA cohort. This result 11. Gao, Q. et al. Integrated proteogenomic characterization of HBV-related hepatocellular
suggested that Chinese patients with HCC have a distinct mutational carcinoma. Cell 179, 561–577 (2019).
12. Sung, W. K. et al. Genome-wide survey of recurrent HBV integration in hepatocellular
background in comparison to the members of the cohorts of Japanese carcinoma. Nat. Genet. 44, 765–769 (2012).
and European individuals with HCC. Although SBS_H8 is distinct from 13. Kan, Z. et al. Whole-genome sequencing identifies recurrent mutations in hepatocellular
AA-related SBS_H2, significant co-occurrence between SBS_H8 and carcinoma. Genome Res. 23, 1422–1433 (2013).
14. Xue, R. et al. Variable intra-tumor genomic heterogeneity of multiple lesions in patients
SBS_H2 across the CLCA suggested that the underlying aetiological with hepatocellular carcinoma. Gastroenterology 150, 998–1008 (2016).
factors might often co-exist. Future experiments are needed to identify 15. Schulze, K. et al. Exome sequencing of hepatocellular carcinomas identifies new mutational
the aetiological factors of SBS_H8. signatures and potential therapeutic targets. Nat. Genet. 47, 505–511 (2015).
16. Imielinski, M., Guo, G. & Meyerson, M. Insertions and deletions target lineage-defining
The high-depth data enabled us to accurately determine the clonal genes in human cancers. Cell 168, 460–472 (2017).
composition of 494 tumours, resulting in the identification of a series 17. Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human
of subclonal events. Five out of eight non-coding drivers showed sig- cancer genomes. Cell 184, 2239–2254 (2021).
18. Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of
nificant enrichments of subclonal mutations. Mutational signatures somatic mutations in normal human skin. Science 348, 880–886 (2015).
also exhibited clonality preference, providing important clues for the 19. Tarabichi, M. et al. Neutral tumor evolution? Nat. Genet. 50, 1630–1633 (2018).

Nature | www.nature.com | 7
Article
20. Ng, S. W. K. et al. Convergent somatic mutations in metabolism genes in chronic liver 29. Guo, L. et al. Single-cell DNA sequencing reveals punctuated and gradual clonal evolution
disease. Nature 598, 473–478 (2021). in hepatocellular carcinoma. Gastroenterology 162, 238–252 (2022).
21. Kim, H. et al. Extrachromosomal DNA is associated with oncogene amplification and poor 30. Xue, R. et al. Genomic and transcriptomic profiling of combined hepatocellular and
outcome across multiple cancers. Nat. Genet. 52, 891–897 (2020). intrahepatic cholangiocarcinoma reveals distinct molecular subtypes. Cancer Cell 35,
22. Deshpande, V. et al. Exploring the landscape of focal amplifications in cancer using 932–947 (2019).
AmpliconArchitect. Nat. Commun. 10, 392 (2019). 31. Wu, S. et al. Circular ecDNA promotes accessible chromatin and high oncogene expression.
23. Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic Nature 575, 699–703 (2019).
event during cancer development. Cell 144, 27–40 (2011). 32. Xue, R. et al. Liver tumour immune microenvironment subtypes and neutrophil
24. Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 heterogeneity. Nature 612, 141–147 (2022).
(2013).
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
25. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
published maps and institutional affiliations.
26. Cortes-Ciriano, I. et al. Comprehensive analysis of chromothripsis in 2,658 human cancers
using whole-genome sequencing. Nat. Genet. 52, 331–341 (2020). Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this
27. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, article under a publishing agreement with the author(s) or other rightsholder(s); author
415–421 (2013). self-archiving of the accepted manuscript version of this article is solely governed by the
28. Satriano, L., Lewinska, M., Rodrigues, P. M., Banales, J. M. & Andersen, J. B. Metabolic terms of such publishing agreement and applicable law.
rearrangements in primary liver cancers: cause and consequences. Nat. Rev. Gastroenterol.
Hepatol. 16, 748–766 (2019). © The Author(s), under exclusive licence to Springer Nature Limited 2024

8 | Nature | www.nature.com
Methods used to process PCR duplicates for mapped BAM files. Somatic muta-
tions, including single-nucleotide variants (SNVs) and small inser-
Patient cohort of CLCA tions and deletions (indels), were called using two methods—Mutect2
Patients with HCC were enrolled from Eastern Hepatobiliary Surgery (v.4.0.11.0)33 and Strelka2 (v.2.8.4)34.
Hospital and Shanghai Zhongshan Hospital during 2017–2020. No For Mutect2, a panel of normals (PON) file was first created and
patients received any preoperative anti-cancer treatment. Each speci- somatic mutations were called by comparing each tumour sample with
men was diagnosed by two senior pathologists. Patients with tissue sam- its matched non-tumour counterpart and the PON file. We filtered any
ples that had sufficient and good-quality DNA were selected. In total, mutations with a ‘fragment_length’, ‘mapping_quality’, ‘strand_artifact’,
samples from 494 patients with HCC were processed for sequencing ‘base_quality’ or ‘read_position’. We selected mutations covered by ≥20
analysis, including WGS (n = 494) and RNA-seq (n = 239). Since this is an reads in the tumour and 10 reads in the normal samples, and excluded
observational study, no statitistical methods are used to predetermine mutations belonging to the ENCODE Data Analysis Consortium black-
sample size and no randomization is performed. The study is not an listed regions. For Strelka2, somatic mutations were called with the
intervention study and therefore blinding is not required. Detailed flag ‘PASS’. We added an additional quality filter to tighten filtering for
clinical information is summarized in Supplementary Table 1. DNA low allelic frequency variants: quality score × allele frequency > 1.3. We
from primary tumours and matched peripheral blood lymphocytes filtered any variant that was supported by three or more reads in the
was obtained. The study protocol was reviewed and approved by the reference sample in at least three patients. We also filtered indels that
institutional review board at Eastern Hepatobiliary Surgery Hospi- were three bases or longer where there was a PON-filtered indel of three
tal and Shanghai Zhongshan Hospital. This study was performed in bases or longer within ten bases in the same sample. The intersection
accordance with the principles of the Declaration of Helsinki. All of of the Mutect2 and Strelka results was used as the final set of somatic
the participants provided written informed consent. All of the samples mutations.
were anonymously coded in accordance with local ethical guidelines.
All research participants consent to the publication of research results. Identification of candidate drivers
We combined P values obtained from independent methods of driver
Cell lines discovery using the empirical Browns method as described in the
For the functional validation of three candidate drivers, the human liver PCAWG study of non-coding drivers5. Three methods of driver dis-
cancer cell lines PLC/PRF/5, PVTT, HepG2, Huh7, SNU387 and SNU182, covery were used for coding regions: MutSigCV35, dndsCV36 and Onco-
and the normal liver cell line HHL5 were obtained from Shanghai Cell driveFML37. We explored potential non-coding drivers by combining
Bank of the Chinese Academy of Sciences. PLC/PRF/5, PVTT, HepG2 four methods: MutSigCV-NC38, NBR5, ActiveDriverWGS39 and Onco-
and Huh7 cells were cultured in high-d-glucose Dulbecco’s modified driveFML37. All drivers were manually checked to filter false-positive
Eagle medium (DMEM, Gibco); and SNU387, SNU182 and HHL5 cells in ‘driver’ loci caused by the sequencing and mapping artefacts, inaccurate
RPMI 1640 medium (basal medium) containing 10% fetal bovine serum background models or local increases in mutations due to mutational
(FBS, Gibco), supplemented with 100 U ml−1 penicillin and 100 μg ml−1 processes that were unaccounted for, as previously reported5.
streptomycin.
For the validation of AA-related mutational signatures, MCF-10A and dN/dS analysis
HepG2 cells were obtained from the American Type Culture Collec- The dN/dS is the ratio between the rates of nonsynonymous and syn-
tion (ATCC). HepG2 cells were cultured as described above. MCF-10A onymous substitutions, and is used for assessing selection in cancer
cells were cultured in DMEM/F12 medium supplemented with 10% FBS, genomes as described previously17,18. In brief, dN/dS ratios can be cal-
10 ng ml−1 insulin, 20 ng ml−1 EGF, 0.5 µg ml−1 hydrocortisone, 50 ng µl−1 culated for different groups of mutations, such as clonal and subclonal
penicillin and 50 U ml−1 streptomycin. All of the cell lines used in this mutations in known cancer genes, yielding insights about the density
study were authenticated by applying short-tandem-repeat DNA pro- of driver mutations in each group of mutations. Using the dndscv R
filing, and were tested to be mycoplasma negative. All of the cell lines package36, dN/dS analysis was run on the clonal and subclonal muta-
were maintained at 37 °C in a humidified incubator with an atmosphere tions. A dN/dS ratio of more than 1 indicates positive selection, whereas
containing 5% CO2. smaller ratios characterize negative selection, and dN/dS ≈ 1 points
toward neutral evolutionary dynamics.
WGS
Fresh frozen tumour tissues and matched peripheral blood were col- TERT promoter mutation
lected from each patient. DNA was isolated using the DNeasy Blood To double check TERT promoter mutations, we performed targeted
& Tissue Kit (Qiagen). RNA was extracted using the RNeasy Mini Kit sequencing of TERT promoter mutations on tumour samples. The
(Qiagen). The DNA concentration was measured using Qubit 3.0 (Inv- library was constructed by two rounds of PCR amplification. The first
itrogen). The size of the DNA was checked using the Fragment Analyzer round of PCR used a barcoded primer targeting the TERT promoter,
(Advanced Analytical Technologies). DNA (200 ng to 1 μg) was sheared which yielded a product of 239 bp. The second round of PCR uses uni-
into fragments of approximately 300 bp using the Covaris S2 (Covaris) versal indexed primers, yielding a 333 bp product. The sequencing
ultrasonicator. The library was constructed using the NEBNext Ultra library was then pooled by mixing the PCR products with the same index
DNA Library Prep Kit for Illumina (New England Biolabs) according but with different barcodes. The library was then processed for quality
to the manufacturer’s protocol. The library (2 × 150 bp paired-end control and sequenced as described for WGS. The average sequencing
reads) was quality-checked and sequenced on the Illumina NovaSeq depth for the region is 378,535×. Data processing was performed the
(Illumina) system. same as for WGS, except that PCR duplicates were not removed.

Mutation calling Mutational signature extraction and assignment


Raw sequencing reads were processed for quality control by trimming We used mSigHdp (v.1.1.2)40 and SigProfilerExtractor from SigProfiler
adapter sequences and removing poly(N) and low-quality reads, after bioinformatics tool suite (v.1.1.0)6 to extract SBS, DBS and ID signa-
which they were preprocessed by FASTP (v.0.13.1) using the follow- tures. For SigProfiler signature extraction, 1,000 iterations were per-
ing parameters: “--cut_by_quality3 -l 50 --correction -g -x”. The FASTQ formed (nmf_replicates = 1000). We report only signatures supported
files were aligned to the human reference genome (hg19/GRCh37) by by both mSigHdp and SigProfiler. A signature was considered to be
Burrows-Wheeler Aligner (BWA, v.0.7.12). Sambamba (v.0.6.8) was supported by both programs if (1) the mSigHdp-extracted signature
Article
has a cosine similarity ≥ 0.90 with a SigProfiler-extracted signature
or (2) the mSigHdp-extracted signature can be reconstructed by SVs
multiple SigProfiler-extracted signatures (reconstruction cosine SVs were called using LUMPY (v.0.2.13) using the default parameters45.
similarity ≥ 0.90). LUMPY simultaneously integrates multiple SV detection signals during
Mutational signature assignment was performed using SV discovery. Both read-pair and split-read signals were considered
mSigAct::MAPAssignActivity (v.2.2.3). The prior proportion of each within the LUMPY framework, achieving a relatively high detection
mutational signature was estimated based on the preliminary assign- sensitivity for SVs. The detected SVs were further used for analysis of
ment by mSigHdp. We then performed ‘Ward.D’ hierarchical clustering clustered mutational processes, including kataegis, chromothripsis
on the Euclidean distances between signature assignments. For simpli- and chromoplexy.
fication, we combined SBS_H1, SBS_H4, SBS_H14, SBS_H16 and SBS_H17,
which were similar to or splits of ageing-related COSMIC signatures Kataegis
SBS1, SBS5 and SBS40, as the ‘Ageing’ SBS signature. Kataegis is a focal hypermutation process that leads to locally clustered
point mutations25. Kataegis events are defined as genomic segments
Comparison of extracted signatures to COSMICv3.2 signatures containing six or more consecutive mutations with an average intermu-
An extracted signature was confirmed as a known signature if (1) it was tation distance of less than or equal to 100 bp. Rainfall plots containing
similar to a COSMICv3.2 signature (cosine similarity ≥ 0.90); (2) it could kataegis were plotted by the rainfallPlot function with detectChange-
be reconstructed by multiple COSMICv3.2 signatures (reconstruction Points set to TRUE from the R package Maftools (v.2.6.05)46.
cosine similarity ≥ 0.90); or (3) it could be reconstructed into a COS-
MICv3.2 signature by combining it with other extracted signatures Chromothripsis
(reconstruction cosine similarity ≥ 0.90). Steps (2) and step (3) were Chromothripsis is characterized by massive genomic rearrangements
evaluated using mSigAct::OptimizeExposureQP. Pentanucleotide con- exhibiting oscillations between two copy-number states23. Chromoth-
text analysis includes 2 bp before and after the mutation. As for SBS_H8, ripsis was inferred using the R package ShatterSeek (v.0.4)26. In brief, it
the low overall cosine similarity of 0.71 between SBS_H8 and SBS22 lead first uses intrachromosomal SVs to detect clusters of interleaved rear-
us to further perform a comparison of pentanucleotide context. The rangements. Next, it evaluates a set of statistical criteria in each of these
low pentanucleotide cosine similarity of 0.61 further revealed substan- regions. The output consists of a data frame reporting the value for the
tial differences between SBS_H8 and COSMIC SBS22. Specifically, T>A statistical criteria used and additional information for each chromo-
mutations of SBS_H8 were enriched in the NCxGG context, while that some. Candidate chromothripsis regions were visually inspected with
of SBS22 had a rather dispersed NCxGN context. the local SVs and CN profiles. For the minimum number of oscillating
CN segments, we used two thresholds: high-confidence calls display
Cell culture to validate the new mutational signatures oscillations between two states in at least seven adjacent segments,
Exposure of HepG2 and MCF-10A cells was performed as previously whereas low-confidence calls involve between four and six segments.
described41. In brief, HepG2 cells were exposed to 20 µM of aristolochic
acid 1 (AA1, A5512, Sigma-Aldrich) for 2 months, whereas MCF-10A Chromoplexy
cells were exposed to 20 µM or 40 µM AA for the same length of time. Chromoplexy results from several simultaneous double-stranded DNA
After 2 weeks of recovery and expansion, single-cell cloning was breaks in several chromosomes that are rejoined incorrectly, leading
performed using flow cytometry. Random clones were selected and to balanced chains of rearrangements7,24. Chromoplexy was inferred
expanded for DNA isolation and WGS. For MCF-10A cells, we sequenced by ChainFinder (v.1.0.1), an algorithm for identifying complex sets
two clones from the cells exposed to 20 µM AA (clones 1 and 2), as of DNA rearrangements and deletions in cancer genomes that may
well as one clone from the cells exposed to 40 µM AA (clone 3). For reflect coordinate chromosomal alterations24. In brief, ChainFinder first
HepG2 cells, we sequenced all three clones from the cells exposed to models the expected chromosomal distribution of breakpoints from
20 µM AA. independently arising rearrangements. The algorithm then profiles
user-provided copy-number and SV data for sets of rearrangements
CNA and associated gene deletions that are unlikely to have arisen inde-
Sequenza (v.2.1.1) was used to call CNAs, taking both ploidy and cellu- pendently based on their deviation from the predicted distribution.
larity into account42. In brief, we used BAM files of tumour and paired
normal samples as an input to calculate the depth ratio, which was HBV integration
normalized using the GC content bias and data ratio. To acquire seg- We first aligned all reads against a comprehensive list of HBV virus ref-
mented copy numbers and estimate cellularity and ploidy, the following erence sequences as described previously (n = 73)14. We next searched
parameters were used: breaks.method = ‘full’, gamma = 40, kmin = 5, for human–virus chimeric reads, where one end or one part of the read
gamma.pcf = 200, kmin.pcf = 200. For each tumour sample, the copy was mapped to the human genome, while the other end or the left part
numbers of segments were divided by ploidy after log2 transformation. of the read was mapped to the viral reference genome, because these
After filtering out segments smaller than 500 kb, copy-number states reads indicate HBV integration into the human genome. Adjacent or
were determined for each segment. Copy-number gains and losses were overlapping chimeric reads (within 500 bp) aligning to the human and
defined as at least one copy more and one copy less than the estimated viral genomes in the same orientation were merged to make clusters.
ploidy, respectively. Clusters with at least two chimeric reads were retained. The integration
PURPLE (PURity & PLoidy Estimator; v.2.34) was also performed on sites were then compared to RefSeq gene boundaries to find genes that
paired tumour-normal WGS data as described previously43. There are were directly disrupted by HBV integration (overlapping) or potentially
five key steps in the PURPLE pipeline, as follows: (1) calculate the tumour affected by integration (within 15 kb of integration sites). HBV fusion
B-allele frequency at high-confidence heterozygous germline loci; was also detected using the RNA-seq data with STAR-Fusion47.
(2) determine read-depth ratios for tumour and reference genomes;
(3) segmentation; (4) purity fitting; (5) smoothing. A number of rules Detection of ecDNA
were further applied to merge adjacent regions to create a smooth ecDNA-based amplification has been recognised as a way for tumour
copy-number profile. GISTIC2.0 (Genomic Identification of Signifi- cells to increase the copy number of oncogenes22,48,49. AmpliconArchi-
cant Targets in Cancer v.2.0.23) was used to identify focal gain and tect (v.1.3.r2) was used to detect ecDNA22. In brief, aligned reads of
loss regions44. regions with CN greater than five were used as seeds. The default
parameters were used. Given mapped reads, AmpliconArchitect constructed using the TruSeq mRNA Library Prep Kit (Illumina) accord-
automatically searches for other intervals participating in the ampli- ing to the manufacturer’s protocol. The library (2 × 150 bp paired-end
con, and then uses a combination of CNV and SV analysis. Amplicon- reads) was then quality checked and sequenced using the Illumina
Architect uses structural variant signatures (for example, discordant NovaSeq (Illumina) system. Qualified reads were obtained after remov-
paired-end reads and CNV boundaries) to partition all intervals into ing raw reads with adapters or of low quality and then aligned to the
segments and build an amplicon graph. It assigns CNs to the seg- human genome (hg19) using STAR (v.2.7.3c)52. The transcripts per mil-
ments by optimizing a balanced flow on the graph. We then used the lion (TPM) values and gene count values were computed using RSEM
AmpliconArchitect-derived breakpoint graph to classify amplicons (v.1.3.3). Fusion genes were detected using STAR-Fusion47.
into four categories using AmpliconClassifier (v.0.2.5): (1) circular
amplification; (2) breakage–fusion–bridge amplification; (3) heavily Stable cell line construction
rearranged amplification; and (4) linear amplification as described21. Three representative candidate drivers were selected for functional
Circular amplicons were considered to be ecDNA. validation, including 3′ UTR PPP1R12B, KCNJ12 promoter and FGA. 3′
UTR PPP1R12B was among the earliest candidate driver events dur-
Validating ecDNA with CIRCLE-seq ing HCC evolution while the KCNJ12 promoter was among the latest
CIRCLE-seq is a sequencing library enrichment approach optimized driver events. Particularly, FGA, was determined independently as
for circular DNA detection50,51 and was performed on selected cases. A both candidate coding and non-coding drivers (3′ UTR). Knockdown
detailed protocol for circular DNA isolation is available on the Nature by short hairpin RNA (shRNA) or knockout by short guide RNA (sgRNA)
Protocol Exchange server (https://doi.org/10.1038/protex.2019.006). for the three drivers were constructed on a total of seven human cell
Amplified circular DNA was sheared to an average fragment size of lines, including six liver cancer cell lines (PLC/PRF/5, PVTT, SNU387,
150–200 bp using the S220 focused ultrasonicator (Covaris). Libraries SNU182, Huh7 and HepG2) and one normal liver cell line (HHL5). The
for next-generation sequencing were prepared using the NEBNext Ultra lentiviruses of shRNA or sgRNA targeting the above three genes were
DNA Library Kit for Illumina according to the manufacturer’s protocol obtained (Supplementary Table 7) and transfected into cell lines as
(New England Biolabs). Sequencing data generated by CIRCLE-seq were indicated.
aligned and processed. The aligned BAM files were then analysed in two For PPP1R12B, disrupted cell lines were constructed using two inde-
ways. First, all read pairs and split reads containing any outward-facing pendent shRNAs (1 and 2) in PLC/PRF/5, PVTT, SNU387 and HHL5 cells,
read orientation, indicating potential circles, were placed into a new and by two independent sgRNAs (1 and 2) in HepG2 and Huh7 cells.
BAM file. Second, genomic segments enriched for signal over back- For KCNJ12, disrupted cell lines were constructed by two independent
ground were detected in the ‘all reads’ BAM file using variable-width shRNAs (1 and 2) in SNU182 and HepG2 cells, and by two independent
windows from Homer v.4.11 findPeaks, and the edges of these enriched sgRNAs (1 and 2) in PLC/PRF/5, PVTT, SNU387 and HHL5 cells. For FGA,
regions were intersected with the ‘circle only’ BAM file to quantify the disrupted cell lines were constructed by two independent shRNAs
number of circle-supporting reads. To determine the thresholds for (1 and 2) in PLC/PRF/5, PVTT, SNU387 and SNU182 cells, and by two
significance of real circles versus background noise, matched WGS data independent sgRNAs (1 and 2) in Huh7 cells. Scramble shRNA was used
were used to determine the background distribution of circle-oriented as a control (shCtrl).
reads in non-circle-enriched regions that were matched for length We failed to knock in the detected non-coding 3′ UTR mutations
and nucleotide composition. An empirical P value of 0.01 was used to of FGA using either Prime Editing technology or other base editors,
filter putative circles, and regions passing this filter were then used such as cytosine base editor (CBE) or the adenine base editor (ABE). To
for downstream analysis. confirm the functional role of FGA 3′ UTR mutations, we first knocked
out endogenous FGA by sgRNA in the HepG2 cell line and then induced
Inferring clonality and evolutionary history ectopic stable expression of mutant 3′ UTR with the wild type as a con-
The evolutionary history of our CLCA cohort was determined as previ- trol (Extended Data Fig. 10e). Overexpression lentiviruses containing
ously described8. In basic terms, clonal mutations occurred before the FGA 3′ UTR mutation with the wild type as a control were constructed
emergence of the most-recent common ancestor, whereas subclonal by Ubigene Biosciences and were ectopically stably expressed in HepG2
mutations occurred after this event. In regions with copy-number gains, single-cell clones without endogenous FGA. In total, 2.5 × 105 cells were
molecular time can be further divided according to whether mutations plated into six-well plates, incubated overnight and transfected with
preceded the copy-number gain (and were themselves duplicated) or lentiviral particles (multiplicity of infection of 10) the next day. At
occurred after the gain and were therefore present on only one chro- 12–24 h after transduction, the medium was replaced with complete
mosomal copy. In brief, the variant allele frequencies (VAFs) of somatic culture medium for 72 h, and the stable knockdown or knockout cell
point mutations cluster around the values imposed by the purity of the lines were sorted by flow cytometry or puromycin.
sample and local copy-number states. On the basis of this information,
subclonal populations were identified, the timing of copy number gains Prime editing
and point mutations was inferred, and the relative timing of somatic Endogenous point mutants (FGA, PPP1R12B and KCNJ12) were intro-
driver events was deduced. We next inferred the timing of mutational duced using the Prime Editing (PE) technology in the HepG2 cell line.
signatures. We inferred the mutational history of our CLCA cohort by In brief, plasmid expression of prime editing guide RNAs (pegRNAs) or
integrating these timing data across 494 patients. We also divided these nicking sgRNAs were cloned using Golden Gate assembly as previously
patients according to aetiological factors, such as smoking and drink- described53. pegRNA was cloned into pU6-tevopreq1-GG-acceptor
ing, and compared the evolutionary history of the different groups. plasmid (Addgene) with an inserted EF1α promoter and puromycin-
Key packages used for timing analysis, including PyClone (v.0.13.1), resistance cassette. Nicking sgRNA used for PE3 or PE3b was cloned
MutationTimeR (v.0.99.3), and PhylogicNDT (v.1.0), are available at the into BPK1520 (Addgene). In total, 3 × 105 cells were plated into a 24-well
PCAWG GitHub repository (https://github.com/PCAWG-11/Evolution). plate overnight, and transfected at approximately 80–90% conflu-
ency with 1,000 ng pCMV-PEmax-P2A-BSD plasmid (Addgene), 500 ng
RNA-seq pCMV-hMLH1dn plasmid (Addgene), 333 ng pegRNA plasmid and
RNA in the tumour samples was extracted using the RNeasy Mini 111 ng nicking sgRNA plasmid by Lipofectamine 3000 (Invitrogen)
Kit (Qiagen). The DNA and RNA concentration was measured using according to the manufacturer’s instructions. Cells were cultured
Qubit 3.0 (Invitrogen). The size of RNA was checked using Fragment and sorted by puromycin and blasticidin. Genomic DNA from edited
Analyzer (Advanced Analytical Technologies). RNA-seq libraries were clones was extracted, and the targeting region was amplified by PCR
Article
and sequenced on ABI 3730XL (Thermo Fisher Scientific). A list of all At the indicated time, 10 μl of CCK-8 solution was added to each
of the pegRNAs, nicking sgRNAs and primer sequences is provided in well, and the plates were incubated in the dark at 37 °C for 1–2 h. The
Supplementary Table 7b,c. spectrometric absorbance of each well at 450 nm was measured
using the Synergy Neo microplate reader (BioTek). Data were normal-
RT–qPCR analysis ized to day 0, and the results are presented as the fold change over the
Total RNA from HCC cell lines was isolated using TRIzol Reagent control samples.
(Invitrogen) according to the standard instructions. RNA was reverse
transcribed into first-strand cDNA using 1 μl of random hexamers Colony-formation assay
(Bio-Light Biotech), 1.25 μl Recombinant RNasin Ribonuclease Inhibi- The self-renewal ability of cells was determined using a colony-formation
tor (Promega), 1 μl 4 × dNTP Mixture (Bio-Light Biotech), 1 μl M-MLV assay. Cells were plated in six-well plates at a density of 1.5–4 × 103 cells
reverse transcriptase (Promega) and 5 μl M-MLV RT 5× buffer (Pro- per well and cultured in complete medium at 37 °C for 9–21 days. The
mega). qPCR was performed using the ChamQ SYBR Colour qPCR medium was replaced every 3 days. After the culture period, the cells
Master Mix (Vazyme) on the LightCycler 96 PCR platform (Roche). were fixed with 4% paraformaldehyde for 15 min and stained with 0.1%
A list of the sequences of the specific RT–qPCR primers is provided in (w/v) crystal violet for 15 min. Cell confluence in each well was quanti-
Supplementary Table 7c. The cycling conditions were as follows: 95 °C fied, and the results are presented as the fold change over the control
for 10 min, 45 cycles of 95 °C for 10 s, and 60 °C for 30 s. The results samples.
were normalized to ACTB (encoding β-actin) mRNA expression and
analysed using the 2−ΔΔC t method. Cell migration and invasion assay
Migration and invasion assays were performed as previously described
Western blotting using 8-μm-pore-size Transwell chambers (Greiner Bio-one; Falcon;
Total protein from frozen tissue samples and cell lines was lysed in RIPA Costar). For cell migration assays, 2–10 × 104 cells prepared in FBS-free
lysis buffer (Strong) (Yesen) in the presence of 1% protease inhibitors medium were seeded onto the upper chambers, while, the lower cham-
and phosphatase inhibitors (Yesen). The concentration of protein was ber was filled with 750 μl conditioned medium containing 10–30% FBS.
assessed using the Pierce BCA Protein Assay Kit (Thermo Fisher Scien- For cell invasion assays, Matrigel-coated Transwell chambers were
tific) according to the manufacturer’s protocol. Equal amounts of total purchased from Corning and homemade (the chamber inserts were
protein were separated by 8% SDS–PAGE and transferred onto preacti- precoated with appropriate proportion Matrigel (Corning) for approxi-
vated poly vinylidene fluoride membranes (Millipore). The blots were mately 2 h in a 37 °C incubator). Next, cell suspension (2.5–10 × 104 cells)
incubated with the appropriate primary antibodies (4 °C, overnight) diluted in FBS-free medium was seeded to the upper chamber and
against β-actin (AC004, AMC0001, 1:5,000; ABclonal), GAPDH (AC033, conditioned media with 10–30% FBS was added to the bottom chamber
AMC0062, 1:5,000; ABclonal), TYK2 (9312, 1:1,000; Cell Signaling Tech- of the Transwell. After incubation 24–96 h, cells on the upper surface
nology), phosphorylated-TYK2 (Tyr1054/1055) (D7T8A) (68790, D7T8A, of the membrane were removed with cotton tips. Cells that attached
1:1,000; Cell Signaling Technology), phosphorylated-STAT3 (Tyr705) to the lower surface were fixed in 4% paraformaldehyde and stained
(D3A7) XP (9145, D3A7, 1:2,000; Cell Signaling Technology), lamin A/C with 0.1% crystal violet for 15 min. Excess dye was removed by washing
(4C11) (4777, 4C11, 1:2,000; Cell Signaling Technology), fibrinogen-α the stained cells with water, after which they were examined using the
(C-7) (sc-398806, C-7, 1:500; Santa Cruz Biotechnology), STAT3 (60199- Olympus IX73 microscope equipped with an DP80 camera. For inhibitor
1-Ig, 3G2D12, 1:2,000; Proteintech). Next, the bands were incubated treatment experiment, PLC/PRF/5-shCtrl and PLC/PRF/5-shFGA cells
with HRP-conjugated goat anti-rabbit IgG (H+L) (SA00001-2, 1:5,000; were pretreated with pTYK2 inhibitor (BMS-986165, 10 μM) or two AKT
Proteintech) and goat anti-mouse IgG (H+L) (SA00001-1, 1:5,000; Pro- inhibitors (MK-2206, 2 μM; AZD5363, 10 μM), and those inhibitors at
teintech) or fluorescently labelled IRDye 800CW goat anti-rabbit IgG the same concentration were added to the top and bottom chambers
(H+L) (926-32211, 1:20,000; LI-COR) and IRDye 800CW goat anti-mouse simultaneously. The results are presented as the fold change over the
IgG (H+L) (926-32210, 1:20,000; LI-COR) secondary antibodies (room control samples.
temperature, 2 h) listed in Supplementary Table 7d. Immunoreactive
bands were detected using the Touch Imager XLi system (e-BLOT Life Subcutaneous xenograft
Science) or Odyssey Sa Infrared Imaging System (LI-COR Biosciences). BALB/c nude mice (aged 5–7 weeks) were obtained from GemPharmat-
For the expression of different proteins in the same blots, partly blot- ech. All of the mice were housed in specific-pathogen-free conditions at
ted membranes were incubated with western blot fast stripping buffer an ambient temperature of 20–26 °C and a humidity of 30–70% under
(EpiZyme) followed by several washes and treated as mentioned above. a 12 h–12 h light–dark cycle before use. Mice had unrestricted access to
The band intensity of western blots was assessed using ImageJ (v1.53a). regular mouse chow and water. Body-weight-matched mice were ran-
β-Actin, GAPDH and lamin A/C were used as references. domized for subcutaneous injection into treatment groups. Blinding
was not required. We subcutaneously injected 2 × 106 PLC/PRF/5-shCtrl
Subcellular fractionation or PLC/PRF/5-shFGA cells within 100 μl of PBS/Matrigel (3:2) into the
Nuclear and cytoplasmic fractions of cells were prepared using the flanks of nude mice (shCtrl, n = 6; shFGA, n = 7). For the sample size, a
Nuclear and Cytoplasmic Extraction Reagents Kit (Beyotime), as minimum of three mice for each group of the PLC/PRF/5-shCtrl and PLC/
well as protease inhibitors, phosphatase inhibitors and phenylmeth- PRF/5-shFGA cells was required to reach statistical significance. Prelimi-
anesulfonyl fluoride. In brief, cells were washed with precooled 1× nary subcutaneous xenograft experiments were performed on male
phosphate-buffered saline (PBS), after which the cytoplasmic protein and female mice, respectively. Similar trends of shFGA cells resulted
samples were collected using Cytoplasmic Protein Extraction Reagent, in larger and more aggressive tumours in comparison to those of mice
which disrupted the plasma membranes, leaving the nuclear mem- injected with shCtrl cells were observed. To exclude the potential con-
branes intact. Nuclear proteins were isolated from the remaining pellet founding factors of aggression and biting in the male groups, only the
using nuclear protein extraction reagent, followed by western blotting. female groups were retained and recorded. The tumour width (w) and
length (l) were measured every 3 days using callipers and the diam-
Cell proliferation assay eter of a single tumour was <2 cm at the time of euthanasia. Tumour
In vitro cell proliferation was assessed using the Cell Counting Kit-8 volume (V) was calculated individually using the following formula:
(CCK8, DOJinDO) according to the manufacturer’s protocol. In brief, V = (w2 × l) × 0.52. Tumour tissues were embedded in paraffin wax and
the cells were seeded in 96-well plates (0.75–2 × 103 cells per well). cut in slices, followed by immunohistochemistry (IHC). All of the mouse
experiments were approved by the Animal Care and Use Committee at set, and was discarded. Next, the phosphorylation ratio between
Eastern Hepatobiliary Surgery Hospital. groups was measured using the following formula: phosphorylation
ratio = [phosphorylation ratio of the shFGA group (phosphorylation
Immunohistochemistry analysis value/unphosphorylation value)]/[phosphorylation ratio of the shCtrl
We collected HCC tissue microarrays (n = 39) and tumour samples group]. Finally, candidate dysregulated phosphorylated proteins were
from xenograft mouse models, which were fixed in 10% neutral for- selected by identifying proteins with a phosphorylation ratio of >1.5
malin, embedded in paraffin and cut into 3–5 μm sections on charged or <0.667 in shFGA cells versus shCtrl cells.
glass slides. After deparaffinization, rehydration, blocking endog-
enous peroxidase and heat-induced antigen retrieval, the sections Statistical analysis
were incubated overnight with primary antibodies at 4 °C includ- Statistical analyses were performed using R (v.3.6.0) and GraphPad
ing, anti-fibrinogen alpha chain (20645-1-AP, 1:100; Proteintech), Prism (v.9.0). n denotes biologically independent samples unless oth-
anti-phosphorylated-TYK2 (Tyr1054/1055) (D7T8A) (68790, D7T8A, erwise specified. The data are presented as the mean ± s.e.m. unless
1:100; Cell Signaling Technology) and anti-Ki-67 (ab15580, 1:500; otherwise specified. For box plots in all panels, the centre line shows
Abcam). HRP-conjugated anti-rabbit (D-3002; Supervision) second- median, the box limits indicate the upper and lower quartiles, and the
ary antibodies were added for 30 min at 37 °C. The slides were visu- whiskers extend to 1.5× the interquartile range, and data beyond the
alized using the Liquid DAB+ Substrate Chromogen System (DAKO) end of the whiskers are outlying points that are plotted individually. All
and counterstained with haematoxylin. IHC slides were scanned using P values were calculated using two-sided analysis. Unpaired Student’s
the Leica Aperio AT2 system, and images were analysed using Aperio t-tests were used unless otherwise specified. P < 0.05 or Q < 0.1 was
ImageScope (v.12.4.6). considered to be significant.

IL-6 concentration Reporting summary


The IL-6 concentration in the cellular supernatant in the PLC/PRF/5 and Further information on research design is available in the Nature Port-
PVTT cell lines was quantified using the human IL-6 ELISA kit (RayBio- folio Reporting Summary linked to this article.
tech) according to the manufacturer’s instructions using a Synergy Neo
microplate reader (BioTek). The IL-6 concentration in patient tissue
was measured using the S-PLEX Human IL-6 Kit (Meso Scale Discovery) Data availability
according to the manufacturer’s protocol. In brief, all protein samples The raw sequencing data reported in this paper have been deposited
were prediluted 2–5×. The plates were then assembled, enhanced and at the Genome Sequence Archive in BIG Data Center, Beijing Institute
read using the 1300 MESO QuickPlex SQ 120MM instrument, which of Genomics (BIG), Chinese Academy of Sciences, under the study
recorded electrochemiluminescence and analysed with the DISCOVERY accession number PRJCA002666. We also built an interactive website
WORKBENCH Desktop Analysis Software (v.4.0). The concentrations (http://lifeome.net/database/liver) for visualizing and analysing our
of IL-6 in the samples were interpolated against a standard curve. CLCA data. The data deposited and made public are compliant with the
regulations of the Ministry of Science and Technology of China. Other
FGA expression public data used in this study include the human reference genome
To compare the FGA protein level between tumours and matched hg19/GRCh37 (https://ftp.ensembl.org/pub/grch37/), PCAWG data
normal tissues, western blotting and immunohistochemistry were (https://dcc.icgc.org/pcawg/#!), TCGA-HCC data (https://portal.gdc.
performed on 47 and 39 tumour–normal tissue pairs, respectively. cancer.gov/projects/TCGA-LIHC) and COSMIC signatures (https://can-
To examine the association among FGA, pTYK2 (Tyr1054) and IL-6 at cer.sanger.ac.uk/signatures/). Source data are provided with this paper.
the protein level, western blots of FGA and pTYK2 (Tyr1054) and elec-
trochemiluminescence signals of IL-6 concentration were assessed in
75 and 71 patients with HCC, respectively. The IL-6 concentration was Code availability
not available in 4 out of 75 patients. The relative intensity of FGA and The Linux working environment that we used is packed into a Singular-
pTYK2 were normalized to β-actin. ity container file and is available at Zenodo (https://doi.org/10.5281/
zenodo.7260221). The detailed codes and instructions for all software
Phospho-specific protein microarray have been deposited at GitHub (https://github.com/ChongJennifer-
The Phospho Explorer Antibody Array (PEX100) was obtained from Full Zhang/CLCA_WGS).
Moon Biosystems and used according to the manufacturer’s protocol.
Each of the antibodies printed on the coated glass microscopy slide has 33. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and
heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
two replicates along with multiple positive and negative controls. The 34. Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat.
phospho-antibody array contained 1,318 site-specific antibody profiles, Methods 15, 591–594 (2018).
of which 584 were pairs of phosphoproteins and their unphosphoryl- 35. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-
associated genes. Nature 499, 214–218 (2013).
ated counterparts. Lysates of PLC/PRF/5-shCtrl and PLC/PRF/5-shFGA 36. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell
cells were applied to Phospho Explorer Antibody Arrays, which were 171, 1029–1041 (2017).
applied and analysed by OE Biotechnology. Next, the protein samples 37. Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N.
OncodriveFML: a general framework to identify coding and non-coding regions with
were biotinylated and hybridized according to the manufacturer’s pro- cancer driver mutations. Genome Biol. 17, 128 (2016).
tocol. The fluorescence intensity of each antibody spot was obtained 38. Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour
using the GenePix 4000B Microarray Scanner (Molecular Devices) and types. Nature 505, 495–501 (2014).
39. Zhu, H. et al. Candidate cancer driver mutations in distal regulatory elements and long-
analysed using GenePix Pro v.6.0. range chromatin interaction networks. Mol. Cell 77, 1307–1321 (2020).
For data analysis, background signals were first removed from 40. Liu, M., Wu, Y., Jiang, N., Boot, A. & Rozen, S. G. mSigHdp: hierarchical Dirichlet process
all measurements. Second, for each antibody, the respective nega- mixture modeling for mutational signature discovery. NAR Genom. Bioinform. 5, lqad005
(2023).
tive control value was removed from each measurement. Third, if a 41. Boot, A. et al. In-depth characterization of the cisplatin mutational signature in human
phosphorylation site did not satisfy the requirements ((1) for each cell lines and in esophageal and liver tumors. Genome Res. 28, 654–665 (2018).
phosphorylated site, two replicates showed the same pattern between 42. Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor
sequencing data. Ann. Oncol. 26, 64–70 (2015).
shCtrl and shFGA cells; (2) CV < 0.3 for the indicated group with a higher 43. Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature
phosphorylation level), that site was considered to be a discrete point 575, 210–216 (2019).
Article
44. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets Program (20230434854), Program of Shanghai Academic/Technology Research Leader
of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011). (21XD1404600), the National Key Research and Development Program of China
45. Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for (2022YFC3400902 and 2022YFC2504602), and the New Cornerstone Science Foundation
structural variant discovery. Genome Biol. 15, R84 (2014). through the XPLORER PRIZE. Figure 1a and Extended Data Fig. 10s were created using
46. Mayakonda, A., Lin, D.-C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and BioRender with an academic license.
comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747–1756 (2018).
47. Haas, B. J. et al. Accuracy assessment of fusion transcript detection via read-mapping Author contributions L.C., C.Z., R.X., L.W., F.B., S.G.R. and H.W. conceived and designed the
and de novo fusion transcript assembly-based methods. Genome Biol. 20, 213 (2019). project. L.C., Z.L., B.Z., K.L., Y. Zhu, S.Y. and Q.G. collected the clinical samples. C.Z., R.X., M.L.,
48. Turner, K. M. et al. Extrachromosomal oncogene amplification drives tumour evolution J. Bai, Yin Wang, R.W., A.Y. and Yan Wang analysed the WGS and RNA-seq data. S.G.R., M.L.,
and genetic heterogeneity. Nature 543, 122–125 (2017). N.J., C.Z. and R.X. performed mutational signature analysis. L.C., J. Bao, W.W., J.H., S.S.,
49. deCarvalho, A. C. et al. Discordant inheritance of chromosomal and extrachromosomal Y. Zhang and M.B. performed functional validation of candidate drivers. R.X., C.Z., J. Bai, L.C.
DNA elements contributes to dynamic disease evolution in glioblastoma. Nat. Genet. 50, and J.G. designed and built the CLCA website. C.Z., R.W. and N.J. built the Zenodo and GitHub
708–717 (2018). pages. R.X., L.C., C.Z. and J. Bao integrated the sequencing and experimental data, drew the
50. Tsai, S. Q. et al. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR– display items and wrote the manuscript. F.B., L.W., D.G., X.W.W., N.Z., H.N., S.G.R. and H.W.
Cas9 nuclease off-targets. Nat. Methods 14, 607–614 (2017). provided edits to the manuscript. L.C., L.W., F.B. and H.W. oversaw the ethical guidelines and
51. Koche, R. P. et al. Extrachromosomal circular DNA drives oncogenic genome remodeling data regulation. L.C., L.W., F.B., S.G.R. and H.W. supervised the project. All of the authors
in neuroblastoma. Nat. Genet. 52, 29–34 (2020). contributed to the final version of the paper.
52. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
53. Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks
Competing interests The authors declare no competing interests.
or donor DNA. Nature 576, 149–157 (2019).

Additional information
Acknowledgements We thank D. Li, S. Yin and C. Zhang for their support in gene editing and Supplementary information The online version contains supplementary material available at
the members of the Shanghai Key Laboratory of Hepato-biliary Tumour Biology and the Key https://doi.org/10.1038/s41586-024-07054-3.
Laboratory of Signaling Regulation and Targeting Therapy of Liver Cancer (SMMU) for their Correspondence and requests for materials should be addressed to Lei Chen, Lin Wu,
technical support. This work was supported by the National Natural Science Foundation of Steven G. Rozen, Fan Bai or Hongyang Wang.
China (81988101, T2125002, 82322047, 82241230, U21A20376, 81830054, 82173035, 82141103 Peer review information Nature thanks Lewis Roberts and the other, anonymous, reviewer(s)
and 82341007), the Innovation Program of Shanghai Municipal Education Commission for their contribution to the peer review of this work.
(21JC1406600 and 22140901000), Beijing Natural Science Foundation (Z220014), Beijing Nova Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | See next page for caption.
Article
Extended Data Fig. 1 | Comparison of CLCA with other HCC cohorts. test performed with the Benjamini–Hochberg method. A threshold of Q < 0.1 was
a, Comparison of clinical information between CLCA and PCAWG-HCC. DP, used for significance and denoted in blue. h, Two-sided Spearman correlation
double positive of HBV and HCV; DN, double negative of HBV and HCV. between the ratio of clonal drivers and tumour purity across all CLCA samples.
b, Sequencing depth of 494 tumours and their matched normal controls in The grey shaded area represents the 95% confidence interval. i, The dN/dS
CLCA. c, Relationships among driver genes using the DISCOVER mutual ratios for clonal and subclonal SNVs in 23 cancer coding drivers across our
exclusivity test. d-e, Venn plot showing the comparison of potential driver CLCA cohort. n denotes the total number of mutations for each category
genes identified in the TCGA-HCC, PCAWG-HCC, and our CLCA cohort. collected from 494 individual tumours. Centre points denote dN/dS values for
*Potential non-true drivers curated by PCAWG-HCC. f-g, Comparison missense, nonsense, splice site, and all mutations. Error bars denote the 95%
of frequency of potential drivers between CLCA and PCAWG-HCC (f) and confidence intervals. Red dashed line denotes dN/dS value of 1. j, Workflow for
TCGA-HCC (g), respectively. Two-sided Fisher’s exact test, multiple hypothesis mutational signature analysis in CLCA.
Extended Data Fig. 2 | Profiles of all mutational signatures in CLCA. classification of each mutation subtype in each plot. The cosine similarity
Mutational profiles of all signatures. SBS (single base substitution), DBS between each signature and its matched COSMICv3.2 signature is indicated.
(doublet base substitution), and ID (small insertion and deletion). Magnified Novel signatures are labelled in red.
versions of signatures SBS_H1, DBS_H1 and ID_H1 are shown to illustrate the
Article

Extended Data Fig. 3 | See next page for caption.


Extended Data Fig. 3 | Analysis of mutational signatures. a, Signature profiles exact tests with Benjamini-Hochberg correction for multiple comparison.
of SBS_H8, DBS_H2, and ID_H3 extracted by both mSigHdp and SigProfiler. A threshold of Q < 0.1 was used for significance. i, Bar plots comparing selected
b, Comparison of the pentanucleotide context of SBS_H8, SBS_H2, and variables that had significant differences between groups. Blue denotes
AA-exposed cell lines. The red square highlights the pentanucleotide context mutation or yes. Grey denotes wildtype or no. Two-sided Chi-square test.
of T > A mutations enriched in SBS_H8 compared to SBS_H2. c, Correlation j, Boxplots comparing the contributions of SBS_H8 across five subgroups.
between the numbers of mutations associated with SBS_H2, DBS_H2, and n denotes biologically independent samples. For boxplots, centre line shows
ID_H3. d, Mutational profile of DBS_H1. e, Mutational profile of ID_H8 related median, box limits indicate upper and lower quartiles, and whiskers extend
to aflatoxin. f, Correlation between numbers of DBS_H1 mutations and age for 1.5 times the interquartile range, while data beyond the end of the whiskers are
involved patients. g, Correlation between numbers of ID_H8 mutations and outlying points that are plotted individually. Two-tailed Student’s t-test. k, OS
SBS_H3 for involved patients. h, Unsupervised clustering based on the and DFS of CLCA cases stratified into SBS_H8-high and SBS_H8-low groups
proportions of SBS, DBS, and ID mutations across tumours results in five by the median value. Log-rank test. For c, f, and g, ρ and P values are from a
subgroups. Selected clinical variables are also listed. The P values indicate two-sided Spearman correlation test.
significant nonrandom distributions for each attribute. Two-sided Fisher’s
Article

Extended Data Fig. 4 | See next page for caption.


Extended Data Fig. 4 | Mutational signature attributions. a, mSigHdp splits extracted SBS40-split signature (SBS_H4, SBS_H16 and SBS_H17) or of SigProfiler
COSMIC SBS5 into three components: SBS_H1, SBS_H14 and SBS_H17. They extracted SBS40-split signature (SBS96E, SBS96M and SBS96I). e-f, Stacked
together recapitulate the pattern of SBS5. SigProfiler splits COSMIC SBS5 bar plots showing the contributions of SBS mutational processes, coloured as
into four components: SBS96C, SBS96E, SBS96I and SBS96M. They together shown in Extended Data Fig. 3h, to coding driver mutations (e) and noncoding
recapitulate the pattern of SBS5. b, Correlation between patient age with driver mutations (f). g, Stacked bar plot shows the contribution of mutational
mutation numbers of mSigHdp extracted SBS5-split signature (SBS_H1, SBS_ processes to hotspot mutations (chromosome: position: the total number of
H14 and SBS_H17) or of SigProfiler extracted SBS5-split signature (SBS96C, patients with mutations at this particular genomic hotspot). Gene names are
SBS96E, SBS96I and SBS96M). c, mSigHdp splits COSMIC SBS40 into three given with amino acid alterations for protein-coding genes. h, Enrichment of
components: SBS_H4, SBS_H16 and SBS_H17. They together recapitulate the mutational signatures with clonal status. Potential aetiology and related
pattern of SBS40. SigProfiler splits COSMIC SBS40 into three components: COSMIC signatures are annotated for each signature. Two-sided Chi-square
SBS96E, SBS96M and SBS96I. They together recapitulate the pattern of SBS40. test. For b and d, ρ and P values are from a two-sided Spearman correlation test.
d, Correlation between patient age with mutation numbers of mSigHdp
Article

Extended Data Fig. 5 | Survival, CNAs, HBV integrations and ecDNA. a, Multi- oncogenes in ecDNA compared with these not in ecDNA. In e-f, n denotes
variate analysis for OS and DFS. Multivariate Cox analysis was performed. biologically independent samples. Two-sided Wilcoxon rank-sum test. For
Hazard ratios with a 95% confidence interval are shown for each predictor and boxplots, centre line shows median, box limits indicate upper and lower
are plotted on a natural log scale. b, Significant CNAs identified by GISTIC quartiles, and whiskers extend 1.5 times the interquartile range, while data
analysis. Red for amplification and blue for deletion. Green lines denote the beyond the end of the whiskers are outlying points that are plotted individually.
threshold of Q value = 0.001. c, Hotspots of HBV integrations across CLCA. g, Comparison of the frequency of cases with kataegis events (denoted in
d, Top frequently amplified genes detected in ecDNA. e, Boxplots comparing blue) between patients with or without APOBEC signatures. Two-sided
the copy number of genes detected in ecDNA to others. f, Higher expression of Chi-square test.
Extended Data Fig. 6 | Patterns of SVs and clustered mutational genome-wide (top) and 2D density of juxtapositions (bottom) of SV,
processes. a, The number of SV events, focal CN segments, kataegis events, chromothripsis, and chromoplexy. d-e, Examples of chromothripsis (d) and
chromoplexy events, and chromothripsis events in the CLCA. b, Proportions chromoplexy (e) events involving oncogenes.
of different categories for each type of alteration. c, The density of breakpoints
Article

Extended Data Fig. 7 | Kataegis and evolutionary history. a, Rainfall plots of total number of individual tumours with the presence of the noted signature.
kataegis events. n denotes the total number of kataegis events detected in the For boxplots, centre line shows median, box limits indicate upper and lower
tumour and marked with arrows below. b, Distribution of point mutations over quartiles, and whiskers extend 1.5 times the interquartile range, while data
different mutation periods. c, Distribution of mutations across early clonal, beyond the end of the whiskers are outlying points that are plotted individually.
late clonal and subclonal stages, for drivers in CLCA. Barplots comparing Boxplots are ordered by the median and no statistical test is used. f, Preferential
the distribution of coding and noncoding mutations are shown, Two-sided ordering diagrams for patients stratified based on Cluster V, alcohol, and
Chi-square test. d, Relative ordering of CN events and driver mutations across smoking. The relative ordering of candidate drivers was compared.
all samples. e, Relative timing of signatures across all patients. n denotes the
Extended Data Fig. 8 | Dysregulated pathways. Each gene box includes integrations. Solid rectangles enclose genes in eight major signalling pathways.
the frequency of patients influenced by different types of somatic alterations Dashed rectangles enclose genes in specific signalling pathways. Interactions
affecting the corresponding gene. A total of eight forms of somatic alterations between genes are indicated. For each pathway, the frequencies of patients
are listed and colour-coded, including coding SNVs, noncoding SNVs (further altered by coding mutations only, noncoding mutations only, and both coding
divided into promoters, lncRNAs and UTRs), CNAs, ecDNA, SVs and HBV and noncoding mutations are denoted, as shown in the Venn diagram.
Article

Extended Data Fig. 9 | Functional validation of PPP1R12B and KCNJ12. sites in KCNJ12. h, RT-qPCR analysis of KCNJ12 mRNA expression across
a, Comparison of tumour migration, invasion, self-renewal and cell proliferation wild-type (WT) and point-mutated HepG2 cell lines. i-j, Comparison of the
capacities of PPP1R12B disruption across cell lines. b, Edited sites in PPP1R12B proliferation (i), migration, invasion, and self-renewal ( j) capacities across cell
by Prime Editing. c, RT-qPCR analysis of PPP1R12B mRNA expression across lines of indicated genotypes. For all panels, each experimental condition was
wild-type (WT) and point-mutated HepG2 cell lines. d-e, Comparison of the independently repeated for three times. Representative images of each assay
proliferation (d), migration, invasion, and self-renewal (e) capacities across cell are shown. Data are presented as mean ± s.e.m. In a, e, f, j, P values for the
lines of indicated genotypes. Representative images of each assay are shown comparison between a certain group with the control group are denoted on
for each cell line. f, Comparison of tumour migration, invasion, self-renewal the top of images. Two-way ANOVA test is used for proliferation analysis in
and cell proliferation capacities of KCNJ12 disruption across cell lines. g, Edited (a, d, f, i). For other plots, P value was derived with two-tailed Student’s t-test.
Extended Data Fig. 10 | See next page for caption.
Article
Extended Data Fig. 10 | Function validation of FGA. a, Lollipop plot of FGA array (n = 2 for each phosphorylated site or unphosphorylated protein).
mutations in CLCA. b, Overall survival of TCGA-HCC patients (n = 364) classified n, Western blot analysis of p-TYK2 (Y1054) and p-STAT3 (Y705) protein levels by
by FGA expression levels, Log-rank test. c, Comparison of FGA mRNA expression FGA knockdown in PLC/PRF/5 and PVTT cell lines. p-, phosphorylated. Source
between tumour and normal tissues in the TCGA-HCC cohort. For boxplots, gels in Supplementary Fig. 4. o, Representative images of cell migration assay
centre line shows median, box limits indicate upper and lower quartiles, and following inhibitor treatment. p, IL6 mRNA expression of sh-Ctrl and sh-FGA
whiskers extend 1.5 times the interquartile range, while data beyond the end of cells. q, IL6 mRNA levels between PLC/PRF/5- sh-Ctrl and sh-FGA cell lines
the whiskers are outlying points that are plotted individually. d, Representative following FBS stimulation. Cells were incubated in DMEM supplemented with
FGA IHC images of paired tumour and normal tissues. Quantitative result is 10% FBS for the indicated time intervals after treated with FBS-free medium
shown in Fig. 5d. e, Schematic of the edited site in the FGA noncoding region. overnight. r, Two-tailed Pearson correlation analysis of FGA protein and TYK2
f, Western blot analysis of FGA levels across wild-type and mutated HepG2 cell phosphorylation (n = 75) in an independent HCC patient cohort. The relative
lines. Source gels in Supplementary Fig. 3. g-h, Comparison of the proliferation intensity of FGA and p-TYK2 were normalized to β-actin. Source gels in
(g), migration, invasion, and self-renewal (h) capacities across cell lines of Supplementary Fig. 5. s, A proposed model illustrating the role of the FGA/
indicated genotypes. i-j, Comparison of tumour migration (i), invasion and TYK2/STAT3 axis during HCC tumorigenesis. Wildtype and mutated forms of
self-renewal, and cell proliferation ( j) capacities of FGA disruption across cell FGA were shown, respectively. The diagram was created using BioRender. For
lines. k, Resected xenograft tumours by sh-Ctrl (n = 6) and sh-FGA cells (n = 7) in all panels, n denotes biologically independent samples. Each experimental
PLC/PRF/5. l, Specific phospho-antibody array analysis between PLC/PRF/5-sh- condition was independently repeated three to five times. Data are presented
Ctrl and sh-FGA cell lines. Top significantly altered phosphorylation sites as mean ± s.e.m. In h and i, P value for the comparison between a certain group
among 156 phosphoproteins are listed. m, TYK2 phosphorylation and its with the control group are denoted on the top of images. Two-tailed Student’s
unphosphorylated counterpart between PLC/PRF/5- sh-Ctrl and sh-FGA cell t-test is used in (c, h, i, p, and q). Two-way ANOVA test is used in g and j.
lines determined with Cy3-labelled streptavidin via specific phospho-antibody
nature portfolio | reporting summary
Lei Chen, Lin Wu, Steven G. Rozen, Fan Bai,
Corresponding author(s): and Hongyang Wang
Last updated by author(s): Jan 2, 2024

Reporting Summary
Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons
A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient)
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.

Software and code


Policy information about availability of computer code
Data collection WGS (n=494) and RNA sequencing (n=239) libraries were sequenced with Illumina Novaseq (Illumina) with bulit-in software.

For the experimental section, the CCK8 signals for cell proliferation assay were detected with Synergy Neo microplate reader (BioTek). The
crystal-violet-stained images for colony formation, cell migration, and invasion assays were scanned by an Olympus IX73 microscope equipped
with an DP80 camera (Olympus). IHC slides were scanned by a Leica Aperio AT2. Immunoreactive bands were detected using e-BLOT Touch
Imager XLi or Odyssey Sa Infrared Imaging System (LI-COR Biosciences). Electrochemiluminescence (ECL) signals for IL-6 concentration in
tumor tissues were recorded on 1300 MESO QuickPlex SQ 120MM instrument (Meso Scale Discovery). RT-qPCR was performed on LightCycler
96 PCR platform (Roche). Phospho-specific protein microarray data was obtained with an Axon Instruments GenePix 4000B Microarray
Scanner.

Data analysis The Linux working environment we used for sequencing data analysis is packed into a Singularity container file and published at Zenodo
(https://doi.org/10.5281/zenodo.7260221). The detailed codes for all the software have been deposited at GitHub (https://github.com/
March 2021

ChongJenniferZhang/CLCA_WGS). Statistical analyses were performed using R (version 3.6.0) and GraphPad Prism (version 9.0).

FASTP (v0.13.1) https://github.com/OpenGene/fastp


BWA (v0.7.12) http://bio-bwa.sourceforge.net/
Sambamba (v0.6.8) https://github.com/biod/sambamba
Mutect2 (v4.0.11.0) https://gatk.broadinstitute.org/hc/en-us
Strelka (v2.8.4) https://github.com/Illumina/strelka
MutSigCV (v1.4) https://www.genepattern.org/modules/docs/MutSigCV
dndscv (v0.1.0) https://github.com/im3sanger/dndscv

1
OncodriveFML (v2.3.0) https://oncodrivefml.readthedocs.io/en/latest/index.html
MutSig2CV_NC (v1.0) https://github.com/broadinstitute/getzlab-PCAWG-MutSig2CV_NC

nature portfolio | reporting summary


ActiveDriverWGS (v1.1.1) https://cran.r-project.org/web/packages/ActiveDriverWGS/index.html
pvalue_combination (v1.0) https://github.com/broadinstitute/getzlab-PCAWG-pvalue_combination
SigProfilerExtractor (v1.1.0) https://github.com/AlexandrovLab/SigProfilerExtractor https://github.com/AlexandrovLab/
SigProfilerSingleSample
mSigHdp (v1.1.2) https://github.com/steverozen/mSigHdp/
maftools (v2.6.05) http://www.bioconductor.org/packages/release/bioc/html/maftools.html
Shatterseek (v0.4) https://github.com/parklab/ShatterSeek
ChainFinder (v1.0.1) https://software.broadinstitute.org/cancer/cga/chainfinder
lumpy (v0.2.13) https://github.com/arq5x/lumpy-sv
Sequenza (v2.1.1) https://cran.r-project.org/web/packages/sequenza/
GISTIC (v2.0.23) https://www.genepattern.org/modules/docs/GISTIC_2.0
purple (v2.34) https://github.com/hartwigmedical/hmftools/blob/master/purity-ploidy-estimator/README.md
AmpliconArchitect (v1.3.r2) https://github.com/virajbdeshpande/AmpliconArchitect
AmpliconClassifier (v0.2.5) https://github.com/jluebeck/AmpliconClassifier
PyClone (v0.13.1) https://github.com/Roth-Lab/pyclone
MutationTimeR (v0.99.3) https://github.com/gerstung-lab/MutationTimeR
PhylogicNDT (v1.0) https://github.com/broadinstitute/PhylogicNDT
Timing_and_Signatures (v1.0) https://github.com/clemencyjolly/PCAWG11-Timing_and_Signatures
STAR (v2.7.3a) https://github.com/alexdobin/STAR
RSEM (v1.3.3) https://github.com/deweylab/RSEM

IHC images were analyzed by Aperio ImageScope v12.4.6(Leica). Band intensity of western blots were assessed by ImageJ 1.53a. RT-qPCR
were analyzed with LightCycler 96 SW 1.1 (Roche). Electrochemiluminescence (ECL) signals were analyzed with DISCOVERY WORKBENCH
Desktop Analysis Software version 4.0 (Meso Scale Discovery). Phospho-specific protein microarray data was analyzed with GenePix Pro 6.0.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A description of any restrictions on data availability
- For clinical datasets or third party data, please ensure that the statement adheres to our policy

The raw sequence data reported in this paper has been deposited in the Genome Sequence Archive in BIG Data Center, Beijing Institute of Genomics (BIG), Chinese
Academy of Sciences, under the study accession number PRJCA002666 (https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA002666). We also built an interactive
website (http://lifeome.net/database/liver) for visualizing and analyzing our CLCA data. The data deposited and made public are compliant with the regulations of
the Ministry of Science and Technology of China. Other public data used in this study includes, the human reference genome of hg19/GRCh37 (https://
ftp.ensembl.org/pub/grch37/), PCAWG data (https://dcc.icgc.org/pcawg/#!), TCGA-HCC data (https://portal.gdc.cancer.gov/projects/TCGA-LIHC), and COSMIC
signatures (https://cancer.sanger.ac.uk/signatures/).

Human research participants


Policy information about studies involving human research participants and Sex and Gender in Research.

Reporting on sex and gender CLCA cohort comprised 427 men (86.4%) and 67 women (13.6%), with a mean age of 56 years (range, 23–84 years). All
patients were enrolled from Eastern Hepatobiliary Surgery Hospital and Shanghai Zhongshan Hospital during 2017-2020.

Population characteristics 94.5% of patients had HBV infection. 85.6% of patients were Edmondson-Steiner grades 3 and 4. 26.7% and 36.8% of
patients had alcohol drinking and smoking history, respectively. Detailed clinical information was summarized in
Supplementary Table 1.

Recruitment All patients included where diagnosed with hepatocellular carcinoma. No patients received any pre-operative anti-cancer
treatment. Each specimen was diagnosed by two senior pathologists. Patients with tissue samples that had sufficient and
good-quality DNA were selected.

Ethics oversight The study protocol was reviewed and approved by the institutional review board at all participating hospitals. This study was
performed in accordance with the principles of the Declaration of Helsinki. All participants provided written informed
March 2021

consent. All samples were anonymously coded in accordance with local ethical guidelines. All research participants consent
to the publication of research results.

Note that full information on the approval of the study protocol must also be provided in the manuscript.

2
nature portfolio | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design


All studies must disclose on these points even when the disclosure is negative.
Sample size The Chinese Liver Cancer Atlas (CLCA) cohort described in the paper contains 494 HCC patients collected in Eastern Hepatobiliary Surgery
Hospital and Shanghai Zhongshan Hospital during 2017-2020. 494 patients with HCC were subjected to sequencing, including WGS (n=494)
and messenger RNA sequencing (n=239). Detailed clinical information was summarized in Supplementary Table 1. No sample size calculations
were performed for human as the main aim of the study was to build up a resource. For xenograft mice experiments, a minimum of 3 mice for
each group of the PLC/PRF/5-sh-Ctrl and PLC/PRF/5-sh-FGA cells are required to reach statistical significance.

Data exclusions There is no data that were excluded from the WGS and RNA-seq analyses.

Replication No replication is needed for WGS and RNA-seq samples in our study since they are all clinical samples.
For experimental validation of potential drivers of PPP1R12B, KCNJ12, and FGA, dysfunctional cell lines were constructed by either knockdown
with two independent short hairpin RNA (shRNA, #1, #2) or knockout with two independent short guide RNA (sgRNA, #1, #2). Then these cell
lines were subjected for assessing proliferation, migration, invasion, and self-renewal capacities. Each assay was repeated three times
independently and representative images are shown.

Randomization No randomization was performed for the human tumor samples because this is an observational study. For xenograft models,
body weight-matched mice were randomized for subcutaneous injection of PLC/PRF/5-sh-Ctrl and PLC/PRF/5-sh-FGA cells.

Blinding Our study was not an intervention study and therefore blinding was not required.

Reporting for specific materials, systems and methods


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material,
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Materials & experimental systems Methods


n/a Involved in the study n/a Involved in the study
Antibodies ChIP-seq
Eukaryotic cell lines Flow cytometry
Palaeontology and archaeology MRI-based neuroimaging
Animals and other organisms
Clinical data
Dual use research of concern

Antibodies
Antibodies used Mouse monoclonal anti-β-actin (Cat# AC004, clone AMC0001, RRID:AB_2737399, blot 1:5000; ABclonal)
Mouse monoclonal anti-GAPDH (Cat# AC033, clone AMC0062, RRID:AB_2769570, blot 1:5000; ABclonal)
Rabbit polyclonal anti-TYK2 (Cat# 9312, RRID:AB_2256719, blot 1:1000; Cell Signaling Technology)
Rabbit monoclonal anti-Phospho-Tyk2 (Tyr1054/1055) (D7T8A) (Cat# 68790, clone D7T8A, RRID:AB_2799752, blot 1:1000, staining
1:100; Cell Signaling Technology)
Rabbit monoclonal anti-Phospho-Stat3 (Tyr705) (D3A7) XP (Cat# 9145, clone D3A7, RRID:AB_2491009, blot 1:2000; Cell Signaling
Technology)
March 2021

Mouse monoclonal anti-Lamin A/C(4C11) (Cat# 4777, clone 4C11, blot 1:2000; Cell Signaling Technology)
Mouse monoclonal anti-Fibrinogen α (C-7) (Cat# sc-398806, clone C-7, blot 1:500; Santa Cruz Biotechnology)
Rabbit polyclonal anti-Fibrinogen Alpha Chain (Cat# 20645-1-AP, RRID:AB_2878715, staining 1:100; Proteintech)
Mouse monoclonal anti-STAT3 (Cat# 60199-1-Ig, clone 3G2D12, RRID:AB_10913811, blot 1:2000; Proteintech)
Rabbit polyclonal anti-KI67 (Cat# ab15580, RRID:AB_443209, staining 1:500; abcam)
HRP-conjugated anti-Rabbit (Cat# D-3002; staining 1:1; Supervision)
HRP-conjugated Affinipure Goat Anti-Rabbit IgG(H+L) (Cat# SA00001-2, RRID:AB_2722564, blot 1:5000; Proteintech)
HRP-conjugated Affinipure Goat Anti-Mouse IgG(H+L) (Cat# SA00001-1, RRID:AB_2722565, blot 1:5000; Proteintech)

3
IRDye 800CW Goat anti-Rabbit IgG (H + L) (Cat# 926-32211, RRID:AB_621843,blot 1:20000; LI-COR)
IRDye 800CW Goat anti-Mouse IgG (H + L) (Cat# 926-32210, RRID:AB_621842, blot 1:20000; LI-COR)

nature portfolio | reporting summary


Validation All antibodies used in this study are commercially available. They are validated by the vendors for the specific assay and species used,
with the validation reports available on the vendor's website. All antibodies were titrated to determine the optimal working
concentration.

Eukaryotic cell lines


Policy information about cell lines and Sex and Gender in Research
Cell line source(s) For the functional validation of three candidate drivers, the human liver cancer cell lines, PLC/PRF/5, PVTT, HepG2, Huh7,
SNU387, SNU182, and the normal liver cell line HHL5 were obtained from Shanghai Cell Bank of the Chinese Academy of
Sciences. For the validation of AA-related mutational signatures, MCF-10A and HepG2 cells were obtained from the American
Type Culture Collection (ATCC).

Authentication All cell lines used in this study were authenticated by applying short tandem-repeat DNA profiling.

Mycoplasma contamination We confirm that all cells were tested as mycoplasma negative.

Commonly misidentified lines No commonly misidentified cell lines were used.


(See ICLAC register)

Animals and other research organisms


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research, and Sex and Gender in
Research

Laboratory animals BALB/c nude mice (5-7 weeks) were obtained from GemPharmatech LLC (JiangSu, China) and used for subcutaneous xenograft. All
mice were housed in pathogen free conditions at an ambient temperature 20-26°C and humidity of 30-70% with a 12:12 hour
light:dark cycle prior to use. Mice had unrestricted access to regular mouse chow and water. The tumour width (w) and length (l)
were measured every 3 days with a caliper, and the diameter of single tumour was < 2cm when sacrificed.

Wild animals This study did not involve wild animals.

Reporting on sex Preliminary subcutaneous xenograft experiments were performed on male and female mice, respectively. Similar trends of sh-FGA
cells resulted in larger and more aggressive tumours in comparison with those of mice injected with sh-Ctrl cells were observed. To
exclude the potential confounding factors of aggression and biting in the male groups, only the female groups were kept and
recorded. The tumorigenic role of FGA dysfunction in HCC applies to box sexes.

Field-collected samples This study did not involve field-collected samples.

Ethics oversight All mouse experiments were approved by the Animal Care and Use Committee at Eastern Hepatobiliary Surgery Hospital.

Note that full information on the approval of the study protocol must also be provided in the manuscript.

March 2021

You might also like