TR C Tràng 2
TR C Tràng 2
Correspondence
[email protected] (X.Y.),
[email protected] (Z.Y.),
[email protected] (Z.H.)
In brief
Yin et al. utilizes 4D-DIA proteomics and
machine learning to identify key
biomarkers PF4 and AACT in serum
extracellular vesicles for colorectal
cancer (CRC) diagnosis. Their random
forest model demonstrates superior
diagnostic performance for early-stage
CRC and distinguishing CRC from benign
colorectal diseases, offering a promising
tool for clinical application.
Highlights
d 4D-DIA proteomic profiles of serum EVs in CRC patients and
healthy controls
Article
Machine learning-based analysis identifies
and validates serum exosomal proteomic signatures
for the diagnosis of colorectal cancer
Haofan Yin,2,3,8 Jinye Xie,4,8 Shan Xing,5,8 Xiaofang Lu,1 Yu Yu,6 Yong Ren,7 Jian Tao,3 Guirong He,3 Lijun Zhang,3
Xiaopeng Yuan,3,* Zheng Yang,1,* and Zhijian Huang1,2,9,*
1Department of Pathology, The Seventh Affiliated Hospital of Sun Yat-Sen University, Shenzhen, Guangdong, China
2Digestive Diseases Center, The Seventh Affiliated Hospital of Sun Yat-Sen University, Shenzhen, Guangdong, China
3Department of Laboratory Medicine, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated
Guangdong, China
6Department of Breast Surgery, Shen Shan Medical Center, Memorial Hospital of Sun Yat-Sen University, Shanwei, Guangdong, China
7Guangdong Artificial Intelligence and Digital Economy Laboratory (Guangzhou), PAZHOU LAB, No. 70 Yuean Road, Haizhu District,
SUMMARY
The potential of serum extracellular vesicles (EVs) as non-invasive biomarkers for diagnosing colorectal can-
cer (CRC) remains elusive. We employed an in-depth 4D-DIA proteomics and machine learning (ML) pipeline
to identify key proteins, PF4 and AACT, for CRC diagnosis in serum EV samples from a discovery cohort of 37
cases. PF4 and AACT outperform traditional biomarkers, CEA and CA19-9, detected by ELISA in 912 individ-
uals. Furthermore, we developed an EV-related random forest (RF) model with the highest diagnostic effi-
ciency, achieving AUC values of 0.960 and 0.963 in the train and test sets, respectively. Notably, this model
demonstrated reliable diagnostic performance for early-stage CRC and distinguishing CRC from benign
colorectal diseases. Additionally, multi-omics approaches were employed to predict the functions and po-
tential sources of serum EV-derived proteins. Collectively, our study identified the crucial proteomic signa-
tures in serum EVs and established a promising EV-related RF model for CRC diagnosis in the clinic.
Cell Reports Medicine 5, 101689, August 20, 2024 ª 2024 The Authors. Published by Elsevier Inc. 1
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
2
OPEN ACCESS
Cell Reports Medicine 5, 101689, August 20, 2024
ll
Table 1. Clinicopathologic characteristics of patients included in the study
Discovery set (n = 37) Train set (n = 338) Test set (n = 328) External set (n = 246)
HC CRC HC BCD CRC HC BCD CRC HC Enteritis Hepatitis B CRC
Characteristics (n = 12) (n = 25) (n = 96) (n = 47) (n = 195) (n = 112) (n = 55) (n = 161) (n = 60) (n = 42) (n = 46) (n = 98)
Age, y, 50.0 ± 9.3 56.5 ± 13.5 57.3 ± 8.4 51.4 ± 11.5 58.2 ± 13.5 56.6 ± 8.5 50.1 ± 11.4 60.1 ± 13.7 63.2 ± 10.5 54.8 ± 9.9 54.7 ± 9.7 62.2 ± 13.4
mean ± SD
Gender, n (%)
Male 4 (33.3) 13 (52.0) 52 (54.2) 26 (55.3) 111 (56.9) 60 (53.6) 38 (69.1) 98 (60.9) 42 (70.0) 31 (73.8) 31 (67.4) 47 (48.0)
Female 8 (66.7) 12 (48.0) 44 (45.8) 21 (44.7) 84 (43.1) 52 (46.4) 17 (30.9) 63 (39.1) 18 (30.0) 11 (26.2) 15 (32.6) 51 (52.0)
Clinical stage, n (%)
I – 3 (12.0) – – 22 (11.3) – – 19 (11.8) 17 (17.3)
II – 6 (24.0) – – 48 (24.6) – – 31 (19.3) 22 (22.4)
III – 8 (32.0) – – 83 (42.6) – – 47 (29.2) 47 (48.0)
IV – 4 (16.0) – – 42 (21.5) – – 64 (39.8) 12 (12.2)
Unknown – 4 (16.0) – – 0 (0.0) – – 0 (0.0) 0 (0.0)
CEA, ng/mL, n (%)
<5 12 (100.0) 14 (56.0) 94 (97.9) 47 (100.0) 129 (66.2) 108 (96.4) 53 (96.4) 90 (55.9) 54 (90.0) 41 (97.6) 45 (97.8) 65 (66.3)
R5 0 (0.0) 11 (44.0) 2 (2.1) 0 (0.0) 66 (33.8) 4 (3.6) 2 (3.6) 71 (44.1) 6 (10.0) 1 (2.4) 1 (2.2) 33 (33.7)
CA19-9, ng/mL, n (%)
<35 12 (100.0) 18 (72.0) 94 (97.9) 45 (95.7) 153 (78.5) 111 (99.1) 54 (98.2) 112 (69.6) 59 (98.3) 40 (95.2) 46 (100.0) 82 (83.7)
R35 0 (0.0) 7 (28.0) 2 (2.1) 2 (4.3) 42 (21.5) 1 (0.9) 1 (1.8) 49 (30.4) 1 (1.7) 2 (4.8) 0 (0.0) 16 (16.3)
Article
ll
Article OPEN ACCESS
data acquisition (4D-DIA) to uncover key biomarkers packaged and heatmap (Figures 1D and 1E). Further, Gene Ontology
in blood-derived EVs holds important implications for scan- (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
ning CRC. analysis were employed to characterize the potential function
Machine learning (ML), as an essential branch of artificial intel- of differentially expressed proteins (DEPs). Molecular function
ligence, has garnered increasing attention in tumor diagnosis and of GO analysis revealed upregulated EV proteins related to
treatment management recently.12 Compared to conventional protein binding (Figure 1F), while downregulated EV proteins
diagnostic models, ML approaches are flexible and more suitable associated with RNA and DNA binding (Figure S1C). Cellular
for capturing non-linear associations and integrating vast components analysis showed that DEPs were localized in both
amounts of medical data including medical imaging and multi- the extracellular space and exosome (Figures 1G and S1D).
omics data.13 Multiple ML methods can be employed to robustly Additionally, biological processes and KEGG pathway analysis
extract crucial features from liquid biopsy analytes and build indicated that upregulated proteins were enriched in inflamma-
diagnostic models, thereby achieving superior specificity and tory, immune response, blood coagulation, and platelet activa-
sensitivity for cancer diagnosis.14 For instance, random forest tion (Figures 1H and 1I). Downregulated proteins also related
(RF) algorithm-based diagnostic models exhibit remarkable per- to certain immune response pathways including adaptive im-
formances across more than twenty types of cancer by utilizing mune response (Figures S1E and S1F).
microbes from tissues and blood.15 Thus, a ML model based on
the optimal algorithm has the potential to enhance the accuracy Screening proteomic biomarkers of serum EVs for CRC
of cancer diagnosis by profiling tissue and blood materials. diagnosis via ML
In this study, the 4D-DIA technology was employed to perform To further identify the critical biomarkers of serum EVs for CRC
in-depth profiling of serum EV-proteomics data. Subsequently, diagnosis, we performed orthogonal partial least squares
the most valuable protein signatures were identified by utilizing discriminant analysis (OPLS-DA) to clarify the contribution of
ML-based pipeline and validated by ELISA detection. The object different variables in distinguishing CRC patients from HC. The
of our study was to develop a reliable EV-related RF model score and scatterplot displayed significant discrimination be-
based on the identified protein signatures for clinical CRC tween CRC and HC subjects based on 4D-DIA proteomics
diagnosis. (Figures 2A and 2B). 12 candidate EV proteins, including IGG1,
A2MG, AACT, PF4, KLD8B, APOB, IC1, A1AT, IGHM, KV315,
RESULTS CP135, and IGKC, were identified as the core contributors to dis-
tinguishing CRC patients from HC by using predictive variable
Identification and characterization of serum EVs in HC importance in projection (VIPpred) analysis (Figure 2C). Subse-
and CRC patients quently, ML diagnostic models based on 12 candidate EV pro-
The serum EVs from the discovery set (25 cases CRC, 12 cases teins were constructed to scan the most valuable variables.
healthy control [HC]) were collected for candidate biomarkers Among 5 different ML algorithms, the RF model yielded the
screening. An expansion cohort comprising 338 cases in the best results in terms of classification error (CE), area under the
train set and 328 cases in the test set was recruited for RF diag- ROC curve (AUC), and area under the precision-recall curve
nostic model construction and validation (Table 1). Separated (PRAUC) (Figures 2D, S1G, and S1H; Table S1). Hence, we
EVs from serum were subjected to subsequent experiments for opted to use the RF algorithm for subsequent ML model con-
validation (Figures 1A–1C). Nanoparticle tracking analysis struction. In the RF variable importance analysis, 5 variables,
(NTA) exhibited that the average diameter and distribution of including PF4, AACT, KLDB, CP135, and KV315, exhibited the
EVs were compliant (Figure 1A). In western blot assay, the EV highest rankings (Figure 2E). Further analysis using least abso-
markers CD63 and TSG101 were present in isolated EVs, but lute shrinkage and selection operator (Lasso) logistic regression
not in protein lysate of CRC cell lines SW480, SW620, and identified PF4 and AACT as the top two ranking variables, which
HCT116. As the negative control marker expressed intracellu- were determined to be the most valuable EV proteins for CRC
larly, GRP94 and calnexin were not exposed in separated EVs diagnosis (Figures 2F–2H ; Table S2). The combined evaluation
(Figure 1B). In addition, the vesicle-like particles were confirmed of PF4 and AACT exhibited superior diagnostic performance in
by applying transmission electron microscopy (TEM) (Figure 1C). the RF diagnostic model (Figures 2I and 2J).
In further in-depth EV proteome analysis, 4D-DIA technology
identified a total of 5,851 peptides and 854 proteins (Figure S1A), Validation of aberrant PF4 and AACT levels in expansion
which included 75 upregulated and 91 downregulated proteins in cohorts
the CRC group compared with the HC group (Figure S1B). Aber- To validate the aberrant elevation of PF4 and AACT identified in
rant expression profiles of EVs were illustrated by volcano plot the discovery set, EVs from an expansion of 912 individuals,
Figure 1. Identification and characterization of serum EVs in HC and CRC patients by 4D-DIA proteomics analysis
(A) NTA showed the mode size and particle concentration of separated EVs by Flow NanoAnalyzer.
(B) Western blot detected EV markers CD63 and TSG101 in serum EVs. GRP94 and calnexin were used as negative control proteins.
(C) TEM image displayed the morphology of isolated EVs.
(D and E) DEPs from EVs between the CRC and HC groups were illustrated by volcano plot (D, p < 0.05, log2 fold change > 0.5, n = 37) and heatmap (E).
(F–I) Upregulated protein enrichment analysis revealed the potential molecular function (F), cellular component (G), biological process (H), and KEGG pathways
(I) enriched in the CRC group compared to the HC group.
comprising 338 cases in the train set, 328 cases in the test set, the combination of PF4, AACT, CEA, and CA19-9 achieved the
and 246 cases in the external set were collected for ELISA detec- best diagnostic performance with an AUC of 0.960, PRAUC of
tion (Table 1). Distinctly, compared with HC or patients with 0.979, and CE of 0.08 (Figures S3A and S3B; Table S5). Thus,
benign colorectal disease (BCD) or inflammatory disease, the the optimal diagnostic model based on the 4 variables was
levels of EV-derived PF4 were significantly increased in the train, defined as the EV-related model and subsequently validated in
test, and external sets (Figures 3A, 3B, and S2A). Consistently, the test set (Table S6). The confusion matrix illustrated that the
elevated AACT levels were also observed in CRC patients EV-related model exhibited superior accuracy of 0.883 in the
compared to HC or patients with BCD or inflammatory disease test set and 0.810 in the external set (Figures 4G and S3C;
(Figures 3C, 3D, and S2B). Additionally, statistical analysis of Table S6). Furthermore, the excellent diagnostic performance
PF4 and clinicopathological characteristics indicated that the was also confirmed by other metrics including ROC and
levels of PF4 were significantly associated with clinical stage, tu- PRAUC curves with an AUC of 0.963 and 0.895 and a PRAUC
mor-node-metastasis (TNM) classification, and differentiation of 0.975 and 0.921 in the test set and external set, respectively
(Table S3). AACT levels were also related to clinical stage and (Figures 4H and 4I; Table S6).
TNM classification (Table S4). Impressively, the levels of serum The diagnosis of early-stage tumors poses greater challenges
EV-derived PF4 gradually elevated with clinical staging in comparison to advanced-stage tumors due to the scarcity of
(Figures 3E, 3F, and S2C). In line with PF4, AACT levels also suitable tumor markers. To evaluate the diagnostic efficacy of
incrementally increased with the progression of clinical stages the EV-related model for early-stage CRC, we extracted data
(Figures 3G, 3H, and S2D). Intriguingly, EV-derived PF4 and from patients with stage I and stage II CRC in the test set for
AACT levels were notably reduced in CRC patients after treat- further validation. In confusion matrix analysis, the EV-related
ment (Figures S2E–S2H). Taken together, these results model demonstrated reliable accuracy in discriminating patients
confirmed the potential of EV-derived PF4 and AACT as impor- with stage I and stage II tumors from HC subjects in the test set
tant biomarkers for CRC diagnosis and post-treatment (Figures 4J and S3D). Consistently, ROC and PRAUC curves
monitoring. further validated the excellent diagnostic efficacy of the EV-
related model (Figure 4K and 4L; Table S6). Moreover, the ability
Development and validation of the EV-related RF of the EV-related model was also tested in distinguishing pa-
diagnostic model for CRC diagnosis tients with CRC from patients with BCD or inflammatory disease.
Subsequently, RF diagnostic models were constructed to eval- Notably, the model presented outstanding diagnostic perfor-
uate the diagnostic efficiency of EV-derived PF4 and AACT mance in discriminating between CRC and other patients
compared with traditional CRC biomarkers CEA and CA19-9. (Figures S3E–S3J; Table S6). Collectively, EV-derived PF4 and
In the train set, receiver operating characteristics (ROC) curves AACT outperformed CEA and CA19-9 as biomarkers for CRC
displayed significantly higher AUC values for both PF4 (AUC = diagnosis. The EV-related model exhibited superior diagnostic
0.926) and AACT (AUC = 0.770) compared to CEA (AUC = performance for CRC, including early-stage diagnosis and differ-
0.623) and CA19-9 (AUC = 0.676). Moreover, the combination ential diagnosis from patients with BCD or inflammatory disease.
of PF4 and AACT yielded an impressive AUC of 0.950 (Figure 4A).
Consistently, in precision-recall (PR) curve analysis, PF4 and Functional enrichment analysis of EV-derived PF4 and
AACT demonstrated superior PRAUC compared with CEA and AACT
CA19-9, and the combined PF4 and AACT model even achieved To gain insights into the potential functions of EV-derived PF4
a higher PRAUC of 0.969 (Figure 4B). Accumulated local effects and AACT in CRC, we performed gene set enrichment analysis
(ALEs) analysis confirmed that both PF4 and AACT had a more (GSEA). The results showed that EV-derived PF4 in the discovery
pronounced effect on predicting CRC compared to CEA and set was enriched in pathways related to cell differentiation, cell
CA19-9 (Figure 4C). The Shapley value also showed that higher development, and transmembrane transport. Particularly, lipid
PF4 (R3870.74 pg/mL) and AACT (R515.4 ng/mL) levels made localization and cholesterol efflux pathways were negatively
the largest contribution in discriminating CRC from HC (Fig- correlated with PF4, and similar pathway enrichment results
ure 4D), which aligns with the results from the importance anal- were also obtained in The Cancer Genome Atlas (TCGA) data-
ysis (Figure 4E). base (Figures 5A, 5B, and S4A). EnrichmentMap analysis was
To achieve the best combination of the EV-derived and tradi- utilized to research the associations between these enriched
tional CRC biomarkers, RF diagnostic models were developed terms. Likewise, the EV-derived PF4 low-expressed phenotype
using different combinations of variables. As shown in Figure 4F, exhibited strong associations between the localization and
Figure 2. Screening EV-derived biomarkers for CRC diagnosis via the ML pipeline
(A and B) Score plot (A) and scatterplot (B) exhibited significant discrimination between CRC and HC subjects via OPLS-DA analysis.
(C) Twelve candidate proteins selected based on their VIPpred scores >4.
(D) Bar plot showed the value of CE in ML diagnostic models based on different algorithms.
(E) Variable importance score plot showed the contribution of twelve candidate proteins in the RF diagnostic model.
(F and G) The Lasso regression analysis based on 4D-DIA proteomics and partial likelihood deviance on the prognostic genes. The minimum criteria and the
1-standard error (1SE) criteria were used to draw the dotted vertical lines at the optimal values of variables.
(H) Venn plot displayed the intersection of candidate proteins from the RF model and the Lasso regression models based on minimum and 1SE criteria.
(I and J) ROC curve (I) and PR curve (J) of RF diagnostic models based on PF4, AACT, and combined PF4 and AACT levels of 4D-DIA proteomics.
homeostasis of lipid, cholesterol efflux, and sterol transport (Figures 6G and 6H). Taken together, aberrant elevation of EV-
pathways in corresponding association network (Figure 5C). derived PF4 and AACT might release from CRC epithelial cells,
Furthermore, core genes in the EnrichmentMap network were and PF4 might also originate from myeloid cells, stromal cells,
extracted using leading edge analysis and integrated with PF4 and T cells.
to construct the protein-protein interaction (PPI) networks in
the STRING database (Figures S4C and 5D). The PPI network DISCUSSION
indicated that PF4 might interact with the core genes including
APOA1, APOA2, and APOE (Figure 5D). CRC is a common malignant tumor with high incidence and mor-
The same analysis pipeline was also processed in EV-derived tality rates, ranking third among different types of cancers.16 Due
AACT. GSEA results displayed that the AACT-high phenotype to the lack of effective approaches for early diagnosis, the 5-year
was enriched in several pathways, including ‘‘Acute inflamma- survival rate of CRC patients is approximately 50%–60%, which
tory response,’’ ‘‘Negative regulation of proteolysis,’’ and even plunges to 14% for patients with metastasis.17 Non-inva-
‘‘Negative regulation of peptidase activity’’ (Figures 5E and 5F). sive approaches with accurate and repeatable characteristic
The same enriched pathways were also validated in the TCGA are in high demand for early detection to improve patient
database (Figure S4B). Consistently, EV-derived AACT was outcome. In this context, our study identified two biomarkers,
negatively associated with the metabolic process and metabo- PF4 and AACT, by deeply profiling 4D-DIA proteomics data of
lites pathways involved in proteolysis (Figures 5E and 5F), whose EVs with a ML pipeline. Subsequently, the optimal RF model
association was also reflected in the EnrichmentMap network based on PF4 and AACT was constructed and yielded the supe-
(Figure 5G). Moreover, leading edge analysis was performed to rior diagnostic performance. The identified EV-proteomic signa-
obtain hub genes in the EnrichmentMap network (Figure S4D). tures and developed RF model provide valuable tools for
The PPI network comprising AACT and hub genes revealed enhancing early detection and management of CRC in clinical
that AACT might interact with transforming growth factor settings.
(TGF)-b1, ACTB, and PTPRC, suggesting its potential role in in- As an emerging liquid biopsy technology, EVs constitute sig-
flammatory, cytoskeleton, and protein metabolic pathways nificant potential for clinical applications in drug delivery therapy
(Figure 5H). and cancer diagnosis.18 Analyzing EVs from serum offers several
advantages compared to direct serum testing.9 Primarily, the
Deciphering specific cell types releasing EV-derived lipid bilayer structure of EVs shields their cargo from degrada-
PF4 and AACT tion, offering a more accurate representation of the body’s state.
Next, single-cell transcriptome analysis of the GEO: GSE132465 In addition, the proteins in the serum of CRC patients are en-
and GEO: GSE132257 datasets was employed to identify the riched by EVs, which substantially augment detection efficacy.
specific cell types responsible for releasing PF4 and AACT pack- Consequently, EVs have garnered increasing attention in the
aged in EVs. When analyzing the GEO: GSE132465 dataset realm of liquid biopsy. The application of ultracentrifugation for
comprising normal and CRC tissues (Figures 6A and S4E), PF4 EV extraction we employed is widely recognized as a robust
exhibited a dramatic elevation in CRC epithelial cells compared extraction method.19 To verify the reproducibility of our experi-
to normal epithelial cells. Additionally, PF4 was also slightly ments, we utilized a commercial extraction kit based on size
upregulated in myeloid cells, stromal cells, and T cells exclusion chromatography (SEC) principles for EV isolation.
(Figures 6B and 6C). Consistently, in the GEO: GSE132257 data- Correlation analysis demonstrated a strong correlation between
set (Figures 6D and S4F), PF4 expression was markedly elevated biomarkers isolated by ultracentrifugation and SEC methods
in CRC epithelial cells and slightly elevated in stromal cells, (Figures S5A and S5B). Additionally, aberrant levels of PF4 and
myeloid cells, and T cells compared to normal tissues AACT isolated by SEC were also observed in the CRC group
(Figures 6E and 6F). As for AACT, its expression was significantly compared to the HC group (Figures S5C and S5D). RF models
higher in CRC epithelial cells compared to normal epithelial cells based on both EV extraction methods exhibited robust
in both the GEO: GSE132465 (Figures 6B and 6C) and GEO: diagnostic performance (Figures S5E and S5F). These results
GSE132257 datasets (Figures 6E and 6F). Furthermore, immu- demonstrate the reproducibility of our experiments and the reli-
nohistochemistry (IHC) was performed to detect PF4 and ability of the proteomic signatures we identified.
AACT expression in 50 paired CRC and adjacent tissues. A previous study on using serum EVs for CRC diagnosis
Consistent with single-cell transcriptome analysis, the expres- showed significant limitations, including testing mixed samples
sion levels of PF4 and AACT were abnormally elevated in CRC and employing unstable TMT-tagged mass spectrometry with
epithelial cells compared to adjacent normal epithelial cells instability and limited proteome coverage.20 In contrast, our
study conducted separate testing for multiple samples and uti- tumor cells and the tumor microenvironment. A growing number
lized the most recent mass spectrometry methods, resulting in of studies focus on the role of PF4 in reshaping the immune
significantly different protein markers compared to theirs. More- microenvironment.25 PF4 has been shown to not only drive
over, we also identified two biomarkers proposed in this study, macrophage migration during tumor progression but also induce
namely FN1 and HSP90AA1. The results indicated that FN1 differentiation of recruited monocytes into myeloid-derived sup-
levels were slightly elevated in the CRC group compared to the pressor cells, thereby suppressing CD8+ T cell function.26,27
HC group (Figure S6A), while HSP90AA1 showed no difference Furthermore, PF4 promoted regulatory T cell (Treg) production
between the two groups (Figure S6B). ROC and PR curves in a mouse model of sepsis through activation of the STAT5/
demonstrated that the diagnostic efficacy of RF models based FOXP3 pathway.28 In addition, PF4 reduced the proliferation of
on FN1, HSP90AA1, and their combination was markedly inferior cytotoxic T lymphocytes and promoted the proliferation of
compared to that of the models based on PF4, AACT, and their Tregs, thereby suppressing the immune response to CRC in
combination (Figures S6C and S6D). Consistent results were transplanted tumor-bearing mice.23 Besides, PF4 deletion abro-
also obtained from ALE (Figure S6E) and variable importance gated SPP1+ macrophage differentiation and improved fibrosis
analysis (Figure S6F). These data further validated the proteomic after cardiac and renal injury.29 PF4 impaired the phagocytic ca-
signatures we identified as robust biomarkers for CRC pacity of macrophages by reducing CD36 levels, leading to the
diagnosis. development of cardiovascular disease.30 CD36 was a key
Recently, the application of ML in the field of oncology has transporter protein in maintaining lipid homeostasis, and several
been increasingly highlighted. By performing robust feature studies suggested that CD36 was involved in reprogramming
selection from analytes of liquid and tissue biopsy, we can use lipid metabolism of the tumor microenvironment in CRC.31–33
ML approaches to improve the accuracy and efficiency of cancer Our bioinformatics results also suggested that EV-derived PF4
diagnosis, treatment decision, and prognostic prediction.13,21 In was responsible for regulating lipid homeostasis and lipid local-
our research, various ML algorithms, including support vector ization (Figures 5A–5D). It would be intriguing to further explore
machine, k-nearest neighbor, decision tree (Rpart), RF, and lo- whether EV-derived PF4 regulated lipid metabolism homeosta-
gistic regression, were employed to construct diagnostic sis through CD36 to reshape the tumor microenvironment. Addi-
models. Compared to conventional linear analysis of logistic tionally, the PPI network indicated that PF4 might interact with
regression, models based on other ML algorithms such as RF several apolipoproteins, including APOA1, APOA2, and APOE,
demonstrated a significant improvement in terms of AUC from suggesting the potential mechanism of PF4 participating in lipid
0.887 to 0.993 (Figure S2A). Consequently, ML is capable of homeostasis and cholesterol efflux (Figure 5D). Moreover, in our
profiling crucial proteomic features from EVs and developing single-cell transcriptome analysis, the elevated EV-derived PF4
more robust and reliable models compared to traditional linear in the serum of CRC patients may release from CRC epithelial
regression models. cells, with a slight increase also observed in myeloid, stromal,
PF4, also known as CXCL4, was a chemokine mainly pro- and T cells (Figure 6). However, PF4 has not yet been a therapeu-
duced by activated platelets participating in numerous biological tic target due to the absence of a defined receptor to explain its
processes, including host inflammatory response promotion, he- regulatory function on immune cells. A recent study revealed that
matopoiesis, and angiogenesis inhibition.22 In addition to plate- PF4 bound to glycosaminoglycan sugars on proteoglycans in the
lets, PF4 is also produced and secreted by other cells, such as endothelial extracellular matrix, leading to increased adhesion of
somatic cells and cancer cells.23,24 Our IHC results indicate leukocytes to blood vessels and causing a series of non-specific
that PF4 is highly expressed in CRC epithelial cells compared recruitment of leukocytes.34 Further studies are needed to fully
to adjacent tissues (Figures 6E and 6J). Moreover, single-cell understand the mechanism of PF4’s regulatory effects on im-
transcriptome analysis revealed that the elevated EV-derived mune cells.
PF4 in the serum of CRC patients may originate from CRC Glycoprotein AACT was a serine protease inhibitor synthesized
epithelial cells, with a slight increase also observed in myeloid, primarily in the liver and secreted into the blood.35 However,
stromal, and T cells (Figure 6). Our findings suggest that the increasing evidence suggested that AACT could also serve as a tu-
abnormally elevated PF4 in serum EVs may originate from both mor biomarker and played a crucial role in tumor progression.
Figure 4. Construction and validation of the EV-related RF diagnostic model for CRC detection
(A and B) ROC curve (A) and PR curve (B) of RF diagnostic models based on indicated variables in the train set.
(C) The ALE curve depicts the accumulated local effects of PF4, AACT, CEA, and CA19-9. The x axis represents the feature values, and the y axis represents the
accumulated local effects.
(D) Shapley values bar plot illustrates the Shapley values for each feature in the RF diagnostic model. Each bar represents the average contribution on
discriminating CRC patients from HC.
(E) Variable importance score plot showed the contribution of 4 variables in the RF diagnostic model.
(F) CE, AUC, and PRAUC values of the RF diagnostic models with different variable combinations.
(G) Confusion matrix displayed the prediction results for 273 test set sample (161 CRC and 112 HC) and 158 external set sample (98 CRC and 60 HC) through the
EV-related diagnostic model.
(H and I) ROC curve (H) and PR curve (I) were plotted for the EV-related diagnostic model using the train and test sets.
(J) Confusion matrix displayed the prediction results for 162 untrained test samples comprising 50 individuals of stages I and II CRC patients and 112 individuals of
HC through the EV-related diagnostic model.
(K and L) ROC curve (K) and PR curve (L) were plotted for the EV-related diagnostic model using the train and test sets.
development of CRC requires further investigation. Previous d QUANTIFICATION AND STATISTICAL ANALYSIS
studies presented that AACT regulated cytokine secretion by acti-
vating the nuclear factor kB (NF-kB) signaling pathway, which also SUPPLEMENTAL INFORMATION
promoted the growth and migration of CRC cells.41,42 Interestingly,
our bioinformatics results also indicated that EV-derived AACT Supplemental information can be found online at https://doi.org/10.1016/j.
was most associated with the acute inflammatory response xcrm.2024.101689.
pathway. which could activate NF-kB signaling (Figures 5E and
5F). Moreover, the STRING database analysis suggested that ACKNOWLEDGMENTS
AACT might be involved in inflammatory and NF-kB signaling path-
ways through the important inflammatory regulator TGF-b (Fig- This study was supported by funds from the National Natural Science Founda-
ure 5H). Nevertheless, AACT was also able to enter the nucleus tion of China (82103346, 82202829, 82202985, and 82203661), Guangdong
and establish a strong link with chromatin, leading to the inhibition Basic and Applied Basic Research Foundation (2021A1515110094,
2022A1515111199, 2022A1515111062, and 2023A1515220107), General
of liver cancer cell proliferation.43,44 It remained unclear whether
Project of Shenzhen Science and Technology Innovation Commission
AACT in EVs also exhibited these contradictory effects. Moreover, (JCYJ20220530145003008 and JCYJ20220530145001002), Guangdong Nat-
AACT exhibited aberrant elevation in CRC epithelial cells and ural Science Fund (2023A1515010214), and Five-Three Project for Training
negative correlations with proteolysis pathway. The potential role Clinician-Scientists of Shenzhen People’s Hospital (no. SYWGSLCYJ202201).
of AACT in cytoskeleton and protein metabolic pathways remains
to be further investigated (Figures 5E–5H and 6). Taken together,
AUTHOR CONTRIBUTIONS
AACT may hold great promise as a diagnostic and therapeutic
target for CRC, although further studies are needed to fully under- Conceptualization, Z.H., H.Y., and Z.Y.; methodology, Z.H., H.Y., and Y.R.;
stand its role in tumor progression. data collection, X.L., Y.Y., J.T., G.H., and L.Z.; statistical analysis, H.Y., J.X.,
Z.H., and S.X.; funding acquisition, J.X., Z.H., H.Y., Z.Y., and X.Y.; study super-
Limitations of the study vision, Z.Y. and X.Y. All authors reviewed the manuscript and approved the
final revision.
A limitation of our research is that the sample size of our cohorts
was not large enough. Nevertheless, we were able to replicate
our findings in the proteomics cohort using ELISA in two inde- DECLARATION OF INTERESTS
pendent cohorts. In addition, we needed to expand the enrolled
The authors declare no competing interests.
population range in order to verify the effect of cardiovascular
disease, inflammation, and other confounding factors on the
Received: September 27, 2023
identified markers. Meanwhile, the specificity and sensitivity of Revised: June 28, 2024
the combination of PF4 and AACT for other gastrointestinal tu- Accepted: July 24, 2024
mors also required more samples to be evaluated. Published: August 20, 2024
Figure 6. scRNA-seq analysis reveals CRC epithelial cells as the major source of EV-derived PF4 and AACT production
(A) Uniform manifold approximation and projection (UMAP) plot showed different cell types in CRC (n = 23) and normal (n = 10) tissues via single-cell RNA
sequencing (scRNA-seq) analysis from the GEO: GSE132465 dataset.
(legend continued on next page)
(B) Dot plot showed the expression of PF4 and AACT in normal and CRC tissues from the GEO: GSE132465 dataset.
(C) Violin plot exhibited the expression of PF4 and AACT in normal and CRC tissues from the GEO: GSE132465 dataset.
(D) UMAP plot showed different cell types in CRC (n = 5) and normal (n = 5) tissues via scRNA-seq analysis from the GEO: GSE132257 dataset.
(E) Dot plot showed the expression of PF4 and AACT from the GEO: GSE132257 dataset.
(F) Violin plot exhibited the expression of PF4 and AACT from the GEO: GSE132257 dataset.
(G and H) Representative images and statistical analysis of PF4 (G) and AACT (H) IHC staining in 50 paired adjacent and CRC specimens (4003 magnification).
Scale bar: 50 mm.
et al. (2019). Exogenous CXCL4 infusion inhibits macrophage phagocy- 37. Zhang, Y., Li, Y., Qiu, F., and Qiu, Z. (2010). Comparative analysis of the
tosis by limiting CD36 signalling to enhance post-myocardial infarction human urinary proteome by 1D SDS-PAGE and chip-HPLC-MS/MS iden-
cardiac dilation and mortality. Cardiovasc. Res. 115, 395–408. tification of the AACT putative urinary biomarker. J. Chromatogr. B Anal.
31. Gong, J., Lin, Y., Zhang, H., Liu, C., Cheng, Z., Yang, X., Zhang, J., Xiao, Y., Technol. Biomed. Life Sci. 878, 3395–3401.
Sang, N., Qian, X., et al. (2020). Reprogramming of lipid metabolism in can- 38. Nie, S., Yin, H., Tan, Z., Anderson, M.A., Ruffin, M.T., Simeone, D.M., and
cer-associated fibroblasts potentiates migration of colorectal cancer cells. Lubman, D.M. (2014). Quantitative analysis of single amino acid variant
Cell Death Dis. 11, 267. peptides associated with pancreatic cancer in serum by an isobaric label-
32. Yang, P., Qin, H., Li, Y., Xiao, A., Zheng, E., Zeng, H., Su, C., Luo, X., Lu, Q., ing quantitative method. J. Proteome Res. 13, 6058–6066.
Liao, M., et al. (2022). CD36-mediated metabolic crosstalk between tumor 39. Zhu, L., Jäämaa, S., Af Hällström, T.M., Laiho, M., Sankila, A., Nordling, S.,
cells and macrophages affects liver metastasis. Nat. Commun. 13, 5782. Stenman, U.H., and Koistinen, H. (2013). PSA forms complexes with
33. Xu, S., Chaudhary, O., Rodrı́guez-Morales, P., Sun, X., Chen, D., Zappa- alpha1-antichymotrypsin in prostate. Prostate 73, 219–226.
sodi, R., Xu, Z., Pinto, A.F., Williams, A., Schulze, I., and Farsakoglu, Y. 40. Dimberg, J., Ström, K., Löfgren, S., Zar, N., Hugander, A., and Matussek,
(2021). Uptake of oxidized lipids by the scavenger receptor CD36 pro- A. (2011). Expression of the serine protease inhibitor serpinA3 in human
motes lipid peroxidation and dysfunction in CD8(+) T cells in tumors. Im- colorectal adenocarcinomas. Oncol. Lett. 2, 413–418.
munity 54, 1561–15677.
41. Cao, L.L., Pei, X.F., Qiao, X., Yu, J., Ye, H., Xi, C.L., Wang, P.Y., and Gong,
34. Gray, A.L., Karlsson, R., Roberts, A.R.E., Ridley, A.J.L., Pun, N., Khan, B.,
Z.L. (2018). SERPINA3 Silencing Inhibits the Migration, Invasion, and Liver
Lawless, C., Luı́s, R., Szpakowska, M., Chevigné, A., et al. (2023). Chemo-
Metastasis of Colon Cancer Cells. Dig. Dis. Sci. 63, 2309–2319.
kine CXCL4 interactions with extracellular matrix proteoglycans mediate
widespread immune cell recruitment independent of chemokine recep- 42. Alfadda, A.A., Benabdelkamel, H., Masood, A., Jammah, A.A., and Ekh-
tors. Cell Rep. 42, 111930. zaimy, A.A. (2018). Differences in the Plasma Proteome of Patients with
Hypothyroidism before and after Thyroid Hormone Replacement: A Prote-
35. Jin, Y., Wang, W., Wang, Q., Zhang, Y., Zahid, K.R., Raza, U., and Gong, Y.
omic Analysis. Int. J. Mol. Sci. 19, 88.
(2022). Alpha-1-antichymotrypsin as a novel biomarker for diagnosis,
prognosis, and therapy prediction in human diseases. Cancer Cell Int. 43. Santamaria, M., Pardo-Saganta, A., Alvarez-Asiain, L., Di Scala, M., Qian,
22, 156. C., Prieto, J., and Avila, M.A. (2013). Nuclear alpha1-antichymotrypsin
36. Miyauchi, E., Furuta, T., Ohtsuki, S., Tachikawa, M., Uchida, Y., Sabit, H., promotes chromatin condensation and inhibits proliferation of human he-
Obuchi, W., Baba, T., Watanabe, M., Terasaki, T., and Nakada, M. (2018). patocellular carcinoma cells. Gastroenterology 144, 818–828.
Identification of blood biomarkers in glioblastoma by SWATH mass spec- 44. Ko, E., Kim, J.S., Bae, J.W., Kim, J., Park, S.G., and Jung, G. (2019). SER-
trometry and quantitative targeted absolute proteomics. PLoS One 13, PINA3 is a key modulator of HNRNP-K transcriptional activity against
e0193799. oxidative stress in HCC. Redox Biol. 24, 101217.
STAR+METHODS
RESOURCE AVAILABILITY
Lead contact
Further information and request for resources and reagents should be directed to and will be fulfilled by the lead contact, Dr. Zhijian
Huang ([email protected]).
Materials availability
This study did not generate new unique reagents.
The discovery cohort comprised of 12 healthy controls (HC) and 25 colorectal cancer (CRC) patients before treatment. Serum sam-
ples from HC and CRC patients were collected between May 2022 and June 2022 at Sun Yat-sen University Cancer Center.
Expansion cohorts for ELISA detection, model development, and validation include three listed cohorts as follows.
d The train set composed of 96 HC, 47 patients with benign colorectal diseases (BCD), and 195 CRC patients before treatment.
Serum samples from the train set were collected between August 2020 and October 2022 at Sun Yat-sen University Cancer
Center.
d The test set consisted of 112 HC, 55 BCD patients, and 161 CRC patients. Serum samples from the test set were collected
between September 2020 and December 2022 at The Seventh Affiliated Hospital of Sun Yat-Sen University. The CRC group
included 161 cases of CRC patients before treatment and 39 cases of CRC patients after treatment.
d The external set consisted of 60 HC, 42 Enteritis, 46 Hepatitis B, and 98 CRC patients. Serum samples from the external set
were collected between October 2023 and March 2024 at Shenzhen People’s Hospital.
The enrolled HC individuals had no history of intestinal diseases, inflammatory diseases, or other diseases. The diagnosis of CRC
was confirmed through histopathological examination, and serum samples were collected at the time of diagnosis prior to tumor
resection or chemoradiotherapy, except for the 39 post-treatment CRC patients in the test set. The diagnosis of BCD was based
on standard endoscopic, histologic, and radiographic criteria. Informed consent was obtained from all participants, and the study
was approved by the Ethics Committee of The Seventh Affiliated Hospital of Sun Yat-Sen University (KY-2020-039-01), Sun Yat-sen
University Cancer Center (B2022-475-01), and Shenzhen People’s Hospital (LL-KY-2022478). The clinical and biological character-
istics of the individuals from the four cohorts were described in Table 1.
METHOD DETAILS
Serum EV identification
For western blotting, 90 mL of RIPA lysis buffer was mixed with 10 mL of EV samples and incubated on ice for 30 min. Afterward, the
mixtures were centrifuged at 12,000 g for 5 min at 4 C, and the supernatant was collected. Protein quantification was performed
using the BCA assay kit. After conducting SDS-PAGE electrophoresis, EV-derived protein samples were transferred onto a PVDF
membrane and blocked with 7% skim milk at room temperature for 1–2 h. Overnight incubation at 4 C was carried out with primary
antibodies. Subsequent incubation with secondary antibodies was performed at room temperature. The chemiluminescence signals
were captured using the ChemiDoc Touch imaging system (Bio-rad, USA).
For transmission electron microscopy (TEM), EV suspensions were fixed using 0.1% (v/v) paraformaldehyde at a 1:1 volume ratio
for 30 min. A drop of 10 mL of fixed EVs was placed on a carbon-coated copper grid for 3 min. Excess liquid was absorbed using filter
paper. Subsequently, 2% phosphotungstic acid was added to the grid for staining, and excess liquid was again absorbed using filter
paper. The copper grids were finally examined and photographed using TEM HT7800 (Hitachi, Tokyo, Japan).
For nanoparticle tracking analysis (NTA), the EV samples were diluted 1:1000 with PBS. The diluted samples were directly analyzed
using a nanoparticle tracking analyzer ZetaVIEW S/N 21–734 (Particle Metrix, Munich, Germany).
ELISA detection
ELISA kits were applied to detect the levels of PF4 (Neobioscience, EHC135.96) and AACT (FineTest, EH0570) derived from serum
EVs. A total of 10 mL of EV samples were mixed with 90 mL of RIPA lysate on ice for 60 min, and were then diluted with 200 mL of PBS.
Next, 100 mL of the diluted samples was added to a 96-well plate for ELISA detection, following the manufacturer’s protocol. Finally,
the absorbance at 450 nm was measured using synergyH1 multi-model readers (BioTek, Vermont, USA).
dynamic exclusion time of 0.4 min, ion target intensity of 10,000, ion intensity threshold of 2500, and collision-induced dissociation
energy of 20–59 eV.
DIA mass spectrometry analysis: The peptide samples were diluted to 10 ng/mL with 0.1% formic acid and supplemented with
iRT peptide mixture. 200 ng peptide sample mixed with iRT was analyzed on a Evosep One system (Evosep, Denmark) coupled
to a timsTOF Pro (Bruker, Bremen, Germany) equipped with a CaptiveSpray source. Peptides were separated on a 15 cm 3
150 mm analytical column, 1.9 mm C18 beads with a packed emitter tip (Evosep, Denmark). The column temperature was maintained
at 50 C using an integrated column oven (Bruker, Germany). The LC-separation method was provided by Evosep One at 30 samples
per day. For diaPASEF, we adapted the instrument firmware to perform data-independent isolation of multiple precursor windows
within a single TIMS separation (100 ms). We used a method with two windows in each 100 ms diaPASEF scan. 100 of these scans
covered the diagonal scan line for doubly and triply charged peptides in the m/z – ion mobility plane with narrow 25 m/z precursor
windows.
Raw data of DDA and DIA were processed and analyzed by Spectronaut (Biognosys AG, Switzerland) with default settings. Spec-
tronaut was set up to search the database assuming trypsin as the digestion enzyme. Carbamidomethyl (C) was specified as the fixed
modification. Oxidation (M) and acetyl (Protein N-term) were specified as the variable modifications. Retention time prediction type
was set to dynamic iRT. Spectronaut will determine the ideal extraction window dynamically depending on iRT calibration and
gradient stability. Q value cutoff on precursor and protein level was applied 1%.
Bioinformatics analysis
RNA-seq data and clinical data of TCGA CRC database were obtained from The Cancer Genome Atlas (TCGA) databases (https://
genome-cancer.ucsc.edu). Gene Set Enrichment Analysis (GSEA) was manipulated to predict the GO molecular function, cellular
component, biological process, and Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets of the Molecular Signature Data-
base v7.4 (http://www.broadinstitute.org/gsea/msigdb) based on PF4 or AACT high and low expressed phenotype. EnrichmentMap
plugin in Cytoscape 3.8.2 software was utilized to conduct the association of the enriched pathways. Leading edge analysis was
performed by GSEA 4.1.0 to elucidate key genes involved in the EnrichmentMap pathways network. The protein-protein interaction
(PPI) networks were constructed using the Search Tool for the Retrieval of Interacting Genes (STRING) database (https://string-
db.org/).
Immunohistochemistry staining
The sections of 50 paired CRC and adjacent tissue were collected at The Seventh Affiliated Hospital of Sun Yat-Sen University. The
tissue sections were initially deparaffinized, followed by rehydration in a graded ethanol series and pretreated with 0.01 M citrate
buffer (pH 6.0) using a high-pressure method. Subsequently, the sections were immersed in 3% H2O2 for 20 min to quench endog-
enous peroxidas, and goat serum was applied to block nonspecific background staining. Next, primary antibodies PF4 (Servicebio,
GB113482) and AACT (ZSGB-BIO, ZA0006) were applied. After an overnight incubation with the primary antibodies at 4 C, the sec-
tions were treated with HRP-conjugated secondary antibody. The antigen-antibody complex was visualized by incubation with the
DAB kit. The stained sections were captured using a slide scanner (Axio Scan. Z1, ZEISS). Protein expression levels were determined
using the staining index (SI), calculated by multiplying the score for stained cell proportions by the staining intensity score. Stained
tumor cell proportions were graded as follows: 0, <5% positive tumor cells; 1, 5%–25% positive tumor cells; 2, 26%–50% positive
tumor cells; 3, 51%–75% positive tumor cells; 4, >75% positive cells. Staining intensity was scored as follows: 0, negative staining
(no staining); 1, weak staining (light yellow); 2, moderate staining intensity (brown); 3, positive staining (yellow).
Statistical analyses were performed with the IBM SPSS Statistics 26.0 and R version 4.2.3. The data variability was presented as the
SD (mean ± SD) and analyzed via unpaired Student’s t test between two groups for normally distributed data. Otherwise, the data
were analyzed via nonparametric Mann-Whitney test. The diagnostic performance in terms of AUC, PRAUC, classification error (CE),
sensitivity, specificity, precision, recall, accuracy, and F1 score was calculated by using mlr3 R package. p < 0.05 was defined sta-
tistical significance.