1 s2.0 S2405844024056366 Main
1 s2.0 S2405844024056366 Main
1 s2.0 S2405844024056366 Main
The predictive value of serum tumor markers for EGFR mutation in non-small cell
lung cancer patients with non-stage IA
Wenxing Du, Tong Qiu, Hanqun Liu, Ao Liu, Zhe Wu, Xiao Sun, Yi Qin, Wenhao Su,
Zhangfeng Huang, Tianxiang Yun, Wenjie Jiao
PII: S2405-8440(24)05636-6
DOI: https://doi.org/10.1016/j.heliyon.2024.e29605
Reference: HLY 29605
Please cite this article as: The predictive value of serum tumor markers for EGFR mutation in non-small
cell lung cancer patients with non-stage IA, HELIYON, https://doi.org/10.1016/j.heliyon.2024.e29605.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
Wenxing Du a, Tong Qiu a, Hanqun Liu a, Ao Liu a, Zhe Wu a, Xiao Sun a, Yi Qin a,
Wenhao Su a, Zhangfeng Huang a, Tianxiang Yun b, Wenjie Jiao a, *
f
oo
a
Department of Thoracic Surgery, Affiliated Hospital of Qingdao University, Qingdao, China;
b
Department of Thoracic Surgery, the Second Affiliated Hospital, Shandong First Medical
r
University, Taian, China.
-p
re
* Corresponding author:
lP
University, NO.16 Jiangsu road, Qingdao, Shandong Province, 266071, China. E-mail
address: [email protected].
ur
Jo
Highlights:
1. None of the STMs was significant predictor for predicting EGFR mutation in stage IA.
2. CYFRA21-1 and CEA were crucial factors for EGFR mutation in non-stage IA.
3. The nomogram and 8 machine learning models could effectively predict EGFR mutations.
Abstract:
Objective: The predictive value of serum tumor markers (STMs) in assessing
epidermal growth factor receptor (EGFR) mutations among patients with non-small
cell lung cancer (NSCLC), particularly those with non-stage IA, remains poorly
understood. The objective of this study is to construct a predictive model comprising
STMs and additional clinical characteristics, aiming to achieve precise prediction of
EGFR mutations through noninvasive means.
Materials and methods: We retrospectively collected 6711 NSCLC patients who
underwent EGFR gene testing. Ultimately, 3221 stage IA patients and 1442 non-stage
f
oo
IA patients were analyzed to evaluate the potential predictive value of several clinical
characteristics and STMs for EGFR mutations.
r
-p
Results: EGFR mutations were detected in 3866 patients (57.9%) of all NSCLC
patients. None of the STMs emerged as significant predictor for predicting EGFR
re
mutations in stage IA patients. Patients with non-stage IA were divided into the study
lP
group (n = 1043) and validation group (n = 399). In the study group, univariate
na
analysis revealed significant associations between EGFR mutations and the STMs
(carcinoembryonic antigen (CEA), squamous cell carcinoma antigen (SCC), and
ur
21-1, pathology, gender, and smoking history for predicting EGFR mutations with
non-stage IA was constructed using the results of multivariate analysis. The area
under the curve (AUC = 0.780) and decision curve analysis demonstrated favorable
predictive performance and clinical utility of nomogram. Additionally, the Random
Forest model also demonstrated the highest average C-index of 0.793 among the eight
machine learning algorithms, showcasing superior predictive efficiency.
Conclusion: CYFRA21-1 and CEA have been identified as crucial factors for
predicting EGFR mutations in non-stage IA NSCLC patients. The nomogram and 8
machine learning models that combined STMs with other clinical factors could
effectively predict the probability of EGFR mutations.
Keywords: Lung cancer; Epidermal growth factor receptor; Serum tumor markers;
Nomogram model; Machine learning.
1. Introduction
Lung cancer is a prevalent malignant tumor and a leading cause of cancer-related
mortality worldwide[1, 2]. Non-small cell lung cancer (NSCLC) accounts for
approximately 85% of all lung cancer cases[3], with epidermal growth factor receptor
(EGFR) mutations being the most common driver mutation in NSCLC[4]. These
mutations occur in about 50% of NSCLC patients in the Asia-Pacific region and 15%
of patients in Western countries[5, 6]. Multiple clinical studies have unequivocally
demonstrated that advanced NSCLC patients with EGFR mutations are more sensitive
to treatment with EGFR tyrosine kinase inhibitors (EGFR-TKIs) as compared to
f
oo
traditional chemotherapy, especially those with advanced lung adenocarcinoma
(ADC) [7-10]. EGFR-TKI monotherapy shows a higher overall response rate, longer
r
-p
median progression-free survival, and median overall survival. Furthermore, the
incidence of treatment-related adverse events is significantly lower than that of
re
chemotherapy[11].
lP
However, due to the low detection rate of EGFR mutations in the real-world
na
setting[12, 13], many lung cancer patients are unable to benefit from this treatment
approach, resulting in limited improvement in survival and quality of life. A
ur
mutation testing in routine care revealed that less than one-third of over 50,000
patients from 18 eligible studies were tested for EGFR mutations[14]. Despite
advances in genetic mutation testing methods, the main reasons for the lower-than-
expected EGFR mutation detection rate are the lack of tumor tissue and the high cost
of EGFR testing[12, 15]. Therefore, there is an urgent need to develop a simple and
non-invasive testing method to predict the EGFR mutation status and improve the
detection rate of EGFR mutations.
Previous research has shown that serum tumor markers (STMs) can aid in the
diagnosis of suspected clinical cancer and cancers of unknown primary origin. They
may also play a significant role in cancer prognosis, treatment, and subsequent
monitoring. The repertoire of currently employed biomarkers for primary lung cancer
encompasses carcinoembryonic antigen (CEA), neuron-specific enolase (NSE),
soluble fragment of cytokeratin 19 (CYFRA21-1), progastrin-releasing peptide
(proGRP), squamous cell carcinoma antigen (SCC), and carbohydrate antigen 125
(CA125)[16-18]. Although the singular utility of individual tumor markers in terms of
specificity and sensitivity remains somewhat limited, their combined application has
emerged as a strategy to bolster diagnostic accuracy. Moreover, the association
between lung cancer biomarkers and clinical staging is noteworthy, as lower levels or
positive rates of SCC, CEA, NSE and CYFRA21-1 have been observed in patients
diagnosed with early-stage NSCLC[19]. Previous studies have revealed the value of
different serum markers in predicting the EGFR mutation status in NSCLC
f
oo
patients[20-25]. However, these research outcomes have exhibited inconsistencies and
the impact of tumor staging has not been adequately addressed. The development of
r
-p
predictive models through traditional regression analysis or machine learning methods
has enabled the integration of a multitude of parameters to provide individualized
re
diagnostic predictions[26]. Limited reports exist regarding the prediction EGFR
lP
f
oo
of the study.
2.2. Histopathology examination
r
-p
Hematoxylin and eosin (HE)-stained tumor slides derived from formalin-fixed
paraffin-embedded tissues of tumor specimens were subjected to microscopic
re
examination and assessment by two pathologists. Any discrepancies were resolved
lP
The tumor stage was determined using the tumor-node-metastasis (TNM) staging
system based on the 8th edition of the International Union against Cancer staging
ur
system[27]. The histological subtype was evaluated according to the 2015 World
Jo
f
oo
refractory mutation system (ARMS) of the Human EGFR Mutation Detection Kit was
used to determine the EGFR mutation status. If any exon mutation was detected, the
r
-p
tumor was identified as "EGFR mutation"; otherwise, the tumor was identified as
"EGFR wild-type".
re
2.5. Statistical analysis
lP
using the chi-square test and Fisher's exact test. Factors that showed statistical
Jo
significance in the univariate analysis were further analyzed using multiple logistic
regression analysis. The effect measure of each variable on EGFR mutations was
presented as odds ratios (OR) and corresponding 95% confidence intervals (CI).
Subsequently, the nomogram prediction model was developed utilizing the results of
the multivariable analysis. The area under the curve (AUC) was calculated to assess
the predictive performance of the model. The comparison of ROC curves was
performed using the DeLong test. The clinical utility of the model was evaluated
using Decision Curve Analysis[29]. Internal and external validation of the model was
conducted through measures such as the concordance index (C-index), calibration
curve, and Hosmer-Lemeshow test. Bootstrap resampling (1000 iterations) was
employed to generate the calibration curve. In order to enhance the accuracy of
predicting EGFR mutations, we employed 8 machine learning algorithms, including
Random Forest (RF), Gradient Boosting Machine (GBM), Neural Network (NNET),
Support Vector Machines (SVM), Lasso Regression algorithm (LASSO), Generalized
Linear Model (GLM), K-Nearest Neighbor (KNN), and Logistic Regression (LR). For
each model, C-index was computed separately for the training and test cohorts, and
the model with the highest average C-index was deemed optimal. All p values were
two-sided, and a p value less than 0.05 was considered to be statistically significant.
Statistical analyses were performed with IBM SPSS Statistics version 25.0 (IBM
Corp. New York, USA) and R (version 4.2.2, R Development Core Team), including
the “pROC”, “regplot”, “rms” and “ResourceSelection” packages.
f
oo
3. Results
3.1. Patient clinical characteristics
r
-p
Among a total of 6,711 NSCLC patients who underwent EGFR gene testing,
EGFR mutations were detected in 3,866 patients (57.9%) (Fig. 1). Among patients
re
with EGFR mutations, common mutations were observed in 3,370 cases (87.1%), rare
lP
mutations in 330 cases (8.6%), and complex mutations in 166 cases (4.3%). The
na
rare mutation was exon 20 insertion mutations. Among complex mutations, those with
Jo
co-occurring exon 20 T790M and exon 18 G719X mutations were the most prevalent,
representing 44.6% and 44% of complex mutations. The specific numbers and
frequencies of each EGFR mutation subtype are presented in Fig. 2.
A total of 3,221 NSCLC patients with stage IA were included in the analysis
(Table 1). It is evident that patients with stage IA exhibited a relatively lower
positivity rate of serum tumor markers, with CYFRA 21-1 having the highest positive
rate at merely 20.9%. EGFR mutations were more frequently detected in females
(80% vs. 57.6%, p < 0.001), non-smokers (77.5% vs. 53.6%, p < 0.001), ADC (73.8%
vs. 10%, p < 0.001), negative CEA (72.6% vs. 66.9%, p = 0.021), and negative SCC
(72.5% vs. 63.6%, p = 0.011). In addition, out of 1442 patients with non-stage IA,
1043 patients were included in the study group. The OCF values for CEA and
CYFRA 21-1 in our research exceeded the assay kit's positive thresholds. Therefore,
for non-stage IA patients, CEA positivity was defined as CEA levels above 11.38
ng/mL, and CYFRA 21-1 positivity was defined as CYFRA 21-1 levels above 4.18
ng/mL when analyzing their relationship with EGFR mutations. Clinical
characteristics of NSCLC patients, stratified by the EGFR mutation status in the study
and validation cohorts, are summarized in Table 2. EGFR mutations were observed in
576 patients (55.2%) in the study cohort and 206 patients (51.6%) in the validation
cohort. In the study cohort, EGFR mutations were more commonly found in females
(77.7% vs. 37.3%, p < 0.001), non-smokers (69.7% vs. 34.1%, p < 0.001), ADC
(62.2% vs. 11.7%, p < 0.001), positive CEA (65% vs. 50.2%, p = 0.01), negative
f
oo
CYFRA21-1 (62.5% vs. 44.9%, p < 0.001) and negative SCC (59.2% vs. 28.4%, p <
0.001). Furthermore, the validation group demonstrated clinical characteristics
r
-p
associated with EGFR mutations that were largely similar to those observed in the
study group.
re
3.2. Exploration of risk factors for EGFR mutation
lP
The results of both univariate and multivariate logistic regression analyses for
na
predicting EGFR mutations in in NSCLC patients with stage IA are presented in Table
3. In the multivariate analysis incorporating gender, smoking history, pathology, CEA,
ur
and SCC, female (OR, 2.001; p < 0.001), non-smoking (OR, 1.558; p < 0.001), and
Jo
ADC (OR, 15.433; p < 0.001) were identified as independent risk factors for
predicting EGFR mutations, while none of the STMs emerged as significant predictor.
However, in the study cohort (Table 4), incorporating significant factors identified in
the univariate analysis, including gender, smoking history, pathology, and STMs
(CEA, SCC, and CYFRA 21-1), the multivariate analysis revealed that being female
(OR, 3.318; p < 0.001), non-smoking (OR, 1.770; p = 0.002), ADC (OR, 6.767; p <
0.001), positive CEA (OR, 1.709; p = 0.001), and negative CYFRA 21-1 (OR, 0.541;
p < 0.001) were independent risk factors for EGFR mutation, ultimately incorporated
into the nomogram predictive model (Fig. 3A). Additionally, the predictive efficacy
and clinical utility of these factors in predicting EGFR mutations were evaluated
through ROC curve and decision curve analyses. The results revealed that the
predictive model exhibited higher predictive efficacy for the EGFR mutations
compared to individual factors, with AUC values of 0.780 (Fig. 3B). Moreover,
decision curve analysis demonstrated that the net benefit of predicting EGFR
mutations with the model surpassed that of individual factors (Fig. 4C). The
probability threshold ranged of the model was 0-83%, indicating a wider range and
better clinical utility. Furthermore, for common EGFR mutations subtypes, CEA
positivity and CYFRA 21-1 negativity were independent risk factors for both exon 19
deletion and exon 21 L858R mutations, nevertheless, negative SCC (OR, 0.446; p =
0.044) was an independent predictor of the exon 19 deletion mutation, while it did not
predict the L858R mutation (Table 5).
f
oo
3.3. Nomograms of the predictive model in the study cohort
Nomograms of the model was established based on the results of a multivariate
r
-p
analysis for predicting EGFR mutations (Fig. 3A). The probability of EGFR mutation
can be assessed by assigning “Points” to each variable and summing them to obtain
re
the total points. This total is then plotted on the “Total Points” axis, and a vertical line
lP
is drawn from the total points axis to the “Pr (EGFR)” axis. For instance, the
na
probability of EGFR mutation was predicted in a female patient with ADC, positive
CYFRA 21-1, positive CEA and non-smoking. The prediction scores were as follows:
ur
ADC scored 100, positive CYFRA 21-1 scored 18, positive CEA scored 55, non-
Jo
smoking scored 38, and female scored 38. Upon summing these scores, the total
reached 249, indicating the probability about 0.8 (80%) for the presence of the EGFR
mutation.
3.4. Performances of discrimination and calibration
The study and validation cohorts were utilized for the internal and external
evaluation of model performance. The nomogram model achieved AUCs of 0.780 and
0.774 (P = 0.815) in the research and validation groups, respectively (Fig. 3D),
indicating its favorable discrimination of EGFR mutations. The newly developed
nomogram model was validated through internal (Fig. 3E) and external validation
(Fig. 3F) using the bootstrap method with 1000-bootstrap repetitions, and the
resulting calibration curves demonstrated strong consistency between the predicted
values and the actual values. Furthermore, the calibrated C-index of 0.777 in the study
cohort and 0.766 in the validation cohort, similar to the uncalibrated C-index (Table
6), indicated excellent predictive accuracy of the proposed nomogram model.
Moreover, the Hosmer-Lemeshow goodness-of-fit test yielded non-significant results
in both the study and validation groups, indicating no significant discrepancies
between the predicted values and the actual values.
3.5 Prediction of EGFR mutations using 8 machine learning algorithms
The prediction of EGFR mutations in non-IA stage NSCLC patients was
conducted using 8 machine learning algorithms, and C-index values were calculated
for each model across the entire dataset. Results indicated that, irrespective of the
f
oo
training (Fig. 4A) or validation cohort (Fig. 4B), the RF model consistently exhibited
the highest C-index values at 0.838 and 0.749, respectively. The RF model also
r
-p
demonstrated the highest average C-index of 0.793 among the eight machine learning
algorithms, showcasing superior predictive efficiency (Fig. 4C). When compared to
re
the nomogram model, the RF model exhibited even better predictive performance,
lP
with C-index values of 0.838 and 0.780 in the training cohort. Furthermore, the LR
na
model, with the lowest average C-index among the machine learning models, still
surpassed 0.74. This suggests that the 8 machine learning models, leveraging serum
ur
tumor markers and other clinical features, exhibit robust predictive efficacy for
Jo
f
oo
inconsistencies in the research findings. Arthur et al.[21] found no statistically
significant differences in CEA, CYFRA21-1, or SCC levels between EGFR mutant
r
-p
and wild-type patients. Conversely, Jin et al.[20] reported an increase in the EGFR
mutation rate with elevated CEA levels in non-smoking lung cancer patients. Wang et
re
al.[24], in a study including 1089 patients, demonstrated an association between
lP
NSCLC patients. We thought that the discrepancies in these results could be attributed
to the small sample sizes and the lack of consideration for the impact of tumor staging
ur
on STMs. Jiang et al. indicated an association between lung cancer biomarkers and
Jo
tumor staging, with lower levels or positive rates of tumor markers observed in early-
stage NSCLC patients[19]. Based on these considerations, our study collected a large
amount of patient data and performed analyses based on whether the tumor staging
was stage IA. The results revealed that the highest positive rate among the six STMs
in stage IA NSCLC patients was only 20.9%, most of which were below 10%. In
NSCLC patients with stage IA, multivariate analysis results showed that ADC,
females, and non-smokers had a higher EGFR mutation rate, which is consistent with
the non-stage IA patients in our study and the majority of research conclusions[31-
33]. None of the six STMs was independent risk factor for predicting EGFR
mutations, likely due to their low clinical value in the early stages of lung cancer.
However, in patients beyond the stage IA, multivariate analysis results indicated that
ADC, females, non-smokers, positive CEA and negative CYFRA21-1 were
independent risk factors for predicting EGFR mutations. Subsequently, the predictive
model was constructed using multivariate analysis results. ROC curves and decision
curve analysis demonstrated that the model exhibited good predictive efficacy (AUC
= 0.78) and clinical utility for predicting EGFR mutations. Furthermore, the
predictive efficacy and clinical utility of the model were significantly superior to that
of individual clinical features. Additionally, CEA positivity and CYFRA 21-1
negativity were independent risk factors for both exon 19 deletion and exon 21 L858R
mutations. Considering that common EGFR mutations account for over 85% of EGFR
mutations in clinical practice, it is understandable that the independent risk factors for
f
oo
predicting the exon 19 deletion mutation and the exon 21 L858R mutation are
essentially the same as those for predicting EGFR mutations. Nevertheless, negative
r
-p
SCC was an independent predictor of the exon 19 deletion mutation, while it did not
predict the L858R mutation, which is similar to the findings of Wang et al[24]. In a
re
nutshell, STMs can predict the EGFR mutations in non-stage IA NSCLC patients, and
lP
the combination of STMs with other clinical factors can enhance the predictive
na
STMs with other clinical features to provide personalized risk assessment of EGFR
Jo
mutations for non-IA NSCLC patients who were unable to undergo genetic testing.
The nomogram model incorporating CEA, CYFRA21-1, pathology, gender, and
smoking history for predicting EGFR mutations was constructed using the results of
multivariate analysis. CYFRA 21-1 served as a tumor marker that exhibited enhanced
sensitivity for NSCLC, particularly in squamous cell carcinoma[34]. CEA exhibits
relatively high sensitivity in lung cancer, with the highest serum concentrations
observed in ADC and large cell carcinoma[35]. EGFR mutations predominantly occur
in patients with ADC, and approximately 40% of lung ADC patients demonstrate
elevated levels of CEA. Conversely, positive CYFRA 21-1 results are frequently
associated with the presence of squamous cell carcinoma. This observation offers a
potential explanation for the independent risk factors of positive CEA and negative
CYFRA 21-1 in predicting EGFR mutations. In both internal and external validations,
the calibration curves of the nomogram model clearly demonstrated a high degree of
consistency between the predictions and observations. Furthermore, the calibrated C-
index of 0.777 in the study cohort and 0.766 in the validation cohort, similar to the
uncalibrated C-index, indicated excellent predictive accuracy of the proposed
nomogram model. The results of the Hosmer-Lemeshow goodness-of-fit test were
also non-significant (P > 0.05), indicating no significant differences between the
predicted and actual values. Therefore, the use of nomogram is recommended. In
other words, nomogram is almost accurate in predicting the probability of EGFR
mutations in non-stage IA NSCLC patients. Furthermore, in our study, regardless of
f
oo
tumor stage, NSE, CA125, and proGRP showed no significant clinical significance in
predicting EGFR mutations.
r
-p
Random Forest, as a component of machine learning algorithms, has been applied
in clinical outcome predictions[36]. RF constructs numerous decision trees through
re
log-rank tests to identify different states and generates individual probabilities based
lP
f
oo
valuable insights for personalized treatment.
r
Abbreviations: -p
STMs: Serum tumor markers; EGFR: Epidermal growth factor receptor; NSCLC:
re
Non-small cell lung cancer; CEA: Carcinoembryonic antigen; SCC: Squamous cell
lP
Author Contributions:
Wenxing Du: Conceptualization, Formal analysis, Software, Visualization, Writing –
original draft, Writing – review & editing. Qiu Tong, Liu Hanqun: Conceptualization,
investigation, methodology, writing – original draft. Zhe Wu, Wenhao Su, Zhangfeng
Huang, and Tianxiang Yun: Conceptualization, investigation, writing – original draft.
Ao Liu, Xiao Sun, and Yi Qin: Conceptualization, methodology, writing – original
draft. Wenjie Jiao: Supervision, Project administration, conceptualization, writing –
Review & Editing.
Data Availability Statement: The data that support the findings of this study are
f
oo
available on request from the corresponding author. The data are not publicly
available due to privacy or ethical restrictions.
r
-p
Declaration of Interest Statement: None.
re
lP
This study was approved by the Institutional Review Board of the Affiliated Hospital
of Qingdao University (QYFY WZLL 27853) and waived the need for informed
ur
References
[1] R.L. Siegel, K.D. Miller, N.S. Wagle, A. Jemal, Cancer statistics, 2023, CA Cancer J Clin 73(1)
(2023) 17-48.
[2] H. Sung, J. Ferlay, R.L. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal, F. Bray, Global
Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36
Cancers in 185 Countries, CA Cancer J Clin 71(3) (2021) 209-249.
[3] R.S. Herbst, J.V. Heymach, S.M. Lippman, Lung cancer, N Engl J Med 359(13) (2008) 1367-80.
[4] J.A. Marin-Acevedo, B. Pellini, E.O. Kimbrough, J.K. Hicks, A. Chiappori, Treatment Strategies
for Non-Small Cell Lung Cancer with Common EGFR Mutations: A Review of the History of EGFR
TKIs Approval and Emerging Data, Cancers (Basel) 15(3) (2023).
[5] Y. Shi, J.S. Au, S. Thongprasert, S. Srinivasan, C.M. Tsai, M.T. Khoa, K. Heeroma, Y. Itoh, G.
Cornelio, P.C. Yang, A prospective, molecular epidemiology study of EGFR mutations in Asian
patients with advanced non-small-cell lung cancer of adenocarcinoma histology (PIONEER), J
Thorac Oncol 9(2) (2014) 154-62.
[6] A. Passaro, A. Prelaj, L. Bonanno, M. Tiseo, A. Tuzi, C. Proto, R. Chiari, D. Rocco, C. Genova, C.
Sini, D. Cortinovis, S. Pilotto, L. Landi, C. Bennati, A. Camerini, L. Toschi, C. Putzu, G. Cerea, G.
Spitaleri, F. Cappuzzo, F. de Marinis, Activity of EGFR TKIs in Caucasian Patients With NSCLC
Harboring Potentially Sensitive Uncommon EGFR Mutations, Clin Lung Cancer 20(2) (2019)
e186-e194.
[7] Y.L. Wu, C.R. Xu, C.P. Hu, J. Feng, S. Lu, Y. Huang, W. Li, M. Hou, J.H. Shi, A. Marten, J. Fan, B.
Peil, C. Zhou, Afatinib versus gemcitabine/cisplatin for first-line treatment of Chinese patients
with advanced non-small-cell lung cancer harboring EGFR mutations: subgroup analysis of the
LUX-Lung 6 trial, Onco Targets Ther 11 (2018) 8575-8587.
[8] C. Zhou, Y.L. Wu, G. Chen, J. Feng, X.Q. Liu, C. Wang, S. Zhang, J. Wang, S. Zhou, S. Ren, S. Lu,
L. Zhang, C. Hu, C. Hu, Y. Luo, L. Chen, M. Ye, J. Huang, X. Zhi, Y. Zhang, Q. Xiu, J. Ma, L. Zhang,
C. You, Final overall survival results from a randomised, phase III study of erlotinib versus
chemotherapy as first-line treatment of EGFR mutation-positive advanced non-small-cell lung
cancer (OPTIMAL, CTONG-0802), Ann Oncol 26(9) (2015) 1877-1883.
f
[9] J.C. Soria, Y. Ohe, J. Vansteenkiste, T. Reungwetwattana, B. Chewaskulyong, K.H. Lee, A.
oo
Dechaphunkul, F. Imamura, N. Nogami, T. Kurata, I. Okamoto, C. Zhou, B.C. Cho, Y. Cheng, E.K.
Cho, P.J. Voon, D. Planchard, W.C. Su, J.E. Gray, S.M. Lee, R. Hodge, M. Marotti, Y. Rukazenkov,
r
S.S. Ramalingam, F. Investigators, Osimertinib in Untreated EGFR-Mutated Advanced Non-
-p
Small-Cell Lung Cancer, N Engl J Med 378(2) (2018) 113-125.
[10] T.S. Mok, Y. Cheng, X. Zhou, K.H. Lee, K. Nakagawa, S. Niho, M. Lee, R. Linke, R. Rosell, J.
re
Corral, M.R. Migliorino, A. Pluzanski, E.I. Sbar, T. Wang, J.L. White, Y.L. Wu, Improvement in
Overall Survival in a Randomized Study That Compared Dacomitinib With Gefitinib in Patients
lP
With Advanced Non-Small-Cell Lung Cancer and EGFR-Activating Mutations, J Clin Oncol 36(22)
(2018) 2244-2250.
na
[11] T.S. Mok, Y.L. Wu, M.J. Ahn, M.C. Garassino, H.R. Kim, S.S. Ramalingam, F.A. Shepherd, Y. He,
H. Akamatsu, W.S. Theelen, C.K. Lee, M. Sebastian, A. Templeton, H. Mann, M. Marotti, S.
Ghiorghiu, V.A. Papadimitrakopoulou, A. Investigators, Osimertinib or Platinum-Pemetrexed in
ur
[12] Y. Cheng, Y. Wang, J. Zhao, Y. Liu, H. Gao, K. Ma, S. Zhang, H. Xin, J. Liu, C. Han, Z. Zhu, Y.
Wang, J. Chen, F. Wen, J. Li, J. Zhang, Z. Zheng, Z. Dai, H. Piao, X. Li, Y. Li, M. Zhong, R. Ma, Y.
Zhuang, Y. Xu, Z. Qu, H. Yang, C. Pan, F. Yang, D. Zhang, B. Li, Real-world EGFR testing in
patients with stage IIIB/IV non-small-cell lung cancer in North China: A multicenter, non-
interventional study, Thorac Cancer 9(11) (2018) 1461-1469.
[13] P.S. Aye, S. Tin Tin, M.J. McKeage, P. Khwaounjoo, A. Cavadino, J.M. Elwood, Development
and validation of a predictive model for estimating EGFR mutation probabilities in patients with
non-squamous non-small cell lung cancer in New Zealand, BMC Cancer 20(1) (2020) 658.
[14] A.M. Thi, S. Tin Tin, M. McKeage, J.M. Elwood, Utilisation and Determinants of Epidermal
Growth Factor Receptor Mutation Testing in Patients with Non-small Cell Lung Cancer in Routine
Clinical Practice: A Global Systematic Review, Target Oncol 15(3) (2020) 279-299.
[15] Z. Lv, J. Fan, J. Xu, F. Wu, Q. Huang, M. Guo, T. Liao, S. Liu, X. Lan, S. Liao, W. Geng, Y. Jin,
Value of (18)F-FDG PET/CT for predicting EGFR mutations and positive ALK expression in
patients with non-small cell lung cancer: a retrospective analysis of 849 Chinese patients, Eur J
Nucl Med Mol Imaging 45(5) (2018) 735-750.
[16] N. Vinolas, R. Molina, R. Fuentes, I. Bover, J. Rifa, V. Moreno, E. Canals, A. Marquez, E.
Barreiro, J. Borras, X. Filella, J. Jo, X. Navarro, P. Viladiu, A.M. Ballesta, Tumor markers (CEA, CA
125, CYFRA 21.1, SCC and NSE) in non small cell lung cancer (NSCLC) patients as an aid in
histological diagnosis and prognosis: Comparison with the main clinical and pathological
prognostic factors, Lung Cancer 29(1, Supplement 1) (2000) 195.
[17] W. Qi, X. Li, J. Kang, Advances in the study of serum tumor markers of lung cancer, J Cancer
Res Ther 10 Suppl (2014) C95-C101.
[18] S. Cedres, I. Nunez, M. Longo, P. Martinez, E. Checa, D. Torrejon, E. Felip, Serum tumor
markers CEA, CYFRA21-1, and CA-125 are associated with worse prognosis in advanced non-
small-cell lung cancer (NSCLC), Clin Lung Cancer 12(3) (2011) 172-9.
[19] C. Jiang, M. Zhao, S. Hou, X. Hu, J. Huang, H. Wang, C. Ren, X. Pan, T. Zhang, S. Wu, S.
Zhang, B. Sun, The Indicative Value of Serum Tumor Markers for Metastasis and Stage of Non-
Small Cell Lung Cancer, Cancers (Basel) 14(20) (2022).
[20] B. Jin, Y. Dong, H.M. Wang, J.S. Huang, B.H. Han, Correlation between serum CEA levels and
EGFR mutations in Chinese nonsmokers with lung adenocarcinoma, Acta Pharmacol Sin 35(3)
f
(2014) 373-80.
oo
[21] A. Cho, J. Hur, Y.W. Moon, S.R. Hong, Y.J. Suh, Y.J. Kim, D.J. Im, Y.J. Hong, H.J. Lee, Y.J. Kim,
H.S. Shim, J.S. Lee, J.H. Kim, B.W. Choi, Correlation between EGFR gene mutation, cytologic
r
tumor markers, 18F-FDG uptake in non-small cell lung cancer, BMC Cancer 16 (2016) 224.
-p
[22] M. Jiang, P. Chen, X. Guo, X. Zhang, Q. Gao, J. Zhang, G. Zhao, J. Zheng, Identification of
EGFR mutation status in male patients with non-small-cell lung cancer: role of (18)F-FDG PET/CT
re
and serum tumor markers CYFRA21-1 and SCC-Ag, EJNMMI Res 13(1) (2023) 27.
[23] H. Zhang, M. He, R. Wan, L. Zhu, X. Chu, Establishment and Evaluation of EGFR Mutation
lP
Prediction Model Based on Tumor Markers and CT Features in NSCLC, J Healthc Eng 2022 (2022)
8089750.
na
[24] S. Wang, P. Ma, G. Ma, Z. Lv, F. Wu, M. Guo, Y. Li, Q. Tan, S. Song, E. Zhou, W. Geng, Y.
Duan, Y. Li, Y. Jin, Value of serum tumor markers for predicting EGFR mutations and positive ALK
expression in 1089 Chinese non-small-cell lung cancer patients: A retrospective analysis, Eur J
ur
[25] X. Tan, Y. Li, S. Wang, H. Xia, R. Meng, J. Xu, Y. Duan, Y. Li, G. Yang, Y. Ma, Y. Jin, Predicting
EGFR mutation, ALK rearrangement, and uncommon EGFR mutation in NSCLC patients by
driverless artificial intelligence: a cohort study, Respir Res 23(1) (2022) 132.
[26] A. Rajkomar, J. Dean, I. Kohane, Machine Learning in Medicine, N Engl J Med 380(14) (2019)
1347-1358.
[27] P. Goldstraw, K. Chansky, J. Crowley, R. Rami-Porta, H. Asamura, W.E. Eberhardt, A.G.
Nicholson, P. Groome, A. Mitchell, V. Bolejack, S. International Association for the Study of Lung
Cancer, A.B. Prognostic Factors Committee, I. Participating, S. International Association for the
Study of Lung Cancer, B. Prognostic Factors Committee Advisory, I. Participating, The IASLC Lung
Cancer Staging Project: Proposals for Revision of the TNM Stage Groupings in the Forthcoming
(Eighth) Edition of the TNM Classification for Lung Cancer, J Thorac Oncol 11(1) (2016) 39-51.
[28] W.D. Travis, E. Brambilla, A.P. Burke, A. Marx, A.G. Nicholson, Introduction to The 2015
World Health Organization Classification of Tumors of the Lung, Pleura, Thymus, and Heart, J
Thorac Oncol 10(9) (2015) 1240-1242.
[29] A.J. Vickers, E.B. Elkin, Decision curve analysis: a novel method for evaluating prediction
models, Med Decis Making 26(6) (2006) 565-74.
[30] A. Russo, T. Franchina, G. Ricciardi, A. Battaglia, M. Picciotto, V. Adamo, Heterogeneous
Responses to Epidermal Growth Factor Receptor (EGFR) Tyrosine Kinase Inhibitors (TKIs) in
Patients with Uncommon EGFR Mutations: New Insights and Future Perspectives in this Complex
Clinical Scenario, Int J Mol Sci 20(6) (2019).
[31] S.P. D'Angelo, M.C. Pietanza, M.L. Johnson, G.J. Riely, V.A. Miller, C.S. Sima, M.F. Zakowski,
V.W. Rusch, M. Ladanyi, M.G. Kris, Incidence of EGFR exon 19 deletions and L858R in tumor
specimens from men and cigarette smokers with lung adenocarcinomas, J Clin Oncol 29(15)
(2011) 2066-70.
[32] R. Rosell, T. Moran, C. Queralt, R. Porta, F. Cardenal, C. Camps, M. Majem, G. Lopez-Vivanco,
D. Isla, M. Provencio, A. Insa, B. Massuti, J.L. Gonzalez-Larriba, L. Paz-Ares, I. Bover, R. Garcia-
Campelo, M.A. Moreno, S. Catot, C. Rolfo, N. Reguart, R. Palmero, J.M. Sanchez, R. Bastus, C.
Mayo, J. Bertran-Alamillo, M.A. Molina, J.J. Sanchez, M. Taron, G. Spanish Lung Cancer,
Screening for epidermal growth factor receptor mutations in lung cancer, N Engl J Med 361(10)
(2009) 958-67.
f
[33] M. Fukuoka, Y.L. Wu, S. Thongprasert, P. Sunpaweravong, S.S. Leong, V. Sriuranpong, T.Y.
oo
Chao, K. Nakagawa, D.T. Chu, N. Saijo, E.L. Duffield, Y. Rukazenkov, G. Speake, H. Jiang, A.A.
Armour, K.F. To, J.C. Yang, T.S. Mok, Biomarker analyses and final overall survival results from a
r
phase III, randomized, open-label, first-line study of gefitinib versus carboplatin/paclitaxel in
-p
clinically selected patients with advanced non-small-cell lung cancer in Asia (IPASS), J Clin Oncol
29(21) (2011) 2866-74.
re
[34] A. Jafari-Kashi, H.A. Rafiee-Pour, M. Shabani-Nooshabadi, A new strategy to design label-
free electrochemical biosensor for ultrasensitive diagnosis of CYFRA 21-1 as a biomarker for
lP
of non-small cell lung cancer with brain metastasis and the role of risk score as a survival
predictor, Eur J Cardiothorac Surg 26(3) (2004) 488-93.
[36] D. Tian, H.J. Yan, H. Huang, Y.J. Zuo, M.Z. Liu, J. Zhao, B. Wu, L.Z. Shi, J.Y. Chen, Machine
ur
Learning-Based Prognostic Model for Patients After Lung Transplantation, JAMA Netw Open
Jo
f
r oo
-p
re
lP
na
ur
Jo
r oo
-p
re
lP
na
ur
Jo
Fig. 3. Construction and validation of the nomogram predictive model. (A) The Nomogram
model for predicting EGFR mutations in the study cohort. (B) ROC curves for the nomogram
model in differentiating EGFR mutation status; (C) DCA curves to evaluate the clinical utility
of the nomogram model for predicting EGFR mutations. (D) ROC curves for the
discrimination of the nomogram; (E) The calibration plot in the study cohort; (F) The
calibration plot in the validation cohort. Pr (EGFR): Probability of EGFR Mutation; ADC,
adenocarcinoma; **means p<0.01, ***means p<0.001, ROC, receiver operating
characteristic; DCA, decision curve analysis; AUC, area under the curve.
Fig. 4. ROC curves for 8 machine learning models in predicting EGFR mutations. (A) ROC
curves in the study cohort; (B) ROC curves in the validation cohort; (C) A total of 8 kinds of
prediction models and further calculated the C-index of each model. ROC, receiver operating
characteristic; AUC, area under the curve; RF, Random Forest; GBM, Gradient Boosting
Machine; NNET, Neural Network; SVM, Support Vector Machines; LASSO, Lasso
f
Regression algorithm; GLM, Generalized Linear Model; KNN, K-Nearest Neighbor; LR,
oo
Logistic Regression.
r
-p
re
lP
na
ur
Jo
Tables
Table 1
Clinical characteristics according to EGFR mutation in NSCLC patients with stage IA.
All Patients EGFR Wild-type EGFR Mutation
P value
(n = 3221) (n = 902) (n = 2319)
Gender <0.001
Female 2069(64.2) 413(20) 1656(80)
Male 1152(35.8) 489(42.4) 663(57.6)
Age, year 0.159
Median (IQR) 60(54-66) 60(53-65) 60(54-66)
Smoking history <0.001
f
Never 2476(76.9) 556(22.5) 1920(77.5)
oo
Former/current 745(23.1) 346(46.4) 399(53.6)
CEA 0.021
r
Negative 2862(88.9) 783(27.4) 2079(72.6)
Positive
CYFRA 21-1
359(11.1) 119(33.1) -p 240(66.9)
0.201
re
Negative 2547(79.1) 700(27.5) 1847(72.5)
Positive 674(20.9) 202(30) 472(70)
lP
SCC 0.011
Negative 3048(94.6) 839(27.5) 2209(72.5)
na
All Patients EGFR Wild-type EGFR P value All Patients EGFR Wild-type EGFR Mutation P value
Characteristics (n = 1043) (n = 467, 44.8%) Mutation (n = 399) (n = 193, 48.4%) (n = 206, 51.6%)
(n = 576,
55.2%)
f
oo
Median
63(55-68) 63(56-68) 62(55-67) 63(55-68) 64(57-68) 62(53-68)
(IQR)
Smoking history <0.001 <0.001
r
Never 618(59.3) 187(30.3) 431(69.7)
Former/current 425(40.7) 280(65.9)
-p
145(34.1)
268(67.2)
131(32.8)
95(35.4)
98(74.8)
173(64.6)
33(25.2)
re
#
CEA <0.001 0.027
Negative 689(66.1) 343(49.8) 346(50.2) 349(87.5) 151(51.7) 141(48.3)
lP
f
ProGRP, Positive 1.19 (0.731-1.938) 0.484
oo
CA125, Positive 0.812 (0.522-1.263) 0.354
Pathology, ADC 25.323 (12.66-50.651) <0.001 15.433(7.639-31.178) <0.001
r
The positive thresholds for serum tumor markers are provided by the assay kits unless otherwise noted.
-p
Abbreviations: OR, odds ratio; 95% CI, 95% confidence interval; CEA, Carcinoembryonic antigen; CYFRA21-1, Cytokeratin-19
fragment; SCC, Squamous cell carcinoma antigen; NSE, Neuron-specific enolase; proGRP, Progastrin-releasing peptide; CA125,
re
Carbohydrate antigen 125; ADC, adenocarcinoma.
* Items were included in the multivariate analysis only when the P value is <0.05 in univariate analysis.
lP
na
Table 4
Univariate and multivariate analyses of various predictive factors for EGFR mutation in the study cohort.
ur
Characteristics, Factor Univariate analysis OR (95% CI) P value Multivariate analysis*OR (95% CI) P value
Gender, Female 5.847 (4.436-7.706) <0.001 3.318 (2.307-4.771) <0.001
Jo
f
oo
The positive thresholds for serum tumor markers are provided by the assay kits unless otherwise noted.
Abbreviations: OR, odds ratio; 95% CI, 95% confidence interval; CEA, Carcinoembryonic antigen; CYFRA21-1,
Cytokeratin-19 fragment; SCC, Squamous cell carcinoma antigen; NSE, Neuron-specific enolase; proGRP, Progastrin-
r
releasing peptide; CA125, Carbohydrate antigen 125; ADC, adenocarcinoma.
-p
* Items were included in the multivariate analysis only when the P value is <0.05 in univariate analysis.
re
# The optimal cut-off values for CEA and CYFRA 21-1 were established at 11.38ng/mL and 4.18ng/mL, respectively,
surpassing the positivity thresholds of the assay, and thus chosen as the ultimate positivity thresholds.
lP
Table 6
na
Performances of discrimination and calibration of models in the study and validation cohorts.
Characteristics Study cohort Validation cohort
ur
ROC analysis
AUC / C-index 0.780 0.774
Jo
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
☐ The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
of
ro
-p
re
lP
na
ur
Jo