Using Meta-Analysis and CNN-NLP To Review and Classify The Medical Literature For Normal Tissue Complication Probability in Head and Neck Cancer

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Lee et al.

Radiation Oncology (2024) 19:5 Radiation Oncology


https://doi.org/10.1186/s13014-023-02381-7

RESEARCH Open Access

Using meta‑analysis and CNN‑NLP to review


and classify the medical literature for normal
tissue complication probability in head
and neck cancer
Tsair‑Fwu Lee1,2,3, Yang‑Wei Hsieh1, Pei‑Ying Yang1, Chi‑Hung Tseng1, Shen‑Hao Lee1,4, Jack Yang5,
Liyun Chang6, Jia‑Ming Wu7,8, Chin‑Dar Tseng1 and Pei‑Ju Chao1*

Abstract
Purpose The study aims to enhance the efficiency and accuracy of literature reviews on normal tissue complication
probability (NTCP) in head and neck cancer patients using radiation therapy. It employs meta-analysis (MA) and natu‑
ral language processing (NLP).
Material and methods The study consists of two parts. First, it employs MA to assess NTCP models for xerostomia,
dysphagia, and mucositis after radiation therapy, using Python 3.10.5 for statistical analysis. Second, it integrates NLP
with convolutional neural networks (CNN) to optimize literature search, reducing 3256 articles to 12. CNN settings
include a batch size of 50, 50–200 epoch range and a 0.001 learning rate.
Results The study’s CNN-NLP model achieved a notable accuracy of 0.94 after 200 epochs with Adamax optimiza‑
tion. MA showed an AUC of 0.67 for early-effect xerostomia and 0.74 for late-effect, indicating moderate to high
predictive accuracy but with high variability across studies. Initial CNN accuracy of 66.70% improved to 94.87% post-
tuning by optimizer and hyperparameters.
Conclusion The study successfully merges MA and NLP, confirming high predictive accuracy for specific model-fea‑
ture combinations. It introduces a time-based metric, words per minute (WPM), for efficiency and highlights the utility
of MA and NLP in clinical research.
Keywords Meta-analysis, Natural language processing, Head and neck cancer, Squamous cell carcinoma of the head
and neck, Normal tissue complication probability prediction, Convolutional neural networks, Artificial intelligence,
Radiation therapy

*Correspondence:
Pei‑Ju Chao
[email protected]
Full list of author information is available at the end of the article

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecom‑
mons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Lee et al. Radiation Oncology (2024) 19:5 Page 2 of 21

Introduction the ability to identify relevant research [3]. However, to


Advancements in radiation therapy techniques for head date, no NLP techniques have been specifically tailored
and neck cancer have significantly improved patients’ for literature on complications following head and neck
quality of life [1]. However, potential complications cancer radiation therapy or normal tissue complication
such as dysphagia, xerostomia, and mucositis can hin- probability (NTCP). Furthermore, there’s a conspicuous
der recovery and amplify adverse effects. Specifically, lack of an annotated dataset for crafting a machine learn-
radiation-induced xerostomia substantially diminishes ing model dedicated to discerning relevant articles in this
patients’ well-being, leading to oral health issues and domain.
communication barriers [2]. Our research aims to fill this gap by creating an anno-
To enhance the welfare of head and neck can- tated abstract dataset focusing on the likelihood of three
cer patients, researchers are exploring innovative common complications post-radiation therapy for head
approaches, including artificial intelligence (AI) and pre- and neck cancer—mucositis, xerostomia, and dysphagia.
dictive algorithms, to investigate potential risk factors We will employ machine learning-based NLP methods
for complications. This multidisciplinary research has to classify abstracts into this annotated dataset. The ulti-
proliferated a vast body of publications. For instance, a mate goal is to minimize human error and enhance ana-
literature search using the terms "artificial intelligence" lytical efficiency.
and "head and neck cancer" between 2013 and May 2022
yielded 734,207 related articles on WOS, indicating a Materials and methods
marked upward trend. Research framework
Given the sheer volume of published literature, com- Our research process, based on MA, is divided into two
prehensive understanding through traditional literature parts, as depicted in Fig. 1. The first part employs MA to
reviews becomes increasingly challenging. Therefore, investigate NTCP predictive models for three common
systematic search and filtering methods are crucial. Opti- complications post-radiation therapy in head and neck
mized strategies involve meta-analysis (MA) for synthe- cancer patients—xerostomia, dysphagia, and mucositis.
sizing literature information, quantitatively integrating The study encompasses patient demographics, meth-
high-quality data to create valuable annotated datasets, odologies, and outcomes, hypothesizing that significant
thereby providing robust quantitative evidence for clini- variations may arise from different complication types,
cal decision-making. model choices, and predictive factors. By comparing vari-
However, conducting an integrated MA is time-con- ous models and feature combinations, we aim to identify
suming and labor-intensive, particularly in literature those with superior predictive capabilities, offering more
screening [3]. Reviewers face the daunting task of sift- effective predicting methods for clinical use. Statistical
ing through a plethora of articles with varying degrees analyses are conducted using Python 3.10.5, with the null
of expertise and clinical relevance. To enhance the effi- hypothesis stating that all model-feature combinations
ciency and accuracy of MA, this study employs natural perform equally well in predicting complications, and the
language processing (NLP) techniques. As a significant alternative hypothesis positing that at least one combina-
branch of Artificial Intelligence, NLP enables computers tion significantly outperforms the others.
to understand human language and has proven its appli- The second part integrates natural language pro-
cability across various domains [4]. Utilizing NLP can cessing with convolutional neural networks (CNN) to
augment the quantitative capabilities of MA, minimize enhance literature retrieval efficiency and result reliabil-
human errors, and automate the screening process. The ity. This approach aims to accelerate the time required
primary aim of this approach is to improve analytical effi- for research on the NTCP of complications in head and
ciency while reducing human error. neck cancer, offering quicker and more reliable insights
NLP accelerates literature reviews by adeptly categoriz- for future studies and clinical applications.
ing pertinent articles. Numerous studies have improved
machine learning methods using publicly accessible lit- Eligibility criteria, information sources, and search strategy
erature from 15 systematic reviews [5–8]. For instance, This study outlines the research content on head and
Yujia et al. employed various machine learning models neck cancer patients using the PICOS framework [9]
to classify abstracts into two categories related to can- (patient characteristics, intervention measures, control
cer risk in genetic mutation carriers (penetrance) or the group, outcome), as showed in Fig. 2. Patient character-
prevalence of genetic mutations [3]. Impressively, they istics focus on head and neck cancer patients; interven-
achieved over 88% accuracy in both models. Zhengyi tions encompass all radiation therapy techniques for
et al. demonstrated that NLP-based methods could sub- treating this cancer; control groups are categorized into
stantially reduce the review workload while maintaining machine learning, deep learning model types, and feature
Lee et al. Radiation Oncology (2024) 19:5 Page 3 of 21

Fig. 1 Research workflow diagram. CNN Convolutional neural networks, NLP Natural language processing, WOS Web of science, PICOS Patient
characteristics, Intervention measures, Control group, Outcome

factors; and the outcome metric targets the AUC of are organized into three layers: patient, method, and out-
multivariate NTCP models. Given its non-RCT or CCT come, and are explored in conjunction with the PICOS
nature, the study falls under the category of prospective framework. To ensure completeness, Boolean "AND"
trials. searches are specifically performed for combinations of
After formulating the research theme, database complications with AI and NTCP. Beyond the PICOS
searches are conducted using relevant keywords, cover- framework, the study also employs PubMed’s MeSH
ing both titles and abstracts. Primary search keywords terms and related literature to broaden its scope. Boolean
Lee et al. Radiation Oncology (2024) 19:5 Page 4 of 21

Fig. 2 Search framework. AI Artificial intelligence, HNC Head and neck cancer, NTCP Normal tissue complication probability, WOS Web of science

logic and faceted search techniques are used to break PROBAST evaluates four domains: participants, pre-
down the indexing problem into multiple thematic layers, dictors, outcome, and analysis. The participants domain
establish inter-layer relationships, and employ Boolean assesses the representativeness of the target population
"OR" for union operations, ensuring the comprehensive- and selection bias; the predictors domain evaluates the
ness of the search results (detailed keywords are provided selection, relevance, reliability, and handling of predictive
in Additional file 1: Table S1) [10–12]. factors; the outcome domain focuses on the measure-
ment and definition of outcomes, assessing their accu-
Selection process racy and consistency; and the analysis domain reviews
Data extracted from each included study is determined methods for model development and validation, includ-
through collaborative discussions among reviewers. One ing sample size, missing data handling, model calibration,
reviewer is responsible for data collection, while another and discriminative ability.
performs cross-validation. The data encompasses author- Bias risk assessment is conducted using the PROBAST
ship, publication year, types of complications, radiation Excel interface developed by Borja M. Fernandez-Felix
therapy methods, employed models, features (prognostic [13], with risk determinations—low, high, or unclear—
factors), performance evaluation, as well as the study’s derived from responses to signaling questions. An over-
contributions and conclusions. all low risk is assigned only if all domains are low-risk;
a single high-risk domain results in an overall high risk;
Data extraction and risk of bias (RoB) assessment and an unclear risk in one domain with low risk in others
In our study, when evaluating the quality and poten- leads to an overall unclear risk. If all model domains are
tial biases of the literature for MA, we opted for the low-risk but lack external validation, the risk is elevated
PROBAST tool (Prediction model Risk Of Bias ASsess- to high; however, if based on extensive data with internal
ment Tool) over the commonly used Cochrane risk of validation, it can be considered overall low-risk.
bias assessment (RoB) tool. This strategic choice was
influenced by the realization that a significant portion of Statistical methods
the studies-included did not align well with the criteria The MA in this study primarily contains the following
of the Cochrane tool due to their unique characteristics. key components:
Lee et al. Radiation Oncology (2024) 19:5 Page 5 of 21

I. Study Selection and Features: Provides an overview undergone manual retrieval and initial screening are allo-
of the included sample size, time frame, model cated into training, validation, and test sets. The positive
characteristics, and predictive factors used. and negative samples in the training and validation sets
II. Combined Effect Size Results: Calculates the aggre- are distributed at a 2:8 ratio, while the test set is further
gated AUC, confidence intervals, and ANOVA fine-tuned to a more realistic 15:85 ratio to better reflect
p-values for the included studies, visually repre- the prevalence of irrelevant samples. Second, for word
sented through forest plots to facilitate under- vector embedding, the text is converted into jsonl for-
standing of MA conclusions and statistical signifi- mat and manually annotated and cleaned, including the
cance. removal of potentially misleading punctuation and spe-
III. Heterogeneity Test Results: Utilizes Cochran’s Q cial characters. These preprocessing steps optimize the
statistic [14] and I­2 values [15] to assess study het- text for word vector embedding input in the CNN model,
erogeneity. A low p-value in the Q statistic indi- facilitating subsequent NLP and analysis.
cates the presence of heterogeneity, while a higher
­I2 value quantifies greater inter-study variability. Results
IV. ANOVA analysis under random effects model: Cal- Literature review and research selection
culates the effect size and variance for each study, After searching the WOS and PubMed databases, this
using AUC as the benchmark. Determines the study initially identified 3,256 potentially relevant arti-
weight for each study, which is the reciprocal of cles, as illustrated in Fig. 3. The first round of screen-
the variance. Computes the overall effect size and ing, based on titles, eliminated studies unrelated to head
variance. Calculates Q statistic, degrees of freedom, and neck cancer or radiation therapy, leaving 87 articles
and ­I2 statistic. Conducts ANOVA analysis; if the for the second round. The second round, focused on
Q statistic exceeds the degrees of freedom, signifi- abstracts, further excluded studies not involving head
cant inter-study differences exist, and F-values and and neck or squamous cell cancer patients, or those not
p-values are calculated to assess the null hypoth- utilizing machine learning or deep learning as evaluation
esis. tools, resulting in 36 articles for full-text review. During
V. This process covers study selection, effect size this phase, articles not addressing predictions, not focus-
aggregation, heterogeneity testing, and variance ing on complications, or lacking AUC-related outcomes
analysis under a random effects model, offering a for multivariate NTCP models were also excluded, along
comprehensive evaluation of the predictive models’ with duplicates. Ultimately, 12 articles were included for
ability to forecast the incidence of complications. review [16–27].

Performance of the CNN‑NLP model


Natural language processing (NLP) program design After comparing nine different optimizers, our study
To expedite the identification and retrieval of relevant opted for Adamax (see Additional file 1: Table S2). With
literature while ensuring result reliability and accuracy, 50 epochs, Adamax achieved a Loss value of 0.51, an
this study adopts a CNN for NLP, drawing inspiration accuracy of 0.85, and an F1-Score of 0.75, along with a
from Yujia Bao’s MA NLP model design [3]. This choice precision of 0.71. When the epochs were increased to
not only considers the nature of the data but also facili- 100, the accuracy and F1-Score improved to 0.87 and
tates platform development, paving the way for the future 0.79, respectively, while the precision reached 0.84.
integration of more deep learning models to enhance At 200 epochs, both accuracy and F1-Score peaked at
the classifier’s accuracy and generalizability. In terms approximately 0.94, clearly demonstrating the superior
of abstract identification, the CNN model employed is performance of the Adamax optimizer in the model.
capable of automatically learning language features from After optimizer fine-tuning, as shown in Table 1, we
extensive text and achieving results across various tasks. evaluated coverage performance, which measures the
Through word vector transformation and feature extrac- overlap of identified studies under specific search sub-
tion, the CNN model effectively performs text classifica- set conditions and assesses the efficacy of automated
tion and sentiment analysis. Key parameters used in this processing. We conducted tests on four different sub-
study include a batch size of 50, epoch range of 50–200, sets, from WOS T1 to Pubmed T4, and compared the
and a learning rate of 0.001. coverage rates when using Adam and Adamax optimiz-
ers across training cycles of 200, 100, and 50 epochs.
Data preprocessing In WOS T1, coverage was generally 0/9 regardless of
The data preprocessing in this study is divided into the optimizer or training cycle, with Adam reaching a
two main phases. First, abstracts and titles that have peak of 1/7 and low recognition frequency. In Pubmed
Lee et al. Radiation Oncology (2024) 19:5 Page 6 of 21

Fig. 3 Article Selection flowchart. WOS Web of science

T2, coverage was mostly 0/7, but a few articles were Features and model methods: systematic review
identified at epochs 100 and 50, not exceeding two in As shown in Table 3, the "studies-included" feature table
total. In WOS T3, Adam achieved a 3/4 coverage rate aligns with the three dimensions of the MA issue dis-
at 50 epochs, similar to Adamax. For Pubmed T4, cussed in our Materials and Methods section. In addi-
Adam reached a 3/4 coverage rate at 100 epochs, while tion to the authors and publication years, the table also
Adamax showed more stable performance across all encompasses demographic characteristics, complica-
training cycles, peaking at 2/4. tions, types of radiation therapy techniques, algorithmic
In the aspect of words per minute (wpm) for lit- combinations in predictive models, predictive perfor-
erature review, our study introduces a more objective mance, and selected predictive factors. The systematic
method for time quantification. Beyond providing a review ultimately included a total of 12 studies [16–27].
standardized metric for future research, we also employ The forest plot is illustrated in Fig. 4, the present study
unit conversion and a deep learning-based Natural undertakes a comprehensive and rigorous meta-analysis,
Language Text Classifier for temporal comparisons. focusing specifically on predictive models for xerosto-
In Table 2, we also calculated and compared the time mia. Utilizing a feature table, we meticulously integrated
spent on alternative tasks, converting wpm results to the models employed across various studies and further
seconds, the details for the screening speed measured stratified them into early and late phases for sub-group
in WPM can be seen in Additional file 1: Table S3. We analysis. The combined effect sizes for these sub-groups
then contrasted this with the average time needed for are visually represented through forest plots (The funnel
text recognition during preprocessing in T1-T4 test plot is included in Additional file 1: Figure S2). The tem-
sets using an Adamax-optimized CNN-NLP model. As poral demarcation for these phases was set at six months,
shown in Table 1, despite considerations like text rec- based on the seminal work of Hubert S. Gabryś [16].
ognition capabilities, the time efficiency gained through Statistically speaking, the overall effect size for the
NLP shows a significant, intuitive difference. (Code for Area Under the Curve (AUC) of early-effect xerostomia
WPM Calculation Algorithm captured from the moni- models (Fig. 4a) was 0.67, with a 95% Confidence Interval
tor is shown in Additional file 1: Figure S1). (CI) ranging from 0.40 to 0.91. This indicates that these
Lee et al. Radiation Oncology (2024) 19:5 Page 7 of 21

Table 1 Coverage results


Test set (Total samples) optimizer epoch minimum computation Highest coverage rate ( Identification
time(s) selected/total) Frequency
(%)

WOS T1 Adam 200 343 1/9 20


( 301) 100 76.516
50 78.596
Adamax 200 351.103 0/9 0
100 106.240
50 91.683
Pubmed T2 Adam 200 337.565 1/7 40
( 98) 100 90.814
50 81.079
Adamax 200 331.719 2/7 40
100 129.784
50 83.186
WOS T3 ( 53) Adam 200 351.416 3/4 80
100 75.448
50 76.222
Adamax 200 345.064 2/4 80
100 143.646
50 87.936
Pubmed T4 ( 60) Adam 200 334.702 3/4 40
100 106.955
50 85.363
Adamax 200 336.015 2/4 80
100 173.420
50 88.835
WOS Web of science

Table 2 Time difference comparison between manual and nlp classifier approaches
Test Set ID Data source Number of Word count Manual time spent Average time spent by CNN-NLP Relative to
entries covered (seconds) CNN-NLP (seconds) manual time spent
Ratio

T1 WOS 301 88,861 48,376 164 1:294


T2 Pubmed 98 36,991 20,102 160 1:126
T3 WOS 53 13,510 7,349 167 1:44
T4 Pubmed 60 22,804 12,404 172 1:72
CNN Convolutional neural networks, NLP Natural language processing, WOS Web of science

models possess moderate predictive accuracy for early- models may be limited across different research settings
effect xerostomia. However, the high heterogeneity, as or patient populations.
evidenced by an ­I2 value of 80.32% and a Q-statistic of In Table 4, titled "Prediction model Risk of Bias in
5.34, suggests significant variability across different stud- Included Studies," the output for each question rep-
ies. For late-effect xerostomia (Fig. 4b), the overall AUC resents distinct focal points of work, encompassing a
effect size was 0.74, with a 95% CI of 0.46 to 0.98. This comprehensive evaluation of all critical stages in the
result further corroborates the models’ relatively high development and application of prediction models as
predictive efficacy for late-effect xerostomia. Neverthe- assessed by PROBAST. The assessment content is divided
less, the exceedingly high heterogeneity (­I2 = 97.99%, into four domains: 1. Participants, 2. Predictive Factors,
Q-statistic = 52.48) implies that the applicability of these 3. Outcomes, and 4. Analysis. These domains are further
Table 3 Features for the included studies
Lee et al. Radiation Oncology

Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
&
Feature Variables

Hubert S. Gabryś et al. [16] Xerostomia 153 IMRT LR-L1 Early Stage (0–6 months): Demographics: 1. The integration of organ and dose shape
(2024) 19:5

LR-L2 LR-L1 AUC Validation: 0.56 Age, Gender, Salivary descriptors has a positive impact on pre‑
LR-EN LR-L2 AUC Validation: 0.46 Gland Shape, Volume, dicting xerostomia
kNN LR-EN AUC Validation: 0.54 Sphericity, Eccentricity 2. The prediction of xerostomia is depend‑
SVM kNN AUC Validation: 0.65 Volume Dose Histogram: ent on patient-specific and non-dosimetric
ET SVM AUC Validation: 0.57 Mean, Distribution, factors, emphasizing the importance of per‑
GTB ET AUC Validation: 0.44 Skewness sonalized data for risk assessment
GTB AUC Validation: 0.55 Spatial Dose Gradient: 3. These insights offer detailed machine
Late Stage (6–15 months): Gradient x, Gradient y, learning methodologies that are valu‑
LR-L1 AUC Validation: 0.63 Gradient z able for future radiomics and dosiomics
LR-L2 AUC Validation: 0.60 Spatial Dose Distribution: in the establishment of NTCP (Normal Tis‑
LR-EN AUC Validation: 0.56 η200, η020, η002 sue Complication Probability) models
kNN AUC Validation: 0.62 Spatial Dose Correlation:
SVM AUC Validation: 0.52 η110, η101, η011
ET AUC Validation: 0.55 Spatial Dose Skewness:
GTB AUC Validation: 0.65 η300, η030, η003
Long-term (15-24 months): Spatial Dose Co-skewness:
LR-L1 AUC Validation: 0.86 η012, η021, η120, η102,
LR-L2 AUC Validation: 0.86 η210, η201
LR-EN AUC Validation: 0.83
kNN AUC Validation: 0.74
SVM AUC Validation: 0.79
ET AUC Validation: 0.88
GTB AUC Validation: 0.77
Longitudinal Long-term
(15–24 months):
LR-L1 AUC Validation: 0.52
LR-L2 AUC Validation: 0.39
LR-EN AUC Validation: 0.52
kNN AUC Validation: 0.58
SVM AUC Validation: 0.57
ET AUC Validation: 0.51
GTB AUC Validation: 0.63
Page 8 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
&
Feature Variables
Lee et al. Radiation Oncology

Tsair-Fwu Lee et al. ( 2014) Xerostomia 206 IMRT LASSO XER3m (LASSO-Suboptimal) XER3m Related Factors: 1. Utilizing the Least Absolute Shrinkage
& Model: Dmean-c, Dmean-i, Age, and Selection Operator (LASSO) to con‑
Logistic Regression Number of factors is 3 Economic Status, T Stage, struct a multivariate logistic regression
AUC is 0.84 AJCC Stage, Smoking, model effectively predicts the incidence
XER3m (LASSO-Optimal) Education Level, Chemo‑ of moderate to severe xerostomia in head
(2024) 19:5

Model: therapy (C/T), Node and neck cancer patients undergoing Inten‑
Number of factors is 8 Classification, Baseline sity-Modulated Radiation Therapy (IMRT)
AUC is 0.86 Xerostomia, SIB or SQM, 2. Through LASSO, eight prognostic factors
XER3m (Likelihood) Model: Gender, Family History, were identified for the 3-month time point:
Number of factors is 9 Marital Status Dmean-c, Dmean-i, age, financial status,
AUC is 0.85 XER12m Related Factors: T-stage, AJCC stage, smoking, and educa‑
XER12m (LASSO-Subopti- Dmean-i, Dmean-c, tion. For the 12-month time point, nine
mal) Model: Smoking, T Stage, Base‑ prognostic factors were identified: Dmean-i,
Number of factors is 5 line Xerostomia, Alcohol education, Dmean-c, smoking, T-stage,
AUC is 0.84 Issues, Family History, baseline xerostomia, alcohol consumption,
XER12m (LASSO-Optimal) Node Classification, family medical history, and lymph node
Model: Gender, Age, Economic classification
Number of factors is 9 Status, Chemotherapy 3. During the process of selecting
AUC is 0.87 (C/T), AJCC Stage, Marital the optimal number of prognostic factors
XER12m (Likelihood) Model: Status, SIB or SQM via LASSO, fine-tuning was performed using
Number of factors is 11 the Hosmer–Lemeshow test and AUC.
AUC is 0.86 For the 3-month time point, three optimal
prognostic factors were selected: Dmean-c,
Dmean-i, and age. For the 12-month time
point, five optimal prognostic factors were
selected: Dmean-i, education, Dmean-c,
smoking, and T-stage
4. The overall performance of the NTCP
model at both time points, as indicated
by scaled Brier scores, Omnibus, and Nagel‑
kerke R2 metrics, met certain standards
and aligned with expected values
5. The multivariate NTCP model using
LASSO was confirmed to be effective
for predicting xerostomia in patients evalu‑
ated post-IMRT
Page 9 of 21
Lee et al. Radiation Oncology

Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
&
Feature Variables
Tsair-Fwu Lee et al. ( 2014) Xerostomia 152 3D-CRT​ LASSO XER HNSCC-3 m Model: Dmean-c The multivariate Normal Tissue Complica‑
(2024) 19:5

(HNSCC) IMRT & Number of Factors = 3 Dmean-i tion Probability (NTCP) model developed
84 Logistic Regression AUC = 0.88 (Range: Age Economic Status using the Least Absolute Shrinkage
(NPC) 0.86–0.91) T-Stage and Selection Operator (LASSO) effec‑
XER HNSCC-12 m Model: Education Level tively predicts the incidence of moderate
Number of Factors = 3 to severe xerostomia in patients with Head
AUC = 0.98 (Range: and Neck Squamous Cell Carcinoma
0.97–0.98) (HNSCC) and Nasopharyngeal Carcinoma
XER NPC-3 m Model: (NPC) undergoing Intensity-Modulated
Number of Factors = 4 Radiation Therapy (IMRT)
AUC = 0.87 (Range: Through LASSO, higher AUC performance
0.83–0.90) was retained while selecting the fewest
XER NPC-12 m Model: predictive factors, resulting in the establish‑
Number of Factors = 3 ment of four predictive models
AUC = 0.96 (Range: In all models, the average dose to the con‑
0.95–0.97) tralateral and ipsilateral salivary glands
was chosen as the most important predic‑
tive factor. Other selected clinical and socio-
economic factors include age, financial
status, T-stage, and educational level
The multivariate logistic regression model
using LASSO techniques can improve
the prediction of the incidence of xerosto‑
mia in HNSCC and NPC patients
The predictive model developed for HNSCC
cannot be directly applied to the NPC pop‑
ulation undergoing IMRT and vice versa,
necessitating validation
Page 10 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
&
Feature Variables
Lisanne V. van Dijk et al. ( Xerostomia 249 3D-CRT​ LASSO XER12m Model without IBM CT Image Biomarkers Existing models for predicting late-stage
2016) IMRT & Discrimination: (IBMs) patient assessment of moderate to severe
VMAT Logistic Regression AUC = 0.75 ( 0.69–0.81) Short Run Emphasis (SRE): xerostomia (XER12m) and oral mucosal
XER12m Model with IBM An image biomarker hypersecretion (STIC12m) after radiation
Discrimination: (IBM) that measures therapy are primarily based on dose-vol‑
Lee et al. Radiation Oncology

AUC = 0.77 ( 0.71–0.82) the heterogeneity ume parameters and baseline xerostomia
XER12m Model without IBM of the parotid gland (XERbase) or oral mucosal hypersecre‑
Validation: tissue tion (STICbase) scores. However, the aim
AUC boot = 0.74 Additional Parameters: of the study is to improve these predictions
XER12m Model with IBM Mean Contra-lateral by using patient-specific features based
Validation: Parotid Gland Dose: The on CT image biomarkers (IBM)
(2024) 19:5

AUC boot = 0.76 average radiation dose The research team prospectively collected
received by the contra- planning CT scans and patient assessment
lateral parotid gland outcome measurements for 249 head
during treatment and neck cancer patients undergoing
Maximum CT Intensity of definitive radiation therapy (with or with‑
the Submandibular Gland: out systemic therapy)
The highest computed These potential image biomarkers (IBM)
tomography (CT) represent the geometric features, CT inten‑
intensity value recorded sity, and textural characteristics of the sali‑
for the submandibular vary glands and submandibular glands
gland Lasso regularization was used to create
Mean Dose to Submandib- multivariate logistic regression models,
ular Glands: The average and internal validation was performed
radiation dose received through bootstrapping
by the submandibular By adding the image biomarker "Short
glands during treatment Run Emphasis" (SRE), which quantifies
the heterogeneity of salivary gland tissue,
to the average contralateral salivary gland
dose and baseline xerostomia model,
significant improvements were made
in predicting xerostomia at 12 months
For predicting oral mucosal hypersecre‑
tion at 12 months, researchers selected
the maximum CT intensity of the subman‑
dibular gland as another image biomarker,
in addition to baseline hypersecretion
and the average dose to the submandibular
gland
By introducing image biomarkers repre‑
senting the heterogeneity and density
of the salivary glands, researchers improved
predictions for xerostomia and oral mucosal
hypersecretion at 12 months
Providing image biomarkers can further
guide the patient-specific response
of healthy tissue to radiation doses
in research
Page 11 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
&
Feature Variables
Stefano Ursino et al Dysphagia 38 RT LRC Predicting Dysphagia at Dose-Volume Histo‑ Researchers developed a predictive model
( 2021) IMRT SVC 6 months: gram (DVH) features for Radiation-Induced Dysphagia (RID)
RFC SVC: AUC = 0.82 of the throat (SWOARs) based on Videofluoroscopy (VF) by incor‑
Lee et al. Radiation Oncology

LRC: AUC = 0.80 Dose of Swallowing Risk porating Dose-Volume Histogram (DVH)
RFC: AUC = 0.83 Organs (SWOARs) parameters of Swallowing Risk Organs
Predicting Dysphagia at Baseline and Post-Radi‑ at Risk (SWOARs) into machine learning
12 months: ation 6 and 12 Months analysis
SVC: AUC = 0.85 Penetration-Aspiration The RID predictive model was devel‑
LRC: AUC = 0.82 Score (P/A-VF) oped using the dose of nine swallow‑
(2024) 19:5

RFC: AUC = 0.94 ing risk organs and the Penetration-


Aspiration Score (P/A) from VF data at 6
and 12 months post-treatment
Seventy-two dose features were extracted
for each patient from the DVH and were
analyzed using Linear Support Vector
Classification (SVC), Logistic Regression
Classification (LRC), and Random Forest
Classification (RFC)
Among 38 patients, the DVH fea‑
tures of SWOARs showed relevance
at both 6 months (SVC’s AUC 0.82; LRC’s
AUC 0.80; RFC’s AUC 0.83) and 12 months
(SVC’s AUC 0.85; LRC’s AUC 0.82; RFC’s AUC
0.94)
At 6 months, the SWOARs with the highest
relevance and their corresponding features
included the base of the tongue (V65
and Dmean), superior and middle constric‑
tor muscles (V45, V55, V65, Dmp, Dmean,
Dmax, and Dmin), and salivary glands
(Dmean and Dmp). At 12 months, the fea‑
tures with the highest relevance included
middle and inferior constrictor muscles
(V55, Dmin, and Dmean; and V55, V65,
Dmin, and Dmax), glottis (V55 and Dmax),
laryngeal muscles (Dmax), and cervical
esophagus (Dmax)
A RID predictive model was trained
and cross-validated, demonstrating
high discriminative ability at both 6
and 12 months post-radiation therapy
Page 12 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
&
Lee et al. Radiation Oncology

Feature Variables
Jamie A. Dean et al. ( 2018) Dysphagia 263 3D-CRT​ PLR 6 months following RT: PM receiving > 1 Gy/ Researchers have proposed a model
IMRT SVC PLRstandard: fraction capable of predicting the severity of acute
RFC AUC = 0.82 ± 0.04 dysphagia in individual patients, which can
SVCstandard: be used to guide clinical decisions
AUC = 0.82 ± 0.04 The goal of the study is to establish a model
(2024) 19:5

RFCstandard: incorporating spatial dose metrics that can


AUC = 0.78 ± 0.05 offer guidelines for radiation therapy
PLRspatial: planning, aiming to reduce the incidence
AUC = 0.75 ± 0.08 of severe swallowing difficulties
SVCspatial: The researchers used radiation therapy
AUC = 0.74 ± 0.08 doses to the pharyngeal mucosa (PM),
RFCspatial: including dose-volume and spatial dose
AUC = 0.75 ± 0.05 metrics, along with clinical data, to develop
a model for severe acute dysphagia
Penalized Logistic Regression (PLR), Support
Vector Classification (SVC), and Random
Forest Classification (RFC) models were
generated and internally (173 patients)
and externally (90 patients) validated
It was determined that the volume
of the pharyngeal mucosa receiving moder‑
ate and high doses (greater than 1 Gy/
fraction) is most correlated with severe
acute dysphagia. In radiation therapy plan‑
ning, these volumes should be minimized
as much as possible to reduce the occur‑
rence of severe acute dysphagia
The performance of the Penalized Logistic
Regression model using dose-volume
metrics (PLR_standard) was comparable
to more complex models and demon‑
strated excellent discriminative abil‑
ity in external validation (Area Under
the Curve, AUC = 0.82)
Page 13 of 21
Table 3 (continued)
Lee et al. Radiation Oncology

Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
&
Feature Variables
Jamie A. Dean et al. ( 2016) Mucositis 351 RT PLR PLRstandard: Volumes of oral cavity The aim of this study is to generate a pre‑
(2024) 19:5

(Not Specifically SVC AUC = 0.72 ± 0.09 receiving intermed— dictive model for severe acute oral mucosi‑
Stated) RFC SVCstandard: high dose tis using spatial dose metrics and machine
AUC = 0.72 ± 0.09 learning, which can guide clinical decision-
RFCstandard: making and inform treatment planning
AUC = 0.71 ± 0.09 Researchers used radiation therapy dosages
PLRspatial: (dose-volume and spatial dose metrics)
AUC = 0.72 ± 0.09 and clinical data to generate predictive
SVCspatial: models. They compared the performance
AUC = 0.71 ± 0.09 of penalized logistic regression, support
RFCspatial: vector classification, and random forest clas‑
AUC = 0.70 ± 0.09 sification models
The performance of the standard dose-
volume-based model was not significantly
different from models that included spatial
information. The discriminative ability
was similar across all models, but the stand‑
ard random forest classification model had
the best calibration
The average AUC and calibration slope
for this model were 0.71 (SD = 0.09) and 3.9
(SD = 2.2), respectively
The volume of the oral cavity receiving
moderate and high doses is correlated
with severe oral mucositis
Reducing the volume of the oral cavity
receiving moderate and high doses may
potentially reduce the incidence of oral
mucositis
Page 14 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
&
Feature Variables
Ivo Beetz et al. (2012) Xerostomia 178 IMRT M-LR XER6m Model Moderate to severe This is a multi-center prospective study
AUC = 0.68 (0.60–0.76) dry mouth (XER M6) aimed at developing a multivariate logistic
and sticky saliva regression model
Lee et al. Radiation Oncology

(STIC M6) were The purpose of the study is to predict


assessed at 6 months the risk of xerostomia and sticky saliva
before and after treat‑ in patients with head and neck cancer
ment using the EORTC 6 months after receiving IMRT. The study
QLQ-H&N35 question‑ covers 178 patients with head and neck
naire cancer. The results show that 51.6%
(2024) 19:5

(For all questions, includ‑ of patients experienced xerostomia


ing those related to dry after treatment; 35.6% of patients reported
mouth and sticky saliva, issues with sticky saliva
a 4-point Likert scale The main predictive factors for xerostomia
was used.) are the average dose to the contralateral
The main predictive salivary gland and baseline xerostomia
factors for dry mouth The main predictive factors for sticky saliva
are the average dose are the average dose to the contralateral
to the contralateral sali‑ submandibular gland, sublingual gland,
vary gland and baseline and minor salivary glands in the soft palate
dry mouth The model proposed in this study can serve
The main predictive as a reference for optimizing future IMRT
factors for sticky saliva treatments
are the average dose Moderate to severe xerostomia (XER M6)
to the contralateral and sticky saliva (STIC M6) were assessed
submandibular gland, using the EORTC QLQ-H&N35 questionnaire
the sublingual gland, before and 6 months after treatment
and the minor salivary For all questions, including those related
glands of the soft palate to xerostomia and sticky saliva, a 4-point
Likert scale was used
Ivo Beetz et al. [24] Xerostomia 165 IMRT M-LR XER6m Model Moderate to severe Dose distributions in minor salivary glands
3D-CRT​ AUC = 0.82 (0.76–0.89) dry mouth (XER M6) during 3D-CRT have limited impact
and sticky saliva on patient-rated salivary dysfunction
(STIC M6) were symptoms
assessed at 6 months Beyond the parotid and submandibular
before and after treat‑ glands, only the sublingual glands showed
ment using the EORTC a significant association with sticky saliva
QLQ-H&N35 question‑ Reliable risk estimation needs other factors
naire like age and baseline subjective scores
(For all questions, includ‑ Including these selected factors in predic‑
ing those related to dry tive models enhances model performance
mouth and sticky saliva, significantly over just using dose volume
a 4-point Likert scale histogram parameters
was used.)
Page 15 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
&
Lee et al. Radiation Oncology

Feature Variables
Kuo Men et al. [19] Xerostomia 784 IMRT 3D rCNN XER12m Model: A subset of 40 images A toxicity prediction model using 3D rCNN
AUC = 0.84 (0.74–0.91) from the RTOG 0522 clini‑ was developed and evaluated
No contour—AUC = 0.82 cal trial had their features The model extracted low- and high-level
(0.72–0.90) automatically extracted spatial features from CT planning images,
No CT- AUC = 0.78 through deep learning radiation therapy dose distributions,
(2024) 19:5

(0.67–0.88) and contours with 3D filters


The proposed model showed promising
results in predicting xerostomia
Future studies focusing on more accurate
definitions of xerostomia-associated regions
can enhance the model’s performance
Benjamin S Rosen et al. [26] Xerostomia 105 VMAT PLR Prediction of XER12m for ≥ 1 CBCT Image Features 1. A methodology has been introduced
grade xerostomia using Patient Demographics for using on-board CBCT to measure
Dose/Clinical model (DVH/ Follow-up and Clinical treatment-related PG changes during HNC
Clinical): Outcomes radiotherapy
AUC = 0.709 (95% CI, 2. Early treatment CBCT measurements
0.603–0.815) of PG density changes were linked to long-
Prediction of XER12m with term xerostomia
added Radiomics model 3. These CBCT-measured changes offer bet‑
(DVH/Clinical + Radiomics): ter predictions than PG dose alone
AUC = 0.719 (95% CI, 4. The CBCT analysis can be conducted
0.603–0.830) with minimal additional cost, making it
Prediction of XER12m for ≥ 2 a viable option for an adaptive radiotherapy
grade xerostomia using platform
Dose/Clinical model (DVH/
Clinical):
AUC = 0.692 (95% CI,
0.615–0.770)
Prediction of XER12m
with added contralateral
salivary gland changes
slightly improved predictive
performance (DVH/Clini-
cal + Radiomics):
AUC = 0.776 (95% CI,
0.643–0.912)
Page 16 of 21
Lee et al. Radiation Oncology

Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
(2024) 19:5

&
Feature Variables
Khadija Sheikh et al Xerostomia 266 IMRT LASSO + Generalized lin‑ XER3m: IBMs (Image Biomark‑ 1. Baseline image features
[27] VMAT Tomo‑ ear models (multiple LR) DVH-AUC = 0.63 (0.51–0.81) ers) CT and MR Imaging from both parotid and submandibular
Therapy CT-AUC = 0.57 (0.45–0.71) Dose-Volume Histogram glands can potentially serve as clinical sur‑
MR-AUC = 0.66 (0.54–0.82) (DVH) Parameters rogates for baseline function
CT + MR-AUC = 0.70 2. Features from the submandibular glands
(0.57–0.82) might offer insights into unstimulated
DVH + CT-AUC = 0.56 salivary function, enhancing predictions
(0.40–0.68) of post-RT xerostomia susceptibility
DVH + CT + MR-AUC = 0.60 3. While combining all data showed a trend
(0.50–0.73) towards better prediction, further research
Clinical + CT + MR- is needed to ascertain the advantages
AUC = 0.73 (0.62–0.86) of merging imaging modalities for xerosto‑
Clinical + DVH + CT + MR- mia prediction
AUC = 0.68 (0.52–0.80) 4. Prediction models based on these
features can deepen our comprehension
of radiation-induced xerostomia and aid
in tailoring radiation treatment plans
to reduce toxicity
XER3m Xerostomia around the 3-month time point, XER6m Xerostomia around the 6-month time point, XER12m Xerostomia around the 12-month time point, Dmean-i Average dose to the ipsilateral parotid gland, Dmean-
c Average dose to the contralateral parotid gland, LR-L1 Logistic regression with L1 penalty, LR-L2 Logistic regression with L2 penalty, LR-EN Logistic regression with elastic net penalty, kNN k-Nearest neighbors, SVM
Support vector machine, ET Extra-trees, GTB Gradient tree boosting, LRC Logistic regression classification, SVC Support vector classification, RFC Random forest classification, M-LR Multivariate logistic regression, 3D rCNN
3-dimensional residual convolutional neural network, LR Logistic regression, MR Magnetic resonance
Page 17 of 21
Lee et al. Radiation Oncology (2024) 19:5 Page 18 of 21

Fig. 4 Forest plot a the overall effect size for the Area Under the Curve (AUC) of early-effect xerostomia models b For late-effect xerostomia models

categorized based on three assessment outcomes, pri- remainder falling under high risk or unclear categories,
marily labeled as "High Risk," "Low Risk," and "Unclear it is noteworthy that in terms of applicability, only two
or Ambiguous." included studies were assessed as having a higher risk,
Although the overall assessment reveals that only four while two were categorized as unclear or ambiguous.
studies exhibited low risk of bias in their data, with the This suggests that while there may be a pervasive issue
Lee et al. Radiation Oncology (2024) 19:5 Page 19 of 21

Table 4 Prediction model Risk of Bias in included studies


Author, Year Risk of Bias Applicability Overall
1. 2. Predictors 3. Outcome 4. Analysis 1. 2. Predictors 3. Outcome Risk of Bias Applicability
Participants Participants

Hubert S. Gabrys + + + − + + + − +
et al. [16]
Tsair-Fwu Lee + + + + + + + + +
et al., [17]
Tsair-Fwu Lee + + + + + + + + +
et al., [18]
Lisanne V. van + + + − + + + − +
Dijk et al., [19]
Stefano Ursino + + + − + + + − +
et al., [20]
Jamie A. Dean + + ? ? + + ? ? ?
et al., [21]
Jamie A. Dean + + − − + + − − −
et al., [22]
Ivo Beetz eta al., + + ? + + + ? ? ?
[23]
Ivo Beetz eta al., + + − + + + − − −
[24]
Khadija Sheikh + + + + + + + + +
et al., [27]
Ben jamin S. + + + + + + + + +
Rosen et al., 2018
Kuo Men et al., + + + − + + + − +
[25]
*
High risk is denoted by "-"; *Low risk is denoted by " + "; *Unclear or ambiguous is denoted by "?"

of data bias, the applicability of these studies is less fre- Further, research by Gabry et al. [16] identified key
quently compromised, thereby indicating a need for more features like dosimetric shapes and salivary gland vol-
rigorous methodological scrutiny to enhance the reliabil- ume through algorithmic comparisons, reiterating the
ity and utility of future prediction models. significant divergence between AI-based and tradi-
tional clinical models in xerostomia prediction.
Discussion However, our study also revealed certain limitations
Results of the MA study and challenges. Firstly, the limited scope of databases
In our study, we conducted a comprehensive retrospec- for literature search led to incomplete data and insuf-
tive analysis to evaluate AI-based predictive models for ficient literature, restricting our ability to perform com-
forecasting post-radiation complications like xerostomia prehensive meta-analyses and forest plot illustrations.
in head and neck cancer patients. Our data revealed sig- Secondly, some studies lacked complete data, such as
nificant effect sizes of 0.67 and 0.74 for early and late- predictive confidence intervals, which further impacted
stage xerostomia, respectively, with p-values below 0.05, our analysis. Just as per any other site, CNS NTCP lit-
highlighting the distinctiveness of AI-based models in erature suffers the same limitations, and no AI has been
this context. successfully implemented as yet [28]. Overall, while our
Interestingly, our findings contrast with earlier research study made progress in predicting normal tissue com-
by our team (Lee et al. [17, 18]) and Van Dijk et al. [19] plications after radiotherapy for head and neck cancer,
We observed that incorporating image biomarkers, such further research and validation are needed. Our find-
as pre-processed CT data, did not necessarily enhance ings align with Chulmin Bang’s 2023 literature review,
predictive accuracy compared to models solely based on emphasizing that the clinical application of AI models
traditional clinical factors and machine learning algo- still requires more in-depth exploration and validation
rithms. This discrepancy may stem from variations in [29].
dataset composition and algorithmic parameters during
model training and validation.
Lee et al. Radiation Oncology (2024) 19:5 Page 20 of 21

Performance of the CNN‑NLP model, optimizer Acknowledgements


This study was supported financially, in part, by grants from the National Sci‑
optimization, and coverage
ence and Technology Council (NSTC) of the Executive Yuan of the Republic of
In this study, we presented an analysis focusing on the China, (110-2221-E-992-005-MY2, 111-2221-E-992-016-MY2). Part of this study
coverage rate of imbalanced datasets. Despite opti- has been presented as a thesis in Chinese.
mizing the algorithmic parameters, we abstained from Author contributions
employing data augmentation techniques like oversam- Conceptualization: P-J.C, T-F.L. Data curation: Y-W.H., P-Y. Y., C-H. T., S–H.L., L.C.,
pling or undersampling to bolster the model’s predictive C-D. T. Methodology: P-J C. J.Y., J-M. W.. Project administration: T-F L. Writ‑
ing ± original draft: T-F L. All authors reviewed the manuscript.
accuracy. Our text classification model was conceptu-
alized based on the research framework proposed by Funding
Yujia Bao, MA [3]. It’s worth noting that this CNN- Grants from the National Science and Technology Council (NSTC) of the
Executive Yuan of the Republic of China, (110–2221-E-992–005-MY2,
based model predominantly relies on abstracts rather 111–2221-E-992–016-MY2).
than full texts for analysis. Consequently, the conver-
sion rate of the included literature could be susceptible Availability of data and materials
Not applicable.
to variations in research themes and inclusion criteria,
a limitation also acknowledged in Yujia Bao’s work [3].
Nevertheless, recent advancements in large-scale lan- Declarations
guage models such as GPT-3 and GPT-4 have shown Ethical approval and consent to participate
capabilities in recognizing diverse file formats, includ- Institutional review board approval was not needed as this study did not
involve human participants.
ing PDFs [30], and have exhibited remarkable precision
in medical text identification [30, 31]. Progress has also Consent for publication
been made in the realm of deep learning for medical We hereby confirm that all authors have seen and agree with the contents of
the manuscript being submitted. We warrant that the article is the authors’
text analysis, exemplified by CNN-based medical report original work, has not received prior publication, and is not under considera‑
retrieval studies [32]. These technological strides open tion for publication elsewhere. We give our consent for the publication of
new avenues for medical text identification, potentially identifiable details, which can include figure(s) and/or table(s) and the details
within it/them, in Radiation Oncology.
mitigating the aforementioned limitations. We are cur-
rently exploring the development of models designed Competing interests
for automated full-text reviews to further enhance the All authors have declared that no competing interests exist.
comprehensiveness and accuracy of literature analyses. Author details
1
Medical Physics and Informatics Laboratory of Electronics Engineering,
National Kaohsiung University of Science and Technology, No.415, Jian‑
gong Rd., Sanmin Dist., Kaohsiung 80778, Taiwan, ROC. 2 Graduate Institute
Conclusion of Clinical Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan,
In this study, we employ an integrative approach com- ROC. 3 Department of Medical Imaging and Radiological Sciences, Kaohsiung
bining MA and NLP to explore feature factors for Medical University, Kaohsiung 80708, Taiwan, ROC. 4 Department of Radiation
Oncology, Linkou Chang Gung Memorial Hospital and Chang Gung University
NTCP in head and neck cancer. Our results reject the College of Medicine, Linkou, Taiwan, ROC. 5 Medical Physics at Monmouth
null hypothesis H0 , confirming that specific model- Medical Center, Barnabas Health Care at Long Branch, Long Branch, NJ, USA.
feature combinations yield high predictive accuracy 6
Department of Medical Imaging and Radiological Sciences, I-Shou University,
Kaohsiung 840, Taiwan, ROC. 7 Heavy Ion Center of Wuwei Cancer Hospital,
for identical complications. Utilizing CNNs in NLP, we Gansu Wuwei Academy of Medical Sciences, Gansu Wuwei Tumor Hospital,
streamline the meta-analytical process and introduce a Wuwei, Gansu Province, China. 8 Department of Medical Physics, Chengde
time-based metric, words per minute (WPM) [33], for Medical University, Chengde, Hebei Province, China.
efficiency evaluation. This study underscores the util- Received: 30 October 2023 Accepted: 20 November 2023
ity of meta-analysis and NLP in clinical research, offer-
ing a methodological advancement for future studies
aiming to optimize predictive models and operational
efficiency. References
1. Chen AM, et al. Quality of life among long-term survivors of head and
Supplementary Information neck cancer treated by intensity-modulated radiotherapy. JAMA Otolar‑
yngol Head Neck Surg. 2014;140(2):129–33.
The online version contains supplementary material available at https://​doi.​ 2. Gueiros LA, Soares MSM, Leao JC. Impact of ageing and drug consump‑
org/​10.​1186/​s13014-​023-​02381-7. tion on oral health. Gerodontology. 2009;26(4):297–301.
3. Deng Z, et al. Validation of a semiautomated natural language process‑
Additional file 1 Table S1. Database Retrieval Detail Sheet. Table S2. ing–based procedure for meta-analysis of cancer susceptibility gene
Optimizer Test Set Performance Comparison Table. Table S3. Screening penetrance. JCO Clinic Cancer Inform. 2019;3:1–9.
speed measured in Words Per Minute (WPM). Figure S1. Code for WPM 4. Takeshita M, Rzepka R, Araki K. Speciesist language and nonhuman
Calculation Algorithm captured from the monitor. Supplementary Figure animal bias in English masked language models. Inf Process Manag.
S2. Bias funnel chart for the a early -effect b late-effect xerostomia 2022;59(5):103050.
Lee et al. Radiation Oncology (2024) 19:5 Page 21 of 21

5. Jonnalagadda S, Petitti D. A new iterative method to reduce workload in 28. Gaito S, et al. Normal tissue complication probability modelling for toxic‑
systematic review process. Int J Comput Biol Drug Des. 2013;6(1–2):5–17. ity prediction and patient selection in proton beam therapy to the cen‑
6. Matwin S, et al. A new algorithm for reducing the workload of tral nervous system: a literature review. Clin Oncol. 2022;34(6):e225–37.
experts in performing systematic reviews. J Am Med Inform Assoc. 29. Bang C, et al. Artificial intelligence to predict outcomes of head and neck
2010;17(4):446–53. radiotherapy. Clinic Transl Radiat Oncol. 2023;39:100590.
7. Ji X, Ritter A, Yen P-Y. Using ontology-based semantic similarity to facili‑ 30. Brown T, et al. Language models are few-shot learners. Adv Neural Inf
tate the article screening process for systematic reviews. J Biomed Inform. Process Syst. 2020;33:1877–901.
2017;69:33–42. 31. Esteva A, et al. Dermatologist-level classification of skin cancer with deep
8. Cohen AM, et al. Reducing workload in systematic review prepara‑ neural networks. Nature. 2017;542(7639):115–8.
tion using automated citation classification. J Am Med Inform Assoc. 32. Zheng T, et al. Detection of medical text semantic similarity based on
2006;13(2):206–19. convolutional neural network. BMC Med Inform Decis Mak. 2019;19:1–11.
9. Moher D, et al. Preferred reporting items for systematic reviews and meta- 33. Ntonti P, et al. A systematic review of reading tests. Int J Ophthalmol.
analyses: the PRISMA statement. Ann Intern Med. 2009;151(4):264–9. 2023;16(1):121.
10. Booth A. “Brimful of STARLITE”: toward standards for reporting literature
searches. J Med Libr Assoc. 2006;94(4):421.
11. Hoffmann TC, et al. Better reporting of interventions: template for inter‑ Publisher’s Note
vention description and replication (TIDieR) checklist and guide. Bmj. Springer Nature remains neutral with regard to jurisdictional claims in pub‑
2014;7:348. lished maps and institutional affiliations.
12. Spiteri L. A simplified model for facet analysis: Ranganathan 101. Can J Inf
Libr Sci. 1998;23(1–2):1–30.
13. Fernandez-Felix BM, et al. CHARMS and PROBAST at your fingertips: a
template for data extraction and risk of bias assessment in systematic
reviews of predictive models. BMC Med Res Methodol. 2023;23(1):1–8.
14. Cochran WG. The comparison of percentages in matched samples. Biom‑
etrika. 1950;37(3/4):256–66.
15. Higgins JP, et al. Measuring inconsistency in meta-analyses. BMJ.
2003;327(7414):557–60.
16. Gabrys HS, et al. Design and selection of machine learning methods
using radiomics and dosiomics for normal tissue complication probability
modeling of xerostomia. Front Oncol. 2018;8:35.
17. Lee TF, et al. Using multivariate regression model with least absolute
shrinkage and selection operator (LASSO) to predict the incidence of
xerostomia after intensity-modulated radiotherapy for head and neck
cancer. Plos One. 2014;9(2):89700.
18. Lee TF, Liou MH, Huang YJ, Chao PJ, Ting HM, Lee HY, Fang FM. LASSO
NTCP predictors for the incidence of xerostomia in patients with head
and neck squamous cell carcinoma and nasopharyngeal carcinoma. Sci
Rep. 2014;4(1):6217.
19. van Dijk LV, et al. CT image biomarkers to improve patient-specific predic‑
tion of radiation-induced xerostomia and sticky saliva. Radiother Oncol.
2017;122(2):185–91.
20. Ursino S, et al. Incorporating dose–volume histogram parameters of
swallowing organs at risk in a videofluoroscopy-based predictive model
of radiation-induced dysphagia after head and neck cancer intensity-
modulated radiation therapy. Strahlenther Onkol. 2021;197:209–18.
21. Dean J, et al. Incorporating spatial dose metrics in machine learning-
based normal tissue complication probability (NTCP) models of severe
acute dysphagia resulting from head and neck radiotherapy. Clinic Trans
Radiat Oncol. 2018;8:27–39.
22. Dean JA, et al. Normal tissue complication probability (NTCP) modelling
using spatial dose metrics and machine learning methods for severe
acute oral mucositis resulting from head and neck radiotherapy. Radio‑
ther Oncol. 2016;120(1):21–7.
23. Beetz I, et al. NTCP models for patient-rated xerostomia and sticky saliva
after treatment with intensity modulated radiotherapy for head and
neck cancer: the role of dosimetric and clinical factors. Radiother Oncol.
2012;105(1):101–6.
Ready to submit your research ? Choose BMC and benefit from:
24. Beetz I, et al. Development of NTCP models for head and neck cancer
patients treated with three-dimensional conformal radiotherapy for
• fast, convenient online submission
xerostomia and sticky saliva: the role of dosimetric and clinical factors.
Radiother Oncol. 2012;105(1):86–93. • thorough peer review by experienced researchers in your field
25. Men K, et al. A deep learning model for predicting xerostomia due to • rapid publication on acceptance
radiation therapy for head and neck squamous cell carcinoma in the
• support for research data, including large and complex data types
RTOG 0522 clinical trial. Int J Radiat Oncol Biol Phys. 2019;105(2):440–7.
26. Rosen BS, et al. Early changes in serial CBCT-measured parotid gland • gold Open Access which fosters wider collaboration and increased citations
biomarkers predict chronic xerostomia after head and neck radiation • maximum visibility for your research: over 100M website views per year
therapy. Int J Radiat Oncol Biol Phys. 2018;102(4):1319–29.
27. Sheikh K, et al. Predicting acute radiation induced xerostomia in head At BMC, research is always in progress.
and neck cancer using MR and CT radiomics of parotid and submandibu‑
lar glands. Radiat Oncol. 2019;14(1):1–11. Learn more biomedcentral.com/submissions

You might also like