Using Meta-Analysis and CNN-NLP To Review and Classify The Medical Literature For Normal Tissue Complication Probability in Head and Neck Cancer
Purpose The study aims to enhance the efficiency and accuracy of literature reviews on normal tissue complication
probability (NTCP) in head and neck cancer patients using radiation therapy. It employs meta-analysis (MA) and natu‑
ral language processing (NLP).
Material and methods The study consists of two parts. First, it employs MA to assess NTCP models for xerostomia,
dysphagia, and mucositis after radiation therapy, using Python 3.10.5 for statistical analysis. Second, it integrates NLP
with convolutional neural networks (CNN) to optimize literature search, reducing 3256 articles to 12. CNN settings
include a batch size of 50, 50–200 epoch range and a 0.001 learning rate.
Results The study’s CNN-NLP model achieved a notable accuracy of 0.94 after 200 epochs with Adamax optimiza‑
tion. MA showed an AUC of 0.67 for early-effect xerostomia and 0.74 for late-effect, indicating moderate to high
predictive accuracy but with high variability across studies. Initial CNN accuracy of 66.70% improved to 94.87% post-
tuning by optimizer and hyperparameters.
Conclusion The study successfully merges MA and NLP, confirming high predictive accuracy for specific model-fea‑
ture combinations. It introduces a time-based metric, words per minute (WPM), for efficiency and highlights the utility
of MA and NLP in clinical research.
Keywords Meta-analysis, Natural language processing, Head and neck cancer, Squamous cell carcinoma of the head
and neck, Normal tissue complication probability prediction, Convolutional neural networks, Artificial intelligence,
Radiation therapy
Pei‑Ju Chao
Full list of author information is available at the end of the article
© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit The Creative Commons Public Domain Dedication waiver (‑ applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Lee et al. Radiation Oncology (2024) 19:5 Page 2 of 21
Fig. 1 Research workflow diagram. CNN Convolutional neural networks, NLP Natural language processing, WOS Web of science, PICOS Patient
characteristics, Intervention measures, Control group, Outcome
factors; and the outcome metric targets the AUC of are organized into three layers: patient, method, and out-
multivariate NTCP models. Given its non-RCT or CCT come, and are explored in conjunction with the PICOS
nature, the study falls under the category of prospective framework. To ensure completeness, Boolean "AND"
trials. searches are specifically performed for combinations of
After formulating the research theme, database complications with AI and NTCP. Beyond the PICOS
searches are conducted using relevant keywords, cover- framework, the study also employs PubMed’s MeSH
ing both titles and abstracts. Primary search keywords terms and related literature to broaden its scope. Boolean
Lee et al. Radiation Oncology (2024) 19:5 Page 4 of 21
Fig. 2 Search framework. AI Artificial intelligence, HNC Head and neck cancer, NTCP Normal tissue complication probability, WOS Web of science
logic and faceted search techniques are used to break PROBAST evaluates four domains: participants, pre-
down the indexing problem into multiple thematic layers, dictors, outcome, and analysis. The participants domain
establish inter-layer relationships, and employ Boolean assesses the representativeness of the target population
"OR" for union operations, ensuring the comprehensive- and selection bias; the predictors domain evaluates the
ness of the search results (detailed keywords are provided selection, relevance, reliability, and handling of predictive
in Additional file 1: Table S1) [10–12]. factors; the outcome domain focuses on the measure-
ment and definition of outcomes, assessing their accu-
Selection process racy and consistency; and the analysis domain reviews
Data extracted from each included study is determined methods for model development and validation, includ-
through collaborative discussions among reviewers. One ing sample size, missing data handling, model calibration,
reviewer is responsible for data collection, while another and discriminative ability.
performs cross-validation. The data encompasses author- Bias risk assessment is conducted using the PROBAST
ship, publication year, types of complications, radiation Excel interface developed by Borja M. Fernandez-Felix
therapy methods, employed models, features (prognostic [13], with risk determinations—low, high, or unclear—
factors), performance evaluation, as well as the study’s derived from responses to signaling questions. An over-
contributions and conclusions. all low risk is assigned only if all domains are low-risk;
a single high-risk domain results in an overall high risk;
Data extraction and risk of bias (RoB) assessment and an unclear risk in one domain with low risk in others
In our study, when evaluating the quality and poten- leads to an overall unclear risk. If all model domains are
tial biases of the literature for MA, we opted for the low-risk but lack external validation, the risk is elevated
PROBAST tool (Prediction model Risk Of Bias ASsess- to high; however, if based on extensive data with internal
ment Tool) over the commonly used Cochrane risk of validation, it can be considered overall low-risk.
bias assessment (RoB) tool. This strategic choice was
influenced by the realization that a significant portion of Statistical methods
the studies-included did not align well with the criteria The MA in this study primarily contains the following
of the Cochrane tool due to their unique characteristics. key components:
Lee et al. Radiation Oncology (2024) 19:5 Page 5 of 21
I. Study Selection and Features: Provides an overview undergone manual retrieval and initial screening are allo-
of the included sample size, time frame, model cated into training, validation, and test sets. The positive
characteristics, and predictive factors used. and negative samples in the training and validation sets
II. Combined Effect Size Results: Calculates the aggre- are distributed at a 2:8 ratio, while the test set is further
gated AUC, confidence intervals, and ANOVA fine-tuned to a more realistic 15:85 ratio to better reflect
p-values for the included studies, visually repre- the prevalence of irrelevant samples. Second, for word
sented through forest plots to facilitate under- vector embedding, the text is converted into jsonl for-
standing of MA conclusions and statistical signifi- mat and manually annotated and cleaned, including the
cance. removal of potentially misleading punctuation and spe-
III. Heterogeneity Test Results: Utilizes Cochran’s Q cial characters. These preprocessing steps optimize the
statistic [14] and I2 values [15] to assess study het- text for word vector embedding input in the CNN model,
erogeneity. A low p-value in the Q statistic indi- facilitating subsequent NLP and analysis.
cates the presence of heterogeneity, while a higher
I2 value quantifies greater inter-study variability. Results
IV. ANOVA analysis under random effects model: Cal- Literature review and research selection
culates the effect size and variance for each study, After searching the WOS and PubMed databases, this
using AUC as the benchmark. Determines the study initially identified 3,256 potentially relevant arti-
weight for each study, which is the reciprocal of cles, as illustrated in Fig. 3. The first round of screen-
the variance. Computes the overall effect size and ing, based on titles, eliminated studies unrelated to head
variance. Calculates Q statistic, degrees of freedom, and neck cancer or radiation therapy, leaving 87 articles
and I2 statistic. Conducts ANOVA analysis; if the for the second round. The second round, focused on
Q statistic exceeds the degrees of freedom, signifi- abstracts, further excluded studies not involving head
cant inter-study differences exist, and F-values and and neck or squamous cell cancer patients, or those not
p-values are calculated to assess the null hypoth- utilizing machine learning or deep learning as evaluation
esis. tools, resulting in 36 articles for full-text review. During
V. This process covers study selection, effect size this phase, articles not addressing predictions, not focus-
aggregation, heterogeneity testing, and variance ing on complications, or lacking AUC-related outcomes
analysis under a random effects model, offering a for multivariate NTCP models were also excluded, along
comprehensive evaluation of the predictive models’ with duplicates. Ultimately, 12 articles were included for
ability to forecast the incidence of complications. review [16–27].
T2, coverage was mostly 0/7, but a few articles were Features and model methods: systematic review
identified at epochs 100 and 50, not exceeding two in As shown in Table 3, the "studies-included" feature table
total. In WOS T3, Adam achieved a 3/4 coverage rate aligns with the three dimensions of the MA issue dis-
at 50 epochs, similar to Adamax. For Pubmed T4, cussed in our Materials and Methods section. In addi-
Adam reached a 3/4 coverage rate at 100 epochs, while tion to the authors and publication years, the table also
Adamax showed more stable performance across all encompasses demographic characteristics, complica-
training cycles, peaking at 2/4. tions, types of radiation therapy techniques, algorithmic
In the aspect of words per minute (wpm) for lit- combinations in predictive models, predictive perfor-
erature review, our study introduces a more objective mance, and selected predictive factors. The systematic
method for time quantification. Beyond providing a review ultimately included a total of 12 studies [16–27].
standardized metric for future research, we also employ The forest plot is illustrated in Fig. 4, the present study
unit conversion and a deep learning-based Natural undertakes a comprehensive and rigorous meta-analysis,
Language Text Classifier for temporal comparisons. focusing specifically on predictive models for xerosto-
In Table 2, we also calculated and compared the time mia. Utilizing a feature table, we meticulously integrated
spent on alternative tasks, converting wpm results to the models employed across various studies and further
seconds, the details for the screening speed measured stratified them into early and late phases for sub-group
in WPM can be seen in Additional file 1: Table S3. We analysis. The combined effect sizes for these sub-groups
then contrasted this with the average time needed for are visually represented through forest plots (The funnel
text recognition during preprocessing in T1-T4 test plot is included in Additional file 1: Figure S2). The tem-
sets using an Adamax-optimized CNN-NLP model. As poral demarcation for these phases was set at six months,
shown in Table 1, despite considerations like text rec- based on the seminal work of Hubert S. Gabryś [16].
ognition capabilities, the time efficiency gained through Statistically speaking, the overall effect size for the
NLP shows a significant, intuitive difference. (Code for Area Under the Curve (AUC) of early-effect xerostomia
WPM Calculation Algorithm captured from the moni- models (Fig. 4a) was 0.67, with a 95% Confidence Interval
tor is shown in Additional file 1: Figure S1). (CI) ranging from 0.40 to 0.91. This indicates that these
Lee et al. Radiation Oncology (2024) 19:5 Page 7 of 21
Table 2 Time difference comparison between manual and nlp classifier approaches
Test Set ID Data source Number of Word count Manual time spent Average time spent by CNN-NLP Relative to
entries covered (seconds) CNN-NLP (seconds) manual time spent
models possess moderate predictive accuracy for early- models may be limited across different research settings
effect xerostomia. However, the high heterogeneity, as or patient populations.
evidenced by an I2 value of 80.32% and a Q-statistic of In Table 4, titled "Prediction model Risk of Bias in
5.34, suggests significant variability across different stud- Included Studies," the output for each question rep-
ies. For late-effect xerostomia (Fig. 4b), the overall AUC resents distinct focal points of work, encompassing a
effect size was 0.74, with a 95% CI of 0.46 to 0.98. This comprehensive evaluation of all critical stages in the
result further corroborates the models’ relatively high development and application of prediction models as
predictive efficacy for late-effect xerostomia. Neverthe- assessed by PROBAST. The assessment content is divided
less, the exceedingly high heterogeneity (I2 = 97.99%, into four domains: 1. Participants, 2. Predictive Factors,
Q-statistic = 52.48) implies that the applicability of these 3. Outcomes, and 4. Analysis. These domains are further
Table 3 Features for the included studies
Lee et al. Radiation Oncology
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
Feature Variables
Hubert S. Gabryś et al. [16] Xerostomia 153 IMRT LR-L1 Early Stage (0–6 months): Demographics: 1. The integration of organ and dose shape
(2024) 19:5
LR-L2 LR-L1 AUC Validation: 0.56 Age, Gender, Salivary descriptors has a positive impact on pre‑
LR-EN LR-L2 AUC Validation: 0.46 Gland Shape, Volume, dicting xerostomia
kNN LR-EN AUC Validation: 0.54 Sphericity, Eccentricity 2. The prediction of xerostomia is depend‑
SVM kNN AUC Validation: 0.65 Volume Dose Histogram: ent on patient-specific and non-dosimetric
ET SVM AUC Validation: 0.57 Mean, Distribution, factors, emphasizing the importance of per‑
GTB ET AUC Validation: 0.44 Skewness sonalized data for risk assessment
GTB AUC Validation: 0.55 Spatial Dose Gradient: 3. These insights offer detailed machine
Late Stage (6–15 months): Gradient x, Gradient y, learning methodologies that are valu‑
LR-L1 AUC Validation: 0.63 Gradient z able for future radiomics and dosiomics
LR-L2 AUC Validation: 0.60 Spatial Dose Distribution: in the establishment of NTCP (Normal Tis‑
LR-EN AUC Validation: 0.56 η200, η020, η002 sue Complication Probability) models
kNN AUC Validation: 0.62 Spatial Dose Correlation:
SVM AUC Validation: 0.52 η110, η101, η011
ET AUC Validation: 0.55 Spatial Dose Skewness:
GTB AUC Validation: 0.65 η300, η030, η003
Long-term (15-24 months): Spatial Dose Co-skewness:
LR-L1 AUC Validation: 0.86 η012, η021, η120, η102,
LR-L2 AUC Validation: 0.86 η210, η201
LR-EN AUC Validation: 0.83
kNN AUC Validation: 0.74
SVM AUC Validation: 0.79
ET AUC Validation: 0.88
GTB AUC Validation: 0.77
Longitudinal Long-term
(15–24 months):
LR-L1 AUC Validation: 0.52
LR-L2 AUC Validation: 0.39
LR-EN AUC Validation: 0.52
kNN AUC Validation: 0.58
SVM AUC Validation: 0.57
ET AUC Validation: 0.51
GTB AUC Validation: 0.63
Page 8 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
Feature Variables
Lee et al. Radiation Oncology
Tsair-Fwu Lee et al. ( 2014) Xerostomia 206 IMRT LASSO XER3m (LASSO-Suboptimal) XER3m Related Factors: 1. Utilizing the Least Absolute Shrinkage
& Model: Dmean-c, Dmean-i, Age, and Selection Operator (LASSO) to con‑
Logistic Regression Number of factors is 3 Economic Status, T Stage, struct a multivariate logistic regression
AUC is 0.84 AJCC Stage, Smoking, model effectively predicts the incidence
XER3m (LASSO-Optimal) Education Level, Chemo‑ of moderate to severe xerostomia in head
(2024) 19:5
Model: therapy (C/T), Node and neck cancer patients undergoing Inten‑
Number of factors is 8 Classification, Baseline sity-Modulated Radiation Therapy (IMRT)
AUC is 0.86 Xerostomia, SIB or SQM, 2. Through LASSO, eight prognostic factors
XER3m (Likelihood) Model: Gender, Family History, were identified for the 3-month time point:
Number of factors is 9 Marital Status Dmean-c, Dmean-i, age, financial status,
AUC is 0.85 XER12m Related Factors: T-stage, AJCC stage, smoking, and educa‑
XER12m (LASSO-Subopti- Dmean-i, Dmean-c, tion. For the 12-month time point, nine
mal) Model: Smoking, T Stage, Base‑ prognostic factors were identified: Dmean-i,
Number of factors is 5 line Xerostomia, Alcohol education, Dmean-c, smoking, T-stage,
AUC is 0.84 Issues, Family History, baseline xerostomia, alcohol consumption,
XER12m (LASSO-Optimal) Node Classification, family medical history, and lymph node
Model: Gender, Age, Economic classification
Number of factors is 9 Status, Chemotherapy 3. During the process of selecting
AUC is 0.87 (C/T), AJCC Stage, Marital the optimal number of prognostic factors
XER12m (Likelihood) Model: Status, SIB or SQM via LASSO, fine-tuning was performed using
Number of factors is 11 the Hosmer–Lemeshow test and AUC.
AUC is 0.86 For the 3-month time point, three optimal
prognostic factors were selected: Dmean-c,
Dmean-i, and age. For the 12-month time
point, five optimal prognostic factors were
selected: Dmean-i, education, Dmean-c,
smoking, and T-stage
4. The overall performance of the NTCP
model at both time points, as indicated
by scaled Brier scores, Omnibus, and Nagel‑
kerke R2 metrics, met certain standards
and aligned with expected values
5. The multivariate NTCP model using
LASSO was confirmed to be effective
for predicting xerostomia in patients evalu‑
ated post-IMRT
Page 9 of 21
Lee et al. Radiation Oncology
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
Feature Variables
Tsair-Fwu Lee et al. ( 2014) Xerostomia 152 3D-CRT LASSO XER HNSCC-3 m Model: Dmean-c The multivariate Normal Tissue Complica‑
(2024) 19:5
(HNSCC) IMRT & Number of Factors = 3 Dmean-i tion Probability (NTCP) model developed
84 Logistic Regression AUC = 0.88 (Range: Age Economic Status using the Least Absolute Shrinkage
(NPC) 0.86–0.91) T-Stage and Selection Operator (LASSO) effec‑
XER HNSCC-12 m Model: Education Level tively predicts the incidence of moderate
Number of Factors = 3 to severe xerostomia in patients with Head
AUC = 0.98 (Range: and Neck Squamous Cell Carcinoma
0.97–0.98) (HNSCC) and Nasopharyngeal Carcinoma
XER NPC-3 m Model: (NPC) undergoing Intensity-Modulated
Number of Factors = 4 Radiation Therapy (IMRT)
AUC = 0.87 (Range: Through LASSO, higher AUC performance
0.83–0.90) was retained while selecting the fewest
XER NPC-12 m Model: predictive factors, resulting in the establish‑
Number of Factors = 3 ment of four predictive models
AUC = 0.96 (Range: In all models, the average dose to the con‑
0.95–0.97) tralateral and ipsilateral salivary glands
was chosen as the most important predic‑
tive factor. Other selected clinical and socio-
economic factors include age, financial
status, T-stage, and educational level
The multivariate logistic regression model
using LASSO techniques can improve
the prediction of the incidence of xerosto‑
mia in HNSCC and NPC patients
The predictive model developed for HNSCC
cannot be directly applied to the NPC pop‑
ulation undergoing IMRT and vice versa,
necessitating validation
Page 10 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
Feature Variables
Lisanne V. van Dijk et al. ( Xerostomia 249 3D-CRT LASSO XER12m Model without IBM CT Image Biomarkers Existing models for predicting late-stage
2016) IMRT & Discrimination: (IBMs) patient assessment of moderate to severe
VMAT Logistic Regression AUC = 0.75 ( 0.69–0.81) Short Run Emphasis (SRE): xerostomia (XER12m) and oral mucosal
XER12m Model with IBM An image biomarker hypersecretion (STIC12m) after radiation
Discrimination: (IBM) that measures therapy are primarily based on dose-vol‑
Lee et al. Radiation Oncology
AUC = 0.77 ( 0.71–0.82) the heterogeneity ume parameters and baseline xerostomia
XER12m Model without IBM of the parotid gland (XERbase) or oral mucosal hypersecre‑
Validation: tissue tion (STICbase) scores. However, the aim
AUC boot = 0.74 Additional Parameters: of the study is to improve these predictions
XER12m Model with IBM Mean Contra-lateral by using patient-specific features based
Validation: Parotid Gland Dose: The on CT image biomarkers (IBM)
(2024) 19:5
AUC boot = 0.76 average radiation dose The research team prospectively collected
received by the contra- planning CT scans and patient assessment
lateral parotid gland outcome measurements for 249 head
during treatment and neck cancer patients undergoing
Maximum CT Intensity of definitive radiation therapy (with or with‑
the Submandibular Gland: out systemic therapy)
The highest computed These potential image biomarkers (IBM)
tomography (CT) represent the geometric features, CT inten‑
intensity value recorded sity, and textural characteristics of the sali‑
for the submandibular vary glands and submandibular glands
gland Lasso regularization was used to create
Mean Dose to Submandib- multivariate logistic regression models,
ular Glands: The average and internal validation was performed
radiation dose received through bootstrapping
by the submandibular By adding the image biomarker "Short
glands during treatment Run Emphasis" (SRE), which quantifies
the heterogeneity of salivary gland tissue,
to the average contralateral salivary gland
dose and baseline xerostomia model,
significant improvements were made
in predicting xerostomia at 12 months
For predicting oral mucosal hypersecre‑
tion at 12 months, researchers selected
the maximum CT intensity of the subman‑
dibular gland as another image biomarker,
in addition to baseline hypersecretion
and the average dose to the submandibular
By introducing image biomarkers repre‑
senting the heterogeneity and density
of the salivary glands, researchers improved
predictions for xerostomia and oral mucosal
hypersecretion at 12 months
Providing image biomarkers can further
guide the patient-specific response
of healthy tissue to radiation doses
in research
Page 11 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
Feature Variables
Stefano Ursino et al Dysphagia 38 RT LRC Predicting Dysphagia at Dose-Volume Histo‑ Researchers developed a predictive model
( 2021) IMRT SVC 6 months: gram (DVH) features for Radiation-Induced Dysphagia (RID)
RFC SVC: AUC = 0.82 of the throat (SWOARs) based on Videofluoroscopy (VF) by incor‑
Lee et al. Radiation Oncology
LRC: AUC = 0.80 Dose of Swallowing Risk porating Dose-Volume Histogram (DVH)
RFC: AUC = 0.83 Organs (SWOARs) parameters of Swallowing Risk Organs
Predicting Dysphagia at Baseline and Post-Radi‑ at Risk (SWOARs) into machine learning
12 months: ation 6 and 12 Months analysis
SVC: AUC = 0.85 Penetration-Aspiration The RID predictive model was devel‑
LRC: AUC = 0.82 Score (P/A-VF) oped using the dose of nine swallow‑
(2024) 19:5
Feature Variables
Jamie A. Dean et al. ( 2018) Dysphagia 263 3D-CRT PLR 6 months following RT: PM receiving > 1 Gy/ Researchers have proposed a model
IMRT SVC PLRstandard: fraction capable of predicting the severity of acute
RFC AUC = 0.82 ± 0.04 dysphagia in individual patients, which can
SVCstandard: be used to guide clinical decisions
AUC = 0.82 ± 0.04 The goal of the study is to establish a model
(2024) 19:5
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
Feature Variables
Jamie A. Dean et al. ( 2016) Mucositis 351 RT PLR PLRstandard: Volumes of oral cavity The aim of this study is to generate a pre‑
(2024) 19:5
(Not Specifically SVC AUC = 0.72 ± 0.09 receiving intermed— dictive model for severe acute oral mucosi‑
Stated) RFC SVCstandard: high dose tis using spatial dose metrics and machine
AUC = 0.72 ± 0.09 learning, which can guide clinical decision-
RFCstandard: making and inform treatment planning
AUC = 0.71 ± 0.09 Researchers used radiation therapy dosages
PLRspatial: (dose-volume and spatial dose metrics)
AUC = 0.72 ± 0.09 and clinical data to generate predictive
SVCspatial: models. They compared the performance
AUC = 0.71 ± 0.09 of penalized logistic regression, support
RFCspatial: vector classification, and random forest clas‑
AUC = 0.70 ± 0.09 sification models
The performance of the standard dose-
volume-based model was not significantly
different from models that included spatial
information. The discriminative ability
was similar across all models, but the stand‑
ard random forest classification model had
the best calibration
The average AUC and calibration slope
for this model were 0.71 (SD = 0.09) and 3.9
(SD = 2.2), respectively
The volume of the oral cavity receiving
moderate and high doses is correlated
with severe oral mucositis
Reducing the volume of the oral cavity
receiving moderate and high doses may
potentially reduce the incidence of oral
Page 14 of 21
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
Feature Variables
Ivo Beetz et al. (2012) Xerostomia 178 IMRT M-LR XER6m Model Moderate to severe This is a multi-center prospective study
AUC = 0.68 (0.60–0.76) dry mouth (XER M6) aimed at developing a multivariate logistic
and sticky saliva regression model
Lee et al. Radiation Oncology
Feature Variables
Kuo Men et al. [19] Xerostomia 784 IMRT 3D rCNN XER12m Model: A subset of 40 images A toxicity prediction model using 3D rCNN
AUC = 0.84 (0.74–0.91) from the RTOG 0522 clini‑ was developed and evaluated
No contour—AUC = 0.82 cal trial had their features The model extracted low- and high-level
(0.72–0.90) automatically extracted spatial features from CT planning images,
No CT- AUC = 0.78 through deep learning radiation therapy dose distributions,
(2024) 19:5
Table 3 (continued)
Author (Year) complications Sample size treatment model/ algorithm AUC(CI) Prognostic factors Significant contributions and findings
(2024) 19:5
Feature Variables
Khadija Sheikh et al Xerostomia 266 IMRT LASSO + Generalized lin‑ XER3m: IBMs (Image Biomark‑ 1. Baseline image features
[27] VMAT Tomo‑ ear models (multiple LR) DVH-AUC = 0.63 (0.51–0.81) ers) CT and MR Imaging from both parotid and submandibular
Therapy CT-AUC = 0.57 (0.45–0.71) Dose-Volume Histogram glands can potentially serve as clinical sur‑
MR-AUC = 0.66 (0.54–0.82) (DVH) Parameters rogates for baseline function
CT + MR-AUC = 0.70 2. Features from the submandibular glands
(0.57–0.82) might offer insights into unstimulated
DVH + CT-AUC = 0.56 salivary function, enhancing predictions
(0.40–0.68) of post-RT xerostomia susceptibility
DVH + CT + MR-AUC = 0.60 3. While combining all data showed a trend
(0.50–0.73) towards better prediction, further research
Clinical + CT + MR- is needed to ascertain the advantages
AUC = 0.73 (0.62–0.86) of merging imaging modalities for xerosto‑
Clinical + DVH + CT + MR- mia prediction
AUC = 0.68 (0.52–0.80) 4. Prediction models based on these
features can deepen our comprehension
of radiation-induced xerostomia and aid
in tailoring radiation treatment plans
to reduce toxicity
XER3m Xerostomia around the 3-month time point, XER6m Xerostomia around the 6-month time point, XER12m Xerostomia around the 12-month time point, Dmean-i Average dose to the ipsilateral parotid gland, Dmean-
c Average dose to the contralateral parotid gland, LR-L1 Logistic regression with L1 penalty, LR-L2 Logistic regression with L2 penalty, LR-EN Logistic regression with elastic net penalty, kNN k-Nearest neighbors, SVM
Support vector machine, ET Extra-trees, GTB Gradient tree boosting, LRC Logistic regression classification, SVC Support vector classification, RFC Random forest classification, M-LR Multivariate logistic regression, 3D rCNN
3-dimensional residual convolutional neural network, LR Logistic regression, MR Magnetic resonance
Page 17 of 21
Lee et al. Radiation Oncology (2024) 19:5 Page 18 of 21
Fig. 4 Forest plot a the overall effect size for the Area Under the Curve (AUC) of early-effect xerostomia models b For late-effect xerostomia models
categorized based on three assessment outcomes, pri- remainder falling under high risk or unclear categories,
marily labeled as "High Risk," "Low Risk," and "Unclear it is noteworthy that in terms of applicability, only two
or Ambiguous." included studies were assessed as having a higher risk,
Although the overall assessment reveals that only four while two were categorized as unclear or ambiguous.
studies exhibited low risk of bias in their data, with the This suggests that while there may be a pervasive issue
Lee et al. Radiation Oncology (2024) 19:5 Page 19 of 21
Hubert S. Gabrys + + + − + + + − +
et al. [16]
Tsair-Fwu Lee + + + + + + + + +
et al., [17]
Tsair-Fwu Lee + + + + + + + + +
et al., [18]
Lisanne V. van + + + − + + + − +
Dijk et al., [19]
Stefano Ursino + + + − + + + − +
et al., [20]
Jamie A. Dean + + ? ? + + ? ? ?
et al., [21]
Jamie A. Dean + + − − + + − − −
et al., [22]
Ivo Beetz eta al., + + ? + + + ? ? ?
Ivo Beetz eta al., + + − + + + − − −
Khadija Sheikh + + + + + + + + +
et al., [27]
Ben jamin S. + + + + + + + + +
Rosen et al., 2018
Kuo Men et al., + + + − + + + − +
High risk is denoted by "-"; *Low risk is denoted by " + "; *Unclear or ambiguous is denoted by "?"
of data bias, the applicability of these studies is less fre- Further, research by Gabry et al. [16] identified key
quently compromised, thereby indicating a need for more features like dosimetric shapes and salivary gland vol-
rigorous methodological scrutiny to enhance the reliabil- ume through algorithmic comparisons, reiterating the
ity and utility of future prediction models. significant divergence between AI-based and tradi-
tional clinical models in xerostomia prediction.
Discussion However, our study also revealed certain limitations
Results of the MA study and challenges. Firstly, the limited scope of databases
In our study, we conducted a comprehensive retrospec- for literature search led to incomplete data and insuf-
tive analysis to evaluate AI-based predictive models for ficient literature, restricting our ability to perform com-
forecasting post-radiation complications like xerostomia prehensive meta-analyses and forest plot illustrations.
in head and neck cancer patients. Our data revealed sig- Secondly, some studies lacked complete data, such as
nificant effect sizes of 0.67 and 0.74 for early and late- predictive confidence intervals, which further impacted
stage xerostomia, respectively, with p-values below 0.05, our analysis. Just as per any other site, CNS NTCP lit-
highlighting the distinctiveness of AI-based models in erature suffers the same limitations, and no AI has been
this context. successfully implemented as yet [28]. Overall, while our
Interestingly, our findings contrast with earlier research study made progress in predicting normal tissue com-
by our team (Lee et al. [17, 18]) and Van Dijk et al. [19] plications after radiotherapy for head and neck cancer,
We observed that incorporating image biomarkers, such further research and validation are needed. Our find-
as pre-processed CT data, did not necessarily enhance ings align with Chulmin Bang’s 2023 literature review,
predictive accuracy compared to models solely based on emphasizing that the clinical application of AI models
traditional clinical factors and machine learning algo- still requires more in-depth exploration and validation
rithms. This discrepancy may stem from variations in [29].
dataset composition and algorithmic parameters during
model training and validation.
Lee et al. Radiation Oncology (2024) 19:5 Page 20 of 21
