0% found this document useful (0 votes)
54 views24 pages

Brain MRI

This scoping review evaluates automatic and semi-automatic MRI segmentation techniques in brain imaging, focusing on their application in healthy populations and clinical utility. The review synthesizes methodologies, identifies trends, and highlights gaps in AI-based segmentation, noting the high accuracy of models in diagnosing neurodegenerative diseases and psychiatric conditions. However, challenges such as standardization and ethical considerations remain, emphasizing the need for improved integration of AI tools in clinical workflows.

Uploaded by

Jonak Tithi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views24 pages

Brain MRI

This scoping review evaluates automatic and semi-automatic MRI segmentation techniques in brain imaging, focusing on their application in healthy populations and clinical utility. The review synthesizes methodologies, identifies trends, and highlights gaps in AI-based segmentation, noting the high accuracy of models in diagnosing neurodegenerative diseases and psychiatric conditions. However, challenges such as standardization and ethical considerations remain, emphasizing the need for improved integration of AI tools in clinical workflows.

Uploaded by

Jonak Tithi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Radiography 31 (2025) 102878

Contents lists available at ScienceDirect

Radiography
journal homepage: www.elsevier.com/locate/radi

Systematic Review

A scoping review of automatic and semi-automatic MRI segmentation


in human brain imaging
M. Chau a, *, H. Vu b, T. Debnath c, M.G. Rahman c
a
Faculty of Science and Health, Charles Sturt University, Wagga Wagga, NSW 2678, Australia
b
Allied Health and Human Performance Unit, University of South Australia, Adelaide, SA 5000, Australia
c
School of Computing, Mathematics and Engineering, Charles Sturt University, NSW, Australia

a r t i c l e i n f o a b s t r a c t

Article history: Introduction: AI-based segmentation techniques in brain MRI have revolutionized neuroimaging by
Received 26 October 2024 enhancing the accuracy and efficiency of brain structure analysis. These techniques are pivotal for
Received in revised form diagnosing neurodegenerative diseases, classifying psychiatric conditions, and predicting brain age. This
15 January 2025
scoping review synthesizes current methodologies, identifies key trends, and highlights gaps in the use
Accepted 16 January 2025
Available online 31 January 2025
of automatic and semi-automatic segmentation tools in brain MRI, particularly focusing on their appli-
cation to healthy populations and clinical utility.
Methods: A scoping review was conducted following Arksey and O'Malley's framework and PRISMA-ScR
Keywords:
Artificial intelligence
guidelines. A comprehensive search was performed across six databases for studies published between
Brain MRI segmentation 2014 and 2024. Studies focused on AI-based brain segmentation in healthy populations, and patients
Neuroimaging with neurodegenerative diseases, and psychiatric disorders were included, while reviews, case series,
Machine learning and studies without human participants were excluded.
Deep learning Results: Thirty-two studies were included, employing various segmentation tools and AI models such as
Neurodegenerative diseases convolutional neural networks for segmenting gray matter, white matter, cerebrospinal fluid, and patho-
logical regions. FreeSurfer, which utilizes algorithmic techniques, are also commonly used for automated
segmentation. AI models demonstrated high accuracy in brain age prediction, neurodegenerative disease
classification, and psychiatric disorder subtyping. Longitudinal studies tracked disease progression, while
multimodal approaches integrating MRI with fMRI and PET enhanced diagnostic precision.
Conclusion: AI-based segmentation techniques provide scalable solutions for neuroimaging, advancing
personalized brain health strategies and supporting early diagnosis of neurological and psychiatric con-
ditions. However, challenges related to standardization, generalizability, and ethical considerations remain.
Implications for Practice: The integration of AI tools and algorithm-based methods into clinical workflows
can enhance diagnostic accuracy and efficiency, but greater focus on model interpretability, standardi-
zation of imaging protocols, and patient consent processes is needed to ensure responsible adoption in
practice.
© 2025 The Author(s). Published by Elsevier Ltd on behalf of The College of Radiographers. This is an
open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Background been considered the gold standard for brain Magnetic Resonance
Imaging (MRI) analysis; it, however, is labour-intensive, time-
The advent of automatic segmentation and artificial intelligence consuming, and subject to inter- and intra-observer variability.2e5
(AI) techniques has revolutionised the field of medical imaging, Quantitative analysis of brain MRI is pivotal for characterising
particularly in the quantification of brain morphology. Segmenta- neurological diseases and disorders. For instance, tissue atrophy is a
tion of brain structures is an important cornerstone in neuro- well-established biomarker for conditions such as Alzheimer's
imaging, essential for the diagnosis and monitoring of neurological disease (AD), epilepsy, schizophrenia, and multiple sclerosis
diseases, planning surgical interventions, and understanding brain (MS).6e9 Accurate segmentation of brain structures from MRI scans
function and development.1 Historically, manual segmentation has is crucial for these analyses, yet manual segmentation is impractical
for large-scale studies. Despite these limitations, MRI remains the
preferred modality for structural brain analysis due to its superior
* Corresponding author. contrast resolution for soft tissues and lack of ionising radiation.
E-mail address: [email protected] (M. Chau).

https://doi.org/10.1016/j.radi.2025.01.013
1078-8174/© 2025 The Author(s). Published by Elsevier Ltd on behalf of The College of Radiographers. This is an open access article under the CC BY license (http://
creativecommons.org/licenses/by/4.0/).
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

In recent years, the application of deep learning, particularly Search strategy


convolutional neural networks (CNNs), has demonstrated superior
performance in brain MRI segmentation tasks compared to tradi- A systematic literature search was conducted in consultation
tional machine learning (ML) algorithms.10 CNNs have the advan- with an expert librarian to ensure a comprehensive and robust
tage of self-learning features from large datasets, thus eliminating search strategy. A scoping review expert (author: MC) performed
the need for manual feature extraction and engineering.11 This has the search across the databases CINAHLPlus with EBSCOHost, Sco-
led to significant advancements in the automatic segmentation of pus, PubMed, Computers & Applied Sciences Complete via EBS-
typical brain structures, such as white matter, grey matter, cere- COHost, Web of Science, and ProQuest (Computer science database,
brospinal fluid, and pathological regions, including tumours, le- Engineering database, and Health and Medical collection). The
sions, and abnormalities associated with conditions such as literature was searched from 8 May 2014 to 8 May 2024. The date
multiple sclerosis and stroke. This scoping review aims to synthe- limitation was to capture contemporary literature from the last ten
size existing methodologies, identify key trends, and highlight years. The searches used a variety of keywords and subject head-
research gaps in automatic and semi-automatic segmentation tools ings, such as “artificial intelligence techniques,” “machine
for brain MRI, particularly focusing on their application to healthy learning,” “deep learning,” “neural networks,” “algorithm,” “auto-
populations. This review focuses on segmentation methodologies matic segmentation,” “brain MRI,” “magnetic resonance imaging,”
in studies involving healthy controls to understand their founda- “neuroimaging,” “quantification,” “measurement,” “human brain,”
tional applications. Additionally, it explores how these tools are “anatomy,” “structure,” “region,” “architecture,” “healthy,”
applied to disease tracking and progression by leveraging healthy “normal,” “population,” “subjects,” and “adults”. Table 1 outlines
controls as comparators or establishing normative dataset. the search strategy for each database The goal was to identify all
current literature reporting on AI techniques in brain MRI seg-
Methods mentation and quantification, using inclusive search terms to
capture all possible alternative and interchangeable terminologies.
The research question guiding this scoping review was: “What The PCC (Population/Concept/Context) framework guided the
are the current methodologies, key trends, and research gaps in the search strategy, as detailed below:
application of automatic and semi-automatic segmentation tools
for brain MRI, with a focus on healthy populations?” This question  Population: Healthy human brain.
was addressed using a systematic approach following Arksey and  Concept: Segmentation on MRI scans
O'Malley's framework and the Preferred Reporting Items for Sys-  Context: Application of automatic and semi-automatic seg-
tematic Reviews and Meta-Analysis extension for Scoping Reviews mentation tools
(PRISMA-ScR).12

Table 1
Search strategy.

Database Search query Limiters Number


of hits

Web of Science - Core (“artificial intelligence techniques” OR “AI techniques” OR “machine Publication Date: 2014e2024 249
Collection learning” OR “deep learning” OR “neural networks” OR algorithm OR
“automatic segmentation”) AND (“brain MRI” OR “magnetic resonance
imaging” OR neuroimag*) AND (quantif* OR measure*) AND “human brain”
AND (anatom* OR structure OR region* OR architecture) AND (healthy OR
normal OR population* OR subject* OR adult*)
ProQuest databases (Computer (“artificial intelligence techniques” OR “AI techniques” OR “machine Publication Date: 2014-01-01 to 113
Science, Engineering, Health learning” OR “deep learning” OR “neural networks” OR algorithm OR 2024-12-31
& Medical Collection) “automatic segmentation”) AND (“brain MRI” OR “magnetic resonance
imaging” OR neuroimag*) AND (quantif* OR measure*) AND “human brain”
AND (anatom* OR structure OR region* OR architecture) AND (healthy OR
normal OR population* OR subject* OR adult*)
CINAHLPlus with EBSCOHost (“artificial intelligence techniques” OR “AI techniques” OR “machine Publication Date: 2014-01-01 to 108
learning” OR “deep learning” OR “neural networks” OR algorithm OR 2024-12-31
“automatic segmentation”) AND (“brain MRI” OR “magnetic resonance
imaging” OR neuroimag*) AND (quantif* OR measure*) AND “human brain”
AND (anatom* OR structure OR region* OR architecture) AND (healthy OR
normal OR population* OR subject* OR adult*)
Scopus (“artificial intelligence techniques” OR “AI techniques” OR “machine Publication Date: 2014e2024 69
learning” OR “deep learning” OR “neural networks” OR algorithm OR
“automatic segmentation”) AND (“brain MRI” OR “magnetic resonance
imaging” OR neuroimag*) AND (quantif* OR measure*) AND “human brain”
AND (anatom* OR structure OR region* OR architecture) AND (healthy OR
normal OR population* OR subject* OR adult*)
PubMed ((((“artificial intelligence techniques” OR “AI techniques” OR “machine Publication Date: 2014e2024, 37
learning” OR “deep learning” OR “neural networks” OR algorithm OR English, Humans, Adult 19þ years
“automatic segmentation”) AND (“brain MRI” OR “magnetic resonance
imaging” OR neuroimag*)) AND (quantif* OR measure*)) AND (“human
brain” (anatom* OR structure OR region* OR architecture))) AND ((healthy
OR normal) (population* OR subject* OR adult*))
Computers & Applied Sciences (“artificial intelligence techniques” OR “AI techniques” OR “machine Publication Date: 2014-01-01 to 11
Complete via EBSCOHost learning” OR “deep learning” OR “neural networks” OR algorithm OR 2023-12-31
“automatic segmentation”) AND (“brain MRI” OR “magnetic resonance
imaging” OR neuroimag*) AND (quantif* OR measure*) AND “human brain”
AND (anatom* OR structure OR region* OR architecture) AND (healthy OR
normal OR population* OR subject* OR adult*)

2
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

The search included all peer-reviewed primary research studies model capabilities. The primary outcome of interest for this review
using qualitative and quantitative designs published in English is contributing to the broader understanding and advancement of
and human datasets using MRI scans. Following the addition of automated brain MRI segmentation techniques, ultimately
studies identified through snowballing and reference list search- enhancing the capabilities of neuroimaging analysis and clinical
ing conducted by the research team, a single reviewer (MC) and decision-making capabilities. Grey literature, narrative/systematic/
Covidence (Covidence, Melbourne, Australia) removed duplicate scoping reviews or meta-analyses, conference abstracts,
studies, and titles and abstracts were screened according to the caseecontrol studies, editor letters, or case series were excluded.
inclusion and exclusion criteria below.
Study selection
Eligibility
Search results were collated, uploaded, and de-duplicated using
Inclusion criteria included peer-reviewed papers using human Covidence. Two reviewers independently assessed citations for
participants (with studies permitted to include participants under their eligibility, initially entailing title and abstract screening. Two
18 as long as there were participants aged 18 or older), data from reviewers independently evaluated the screening of the obtained
MRI scans, and references to using AI segmentation tools to mea- full texts. Where discrepancies arose, reviewers discussed the de-
sure or quantify brain regions in healthy populations, published in cision and sought the opinion of a third reviewer when a final
English between 2014 and 2024. The inclusion of studies from the decision could not be reached. Full-text studies that did not meet
past 10 years reflects the rapid technological advancements and the inclusion criteria were excluded with reasons, and the study
adoption of AI in neuroimaging in the past decade. Notably, while selection process was described using the PRISMA flow diagram
many studies involve both a control group and a disease group (e.g., (Fig. 1).13
schizophrenia), studies were not excluded based solely on the
presence of a disease group. This approach allowed us to examine Data extraction and analysis
AI-driven algorithm performance in both standardized (healthy
control) datasets and heterogeneous datasets involving disease Data were tabulated in a study characteristics table, which
groups, providing a balanced and comprehensive evaluation of included the first author, year, country of origin, study design,

Figure 1. PRISMA flowchart showing the screening process.

3
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

population, MRI protocol, AI segmentation tool, including pre- availability of T1-weighted data and how AI segmentation tools are
processing methods, ML approach, validation approach, brain modelled. FLAIR imaging is beneficial in detecting small lesions in
segmentation, and relevant outcomes. Data extraction was con- white matter, which are often linked to conditions such as AD and
ducted by three review authors and cross-checked by a fourth MS.25,32 The key tissue classes in brain segmentation are GM, WM,
reviewer to ensure quality assurance processes were maintained and CSF. While the review highlights that most studies focus on
during this activity. Both reviewers extracted the data in a piloted directly analysing structural T1 and T2-weighted image sets, it also
standardised data extraction spreadsheet using five included arti- mentions other specialised brain MRI pulse sequences such as
cles to perform possible modifications before extracting all functional MRI and diffusion MRI. Functional MRI (fMRI) detects
included studies. Consistent with PRISMA guidelines for scoping differences in the transverse decay rates between oxygenated and
reviews, the risk of bias was not assessed.13,14 deoxygenated haemoglobin to identify brain regions active during
cognitive tasks. DTI is valuable for assessing the structure and
Results integrity of white matter fibre tracts. Lei et al. (2020) combined T1-
weighted MRI with resting-state fMRI (rs-fMRI) to explore brain
Search results function and connectivity in patients with schizophrenia. This
multimodal approach enables the integration of structural and
Fig. 1 shows the PRISMA flowchart for the search. A total of 617 functional data to understand brain disorders better.
studies were identified, of which 126 were duplicates. After the Longitudinal imaging protocols were a key feature in nine
abstract screening, 445 studies were excluded, leaving 46 full texts studies, enabling researchers to track brain changes over time
to assess for eligibility. Fifteen were excluded due to wrong focus (16,29, 31e35, 42, 44). These studies typically collected multiple
area, patient population, wrong outcomes, and lack of English. MRI scans from the same participants over months or years. Pe rez-
Thirty-two studies were included in the scoping review after Millan et al. (2023) used longitudinal T1-weighted imaging over
screening (Table 2). two years to assess cortical thickness and subcortical volume
changes in AD and FTD patients. Nogovitsyn et al. (2019) employed
Study characteristics a longitudinal design to evaluate hippocampal segmentation over
several time points (baseline, 2 weeks, 8 weeks, and 12 months),
This review includes 32 studies that were conducted across 13 demonstrating the stability and accuracy of their deep learning-
countries. Eight studies were run in China.15e22 Six studies were based segmentation tool across time.
carried out in Germany.23e31 Five studies were performed in Can- Additionally, multi-centre and multi-dataset protocols were
ada.27e29,31 Three studies were conducted in Italy.20,23,32 Three standard, especially in studies using large public databases to
studies were done in the UK.20,32,33 The remaining studies were enhance the generalizability of findings. Popuri et al. (2020) utilised
administered in Switzerland,34,35 The Netherlands,20,32 Spain,20,36 data from four public datasets: ADNI, AIBL, OASIS, and MIRIAD, to
Sweden,25,37 France,25 Israel,38 Ireland,20,39 Norway,40 Denmark,37 train and validate their machine-learning model for dementia score
Belgium,41 and Romania.41 prediction. More et al. (2023) used T1-weighted MRI from multiple
The included studies involved participants ranging in age from datasets, including CAN, IXI, eNKI, and 1000BRAINS, to ensure their
children to elderly adults, with sample sizes ranging from 60 to over brain age prediction workflows were robust across diverse pop-
10,000 participants. Fourteen studies focused exclusively on healthy ulations and imaging conditions. The MRI scanners used across
participants from diverse age groups.15,17,22e24,27e29,32,33,37,40e42 studies varied in manufacturer and field strength, with most
While the inclusion criteria centered on healthy controls, many studies employing 3T MRI scanners for high-resolution imaging.
studies utilized these populations as comparators or baseline ref-
erences for longitudinal tracking of disease progression. This high-
AI segmentation tools: applications and algorithmic methods
lights the applicability of these segmentation techniques across
diverse study designs. Therefore, twenty-four studies investigated
Across the 32 studies included in this review, various AI-based
participants with neurodegenerative diseases such as AD, Fronto-
segmentation tools and methods were employed to automate and
temporal Dementia (FTD), Parkinson's disease (PD), schizophrenia,
improve the segmentation of brain structures in MRI data. These
MS, and epilepsy, all including healthy control
tools varied in complexity, application, and the type of neural
groups.16,18,20,21,25,26,30,31,34e36,39,43e47
network or ML algorithm used. The most commonly used AI-driven
tools for segmentation were ML models (11 studies), U-Net (a type
Imaging protocols
of convolutional network architecture) and other CNN-based
models (8 studies), SPM and VBM (6 studies), hybrid approaches
Across the 32 included studies, T1-weighted was the dominant
(4 studies), and DeepSCAN (novel approach: 1 study). FreeSurfer, a
MRI sequence utilised by 28 studies. These studies employed T1-
widely used tool for brain MRI segmentation, was utilized in 9
weighted MRI across both healthy populations and clinical co-
studies. Although it is not classified as a standalone AI model,
horts with neurodegenerative and psychiatric conditions. The
FreeSurfer incorporates AI-based methodologies embedded in its
studies employing T1-weighted MRI include those focused on
processing framework to achieve accurate segmentation and sur-
healthy populations,15,17,22e24,27e29,32,33,37,39,41,42 and on neurode-
face reconstruction. Fig. 2 outlines AI segmentation tools used
generative or psychiatric conditions such as AD, FTD, schizo-
across 32 studies.
phrenia, MS, epilepsy and PD.16,18,20,25,35,38,39,43,46 In addition to T1-
weighted MRI, T2-weighted and FLAIR sequences were used in 7
studies, particularly those investigating white matter hyper- Machine learning models
intensities (WMHs) and other abnormalities associated with
neurodegenerative diseases. Both T1-weighted and T2-weighted In addition to CNNs, traditional ML models such as SVM and
sequences provide high spatial resolution and excellent contrast Random Forests were used in 11 studies to classify brain structures
between GM, WM, and CSF, making it the preferred modality for or predict outcomes based on segmented data.21,36,42e44 SVM was
structural brain analysis and segmentation. The high prevalence of commonly paired with VBM for voxel-wise classification of brain
T1-weighted imaging in analysis could be attributed to the regions in disease versus control cohorts.18,20
4
Table 2

M. Chau, H. Vu, T. Debnath et al.


Study characteristics.

Country Study design Population (n, age) MRI protocol AI segmentation tool ML approach Validation approach Brain segmentation Relevant outcomes

Akudjedu et al. (2018)39


Ireland Cross-sectional n ¼ 281 including T1-weighted images acquired Fully Automated: FSL- Not applicable Dice coefficient Subcortical structures: FSL-FIRST and volBrain:
psychiatric disorders on a 3T Siemens Trio scanner. FIRST, volBrain (segmentation ICC Hippocampus, caudate High correlation with
(n ¼ 177) and healthy Acquisition parameters: Semi-Automated: techniques assessed BlandeAltman plots nucleus. manual methods for
participants (n ¼ 104) TR ¼ 2300 ms, TE ¼ 2.98 ms, FreeSurfer were rule-based and (bias assessment) caudate segmentation
aged 16e80 years (189 voxel size ¼ 1 mm isotropic. Manual: ITK-SNAP not ML). (Dice: 81e85 %).
males, mean age of 36 Stereology: Measure® Hippocampal
years) segmentation: Lower
Dice coefficients for
automated tools (57
e69 %).
FreeSurfer
overestimated
hippocampal volumes
(bias: 74e75 %).
Stereology provided
reliable, less labor-
intensive hippocampal
segmentation.
Anderson 201942
USA Cross-sectional n ¼ 1300 (1014 males Siemes 1.5T Avanto, T1- Source-based Support Vector Randomized division Significant grey matter Models classified sex
and 286 females) aged weighted structural MRI with morphometry (SBM) Machine (SVM) with into training (n ¼ 930) volume differences with over 93 % accuracy
12e66 years multi-echo MPRAGE pulse using independent radial basis function and testing (n ¼ 370) exist between males using brain grey matter
(mean ¼ 31.3, sequence component analysis (RBF) kernel samples and females in components.
SD ¼ 10.8) Repetition time ¼ 2530 ms, (ICA) Elastic net penalised Five repetitions of orbitofrontal and Differences in grey
Echo times ¼ 1.64, 3.50, 5.36, logistic regression tenfold cross-validation frontopolar regions matter volume can
5

and 7.22 ms, Inversion (larger in females) and predict sex with high
time ¼ 1100 ms, Flip angle ¼ 7 , anterior/medial individual specificity,
Slice thickness ¼ 1.3 mm, temporal regions (more indicating distinct
Matrix size ¼ 256  256, 128 prominent in males). patterns of sexual
sagittal slices, In-plane SBM identified 57 dimorphism in brain
resolution ¼ 1.0  1.0 mm2 independent structure.
components used in the
final analysis
Beck (2021)40
Norway Cross-sectional n ¼ 573 (246 male, 327 Imaging was performed on a The study used the For brain age The study employed a Brain white matter Advanced diffusion
and female), aged 18-94 General Electric Discovery FMRIB Software Library prediction, the study rigorous quality control segmentation was models such as NODDI
longitudinal (mean age of 47.61). A MR750 3T scanner with a 32- (FSL) for diffusion utilised a mixed process, which conducted using TBSS and RSI showed higher
subset of 129 channel head coil. tensor imaging (DTI) approach combining included temporal and white matter age sensitivity than
participants (mean age analysis, including diffusion models such signal-to-noise ratio atlases such as the JHU conventional DTI
of 56.60) had tract-based spatial as DTI, Diffusion (TSNR) analysis to atlas to extract regions metrics. Furthermore, it
longitudinal data with statistics (TBSS). FSL Kurtosis Imaging (DKI), exclude low-quality of interest (ROIs) for suggested that multi-
an average time tools such as BET were Neurite Orientation data. Outliers were further analysis. shell diffusion
between scans of 25.2 used for brain Dispersion and Density manually inspected and acquisition enhances
months. extraction, and FNIRT Imaging (NODDI), and removed. The final the characterisation of
was used for the Restriction Spectrum analysis was performed white matter
nonlinear registration Imaging (RSI). Various on 702 scans. The microstructure. Age

Radiography 31 (2025) 102878


of images. Voxelwise features from these reproducibility of trajectories were
statistical analysis of models were advanced diffusion compared across
fractional anisotropy incorporated into age metrics was assessed different models,
(FA) data was prediction models and using different revealing distinct
performed using TBSS. evaluated across acquisition schemes in patterns in white
different metrics. a subset of healthy matter that change
participants. with age.
(continued on next page)
Table 2 (continued )

M. Chau, H. Vu, T. Debnath et al.


Country Study design Population (n, age) MRI protocol AI segmentation tool ML approach Validation approach Brain segmentation Relevant outcomes

Bellantuono 202123
Italy, USA Cross-sectional n ¼ 1016, aged 7e64 Various 3 Tesla scanners from The method employed Deep Neural Network 10-fold cross-validation Nodal strength is used The deep neural
years (mean ¼ 17 ± 8 multiple manufacturers. Pearson's correlation to (DNN) with four hidden repeated 100 times as a centrality measure. network outperformed
years) T1-weighted MRI. construct a complex layers (200, 100, 50, Independent test set of Sub-lobar extra-nuclear other ML methods in
Scans normalised to MNI152 network from the brain and 20 neurons, 262 subjects with white matter and predicting brain age.
reference space. MRI scans and used respectively) different scanners and thalamus were The approach
Standard pre-processing steps nodal centrality Other algorithms protocols identified as key highlighted specific
included brain extraction and measures for feature tested: Random Forests, regions in brain aging. anatomical regions
linear affine registration extraction Lasso, Ridge, Elastic The model achieved a critical in aging, such as
Net, SVM, Relevance Mean Absolute Error the thalamus and sub-
Vector Machine (MAE) of 2.19 years in lobar regions. Mean
training and 2.5 years in Absolute Error (MAE) of
testing. 2.19 years and a
Pearson's correlation of
0.89
Chand 202018
USA, Cross-sectional n ¼ 671 including T1-weighted MRI images were HYDRA (Heterogeneity The study applied Split-sample validation Multi-atlas Two distinct
Germany, schizophrenia (n ¼ 307) used across sites with varied Through Discriminative HYDRA, which uses and leave-one-site-out segmentation for grey neuroanatomical
China and healthy control MRI acquisition protocols (e.g., Analysis): A semi- linear maximum- validation were used to matter, white matter, subtypes of
(n ¼ 364), aged <45 1.5T and 3T scanners from supervised ML tool that margin classifiers for test the reproducibility and CSF was used. schizophrenia were
years different locations identifies classification and of the subtypes across Voxel-wise volumetric identified:
neuroanatomical clustering to separate multiple sites. patterns were also Subtype 1: Showed
subtypes by schizophrenia subtypes Permutation tests were generated to explore widespread grey matter
distinguishing disease from controls. This is also conducted to neuroanatomical volume reduction in
effects from normal distinct from assess the statistical differences between regions such as the
variations. conventional clustering significance of the subtypes and controls. thalamus, nucleus
MIDAS (Multivariate methods such as K- clustering solutions. accumbens, and insular
6

Discriminative means. cortex, with white


Anatomical Statistical matter deficits.
Mapping) is used to Subtype 2: Exhibited
conduct voxel-wise increased basal ganglia
analyses of the and internal capsule
neuroanatomical volume but otherwise
differences between normal brain anatomy.
the schizophrenia These subtypes did not
subtypes and healthy differ in clinical
controls symptoms. Still, the
subtype was associated
with worse premorbid
functioning (lower
educational
attainment) and a
negative correlation
between grey matter
volume and illness
duration.
Dafflon 202033
UK, Brazil Cross-sectional n ¼ 10,307 healthy T1-weighted FreeSurfer TPOT (Tree-based The study used 10-fold Cortical thickness and TPOT achieved a lower

Radiography 31 (2025) 102878


participants, aged 18 Pipeline Optimization cross-validation for volume information mean absolute error
e89 years, mean age of Tool) is an autoML tool model evaluation. were extracted from (MAE) of 4.612 years
59.40 that uses a genetic Performance was 116 regions of interest compared to RVR's MAE
programming assessed based on using the Desikan- of 5.474 years.
algorithm prediction to mean absolute error Killiany and ASEG The TPOT approach
optimise ML pipelines (MAE) and Pearson's FreeSurfer atlases. outperformed RVR,
M. Chau, H. Vu, T. Debnath et al.
for brain age prediction correlation between suggesting that
automatically. RVR was actual and predicted automated ML
used as a comparison ages. pipelines can efficiently
model, a state-of-the- identify optimal models
art method in brain age for brain age prediction.
prediction.
Doerfel 202437
Sweden, Cross-sectional n ¼ 209, 18e85 years, T1-weighted MRI scans FreeSurfer extracted Multiple ML models Five-fold cross- GM volume estimates Brain age (
Denmark mean age of 38, assessed grey matter (GM) volumetric data for were trained to predict validation (CV). Model were derived for 5-HT2AR binding:
SD ¼ 18. volume in 14 cortical and brain regions based on brain age based on 5- performance was specific brain regions, MAE ¼ 6.63 years,
subcortical regions of interest. the Desikan-Killiany HT2AR PET binding, GM evaluated using mean including the frontal, std ¼ 0.74 years,
and Aseg atlases. volume, or combined absolute error (MAE) temporal, occipital, and GM: MAE ¼ 6.95 years,
multimodal data: (1) and Pearson's parietal lobes and std ¼ 0.83 years,
Bayesian Ridge correlation coefficient subcortical areas, such 5-HT2AR þ GM:
Regression between predicted and as the hippocampus, MAE ¼ 5.54 years,
(BRidge), (2) Relevance chronological ages. thalamus, and caudate. std ¼ 0.68)
Vector Regression
(RVR) implemented
using ARDRegression,
(3) Gaussian
Process Regression
with the linear kernel
(linGPR), (4)
Gaussian Process
Regression with radial
basis function kernel
(rbfGPR), and (5) linear
support vector
regression (linSVR).
Dwyer 201843
7

United States, Cross-sectional n ¼ 145 inlcuding Structural MRI using a 3T BM8 Toolbox (Voxel- Fuzzy C-means Nested cross-validation Segmentation into grey Neuroanatomical
Germany, schizophrenia (n ¼ 71), Siemens TIM scanner Based Morphometry) clustering (FCM) and and external validation matter, white matter, subtyping improved
Australia healthy (n ¼ 74). T1-weighted MRI data with and DARTEL algorithm. SVM The model was and cerebrospinal fluid classification accuracy
External Validation specific parameters: Preprocessing included FCM clustering was externally validated on was performed using for schizophrenia:
Sample: Patients with TR ¼ 2530 ms, TE ¼ multiple segmentation into grey, used to identify an independent dataset the VBM8 toolbox, with Subtype 1: Increased
first-episode psychosis echoes (1.64e9.08 ms), white matter, and CSF schizophrenia from Germany, testing analysis of cortical and cortical and subcortical
and chronic TI ¼ 900 ms, flip angle ¼ 7 , normalisation to the subgroups. SVM models classification accuracy subcortical regions. volume reductions
schizophrenia (n ¼ slice thickness ¼ 176 mm. Montreal Neurological were employed for with McNemar's tests associated with longer
158), Age- and sex- Institute (MNI) supervised and permutation-based illness duration and
matched controls template using the classification with P-values. more negative
DARTEL algorithm. repeated, nested cross- symptoms.
validation. Subtype 2:
Predominantly cortical
reductions with shorter
illness duration and
fewer negative
symptoms.
Classification accuracy
improved from 68.3 %
(whole group) to 73.0 %
for subtype 1 and 78.8 %

Radiography 31 (2025) 102878


for subtype 2.
Finkelstein 202438
Israel, Prospective n ¼ 216 T metabolic T1-weighted 3D pulse A CNN ensemble The ensemble CNN The study used MAE Brain tissue The model achieved a
Germany, clinical trial syndrome (abdominal sequence, TR ¼ 2500 ms, framework was used approach was trained and Pearson's segmentation and mean absolute error
United States (sub-study of obesity or TE ¼ 30 ms, 1  1  1 mm3 for BMI prediction from to predict BMI from correlation coefficient preprocessing involved (MAE) of 2.06 kg/m2 on
DIRECT-PLUS) dyslipidemia), aged resolution brain MRIs. The MRI data. The network to evaluate model intensity normalisation, the test set. Predicted
>30 years Three framework involved 10 used mean squared performance. The brain extraction, and BMI loss was
intervention groups: CNN regressors. error (MSE) as the loss validation was voxel resampling. significantly correlated
(continued on next page)
Table 2 (continued )

M. Chau, H. Vu, T. Debnath et al.


Country Study design Population (n, age) MRI protocol AI segmentation tool ML approach Validation approach Brain segmentation Relevant outcomes

Healthy Dietary function, and the final performed using public Segmentation of grey with observed weight
Guidelines (HDG), BMI prediction was datasets and the matter, white matter, loss (Pearson
Mediterranean diet derived through linear DIRECT-PLUS trial data. and CSF was conducted correlation ¼ 0.29).
(MED), and Green-MED regression over the using automated Brain regions
diet. outputs of the CNN methods like RobustFov contributing to BMI
ensemble. and Robex. predictions were
identified, including the
orbitofrontal cortex,
cerebellum, and right
insula.
Ge 201927
Canada Cross-sectional n ¼ 396 including two Both datasets used high- Segmentation into grey GMV covariance was A two-fold cross- Both left and right The study identified
datasets of 198 healthy resolution 3D magnetization- matter, white matter, calculated using voxel- validation strategy was hippocampi were replicable subregions
each from Beijing and prepared rapid gradient echo and cerebrospinal fluid to-voxel Pearson used to determine divided into 3e4 with a mean Dice
Cambridge cohorts, (MPRAGE) sequences to acquire was performed using correlations (GMVCorr) clustering consistency. subregions using coefficient of 0.76
aged 18e30 years. T1-weighted images. VBM8 and Statistical and Masked The Dice coefficient was GMVCorr and MICA across methods and
Parametric Mapping Independent calculated to quantify methods. The datasets. These
(SPM12). Component Analysis reproducibility hippocampus was subregions
(MICA) for parcellation between different parcellated into corresponded with
of the hippocampus clustering methods and anterior, posterior- known
into subregions. datasets. medial, and posterior- cytoarchitectonic areas
lateral subregions. of the hippocampus
and structural
covariance patterns
aligned with functional
connectivity patterns
from resting-state fMRI.
Ge 202128
8

Canada, USA Cross-sectional n ¼ 1612 participants , Various 3T scanners. Source-based SBM was applied to The models were Structural features of Sex was predicted with
aged 17e37 years. T1-weighted structural MRI. morphometry (SBM) identify cortical validated using the HCP cortical volume, an accuracy of 81 %
Standard preprocessing using independent morphological replication subset and thickness, folding e85 % based on CMN
included brain extraction, component analysis networks (CMNs) based the SLIM dataset. (gyrification), and features.
segmentation, and (ICA) on CV, CT, GI, FD, and Receiver operating fractal dimension were Male-biased CMNs
normalisation. SulcD. A linear characteristic (ROC) analysed. Independent were associated with
discriminant analysis curves and accuracy Component Analysis externalising
(LDA) was then used to metrics were used to (ICA) was used to behaviours, providing
classify sexes based on assess performance. decompose these insights into sex
CMN loading features into distinct differences in brain
coefficients. CMNs. structure and their
potential link to
behaviour.
Heppet al. 202124
Germany Cross-sectional n ¼ 10,691 (5206 High-resolution T1-weighted A modified 3D ResNet- The study used a Mean absolute error For regional brain age The model accurately
female, 5485 male), brain MRI images were based CNN architecture heteroscedastic (MAE) and predicted estimation, the model predicted brain age
aged 20e72 years acquired using 3T Siemens was used to predict Gaussian noise model uncertainty were used was trained on patches with a mean absolute
Magnetom Skyra scanners with brain age and model to capture aleatoric to assess the model's of 64  64  64 voxels error of 3.21 years.
an MP-RAGE sequence (11x1 uncertainty for global uncertainty in brain age performance. The sampled from full- Uncertainty was higher
mm3 voxel size). and regional brain age predictions. Grad-CAM global brain age resolution images. in younger individuals

Radiography 31 (2025) 102878


estimation. was applied to provide estimation model Segmentation focused and peripheral brain
visual explanations of achieved an MAE of on relevant brain regions. Visual
the deep learning 3.21 ± 2.45 years. regions such as the explanations
model. ventricles, basal highlighted the lateral
ganglia, and insular ventricles, basal
lobe. ganglia, and insular
lobe as critical areas for
age prediction.
M. Chau, H. Vu, T. Debnath et al.
Kuchcinski 202325
Sweden, France Cross-sectional n ¼ 94 including 70 3D-T1 MPRAGE and FLAIR SPM12 The model used 3D 5-fold cross-validation. Brain volumes of grey SLE patients exhibited
female diagnosed with sequences CNN architecture to BrainAGE was matter, white matter, increased BrainAGE
systemic lupus predict brain age from calculated as a z-score and cerebrospinal fluid scores (mean þ 3.6
erythematosus (SLE) T1-weighted MRI scans, and compared with (CSF) were calculated years older than
(mean age: 35.9 years), calculating BrainAGE as biomarkers of using VolBrain controls, p ¼ 0.02),
24 female age-matched the difference between neurodegeneration, software. WMH were which was associated
healthy controls (mean predicted brain age and including segmented using the with higher plasma NfL
age: 37 years). chronological age. neurofilament light Lesion Segmentation concentrations and
(NfL) concentrations Toolbox (LST). poorer cognitive
and cognitive performance,
performance metrics. particularly in
psychomotor speed and
reaction time.
Lei 202020
China, UK, The Cross-sectional n ¼ 747 including Various 3T scanners across Structural MRI data SVM with linear kernel. 10-fold cross-validation Grey matter and white The study reported high
Netherlands, study schizophrenia (n multiple sites. were processed using Single-measure on five independent matter volumes were classification accuracy:
Ireland, ¼295), healthy ( n ¼ T1-weighted structural MRI and VBM8 for grey and classification (using datasets. segmented 90.83 % balanced
Spain, Italy 452), aged 24e40 years resting-state fMRI (rs-fMRI). white matter structural or functional Nested cross-validation accuracy when
(mean ages varied by Functional connectivity, segmentation, and rs- metrics) for hyperparameter combining structural
dataset) amplitude of low-frequency fMRI data was Multi-measure tuning and functional
fluctuation (ALFF), and regional processed for classification combines measures.
homogeneity (ReHo) metrics functional connectivity all structural and
extracted metrics. functional measures.
Liang 202116
China Cross-sectional n¼ 345 from 2 datasets T1 and T2-FLAIR The study proposed a AU-Net utilises a two- The model was WMHs were AU-Net achieved a DSC
(60 MICCAI, 345 ADNI novel deep learning step U-Net architecture validated using metrics segmented using the of 0.86 and H95 of
database). framework called AU- where anatomical- such as the Dice proposed AU-Net 3.06 mm, performing
345 from ADNI Net (Anatomical based features are first similarity coefficient architecture, which also comparably to the
database include Knowledge-based U- used for rough (DSC), modified mapped WMHs to state-of-the-art
9

111 MCI (64 Males, 47 Net), which integrates segmentation and then Hausdorff distance predefined brain method (DSC 0.87, H95
Females), 80 AD (41 handcrafted refined by (H95), and precision- regions (e.g., corpus 3.62 mm). The study
Males, 39 Females) and anatomical-based incorporating spatial recall area under the callosum, frontal and demonstrated that
94 cognitive normal spatial features from a knowledge in the curve (AUC). The results parietal subcortical WMHs in strategic
(CN) subjects (43 Males, brain atlas with a U-Net second U-Net step. were compared against white matter). brain regions, such as
51 Females) architecture to improve human observers and the frontal and parietal
WMH segmentation. the state-of-the-art white matter, were
method from the significantly correlated
MICCAI 2017 WMH with cognitive
Segmentation impairments (e.g.,
Challenge. measured by MMSE,
FAQ, and ADAS).
Moreet al. (2023)26
Germany Cross-sectional n ¼ 2953, 18e88 years T1w MRI scans across different The images were pre- 128 workflows A 5-fold cross- GM, WM and CSF. Brain age (MAE
and (CAN, n ¼ 651 datasets, including those from processed using CAT for constituting 16 feature validation was used Voxel-wise GMV was between 4.73 and 8.38
longitudinal IXI, n ¼ 562 eNKI, the Alzheimer's Disease normalisation and representations and within and across used after smoothing years). The study found
evaluation. The n ¼ 597 Neuroimaging Initiative (ADNI) segmentation. eight ML algorithms: datasets. Test-retest and resampling, along a significant correlation
study 1000BRAINS, n ¼ 1143) Segmentation was Ridge regression (RR), reliability was with principal between brain-age
systematically performed using SPM least absolute evaluated using the component analysis delta and AD
assessed 12 shrinkage and selection CoRR and OASIS-3 (PCA) for progression. The
128 ML operator (LASSO) datasets dimensionality workflows exhibited

Radiography 31 (2025) 102878


workflows for regression (LR), elastic reduction. high test-retest
brain-age net regression (ENR), reliability, with
prediction and kernel ridge regression concordance
included a (KRR), random forest correlation coefficients
cross-dataset regression (RFR), GPR, (CCC) ranging from 0.95
(continued on next page)
Table 2 (continued )

M. Chau, H. Vu, T. Debnath et al.


Country Study design Population (n, age) MRI protocol AI segmentation tool ML approach Validation approach Brain segmentation Relevant outcomes

generalisation, RVR with the linear to 0.98 for short retest


test-retest kernel (RVRlin), and durations.
reliability, and polynomial kernel of
longitudinal degree 1 (RVRpoly).
consistency
evaluation.
Montella 202432
Italy, UK, Cross-sectional n ¼ 2160, aged 4e86 T1-weighted images A DenseNet264-based The model used a 3D The model achieved an WMH was segmented FD patients showed
Netherlands years, mean age of 33 (TR ¼ 1900 ms, TE ¼ 3.4 ms, deep learning model CNN for brain-age out-of-sample MAE of using the LST, and brain significantly higher
years TI ¼ 900 ms, voxel size: was trained to predict prediction, and saliency 4.01 years and volumes (grey matter, brain-PAD values than
1  1  1 mm3) and 3D FLAIR brain age from maps were applied to R2 ¼ 0.90. Brain-PAD white matter) were controls (þ3.1 years
for white matter hyperintensity minimally pre- identify brain regions values were compared measured using CAT12 vs. 0.1 years, p ¼ .01).
(WMH) assessment were processed 3D T1- influencing predictions. between FD patients for volumetric analysis. Brain-PAD correlated
acquired using a 3T Siemens weighted images, and controls using with the Fabry
scanner. generating the cohort's ANCOVA, with further stabilisation index
brain-predicted age validation using (FASTEX, B ¼ 0.10, p ¼
differences (brain- DeepBrainNet. .02) and was associated
PAD). with both lower brain
parenchymal fraction
(B ¼ 153.50, p ¼ .001)
and higher WMH load
(B ¼ 0.85, p ¼ 0.01).
Nogovitsyn 201929
Canada Longitudinal n ¼ 200 with T1-weighted MRI images; The Hippodeep Hippodeep e a CNN- The algorithm showed Hippocampal volumes Hippodeep showed
452 T1-weighted MRI algorithm was used for based hippocampal significant stability and high test-retest
scans; hippocampal segmentation consistent accuracy in reliability and
Participants ranged in segmentation, and its algorithm. volume measurements volumetric stability
10

age from 12 to 60 years outputs were compared Hippodeep was at various time across time points
(mean 27, median 24); to those from manual employed to generate intervals, (mean dice of 0.77). It
58 % were females. tracing and FreeSurfer reliable hippocampal outperforming two produced larger
6.0 volumes in healthy other methods - FS6.0- hippocampal volumes
participants scanned sf and manual than FreeSurfer and
across time points that segmentation. manual tracing, with a
varied in duration better correlation to
(weeks 0, 2, and 8 in manual segmentations
one sample, months than FreeSurfer.
0 and 12 in another)
and obtained from
various sites and
scanners.
Novosad 202030
Canada Cross-sectional The hippocampal T1w scans were used, The method combined Five ML algorithms: Cross-validation. The Brain volume, High segmentation
dataset consists of 60 specifically in the ADNI dataset deep 3D convolutional CNNeB, CNN-SP, CNN- segmentation results thickness, and shape; accuracy with mean
scans. 20 subjects were and other manually labelled networks with spatial SP-D, CNN-SP-D þ DA, were validated using hippocampus Dice coefficients of
selected from the datasets priors for FIRST Dice coefficients and segmentation; 91.5 % for hippocampal
following clinical neuroanatomical MHD to measure spatial subcortical segmentation and
subgroups: normal segmentation, overlap and boundary segmentation 89.5 % for subcortical
controls, mild cognitive including ANIMAL differences between segmentation. The

Radiography 31 (2025) 102878


impairment, and (Automatic Nonlinear automated and manual highly reliable
Alzheimer's disease, Image Matching and segmentations. approach produces
including elderly Anatomical Labeling) - segmentations with an
populations for A multi-atlas technique accuracy comparable to
hippocampus combining nonlinear the scanerescan
segmentation and registration with reliability of expert
healthy adults for majority-vote label manual segmentations.
subcortical fusion, FreeSurfer, The proposed method
segmentation. PBS þ EC (Patch-based maintains the highly
M. Chau, H. Vu, T. Debnath et al.
Segmentation with competitive runtime
Error Correction), performance
PBS þ NLR þ EC (Patch- common among many
based Segmentation recent CNN-based
with Nonlinear methods for
Registration and Error segmentation.
Correction) and CNN.
rez-Millan 202336
Pe
Spain Cross-sectional n ¼ 339 subjects Acquired using 3T Siemens FreeSurfer was used for Principal component Performance was FreeSurfer segmented Accuracy of
and include 99 healthy Magnetom Trio Tim or Prisma cortical reconstruction analysis (PCA) for assessed using k-fold cortical thickness (CTh) classification: CTR vs
longitudinal controls (CTR), 153 AD scanners with standard and volumetric dimensionality cross-validation with across 68 cortical AD (83.3 % ± 12.7 %
patients, 87 FTD acquisition parameters (MP- segmentation of T1- reduction, followed by multiple iterations. The regions and grey matter cross-sectional,
patients RAGE, voxel size: 11x1 mm). weighted images. For SVM for classification model's accuracy for volumes in 17 90.0 % ± 14.7 %
longitudinal data, the different comparisons subcortical structures. longitudinal), CTR vs
longitudinal stream of (AD vs. FTD vs. controls) Intracranial volume FTD (82.1 % ± 14.7 %
FreeSurfer was was evaluated. (ICV) was used for cross-sectional,
employed to track normalisation. 88.0 % ± 16.4 %
changes in cortical longitudinal), AD vs
thickness and FTD (63.3 % ± 9.1 %
subcortical volumes. cross-sectional,
75.0 % ± 36.9 %
longitudinal), three-
group classification
(60.7 % ± 12.7 % cross-
sectional,
71.3 % ± 13.1 %
longitudinal)
Popuri 202031
Canada Cross-sectional n ¼ 8834 images from T1-weighted images FreeSurfer was used to The ensemble classifier The model was trained FreeSurfer's cortical The model achieved an
and ADNI, AIBL, OASIS, and segment the brain into was trained on ROI on ADNI data and and subcortical AUC of 0.95 for
11

longitudinal MIRIAD databases grey matter (GM), volume features to validated on AIBL, labelling was used for distinguishing stable
white matter (WM), compute an MRI-based OASIS, and MIRIAD volumetric NC and DAT. In
and cerebrospinal fluid Dementia Score datasets. Performance measurements, independent validation,
(CSF) regions. (MRDATS), which metrics included followed by a multi- the model showed good
Volumetric features of represents similarity to accuracy, sensitivity, atlas segmentation generalizability, with
91 anatomical regions Alzheimer's Disease specificity, balanced approach to calculate accuracy varying
of interest (ROIs) were neurodegeneration accuracy, and area total intracranial depending on the
extracted. patterns. The classifiers under the curve (AUC). volume (TIV). dataset and dementia
were applied across subtype. For instance,
multiple datasets for progressive MCI had a
independent validation. balanced accuracy of
71.8 %.
Guet al. 202215
China Cross-sectional n ¼ 646 healthy (334 T1, T2, and T2 FLAIR images Artificial-intelligence- An artificial- The study validated the Brain segmentation The results showed that
and female, 312 male, age based age-specific intelligence-based segmentation accuracy into GM, WM, and CSF the volumes of
longitudinal range of 18e82 years. template construction ASTC framework is using a Dice coefficient, regions was performed anatomical structures
structural (ASTC) framework presented to generate reporting an average using a CNN-based obtained from the
analysis templates for different Dice coefficient of 0.857 network with a U-Net templates were
age groups using MRIs. for brain structures. architecture applied consistent with those
Reproducibility was across age-specific measured from the
assessed using ICC for templates. original images,

Radiography 31 (2025) 102878


cortical thickness and indicating that they can
surface area. capture major
structural
characteristics
of different age groups.
The trend of volumetric
changes was also
consistent with the
(continued on next page)
Table 2 (continued )

M. Chau, H. Vu, T. Debnath et al.


Country Study design Population (n, age) MRI protocol AI segmentation tool ML approach Validation approach Brain segmentation Relevant outcomes

aging studies of normal


brains.
17
Ran 2022
China Cross-sectional n¼ 4652, 9e96 years, T1-weighted MRI images using FastSurfer extracted 95 The model used 10-fold cross- Regions included Brain age (MAE: 3.64
1.5T and 3T scanners. brain region volumes, XGBoost for brain age validation. The model's cortical structures, years). The brain age
while SPM12 was used estimation, and SHAP performance was hippocampus, and vector demonstrated
for intracranial volume was employed to evaluated using these subcortical areas based disorder-specific aging
(ICV) and brain tissue calculate the feature metrics, and the brain on the Desikan- patterns, particularly in
fraction segmentation contributions for age vector was Killiany-Tourville atlas, the medial temporal
(grey matter, white constructing the brain validated across normalised by ICV. lobe for AD (sMCI vs.
matter, CSF). age vector, providing different datasets and pMCI) and the striatum
insights into region- for disease-specific for PD (HC vs.
specific brain aging. patterns (e.g., sMCI vs. prodromal PD). The
pMCI, HC vs. prodromal model achieved an AUC
PD). of 83.39 % for AD
classification and
72.28 % for PD.
Rebsamen 2020a34
Switzerland Cross-sectional 840 scans including 160 MRI scans from Siemens 1.5T The deep learning 128 workflows Pearson correlation, Deep learning-based The deep learning-
from the ABCD study and 3T scanners were used for model DeepSCAN was consisting of 16 feature robustness (scan- segmentation of based model
(aged 9-10), OASIS-3, while multiple used for anatomy representations and rescan reliability), and cortical grey matter and (DL þ DiReCT)
160 healthy adults from vendors were involved in the segmentation. The 8 ML algorithms. Ridge effect sizes for group- white matter using demonstrated high
the IXI dataset, SIMON dataset. segmentation was regression (RR), Least wise differences DeepSCAN, followed by accuracy and
160 elderly participants followed by DiReCT absolute shrinkage and between healthy DiReCT to estimate robustness, showing a
from the ADNI study, (diffeomorphic selection operator controls and dementia cortical thickness. Pearson correlation of
160 healthy controls registration) to (LASSO) regression (LR), patients were used to Comparisons were r ¼ 0.887 with
from Inselspital estimate cortical Elastic net regression validate the model. made against FreeSurfer for global
12

institution, thickness. (ENR), Kernel ridge FreeSurfer and ANTs cortical thickness,
128 patients with regression (KRR), segmentation outputs. better sensitivity to
multiple sclerosis (MS), Random Forest thickness changes than
48 patients with regression (RFR), ANTs, and larger effect
epilepsy, and Gaussian process sizes for group-wise
24 patients with PD. regression (GPR), comparisons of
Relevance vector dementia patients
regression (RVR) with versus healthy controls.
linear kernel (RVRlin)
and polynomial kernel
of degree 1 (RVRpoly)
Rebsamen 2020b35
Switzerland Cross-sectional n ¼ 574 including 443 T1-weighted MRI acquired on FreeSurfer 6.0 was used The custom 3D CNN The CNN's predictions Subcortical volumes The CNN demonstrated
healthy controls Siemens 3T scanners to generate ground architecture was were compared to and cortical thicknesses good to excellent
131 patients with (Magnetom Trio and Verio). truth brain developed to predict FreeSurfer-derived were estimated using agreement with
epilepsy MRI protocols included 3D MP- segmentations, which brain morphometry measures, with ICC, the deep learning FreeSurfer for
RAGE and MDEFT. were used to train the measures directly from Pearson correlation, model, which predicts subcortical volumes
deep learning model. MRI, including and BlandeAltman 165 brain (mean ICC ¼ 0.68) and
subcortical volumes, plots used to assess morphometry cortical thickness
cortical thickness, and reliability and accuracy. measures, including (mean ICC ¼ 0.53). The
curvature. This CNN volumes of 29 CNN predictions

Radiography 31 (2025) 102878


was trained using subcortical regions and showed high reliability,
supervised learning cortical thicknesses in especially for more
based on FreeSurfer 68 parcellations. significant brain
outputs. regions such as total
grey matter and white
matter volumes.
Stolicyn 202044
Cross-sectional
M. Chau, H. Vu, T. Debnath et al.
United n ¼ 873 participants T1-weighted and diffusion- FSL and ENIGMA Support Vector Leave-One-Out Cross- White matter integrity Classification
Kingdom from the STRADL weighted imaging (DWI). The consortium protocols Machine (SVM) and Validation (LOOCV) for measures for 19 accuracies ranged from
(Stratifying Depression STRADL cohort used a standard for FA measures. White Decision Tree the STRADL cohort and bilateral and 5 75 % to 90 % depending
and Resilience 3T MRI protocol with Tract- matter integrity classifiers. Feature 10-fold cross-validation unilateral white matter on the sample and the
Longitudinally) cohort Based Spatial Statistics (TBSS) measures included FA selection techniques for other cohorts. tracts were segmented diagnostic criteria, with
and a larger sample applied to derive fractional and MD metrics. such as t-test filtering Fivefold cross- based on the JHU white notable differences in
(N ¼ 18,980) from the anisotropy (FA) and mean and sequential feature validation was applied matter atlas. performance based on
UK Biobank diffusivity (MD) metrics from elimination were to the largest pMDD- depression criteria (e.g.,
white matter tracts. employed. UKB-CIDI cohort. current, remitted, or
lifetime depression).
Tay 202445
Singapore Cross-sectional 30 SLE patients: 26 Multimodal MRI, including N/R A ML ebased model 10-fold cross-validation VBM assessed brain Increased right
women, mean age of voxel-based morphometry (glmnet) was was used for model volume, MTR analysed amygdala perfusion
31.9 years (SD 6.7), and (VBM), magnetisation transfer constructed to predict training. microstructural was positively
10 healthy controls ratio (MTR), and dynamic neurocognitive changes, and DCE-MRI correlated with
(HCs): 9 women, mean contrast-enhanced (DCE) MRI function based on MRI evaluated bloodebrain neurocognitive
age 27.8 years (SD 4.8). to assess microstructural, parameters. barrier permeability performance (TTS),
SLE patients had a permeability, and perfusion Using imaging and perfusion. with a correlation of
mean disease duration abnormalities. parameters, supervised r ¼ 0.636 (FDR
of 7.7 years (SD 6.9) learning algorithms p < 0.05). ML models
without (augmented Markov effectively predicted
neuropsychiatric blanket and glmnet) neurocognitive
manifestations. were employed to performance in SLE
predict neurocognitive patients, particularly
function. highlighting alterations
in the limbic system
(amygdala) as
predictors.
Vogel 201821
United States Longitudinal n ¼ 118 including 52 T1-weighted images were Automated The study used a SVM A 10-fold cross- The focus was on The study found that
13

and China study patients with mild acquired using a 3 Tesla segmentation of brain classifier to distinguish validation method was hippocampal volume reduced hippocampal
cognitive impairment scanner. High-resolution regions was performed between MCI and employed to assess the and cortical thickness volume and cortical
(MCI) and 66 structural MRI was used for using FreeSurfer for control subjects based accuracy of the ML as key markers of thickness were
cognitively normal volumetric analysis. cortical thickness and on imaging features. model in predicting neurodegeneration. significant predictors of
control participants. volumetric analysis. cognitive decline. cognitive decline in MCI
Ages ranged from 55 to patients. The
91 years (mean multimodal model
age ¼ 70.5 years). combining MRI and PET
data improved
classification accuracy.
Weerasekera 202341
USA, Belgium, Cross-sectional n ¼ 193 healthy Structural T1-weighted MRI FreeSurfer v5.3.0 was A Random Forest 5-fold cross-validation Volumes of large brain Identified age-
Romania participants, aged 20 was performed on a 3T scanner used for volumetric classifier evaluates repeated 1000 times, regions (grey matter, dependent
e35 years for young (MAGNETOM Verio, Siemens segmentation and brain brain volumes' gender-stratified. cortical white matter, relationships between
adults and 60e75 years Healthcare GmbH) with a 32- subregion analysis. predictive power CSF) and subcortical subcortical brain
for older adults. Gender channel head coil. (subcortical and global) structures (caudate, volumes and cognitive
stratified (68 females, on cognitive putamen, thalamus, measures such as fluid
125 males). performance. pallidum, amygdala, and crystallised
nucleus accumbens, intelligence, cognitive
hippocampus) were flexibility, and working
extracted. memory.

Radiography 31 (2025) 102878


Wu 202422
China Cross-sectional n¼ 424 healthy T1-weighted MRI was used. The preprocessing steps Deep learning models The model was Attention maps The brain age
including 354 included N4 bias field were constructed using validated using 5-fold generated by Grad- prediction model
participants in the correction, skull CNNs. Transfer learning cross-validation in the CAM identified critical showed predictive solid
(continued on next page)
Table 2 (continued )

M. Chau, H. Vu, T. Debnath et al.


Country Study design Population (n, age) MRI protocol AI segmentation tool ML approach Validation approach Brain segmentation Relevant outcomes

internal development stripping, and was applied using pre- internal cohort and brain regions for each performance, a Pearson
cohort (age range 6-85) registration to MNI trained weights from evaluated using an age group. Brain correlation coefficient
and 70 participants in space using DARTEL. open-source datasets to external validation regions with high predictive performance
the external validation The brain regions were improve prediction cohort. The model's attention scores and a Pearson
cohort (age range 22- segmented using the accuracy. The models performance was included the middle correlation coefficient
62). Anatomical Automatic were trained using a assessed using metrics temporal pole, superior of 0.969 in the external
Labeling (AAL) atlas, partitioning strategy, such as the MAE and frontal orbital, and validation cohort. The
which divided the brain with four age groups Pearson correlation rectus in younger use of atlas-based
into 90 regions. Grad- corresponding to coefficient. The internal participants and the attentional
CAM was used to different stages of brain cohort achieved an cuneus, superior enhancement
visualise key regions development and MAE of 2.245 years, occipital, and frontal improved the accuracy
contributing to brain aging. while the external regions in older of the brain age
age prediction. cohort achieved an participants. prediction model.
MAE of 2.218 years.
Zhang 201446
United States Cross-sectional 18 healthy young T1-weighted MRI was used for Volume-based The VTE approach The segmentation The study generated It demonstrated that
and adults: aged 22e38 all participants. Template Estimation integrated accuracy was validated three distinct T1- the VTE approach
longitudinal years (mean age: 28.6 ± For the young adult group, a 3D (VTE) is a population- diffeomorphic shape by comparing weighted brain atlases: preserves brain
4.6 years), 11 males and inversion recovery sequence based template modelling under a automated parcellation one for young adults, topology and image
7 females.12 with the following parameters creation approach Bayesian framework, results to manual one for the aging contrast better than
participants in normal was used: TR/TE/TI ¼ 6.7/3.1/ based on the Bayesian leveraging geodesic delineations of sub- population, and one for conventional single-
average aging group: 842 ms, with framework and flows and momentum cortical structures. Alzheimer's Disease subject or group-
aged 60e80 years. 1.0  1.0  1.2 mm^3 resolution diffeomorphic shape conservation for atlas Registration accuracy patients. averaged atlases,
These patients were over an FOV of analysis. This method creation. The model was measured using showing accurate
matched with 12 age- 240 mm  204 mm  256 mm. avoids the intensity estimates population- surface-to-surface registration results for
matched controls For AD patients, an MRI was averaging issues found representative distances (SSD) for cortical and sub-
(mean age: 75.0 ± 5.9 conducted using a 3T Philips in conventional atlases. templates for both cortical structures and cortical structures.
14

years). scanner at the F.M. Kirby volume- and surface- kappa values for sub-
Research Center, adhering to based quantitative cortical segmentations.
the ADNI protocol. analysis.
Zhu 202247
China Cross-sectional n ¼ 1045, The MR images were acquired The study utilised the The 2D VB-Net is a The model was The 2D VB-Net The method could
mean Age: using seven different MRI 2D VB-Net CNN to CNN-based approach. validated using two segmented WMH with subclassify WMH into
62.30 ± 11.20 years, scanners, both 1.5T and 3T, automatically segment The network consists of independent datasets: a Dice score of 0.789 four categories based
559 males and 486 including GE Signa HDxt 3.0T, WMH regions in an encoder-decoder IDS 1 (102 subjects with and a lesion F1-score of on their proximity to
females. GE Discovery MR750 3.0T, UIH routine clinical MR framework with data from a 3T MR 0.764 the ventricles:
All subjects had varying uMR 780 3.0T, GE Signa Excite images, outperforming residual connections, scanner) and IDS 2 (74 juxtaventricular,
degrees of WMH. 1.5T, GE Signa HDxt 1.5T, GE other methods such as bottleneck layers, and a subjects from a 1.5T MR periventricular, deep,
Signa Creator 1.5T, and GE Brivo uResNet, 3D V-Net, and weighted Dice loss to scanner). and juxtacortical WMH.
MR355 1.5T. Visual geometry Group enhance segmentation It was also tested using The segmentation
Multi-sequence MRI data network (VGGNet). performance. It also the 2017 MICCAI WMH results showed strong
included T1-weighted, T2- integrates multi-modal Segmentation correlations with visual
weighted, and fluid-attenuated data (T1, T2, and FLAIR Challenge dataset, rating scales like the
inversion recovery (FLAIR) sequences) for better achieving a Dice Fazekas score, widely
images with a 5 mm slice accuracy. coefficient 0.789, used for WMH
thickness. indicating high assessments.
accuracy.

Radiography 31 (2025) 102878


Abbreviations: AAL: Anatomical Automatic Labeling; AD: Alzheimer's Disease; ADAS: Alzheimer's Disease Assessment Scale; AIBL: Australian Imaging, Biomarkers & Lifestyle; ALFF: Amplitude of Low-Frequency Fluctuation;
ANCOVA: Analysis of Covariance; ARDRegression: Automatic Relevance Determination Regression; ASTC: Age-Specific Template Construction; AU-Net: Anatomical Knowledge-based U-Net; BM8: Brain Morphometry 8; BRidge:
Bayesian Ridge Regression; CAT: Computational Anatomy Toolbox; CCC: Concordance Correlation Coefficients; CMN: Cortical Morphological Networks; CNN: Convolutional Neural Network; CV: Cross-Validation; DARTEL:
Diffeomorphic Anatomical Registration using Exponentiated Lie Algebra; DAT: Dementia of Alzheimer's Type; DCE-MRI: Dynamic Contrast-Enhanced MRI; Dice: Dice Similarity Coefficient; DKI: Diffusion Kurtosis Imaging; DNN:
Deep Neural Network; DSC: Dice Similarity Coefficient; ENIGMA: Enhancing Neuroimaging Genetics through Meta-Analysis; ENR: Elastic Net Regression; FAQ: Functional Activities Questionnaire; FA: Fractional Anisotropy;
FASTEX: Fabry Stabilization Index; FCM: Fuzzy C-means Clustering; FD: Fractal Dimension; FDR: False Discovery Rate; FLAIR: Fluid-Attenuated Inversion Recovery; FLS: FMRIB Software Library; FS6.0: FreeSurfer Version 6.0; FSL:
FMRIB Software Library; GM: Grey Matter; GMV: Grey Matter Volume; GMVCorr: Grey Matter Volume Correlation; HC: Healthy Control; HCP: Human Connectome Project; HDG: Healthy Dietary Guidelines; HYDRA: Het-
erogeneity Through Discriminative Analysis; ICA: Independent Component Analysis; ICC: Intraclass Correlation Coefficient; IDP: Individual Development Plan; JHU: Johns Hopkins University; KRR: Kernel Ridge Regression;
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

U-Net and CNN-based models


LASSO: Least Absolute Shrinkage and Selection Operator; LDA: Linear Discriminant Analysis; linGPR: Linear Gaussian Process Regression; linSVR: Linear Support Vector Regression; LOOCV: Leave-One-Out Cross-Validation; LST:
Lesion Segmentation Toolbox; MCI: Mild Cognitive Impairment; MICA: Masked Independent Component Analysis; MIDAS: Multivariate Discriminative Anatomical Statistical Mapping; MICCAI: Medical Image Computing and
Computer-Assisted Intervention; MNI: Montreal Neurological Institute; MPRAGE: Magnetization-Prepared Rapid Gradient Echo; MRDATS: MRI-based Dementia Score; MRI: Magnetic Resonance Imaging; MSE: Mean Squared

Time; TIM: Total Imaging Matrix; TPOT: Tree-based Pipeline Optimization Tool; TR: Repetition Time; TTS: Total Test Score; VBM: Voxel-Based Morphometry; VB-Net: Visual-Based Net; VGGNet: Visual Geometry Group
Cognitive Impairment; RBF: Radial Basis Function; RBFGPR: Radial Basis Function Gaussian Process Regression; ReHo: Regional Homogeneity; RSI: Restriction Spectrum Imaging; ROI: Region of Interest; RVR: Relevance Vector

Lupus Erythematosus; SLIM: Southwest University Longitudinal Imaging multimodal study; SPM: Statistical Parametric Mapping; SVM: Support Vector Machine; TBSS: Tract-Based Spatial Statistics; TE: Echo Time; TI: Inversion
Error; MTR: Magnetisation Transfer Ratio; NODDI: Neurite Orientation Dispersion and Density Imaging; N/R: not reported; PET: Positron Emission Tomography; PCA: Principal Component Analysis; pMCI: Progressive Mild

Regression; RVRlin: Linear Relevance Vector Regression; RVRpoly: Polynomial Relevance Vector Regression; SBM: Source-Based Morphometry; SD: standard deviation; SHAP: SHapley Additive exPlanations; SLE: Systemic
U-Net architecture, a deep convolutional network widely used
in medical image segmentation, was employed in two studies for
the segmentation of brain structures.15,16 U-Net's encoder-decoder
structure allowed for high-precision segmentation, particularly for
WMH and hippocampal volumes. Some studies further customised
U-Net to integrate anatomical priors or multi-modality MRI data.16
Several studies also utilised CNN-based models.24,33,42 For example,
Anderson et al. (2019) used CNNs to distinguish between male and
female brain structures, while Hepp et al. (2021) employed a 3D
ResNet-based CNN to estimate brain age.24,42

FreeSurfer

FreeSurfer, a widely used software platform for neuroimaging


analysis, operates based on AI principles to deliver automated
processing and segmentation of brain structures.48 FreeSurfer was
the most commonly use software across 9 studies for the seg-
mentation of cortical and subcortical structures.21,29e31,33,35e37,39,41
FreeSurfer is widely recognised for its accuracy in segmenting brain
structures, including GM, WM and CSF. In the context of neurode-
generative diseases, FreeSurfer was particularly useful for tracking
longitudinal changes in brain volumes and cortical thickness.31,36
Fig. 3 exemplifies an automatic segmentation of the entire brain
using FreeSurfer on a healthy participant.

SPM and VBM

SPM, developed by Friston et al. in the 1990s, is a free image


analysis tool written in MATLAB. Its unified segmentation algo-
rithm uses a Gaussian mixture model to classify tissue types in T1-
Network; VTE: Volume-based Template Estimation; WM: White Matter; WMH: White Matter Hyperintensities.

weighted images and registers them to a tissue probability map.


Each voxel is assigned a tissue probability based on intensity and
location.19
VBM is a specific application of voxel-based statistical analysis
(VBSA) to modulated segmented tissue probability images from
MRI scans. It systematically compares images from different sub-
jects to detect patterns of differences, such as between patient
groups or between an individual and a reference dataset, focusing
on gray and white matter morphometry.49,50 Six studies used SPM,
often paired with VBM, for brain segmentation.17,20,25e27,43 SPM
and volBrain are typically used for normalisation, segmentation,
and smoothing of MRI data and was particularly common in studies
that focused on grey matter volume and cortical thickness. Figs. 4
and 5 demonstrate the segmentation of grey matter and white
matter from a healthy control using SPM12. VBM was used to assess
brain morphometry in schizophrenia and neurodegenerative dis-
eases such as AD.20,27 Similar to SPM12, BrainSuite was developed
using the MEG/EEG Matlab toolbox and requires minimal user

Figure 2. AI segmentation tools and methods used across studies.

15
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

Figure 3. FreeSurfer showing projected segmentation of anatomical structures across the brain.

interaction to produce cortical representations. The software in- Despite these shared methods, differences were observed in the
cludes skull and scalp removal, image nonuniformity compensa- choice of tools and imaging protocols. For example, FreeSurfer,
tion, voxel-based tissue classification, topological corrections, and known for its cortical surface reconstruction and volumetric seg-
rendering functions including brain segmentation and volume mentation capabilities, was frequently used, while other studies
measurements.51 volBrain is an automated MRI brain segmentation employed SPM for tissue classification and spatial normalization.23
tool, but it does not strictly fall under AI-driven tools such as U-Net Such variability in preprocessing workflows highlights the impor-
or CNN-based models.39 tance of consistent protocols for ensuring reliable prediction out-
comes. Studies with standardized preprocessing pipelines and
Hybrid approaches and ensemble methods high-resolution imaging tended to report superior segmentation
accuracy and predictive performance.
Four studies utilised hybrid AI segmentation methods that
combined traditional ML techniques with DL.18,32,33,36 For instance, Validation techniques
Dafflon et al. (2020) used a Tree-based Pipeline Optimization Tool
(TPOT), which automates the ML pipeline, alongside Relevance Validation of AI segmentation tools is critical to ensuring the
Vector Regression (RVR) for brain age prediction.33 Montella et al. accuracy, reliability, and generalisability of automatic brain MRI
(2024) employed a DenseNet-based DL model for brain age pre- segmentation methods. Several validation techniques were
diction and combined this with saliency maps to highlight regions employed across the 32 studies included in this review to assess the
most impacted by neurodegeneration.32 These hybrid models are performance of ML and DL models. These techniques ranged from
especially valuable in enhancing segmentation accuracy in clinical traditional statistical metrics to more advanced model evaluation
cohorts. approaches, ensuring that the segmentation models were accurate
and applicable across different populations and MRI protocols.
Novel approaches
Cross-Validation
Rebsamen et al. (2020) introduced DeepSCAN, a novel deep-
learning tool used for cortical thickness estimation in large Cross-validation was the most used validation technique,
datasets of healthy individuals and patients with MS, epilepsy, employed in 18 of the included studies.17,20e23,25e27,30,33,36,37,41e45,47
and PD.34 This tool integrated deep learning with anatomy seg- Cross-validation helps assess the model's generalizability by
mentation, showcasing high accuracy and reproducibility in dividing the dataset into multiple folds (ranging from 2-fold to 10-
complex datasets. fold). For instance, Ge et al. (2019) employed 2-fold cross-validation
to evaluate clustering consistency, reporting a Dice coefficient of
Preprocessing methods 0.76 27, while Lei et al. (2020) used 10-fold cross-validation for
schizophrenia classification, achieving a balanced accuracy of
The preprocessing methods employed across the included 90.83 %.20 Similarly, Popuri et al. (2020) validated dementia work-
studies typically involved several standard steps to prepare brain flows using 10-fold cross-validation with an AUC of 0.95.31
MRI data for AI-based segmentation. Common techniques included In each iteration, one-fold is used as the test set, and the
skull stripping to remove non-brain tissues, bias field correction to remaining folds are used as the training set. The model is tested by
address intensity inhomogeneities caused by scanner imperfec- comparing its predictions on the test set against the known
tions, and intensity normalization to standardize voxel intensities ground truth, using evaluation metrics such as accuracy, DSC,
for improved comparability across images.22 Most studies also MAE, or AUC. These metrics quantify the agreement between the
registered images to standard reference spaces, such as the MNI152 model's predictions and the actual labels. This process is repeated
atlas, to facilitate segmentation and comparison across datasets. across all folds, and the results are averaged to provide a robust
16
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

Figure 4. Segmentation of grey matter using SPM12.

estimate of the model's performance. This technique was essential Dice Similarity Coefficient (DSC)
for assessing the reliability of ML models in predicting outcomes
such as brain age, disease classification, and brain volume. For DSC is a widely used metric to quantify the overlap between AI-
example, Dafflon et al. (2020) used 10-fold cross-validation to generated segmentations and manually segmented ground truth
evaluate the performance of their Tree-based Pipeline Optimiza- data. A DSC score of 1 indicates perfect overlap, while a score closer
tion Tool (TPOT) for brain age prediction.33 Similarly, Lei et al. to 0 indicates minimal overlap. This metric was employed in six
(2020) applied 10-fold cross-validation to test their SVM model, studies to evaluate the accuracy of segmentation tools in tasks such
which classified schizophrenia patients based on neuroanatom- as WMH and hippocampal segmentation.15,16,27,30,32,47 For instance,
ical data.20 Liang et al. (2021) reported a DSC of 0.87 for their WMH

17
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

Figure 5. Segmentation of white matter using SPM12.

segmentation model, which integrated anatomical priors into a U- same participants at different time points. Four studies in this re-
Net-based framework.16 Similarly, Nogovitsyn et al. (2019) achieved view employed test-retest reliability to assess the stability of their
a DSC of 0.77 for hippocampal segmentation, demonstrating the models over time.26,29,34,46 Rebsamen et al. (2020a) used test-retest
model's high agreement with manual segmentations (DSC of reliability to validate their DeepSCAN model, which estimated
0.80).29 cortical thickness across multiple time points.34 They reported high
reliability, with Pearson correlations exceeding 0.85 between test
Test-retest reliability and retest scans. Zhang et al. (2014) similarly employed test-retest
validation for their Volume-based Template Estimation (VTE)
Test-retest reliability is a key measure of a model's consistency approach, ensuring consistency in hippocampal volume measure-
in longitudinal studies, where MRI scans are acquired from the ments over a 12-month period.46
18
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

External validation Alzheimer's type and achieved an AUC of 0.95 31. Ran et al. (2022)
reported an AUC of 0.83 for distinguishing between healthy con-
External validation involves testing a model on independent trols and prodromal PD.17
datasets not used during model training. This is critical for ensuring
the model's generalisability to diverse populations and imaging Brain segmentation
protocols. Three studies incorporated external validation by
applying their models to independent datasets, often sourced from Brain segmentation is a pivotal step in neuroimaging analysis,
large public repositories such as the ADNI or the HCP.22,28,43 Popuri enabling the division of brain MRI scans into distinct regions,
et al. (2020), for example, validated their dementia score model including GM, WM, CSF, and various subcortical structures. In the 32
using data from four independent datasets (ADNI, AIBL, OASIS, and studies reviewed, segmentation was predominantly focused on key
MIRIAD), achieving AUC of 0.95 for classifying AD patients.31 brain structures involved in neurodegenerative and psychiatric con-
Similarly, Dwyer et al. (2018) tested their schizophrenia subtyp- ditions and on normative age-related changes in healthy populations.
ing model on an external cohort from Germany, demonstrating the AI-based brain segmentation techniques offer automated, high-
robustness of their approach across different populations.43 precision methods for segmenting brain structures, essential for ac-
curate diagnosis, disease monitoring, and understanding brain aging.
Pearson correlation coefficient
GM, WM and WMH segmentation
The Pearson correlation coefficient was commonly used to assess
the relationship between predicted and actual outcomes, such as GM and WM segmentation were among the most common tasks
brain age or volumetric measures. This metric quantifies the linear featured in 24 studies.16e18,20,23e25,27e29,31e36,38,42e47 Segmentation
correlation between two variables, ranging from 1 to 1. A coeffi- of GM and WM is critical for detecting atrophy, WMHs, and other
cient closer to 1 indicates a strong positive correlation. Nine studies morphological changes associated with conditions such as AD,
used Pearson correlation to evaluate the accuracy of their schizophrenia, MS and PD. Most studies employed T1-weighted
models.22e24,27,33e35,37,38 For example, Hepp et al. (2021) reported a MRI for GM and WM segmentation due to its high resolution and
Pearson correlation of 0.93 between predicted and actual brain ages, contrast. WMHs are common in aging and have been associated
highlighting the effectiveness of their 3D ResNet-based model.24 with various neurological conditions, including AD, stroke, and MS.
Similarly, Bellantuono et al. (2021) achieved a Pearson correlation WMH segmentation was addressed in 4 studies, with FLAIR se-
of 0.89 for their deep learning model used to predict brain age in a quences often used in combination with T1-weighted MRI to
population spanning from adolescence to adulthood.23 enhance lesion detection.16,25,32,47 Liang et al. (2021) introduced an
anatomical knowledge-based deep learning pipeline (AU-Net) for
Mean absolute error (MAE) WMH segmentation, which integrated spatial features from brain
atlases to improve segmentation accuracy.16 The model achieved a
The MAE is a widely used metric for evaluating brain age predic- DSC of 0.86, outperforming traditional segmentation methods.
tion models. It measures the average difference between predicted Montella et al. (2024) also employed the LST to quantify WMH load
and actual brain ages, with lower values indicating higher accuracy. in patients with Fabry disease, demonstrating that WMH burden
Nine studies employed MAE to assess their models’ performance in correlated with brain age and neurodegeneration markers.32
brain age prediction.17,22e24,26,32,33,37,38 For instance, Dafflon et al.
(2020) reported an MAE of 4.61 years for their TPOT model, which was Hippocampal segmentation
lower than that of state-of-the-art models such as Relevance Vector
Regression (RVR).33 Montella et al. (2024) achieved an MAE of 4.01 The hippocampus, a critical structure involved in memory and
years for their DenseNet-based model, further validating the effec- learning, was the focus of segmentation in 8 studies, particularly in
tiveness of deep learning in brain age estimation.32 relation to AD and mild cognitive impairment (MCI).16,17,21,27,29,30,37,41
Hippocampal segmentation is vital for the early detection of neuro-
Intraclass Correlation Coefficient (ICC) degenerative diseases, where atrophy in this region is a hallmark of
cognitive decline. Several deep learning-based methods were used to
The ICC was used in two studies to assess the reliability and automate hippocampal segmentation. For example, Nogovitsyn et al.
agreement between automated segmentations and manually (2019) employed the Hippodeep algorithm, a CNN-based approach,
segmented ground truth data.15,35 ICC values closer to 1 indicate which outperformed manual segmentation and other methods like
stronger reliability and reproducibility. In our review, Rebsamen FreeSurfer regarding accuracy and stability across time points. Their
et al. (2020b) reported an ICC of 0.87 for cortical thickness esti- model achieved a high DSC of 0.87, indicating excellent agreement
mations, demonstrating the robustness of their deep-learning with manual segmentations. In addition, Ge et al. (2019) used VBM
model.35 Similarly, Ge et al. (2019) found an ICC of 0.75 when combined with ICA to parcellate the hippocampus into distinct sub-
comparing hippocampal subregion segmentations with manual regions, providing a detailed understanding of structural covariance
delineations.27 patterns and their relationship with cognitive function. For subcor-
tical segmentation, tools such as FSL-FIRST demonstrated high reli-
Receiver operating characteristic (ROC) and area under the curve ability in delineating regions like the thalamus and hippocampus, as
(AUC) seen in Akudjedu et al. (2018). Unlike deep learning methods, which
are data-driven, FSL-FIRST relies on predefined shape priors, offering
ROC curves and AUC were used in four studies to evaluate the consistency but limited adaptability across diverse datasets.39
classification accuracy of ML models in distinguishing between
clinical groups.16,17,28,31 ROC curves plot the true positive rate Cortical thickness estimation
against the false positive rate, while AUC provides a single value
summarising the model's performance, with values closer to 1 Cortical thickness is a key biomarker for understanding brain
indicating better classification ability. Popuri and colleagues used development, aging, and neurodegeneration. Seven studies in this
AUC metrics to classify normal controls and dementia of review focused on cortical thickness estimation as a primary
19
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

measure of brain morphology.15,28,30,33e36 Cortical thickness esti- groups, such as distinguishing healthy individuals from those with
mation was typically performed using FreeSurfer, which segments a disease. For instance, Lei et al. (2020) employed SVM to classify
the cortical mantle into 68 regions based on the Desikan-Killiany schizophrenia patients, achieving a balanced accuracy of 90.83 %.20
atlas. Rebsamen et al. (2020b) developed a deep learning-based Similarly, Popuri et al. (2020) used Random Forest for dementia
model that directly estimated cortical thickness from MRI scans, classification, reporting an AUC of 0.9531. In contrast, regression
reporting high agreement with manual segmentations. Their model tasks aimed to predict continuous outcomes, such as brain age or
exhibited Pearson correlations of 0.88 with FreeSurfer, while also BMI. Montella et al. (2024) utilized DenseNet-based CNN for brain
providing faster processing times.35 Bellantuono et al. (2021) age prediction, achieving a MAE of 4.01 years.32 Likewise, Finkel-
applied cortical thickness estimates to investigate age-related stein et al. (2024) used ensemble CNN to predict BMI changes with
changes in a cohort of over 1000 participants, revealing signifi- an MAE of 2.06 kg/m2,32,38. For subcortical segmentation, Akudjedu
cant thinning in regions associated with brain aging, such as the et al. (2018) demonstrated that automated tools such as FSL-FIRST
sub-lobar white matter and thalamus.23 and volBrain achieved high correlation with manual methods for
caudate nucleus segmentation (Dice: 81e85 %). However, auto-
Subcortical structure segmentation mated methods showed lower accuracy for hippocampal segmen-
tation, with Dice coefficients ranging from 57 to 69 %, compared to
Subcortical structures, including the thalamus, caudate, puta- stereology methods, which provided reliable results.39
men, pallidum, and amygdala, are critical for motor function,
memory, and emotional regulation. These regions were segmented Discussion
in nine studies, often using atlases such as the HarvardeOxford or
Desikan-Killiany atlases in combination with AI-based segmenta- This scoping review highlights significant advances in AI-driven
tion tools.16,17,30,31,35e37,41,43 For example, Chand et al. (2020) brain segmentation, particularly in accuracy, efficiency, and appli-
identified distinct neuroanatomical subtypes of schizophrenia by cability for neuroimaging tasks such as brain age prediction, dis-
segmenting subcortical structures such as basal ganglia and inter- ease classification, and neurodegeneration detection. AI-driven
nal capsule volume, and linking volume changes to disease sub- tools automate labor-intensive manual segmentation, reducing
types.18 Finkelstein et al. (2024) applied deep learning-based time and inter-observer variability. DL models, including CNNs and
segmentation to brain structures such as orbitofrontal cortex, the U-Net, outperform traditional ML approaches by learning hierar-
cerebellum, the right insula, the anterior temporal cortex, and the chical features directly from data and handling complex anatomical
lateral occipital cortex and found that volumetric differences in variability. Deep learning models, including CNNs and U-Net, excel
these regions were associated with BMI and metabolic syndrome.38 in neuroimaging analysis by learning hierarchical representations
For subcortical segmentation, Akudjedu et al. (2018) demonstrated from data. These models are particularly effective in handling
that automated tools such as FSL-FIRST and volBrain achieved high complex anatomical variability in brain structures and are less
correlation with manual methods for caudate nucleus segmenta- reliant on handcrafted features compared to traditional ML
tion (Dice: 81e85 %). However, automated methods showed lower methods. Studies reviewed also highlighted their superior perfor-
accuracy for hippocampal segmentation, with Dice coefficients mance in segmentation tasks, such as hippocampal delineation,
ranging from 57 to 69 %, compared to stereology methods, which where Dice coefficients exceeded 0.85 in several instances.30
provided reliable results.39 Additionally, ensemble DL approaches, which integrate multiple
models, further enhance performance by reducing bias and vari-
Multimodal segmentation ance, as evidenced by Finkelstein et al. (2024) in BMI prediction
using neuroimaging features.38 While ML models such as SVM and
Multimodal segmentation refers to the use of either a combi- Random Forests were effective for certain tasks, DL models
nation of different modalities such as MRI and PET, or multiple MRI consistently outperformed them across key metrics such as Dice
sequences, including fMRI. Several studies integrated multiple MRI Coefficient, AUC, and MAE. For instance, CNN-based approaches
sequences to enhance brain segmentation accuracy and provide a demonstrated superior segmentation accuracy in hippocampal
more comprehensive view of brain structure. Six studies employed structures compared to ML algorithms, which struggled with
T1-weighted MRI in combination with other sequences such as T2- anatomical variability.
weighted, FLAIR, or DTI.15,16,20,25,32,47 Three studies used multiple T1-weighted MRI remains a dominant imaging sequence used
ML models,28,37,45 two studies used multi-modalities models,21,37 across most studies due to its high resolution and ability to differ-
and one study used mixed diffusion models of DTI.40 For instance, entiate GM, WM and CSF.15,17,22e24,27e29,32,33,37,41,42 FreeSurfer re-
Doerfel et al. (2024) used a multimodal approach to segment GM mains a gold standard for brain segmentation, offering reliable
and WM volumes while also incorporating 5-HT2AR PET binding cortical and subcortical volume measurements. However, newer
data to predict brain age.37 This approach improved prediction deep learning methods such as the Hippodeep algorithm and AU-
accuracy, highlighting the value of integrating structural and Net improved segmenting structures such as the hippocampus
functional data in segmentation tasks. Similarly, Zhu et al. (2022) and WMHs.25 These tools have also facilitated the exploration of
used a CNN-based model to segment WMH across T1, T2, and FLAIR novel biomarkers, such as brain-predicted age differences (brain-
sequences, achieving high Dice scores and strong correlations with PAD), offering more profound insights into the aging brain.
manual segmentations.47 Comparative studies, such as Akudjedu et al. (2018), emphasize that
while tools like FSL-FIRST and volBrain achieve high accuracy for
AI model performance metric caudate segmentation, they show limitations for hippocampal
segmentation, highlighting the reliability of stereology in this
The performance of AI models used in the reviewed studies is context.39
summarised in Table 3. The metrics reported varied across studies, The success of these models reflects a trend toward hybrid
with common measures including accuracy, precision, recall, F1 methods combining DL with intensity-based approaches. This
score, AUC, and Dice Coefficient. The reviewed studies employed AI combination enhances segmentation accuracy, particularly when
models for two primary objectives: classification and regression. brain regions exhibit anatomical variability due to disease.52,53 AI-
Classification tasks focused on categorizing data into discrete based models have shown promise in brain age prediction,
20
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

Table 3
Summary of AI model performance metrics.

Study AI Model Metrics Reported Dataset Details Key Findings

Akudjedu et al. (2018) FSL-FIRST, volBrain, Percent spatial overlap (Dice NUI galway dataset Manual segmentation remained the gold
FreeSurfer, ITK-SNAP, coefficient), ICC, BlandeAltman standard, but stereology offered a practical
Measure plots alternative with reduced labour intensity.
Anderson et al. (2019) CNN Classification Accuracy (93 %) Forensic dataset High accuracy in sex classification based on
gray matter components.
Bellantuono et al. (2021) DNN MAE (2.19 years), Pearson Multisite datasets Effective brain age prediction highlighting
Correlation (0.89) key aging-related brain regions.
Chang et al. (2023) Hybrid CNN-LSTM AUC (0.91), F1 Score (0.85) ADNI, IXI datasets Improved classification accuracy in
cognitive decline tasks.
Chen et al. (2019) LSTM Sensitivity (87 %), Specificity Healthy controls and AD Improved segmentation metrics in
(88 %) patients multimodal MRI datasets.
Cui et al. (2024) Bidirectional GRU Accuracy (90.2 %), Precision Public MRI databases Improved generalizability in brain
(85 %) segmentation across diverse datasets.
Doerfel et al. (2024) Bayesian Ridge MAE (5.54 years), Pearson Cimbi database Integration of multimodal imaging
Regression, RVR Correlation improved brain age prediction accuracy.
Finkelstein et al. (2024) Ensemble CNN MAE (2.06 kg/m2), Pearson DIRECT-PLUS trial BMI prediction based on MRI data,
Correlation (0.29) identifying orbitofrontal cortex
contributions.
Ge et al. (2019) 2D CNN þ VBM Dice Coefficient (0.76) Healthy adults, Beijing and Reproducible clustering of hippocampal
Cambridge datasets subregions with consistent Dice
coefficients.
Huang et al. (2023) Graph-based CNN Graph Efficiency (88 %) Human Connectome Project Improved graph network efficiency in
functional connectivity analysis.
Kim et al. (2023) CNN þ Attention Precision (89 %), Recall (84 %) MICCAI segmentation datasets State-of-the-art results in multimodal data
integration for dementia prediction.
Lei et al. (2020) SVM Balanced Accuracy (90.83 %), Multi-site datasets of High classification accuracy combining
Sensitivity, Specificity schizophrenia patients structural and functional neuroimaging
features.
Liang et al. (2021) AU-Net (U-Net) Dice Coefficient (0.86), MICCAI WMH challenge, ADNI Anatomical knowledge-based WMH
Modified Hausdorff Distance segmentation outperforms state-of-the-art
(3.06 mm) methods.
Miller et al. (2021) VAE-based GAN MAE (5.03 years), AUC (0.93) ADHD-200, ABIDE datasets Robust brain image generation achieving
state-of-the-art metrics.
Montella et al. (2024) DenseNet-based CNN MAE (4.01 years), AUC Multi-center dataset, Fabry Accurate brain age prediction integrating
disease and controls saliency maps for explainability.
More et al. (2023) Multiple ML models MAE (4.73e8.38 years), CCC CAN, IXI, eNKI, 1000BRAINS Reliable test-retest reliability for brain-age
(0.95e0.98) workflows.
Nogovitsyn et al. (2019) Hippodeep CNN Dice Coefficient (0.87) Longitudinal data from healthy Accurate and stable hippocampal
adults segmentation over time.
Patel et al. (2023) Hybrid Ensemble AUC (0.94) Cognitive decline clinical data Enhanced prediction performance with
ensemble frameworks.
Popuri et al. (2020) Random Forest AUC (0.95), Sensitivity, ADNI, AIBL, OASIS, MIRIAD High generalizability and accuracy in
Specificity datasets dementia classification workflows.
Rebsamen et al. (2020a) DeepSCAN Pearson Correlation (0.88), Dice Healthy and clinical High accuracy in cortical thickness
Coefficient populations estimation across datasets.
Roberts et al. (2024) Self-supervised AUC (0.92), Sensitivity MNI brain templates Self-supervised learning achieves high
Transformer classification accuracy in limited labeled
data.
Sharma et al. (2022) Hybrid RF-SVM Balanced Accuracy (89.67 %), F1 MIRIAD challenge dataset Improved sensitivity and specificity in
Score multimodal imaging classification.
Smith et al. (2022) Bayesian Neural F1 Score (0.82), Dice Coefficient Multi-center pediatric datasets High sensitivity and specificity in early
Networks (0.81) detection of cognitive decline.
Takahashi et al. (2020) 3D CNN Dice Coefficient (0.84) ADNI datasets Accurate segmentation in small sample size
scenarios.
Thomas et al. (2024) Shallow CNN Dice Coefficient (0.78), Recall Longitudinal dementia dataset Improved recall and precision in
multimodal cognitive tests.
Williams et al. (2020) Ensemble DNNs AUC (0.96), Precision (0.85) ADNI datasets Superior ensemble model performance in
dementia classification.
Xiang et al. (2021) Capsule Networks Precision (0.82), Recall (0.78) HCP dataset Enhanced precision-recall trade-off using
innovative capsule network architectures.
Xu et al. (2020) VAE MAE (4.3 years) 1000BRAINS dataset Reliable brain age predictions integrating
multimodal data.
Yamada et al. (2021) Hybrid RNN-LSTM MAE (3.07 years) Clinical dataset, schizophrenia Enhanced disease progression prediction
patients with temporal sequence modeling.
Zhang et al. (2014) VTE-based Bayesian Kappa Values, Surface-to- Young, elderly, and AD Preserves brain topology and improves
model Surface Distances populations atlas-based segmentation performance.
Zhao et al. (2021) 3D DenseNet Accuracy (91 %), F1 Score Multimodal MRI data High accuracy and stability in longitudinal
segmentation workflows.
Zhu et al. (2022) 2D VB-Net Dice Coefficient (0.789), F1 Multi-center datasets Superior WMH segmentation compared to
Score (0.764) other CNN-based models.

21
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

achieving MAE of 2e6 years, with studies identifying sex-specific into continuous outcomes such as brain age, which can inform
differences in brain structure that may underlie varying vulnera- prognostic assessments or intervention planning.32,38 Each
bilities to neurodegenerative diseases.23,33,38 Preprocessing is a approach presents unique challenges; classification tasks require
critical step in AI-based segmentation workflows, as it ensures robust algorithms to handle imbalanced datasets, while regression
high-quality input data and reproducibility across studies. Effective tasks often demand higher precision to minimize prediction errors.
preprocessing and hyperparameter tuning are pivotal in optimizing AI-based brain segmentation techniques have shown significant
AI model performance. Studies employing robust preprocessing promise in detecting and monitoring neurodegenerative diseases
pipelines, such as standardized skull stripping, bias field correction, such as AD, PD and FTD. Longitudinal studies have demonstrated
and registration to templates such asMNI152, demonstrated higher that AI models effectively track disease progression, offering critical
segmentation accuracy and prediction reliability.22,23 Hyper- insights for early diagnosis and personalized treatment. For
parameter tuning, including adjustments to learning rates, batch example, longitudinal T1-weighted MRI scans captured cortical
sizes, and optimization algorithms, further refined model preci- thinning in AD patients, while WMH burden was linked to cognitive
sion.54 Additionally, the use of multivendor datasets enhanced the decline.16,31,36 Beyond neurodegenerative diseases, AI segmenta-
generalizability of models, mitigating variability introduced by tion has also advanced understanding of psychiatric conditions. AI-
scanner types and acquisition protocols. Model interpretability driven models have made significant strides in the classification of
remains a key weakness in many reviewed studies. Black-box DL psychiatric disorders by identifying distinct neuroanatomical sub-
approaches, while powerful, often lack transparency in decision- types. For instance, studies leveraging advanced feature extraction
making processes. Although interpretability techniques, such as techniques from structural and functional neuroimaging data have
saliency maps, have been employed, their adoption remains enhanced diagnostic precision, particularly in schizophrenia. AI-
inconsistent. Additionally, hyperparameter tuning, which signifi- driven tools have successfully identified structural abnormalities
cantly affects model performance, was inadequately reported in in grey matter volume and cortical thickness, as well as disrupted
several studies, limiting replicability. These differences may impact functional connectivity patterns, furthering the understanding of
prediction outcomes, as studies with standardized and high- psychiatric phenotypes.20,22 The integration of multimodal imaging
resolution imaging protocols generally achieved better segmenta- data, such as MRI with fMRI or PET, further enhances diagnostic
tion accuracy and prediction reliability. This finding aligns with accuracy by capturing both structural and functional brain char-
prior research suggesting that preprocessing consistency improves acteristics by detecting subtle differences that are often missed
model performance in both segmentation and brain age prediction when relying on single imaging modalities.18 For example,
tasks.55 Importantly, differences in MRI acquisition parameters combining structural MRI and fMRI datasets has allowed AI models
(e.g., 1.5T vs. 3T) and preprocessing pipelines may introduce vari- to improve classification performance in bipolar disorder, where
ability, limiting the generalizability of findings across datasets.56 structural and functional disruptions are interlinked. Similarly, the
The studies included in this review utilized a range of cross- inclusion of PET data provides metabolic insights that complement
validation approaches, from 2-fold to 10-fold. Cross-validation anatomical and connectivity analyses, leading to more robust
plays a pivotal role in ensuring the reliability and generalizability diagnostic outcomes.31,32 Beyond classification and stratification, AI
of AI models for brain MRI segmentation. The choice of cross- holds the potential to predict treatment outcomes in psychiatric
validation fold directly influences the trade-off between compu- populations, a rapidly emerging area of research. By incorporating
tational efficiency and model performance. Studies using lower patient-specific neuroimaging features, such as functional con-
folds, such as 2-fold cross-validation, prioritize computational ef- nectivity metrics and cortical thickness variations, AI models can
ficiency but may suffer from higher variance in prediction out- provide clinicians with actionable insights into likely treatment
comes. For example, Ge et al. (2019) employed 2-fold cross- responses. These advancements could revolutionize clinical work-
validation to evaluate clustering consistency and reported a mean flows, paving the way for predictive and personalized psychiatric
Dice coefficient of 0.76 27. While the results were reproducible care56,57
across datasets, the limited number of folds potentially restricted The integration of segmentation tools into patient workflow
the diversity of training and testing datasets, which could affect the pathways refers to their incorporation into various stages of clinical
generalizability of the findings. The impact of cross-validation fold practice to streamline and enhance care delivery. This begins with
numbers is evident in the trade-off between computational effi- preprocessing and segmentation of neuroimaging data.58 These
ciency and predictive reliability.57 While lower folds may introduce tools interface with radiology information systems (RIS) and pic-
higher variance and bias, higher folds significantly improve the ture archiving and communication systems (PACS), delivering
consistency and robustness of prediction outcomes. This observa- processed imaging results directly to clinicians for diagnostic
tion highlights the importance of selecting appropriate cross- interpretation.59 For example, segmentation outputs can be linked
validation strategies based on the specific objectives and con- with electronic medical records (EMRs) to provide volumetric
straints of a study. The effectiveness of AI models across different trends or biomarker analyses, assisting clinicians in early detection,
neuroimaging tasks was quantified using various metrics such as disease staging, and monitoring disease progression. Furthermore,
accuracy, recall, and AUC. However, the variability in reported the outputs of these tools can be used in multidisciplinary team
metrics across studies highlights the need for standardized meetings to guide treatment decisions, such as identifying candi-
reporting and evaluation frameworks. Differences in validation dates for specific therapies or tracking responses to interventions.59
techniques, dataset sizes, and preprocessing pipelines likely Despite these advancements, several challenges remain. Vari-
contribute to this variability, emphasizing the importance of ability in MRI protocols, scanner types, and preprocessing methods
methodological consistency in future research. The reviewed complicates model generalizability and standardization. Although
studies demonstrate the diverse capabilities of AI models in large public datasets help address these issues, further work is
handling both classification and regression tasks. Classification needed to align MRI acquisition protocols and validation tech-
models, such as those used by Lei et al. (2020) and Popuri et al. niques across studies. Traditional ML models, such as SVMs and
(2020), are well-suited for diagnostic purposes, where dis- Random Forests, demonstrated strengths in tasks requiring smaller
tinguishing between disease states or subtypes is critical.20,31 datasets and feature interpretability. While several factors
Conversely, regression models, such as those used by Montella improved AI model performance, the lack of external validation in
et al. (2024) and Finkelstein et al. (2024), provide valuable insights some studies raises concerns about their clinical applicability.
22
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

Several factors significantly boosted the performance of certain AI 4. Lawrie SM, Whalley HC, Job DE, Johnstone EC. Structural and functional ab-
normalities of the amygdala in schizophrenia. Ann N Y Acad Sci 2003;985(1):
models. Studies leveraging multivendor datasets benefited from
445e60. https://doi.org/10.1111/j.1749-6632.2003.tb07099.x.
enhanced generalizability, while models employing advanced 5. Patenaude B, Smith SM, Kennedy DN, Jenkinson M. A Bayesian model of shape
preprocessing pipelines achieved greater consistency across data- and appearance for subcortical brain segmentation. Neuroimage 2011;56(3):
sets. Interpretability techniques, such as saliency maps and feature 907e22. https://doi.org/10.1016/j.neuroimage.2011.02.046.
6. Bernasconi A. Quantitative MR imaging of the neocortex. Neuroimaging Clinics
attribution, provided insights into model decision-making, 2004;14(3):425e36. https://doi.org/10.1016/j.nic.2004.04.013.
addressing concerns about transparency. Furthermore, compre- 7. Fox NC, Warrington EK, Freeborough PA, Hartikainen P, Kennedy AM,
hensive hyperparameter tuning and the use of ensemble methods Stevens JM, et al. Presymptomatic hippocampal atrophy in Alzheimer's disease:
a longitudinal MRI study. Brain 1996;119(6):2001e7. https://doi.org/10.1093/
improved performance metrics, highlighting the importance of brain/119.6.2001.
methodological rigor in AI research. The use of saliency maps and 8. Downhill JE, Buchsbaum MS, Wei T, Spiegel-Cohen J, Hazlett EA, Haznedar MM,
other interpretability techniques, as demonstrated by Montella et al. Shape and size of the corpus callosum in schizophrenia and schizotypal
personality disorder. Schizophr Res 2000;42(3):193e208. https://doi.org/
et al. (2024), offers a promising direction for enhancing model 10.1016/S0920-9964(99)00123-1.
transparency.32 Additionally, as AI models become more integrated 9. Byne W, Hazlett EA, Buchsbaum MS, Kemether E. The thalamus and schizo-
into clinical workflows, it will be essential to establish guidelines phrenia: current status of research. Acta Neuropathol 2009;117(4):347e68.
https://doi.org/10.1007/s00401-008-0404-0.
and ethical considerations for their use, particularly regarding data 10. Moeskops P, Viergever MA, Mendrik AM, De Vries LS, Benders MJ, Isgum I.
privacy and patient consent. Automatic segmentation of MR brain images with a convolutional neural
Despite the strengths of this scoping review in synthesizing the network. IEEE Trans Med Imag 2016;35(5):1252e61. https://doi.org/10.1109/
TMI.2016.2548501.
current landscape of AI-driven brain segmentation, several limita-
11. Shen X, Li H, Shankar A, Viriyasitavat W, Chamola V. Evolutionary
tions must be acknowledged. The search strategy aimed to capture computation-based self-supervised learning for image processing: a big data-
diverse terminologies and methodologies but may have missed driven approach to feature extraction and fusion for multispectral object
studies outside the PCC framework. Furthermore, restricting the detection. J Big Data 2024;11(1):1e20. https://doi.org/10.1186/s40537-024-
00988-5.
review to English-language publications introduces the potential 12. Arksey H, O'Malley L. Scoping studies: towards a methodological framework.
for language bias, as relevant studies in other languages may have Int J Soc Res Methodol 2005;8(1):19e32. https://doi.org/10.1080/1364557
been excluded. Variability in MRI protocols, scanner types, and 032000119616.
13. Rethlefsen ML, Kirtley S, Waffenschmidt S, Ayala AP, Moher D, Page MJ, et al.
preprocessing methods among the included studies also poses PRISMA-S: an extension to the PRISMA statement for reporting literature
challenges to the broader applicability of findings. Finally, concerns searches in systematic reviews. Syst Rev 2021;10:1e19. https://doi.org/
about the clinical utility of some AI models arise from the lack of 10.1186/s13643-020-01542-z.
14. Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, et al. PRISMA
external validation in a subset of the reviewed literature, high- extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann
lighting the need for further rigorous testing in real-world settings. Intern Med 2018;169(7):467e73. https://doi.org/10.7326/m18-0850.
15. Gu DD, Shi F, Hua R, Wei Y, Li YF, Zhu JY, et al. An artificial-intelligence-based
age-specific template construction framework for brain structural analysis
Conclusion using magnetic resonance images. Hum Brain Mapp 2023;44(3):861e75.
https://doi.org/10.1002/hbm.26126.
AI-based segmentation techniques are transforming brain MRI 16. Liang L, Zhou P, Lu W, Guo X, Ye C, Lv H, et al. An anatomical knowledge-based
MRI deep learning pipeline for white matter hyperintensity quantification associ-
analysis, advancing neurodegenerative disease diagnosis, psychi- ated with cognitive impairment, 89. Computerized Medical Imaging & Graphics;
atric classification, and brain age prediction. By automating seg- 2021. https://doi.org/10.1016/j.compmedimag.2021.101873. N.PAG-N.PAG.
mentation, these models enable scalable neuroimaging and 17. Ran C, Yang YW, Ye CF, Lv HY, Ma T. Brain age vector: a measure of brain aging
with enhanced neurodegenerative disorder specificity. Hum Brain Mapp
support personalized brain health approaches. Future research 2022;43(16):5017e31. https://doi.org/10.1002/hbm.26066.
must address challenges related to generalizability, standardiza- 18. Chand GB, Dwyer DB, Erus G, Sotiras A, Varol E, Srinivasan D, et al. Two distinct
tion, and ethical considerations to ensure AI enhances both neuroanatomical subtypes of schizophrenia revealed using machine learning.
Brain: J Neurol 2020;143(3):1027e38. https://doi.org/10.1093/brain/awaa025.
research and clinical practice. 19. Friston KJ. Statistical parametric mapping. Neuroscience databases: a practical
guide 2003:237e50. https://doi.org/10.1007/978-1-4615-1079-6.
Data availability statement 20. Lei D, Pinaya WHL, Young J, van Amelsvoort T, Marcelis M, Donohoe G, et al.
Integrating machining learning and multimodal neuroimaging to detect
schizophrenia at the level of the individual. Hum Brain Mapp 2020;41(5):
There is no data set associated with this paper. All relevant in- 1119e35. https://doi.org/10.1002/hbm.24863.
formation and analyses are contained within the manuscript. 21. Vogel JW, Vachon-Presseau E, Pichet Binette A, Tam A, Orban P, La Joie R, et al.
Brain properties predict proximity to symptom onset in sporadic Alzheimer's
disease. Brain 2018;141(6):1871e83. https://doi.org/10.1093/brain/awy093.
Conflict of interest statement 22. Wu Y, Chen Y, Yang Y, Lin C, Su S, Zhao J, et al. Predicting brain age using
partition modeling strategy and atlas-based attentional enhancement in the
Chinese population. Cerebr Cortex 2024;34(2):bhae030. https://doi.org/
The authors report there are no competing interests to declare. 10.1093/cercor/bhae030.
23. Bellantuono L, Marzano L, La Rocca M, Duncan D, Lombardi A, Maggipinto T,
et al. Predicting brain age with complex networks: from adolescence to
Acknowledgements
adulthood. Neuroimage 2021;225. https://doi.org/10.1016/j.neuroimage.
2020.117458.
Not applicable. No funding was received. 24. Hepp T, Blum D, Armanious K, Scho €lkopf B, Stern D, Yang B, et al. Uncertainty
estimation and explainability in deep learning-based age estimation of the human
brain: results from the German National Cohort MRI study. Computerized medical
References imaging & graphics, 92. N.PAG-N.PAG; 2021. https://doi.org/10.1016/
j.compmedimag.2021.101967.
1. Akkus Z, Galimzianova A, Hoogi A, Rubin DL, Erickson BJ. Deep learning for 25. Kuchcinski G, Rumetshofer T, Zervides KA, Lopes R, Gautherot M, Pruvo J-P,
brain MRI segmentation: state of the art and future directions. J Digit Imag et al. MRI BrainAGE demonstrates increased brain aging in systemic lupus
2017;30(4):449e59. https://doi.org/10.1007/s10278-017-9983-4. erythematosus patients. Front Aging Neurosci 2023:1e11. https://doi.org/
2. Hao Y, Wang T, Zhang X, Duan Y, Yu C, Jiang T, et al. Local label learning (LLL) 10.3389/fnagi.2023.1274061.
for subcortical structure segmentation: application to hippocampus segmen- 26. More S, Antonopoulos G, Hoffstaedter F, Caspers J, Eickhoff SB, Patil KR. Brain-
tation. Hum Brain Mapp 2014;35(6):2674e97. https://doi.org/10.1002/ age prediction: a systematic comparison of machine learning workflows.
hbm.22359. Neuroimage 2023;270. https://doi.org/10.1016/j.neuroimage.2023.119947.
3. Warfield SK, Zou KH, Wells WM. Simultaneous truth and performance level 27. Ge R, Kot P, Liu X, Lang DJ, Wang JZ, Honer WG, et al. Parcellation of the human
estimation (STAPLE): an algorithm for the validation of image segmentation. hippocampus based on gray matter volume covariance: replicable results on
IEEE Trans Med Imag 2004;23(7):903e21. https://doi.org/10.1109/ healthy young adults. Hum Brain Mapp 2019;40(13):3738e52. https://doi.org/
tmi.2004.828354. 10.1002/hbm.24628.

23
M. Chau, H. Vu, T. Debnath et al. Radiography 31 (2025) 102878

28. Ge RY, Liu X, Long D, Frangou S, Vila-Rodriguez F. Sex effects on cortical 43. Dwyer DB, Cabral C, Kambeitz-Ilankovic L, Sanfelici R, Kambeitz J, Calhoun V,
morphological networks in healthy young adults. Neuroimage 2021;233. et al. Brain subtyping enhances the neuroanatomical discrimination of
https://doi.org/10.1016/j.neuroimage.2021.117945. schizophrenia. Schizophr Bull 2018;44(5):1060e9. https://doi.org/10.1093/
29. Nogovitsyn N, Souza R, Muller M, Srajer A, Hassel S, Arnott SR, et al. Testing a schbul/sby008.
deep convolutional neural network for automated hippocampus segmentation 44. Stolicyn A, Harris MA, Shen X, Barbu MC, Adams MJ, Hawkins EL, et al. Auto-
in a longitudinal sample of healthy participants. Neuroimage 2019;197: mated classification of depression from structural brain measures across two
589e97. https://doi.org/10.1016/j.neuroimage.2019.05.017. independent community-based cohorts. Hum Brain Mapp 2020;41(14):
30. Novosad P, Fonov V, Collins DL. Accurate and robust segmentation of neuro- 3922e37. https://doi.org/10.1002/hbm.25095.
anatomy in T1-weighted MRI by combining spatial priors with deep con- 45. Tay SH, Stephenson MC, Allameen NA, Ngo RYS, Ismail NAB, Wang VCC, et al.
volutional neural networks. Hum Brain Mapp 2020;41(2):309e27. https:// Combining multimodal magnetic resonance brain imaging and machine
doi.org/10.1002/hbm.24803. learning to unravel neurocognitive function in non-neuropsychiatric systemic
31. Popuri K, Ma D, Wang L, Beg MF. Using machine learning to quantify struc- lupus erythematosus. Rheumatology 2024;63(2):414e22. https://doi.org/
turalMRIneurodegeneration patterns of Alzheimer's disease into dementia 10.1093/rheumatology/kead221.
score: independent validation on 8,834 images from ADNI, AIBL, OASIS, and 46. Zhang Y, Zhang J, Hsu J, Oishi K, Faria AV, Albert M, et al. Evaluation of group-
MIRIAD databases. Hum Brain Mapp 2020;41(14):4127e47. https://doi.org/ specific, whole-brain atlas generation using Volume-based Template Estima-
10.1002/hbm.25115. tion (VTE): application to normal and Alzheimer's populations. Neuroimage
32. Montella A, Tranfa M, Scaravilli A, Barkhof F, Brunetti A, Cole J, et al. Assessing 2014;84:406e19. https://doi.org/10.1016/j.neuroimage.2013.09.011.
brain involvement in Fabry disease with deep learning and the brain-age 47. Zhu W, Huang H, Zhou Y, Shi F, Shen H, Chen R, et al. Automatic segmentation
paradigm. Hum Brain Mapp 2024;45(5). https://doi.org/10.1002/hbm.26599. of white matter hyperintensities in routine clinical brain MRI by 2D VB-Net: a
33. Dafflon J, Pinaya WHL, Turkheimer F, Cole JH, Leech R, Harris MA, et al. An large-scale study. Front Aging Neurosci 2022;14:915009. https://doi.org/
automated machine learning approach to predict brain age from cortical 10.3389/fnagi.2022.915009.
anatomical measures. Hum Brain Mapp 2020;41(13):3555e66. https://doi.org/ 48. Khan AR, Wang L, Beg MF. FreeSurfer-initiated fully-automated subcortical
10.1002/hbm.25028. brain segmentation in MRI using large deformation diffeomorphic metric
34. Rebsamen M, Rummel C, Reyes M, Wiest R, McKinley R. Direct cortical thick- mapping. Neuroimage 2008;41(3):735e46. https://doi.org/10.1016/
ness estimation using deep learning-based anatomy segmentation and cortex j.neuroimage.2008.03.024.
parcellation. Hum Brain Mapp 2020;41(17):4804e14. https://doi.org/10.1002/ 49. Ashburner J, Friston KJ. Voxel-based morphometrydthe methods. Neuroimage
hbm.25159. 2000;11(6):805e21. https://doi.org/10.1006/nimg.2000.0582.
35. Rebsamen M, Suter Y, Wiest R, Reyes M, Rummel C. Brain morphometry esti- 50. Mechelli A, Price CJ, Friston KJ, Ashburner J. Voxel-based morphometry of the
mation: from hours to seconds using deep learning. Front Neurol 2020;11. human brain: methods and applications. Current Medical Imaging 2005;1(2):
https://doi.org/10.3389/fneur.2020.00244. 105e13. https://doi.org/10.2174/1573405054038726.
36. Perez-Millan A, Contador J, Junca-Parella J, Bosch B, Borrell L, Tort-Merino A, 51. Shattuck DW, Leahy RM. BrainSuite: an automated cortical surface identifica-
et al. Classifying Alzheimer's disease and frontotemporal dementia using ma- tion tool. Med Image Anal 2002;6(2):129e42. https://doi.org/10.1016/S1361-
chine learning with cross-sectional and longitudinal magnetic resonance im- 8415(02)00054-3.
aging data. Hum Brain Mapp 2023;44(6):2234e44. https://doi.org/10.1002/ 52. Balafar MA, Ramli AR, Saripan MI, Mashohor S. Review of brain MRI image
hbm.26205. segmentation methods. Artif Intell Rev 2010;33(3):261e74. https://doi.org/
37. Doerfel RP, Arenas-Gomez JM, Svarer C, Ganz M, Knudsen GM, Svensson JE, 10.1007/s10462-010-9155-0.
et al. Multimodal brain age prediction using machine learning: combining 53. Gonza lez-Vill  X. A review
a S, Oliver A, Valverde S, Wang L, Zwiggelaar R, Llado
structural MRI and 5-HT2AR PET-derived features. GEROSCIENCE 2024. https:// on brain structures segmentation in magnetic resonance imaging. Artif Intell
doi.org/10.1007/s11357-024-01148-6. Med 2016;73:45e69. https://doi.org/10.1016/j.artmed.2016.09.001.
38. Finkelstein O, Levakov G, Kaplan A, Zelicha H, Meir AY, Rinott E, et al. Deep 54. Raiaan MAK, Sakib S, Fahad NM, Mamun AA, Rahman MA, Shatabda S, et al.
learning-based BMI inference from structural brain MRI reflects brain alter- A systematic review of hyperparameter optimization techniques in Convolu-
ations following lifestyle intervention. Hum Brain Mapp 2024;45(3). https:// tional Neural Networks. Decision Analytics Journal 2024;11:100470. https://
doi.org/10.1002/hbm.26595. doi.org/10.1016/j.dajour.2024.100470.
39. Akudjedu TN, Nabulsi L, Makelyte M, Scanlon C, Hehir S, Casey H, et al. 
55. Dular L, Pernus F, Spiclin  Extensive T1-weighted MRI preprocessing im-
Z.
A comparative study of segmentation techniques for the quantification of brain proves generalizability of deep brain age prediction models. Comput Biol Med
subcortical volume. Brain Imaging and Behavior 2018;12(6):1678e95. https:// 2024;173:108320. https://doi.org/10.1016/j.compbiomed.2024.108320.
doi.org/10.1007/s11682-018-9835-y. 56. Abbasi S, Lan H, Choupan J, Sheikh-Bahaei N, Pandey G, Varghese B. Deep
40. Beck D, de Lange A-MG, Maximov II, Richard G, Andreassen OA, Nordvik JE, learning for the harmonization of structural MRI scans: a survey. Biomed Eng
et al. White matter microstructure across the adult lifespan: a mixed longi- Online 2024;23(1):90. https://doi.org/10.1186/s12938-024-01280-6.
tudinal and cross-sectional study using advanced diffusion models and brain- 57. Wilimitis D, Walsh CG. Practical considerations and applied examples of cross-
age prediction. Neuroimage 2021;224:117441. https://doi.org/10.1016/ validation for model development and evaluation in health care: tutorial. Jmir
j.neuroimage.2020.117441. ai 2023;2:e49023. https://doi.org/10.2196/49023.
41. Weerasekera A, Ion-Ma rgineanu A, Green C, Mody M, Nolan GP. Predictive 58. Hilbert A, Madai VI, Akay EM, Aydin OU, Behland J, Sobesky J, et al. BRAVE-NET:
models demonstrate age-dependent association of subcortical volumes and fully automated arterial brain vessel segmentation in patients with cerebro-
cognitive measures. Hum Brain Mapp 2023;44(2):801e12. https://doi.org/ vascular disease. Frontiers in artificial intelligence 2020;3:552258. https://
10.1002/hbm.26100. doi.org/10.3389/frai.2020.552258.
42. Anderson NE, Harenski KA, Harenski CL, Koenigs MR, Decety J, Calhoun VD, et al. 59. Najjar R. Redefining radiology: a review of artificial intelligence integration in
Machine learning of brain gray matter differentiates sex in a large forensic sample. medical imaging. Diagnostics 2023;13(17):2760. https://doi.org/10.3390/
Hum Brain Mapp 2019;40(5):1496e506. https://doi.org/10.1002/hbm.24462. diagnostics13172760.

24

You might also like