0% found this document useful (0 votes)
14 views

Predicting Brain Age using ml algorithms

Uploaded by

kobovap895
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Predicting Brain Age using ml algorithms

Uploaded by

kobovap895
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2021.3083187, IEEE Journal of
Biomedical and Health Informatics

IEEE JOURNAL ON BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2021 1

Predicting brain age using machine learning


algorithms: A comprehensive evaluation
Iman Beheshti , M.A. Ganaie , Vardhan Paliwal , Aryan Rastogi , Imran Razzak, M. Tanveer∗

Abstract— Machine learning (ML) algorithms play a vital trajectory’, whereas a large brain age-delta is indicative of an
role in the brain age estimation frameworks. The impact of ‘accelerative cognitive aging’, pointing to a higher risk of age-
regression algorithms on prediction accuracy in the brain related neurological diseases or abnormal brain changes for a
age estimation frameworks have not been comprehensively
evaluated. Here, we sought to assess the efficiency of given age [2]. To date, brain age metric has been successfully
different regression algorithms on brain age estimation. To used in the context of different neurological disorders such
this end, we built a brain age estimation framework based as Alzheimer’s disease (AD) [3] - [4], Parkinson’s disease
on a large set of cognitively healthy (CH) individuals (N = [5], Epilepsy [6], and Schizophrenia [7]. A summary of brain
788) as a training set followed by different regression algo- age estimation studies in the context of clinical application is
rithms (22 different algorithms in total). We then quantified
each regression-algorithm on independent test sets com- presented in [1].
posed of 88 CH individuals, 70 mild cognitive impairment The prediction accuracy level in the brain age estimation
patients as well as 30 Alzheimer’s disease patients. The frameworks is associated with different items such as feature
prediction accuracy in the independent test set (i.e., CH set) extraction methods, data reduction strategies, bias correction
varied in regression algorithms mean absolute error (MAE) methods, and regression algorithms. In the context of feature-
from 4.63 to 7.14 yrs, R2 from 0.76 to 0.88. The highest and
lowest prediction accuracies were achieved by Quadratic
extraction, various neuroimaging modalities such as anatomi-
Support Vector Regression algorithm (MAE = 4.63 yrs, cal MRI [1], [8], [9], functional MRI [10], fluorodeoxyglucose
R2 = 0.88, 95% CI = [−1.26, 1.42]) and Binary Decision Tree positron emission tomography imaging [3], and diffusion
algorithm (MAE = 7.14 yrs, R2 = 0.76, 95% CI = [−1.50, 2.62]), tensor imaging [10] can be used to extract the brain imaging
respectively. Our experimental results demonstrate that the features after respective preprocessing stage. Among different
prediction accuracy in brain age frameworks is affected by neuroimaging modalities, anatomical MRI is the most fre-
regression algorithms, indicating that advanced machine
learning algorithms can lead to more accurate brain age quently used in brain age studies because of its widespread
predictions in clinical settings. availability, excellent spatial resolution, and good tissue con-
trast. When the number of extracted features is larger than
Index Terms— Brain age, estimation, machine learning,
algorithms, regression, T1-weighted MRI. the number of samples, a data reduction technique, such as
principal component analysis (PCA), can be used for avoiding
the curse of dimensionality [5]. In the prediction stage, a
I. I NTRODUCTION supervised learning technique (i.e., regression algorithm) is
Recent times have witnessed an increased interest in the used to predict the brain age values for the given input data.
brain age-delta as a heritable metric for monitoring cognitively The prediction model in a brain age estimation framework
healthy (CH) aging and diagnosing various neurological dis- is vital to accurately predict the brain age values for clinical
orders and co-morbidities [1]. The brain age-delta is defined applications. The most widely used regression algorithms
as the difference between the chronological age and the age include Gaussian process regression [11] - [12], and support
predicted from machine learning models trained on brain- vector regression [4], [6], [8]. While considering the regression
imaging data. The brain shrinks with increasing age, and algorithm for brain age estimation, the following points should
there are changes at all levels, from molecules to morphology. be considered:
A brain age-delta equal to zero indicates a ‘healthy aging • The algorithm should be accurate and sensitive to various

∗ Corresponding
data points in the training data. Generally, the perfor-
author
Iman Behesti is with Department of Human Anatomy and Cell Sci- mance of such models is measured on the basis of Mean
ence, Rady Faculty of Health Sciences, Max Rady College of Medicine, Absolute Error (MAE) between the predicted age and the
University of Manitoba, Winnipeg, MB, Canada (e-mail (Iman Beheshti): chronological age.
[email protected]).
M.A. Ganaie and M. Tanveer are with the Department of Mathemat- • The chosen algorithm should be able to draw a relation
ics, Indian Institute of Technology Indore, Simrol, Indore, 453552, India between naturally occurring variation, such as that caused
(e-mail (M.A. Ganaie): [email protected], e-mail (M. Tanveer): by genetic factors. Many aspects of brain aging and
[email protected]).
Vardhan Paliwal and Aryan Rastogi are with the Department of Elec- susceptibility to age-related brain disease are thought to
trical Engineering, Indian Institute of Technology Indore, Simrol, Indore, be under genetic influence. Hence, the model should be
453552, India (e-mail (Vardhan Paliwal): [email protected], e-mail capable to “learn” these variations.
(Aryan Rastogi): [email protected]).
Imran Razzak is with the School of Information Technology, Deakin • The algorithm should be able to produce reliable results
University, Geelong, Australia (e-mail: [email protected]). across different datasets and patient groups.

2168-2194 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 30,2021 at 23:38:20 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2021.3083187, IEEE Journal of
Biomedical and Health Informatics
2 IEEE JOURNAL ON BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2021

To date, few brain age studies have addressed the effects (i.e., total of 3, 747 voxel) were considered as features for
of regression algorithms on prediction accuracy in the brain regression models.
age estimation frameworks [4], [13]. For instance, Valizadeh
and peers [13] investigated six statistical regression algorithms C. Regression Algorithms
(random forest, multiple linear regression, neural network,
ridge regression, k-nearest neighborhood, and support vector The regression algorithms evaluated in this study for brain
machine) on brain age prediction results based on brain age estimation are explained below:
anatomical measurements (e.g. thicknesses, volumes, and cor- 1) Linear Regression: Linear Regression is an approach
tical surfaces) among CH individuals. They reported the best to model the relationship between a scalar (dependent
results based on Neural Network and Support Vector Machine variable) and one or more explanatory variables (inde-
(SVM) based algorithms (R2 = 0.84) over the entire dataset. pendent variables). This relationship is modeled using
The most significant issue raised in [13] was that the effects a linear predictor function whose unknown model pa-
of different regression algorithms should be assessed at the rameters are estimated from the data. Linear regression
clinical level (i.e., testing on clinical populations). In order to models are often fitted using the least-squares approach,
address this issue, we conducted this study to comprehensively but they can also be fitted in other ways.
assess the brain age prediction results followed by various 2) Support Vector Regression: Drucker et al. [14] ini-
salient regression techniques (22 different algorithms in total) tially proposed Support Vector Regression (SVR), a
not only on CH individuals but also in the clinical population supervised learning algorithm based on the Support
(i.e., the context of neurodegeneration, such as that due to AD). Vectors by Vapnik and co-workers [15]. SVR aims at
We also adjudge the best performing regression technique for curtailing the error by determining the hyperplane and
this task, and discuss future works needed in this direction. minimizing the range between predicted values and true
labels. SVR performs better in many cases and is also
II. S UBJECTS AND M ETHODS flexible in dealing with geometry, transmission, data-
generalization, and provides additional kernel function-
A. Subjects
ality. This added functionality enhances the model’s
In total, 976 participants from the IXI dataset capacity for predictions by considering the quality of
(http://www.brain-development.org/ixi-dataset/) and OASIS features.
dataset (http://www.oasis-brains.org/) were placed in this 3) Binary Decision Tree: Binary Decision Tree [16] is a
study, including 876 CH individuals, mild cognitive supervised machine learning that operates by subjecting
impairment (MCI) patients (N = 70, mean age ± SD: 76.21 attributes to a series of binary decisions. Each decision
years ± 7.18, age range: 62–92 years), where SD is the leads to one of the two possibilities - either it will
Standard Deviation, and probable AD patients (N = 30, lead to another decision, or it will lead to a prediction.
mean age ± SD: 78.03 years ± 6.91, age range: 65–96 years). In a regression tree, regression model fits to the target
To build the brain age estimation framework, we randomly variable using each of the independent variables. After
used 90% of the CH individuals (N = 788, mean age ± this, the data is split at several points for each indepen-
SD: 47.40 years ± 19.69, age range: 18–94 years) as the dent variable. At each such point, the error between the
training set. The independent test sets were composed of the predicted values and actual values is squared to get “A
remainder of the CH individuals (10%, N = 88, mean age Sum of Squared Errors” (SSE). The SSE is compared
± SD: 48.17 years ± 17.73, age range: 20–87 years), MCI across the variables, and the variable or point which
patients, and AD patients. In terms of chronological age, has the lowest SSE is chosen as the split point. This
there were no significant differences between CH individuals process is continued recursively to finally predict the
in training set and test set (t = 0.35, P = 0.72), and between output value.
MCI patients and AD patients (t = 1.17, P = 0.24). 4) Ensemble Trees : Ensemble learning leads to better
performance as compared to the individual learners [17],
B. MRI Processing [18]. Ensemble Trees combine several decision trees
All T1-weighted brain MRI scans were preprocessed to produce better predictive performance than utilizing
using Statistical Parametric Mapping toolbox version 12 a single decision tree. The ensemble model’s main
(http://www.fil.ion.ucl.ac.uk/spm/software/spm12/) and the principle is that a group of weak learners come together
CAT12 package (http://dbm.neuo.uni-jena.de) followed by to form a strong learner. Popular techniques which are
the default set of parameters. In summary, MRI processing used to perform ensemble trees are:
included bias-field distortions correction, removal of non- • Bagging [19]: is used when the goal of our model
brain tissue, normalization to standard space, and brain is to reduce the variance of a decision tree. It is
tissue segmentation into gray matter (GM), white matter, and done by creating several subsets of data from the
cerebrospinal fluid components. In this study, only the GM training sample chosen randomly with replacement.
images were used. The GM images were smoothed using Now, each collection of subset data is used to train
a 4-mm full-with-half maximum Gaussian kernel and then the decision tree. As a result, we end up with
re-sampled to 8-mm isotropic spatial resolution. For each an ensemble of different models. Average of all
subject, the GM signal intensities extracted from whole-brain the predictions from different trees is used as the

2168-2194 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 30,2021 at 23:38:20 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2021.3083187, IEEE Journal of
Biomedical and Health Informatics
BEHESHTI et al.: PREDICTING BRAIN AGE USING MACHINE LEARNING ALGORITHMS: A COMPREHENSIVE EVALUATION 3

final prediction. Ensembling leads to more robust 10) Nyström Ridge Regression: The Nyström method [26]
predictions than a single decision tree. is a technique for reducing the computational load of
• Least-Squares: Least-squares boosting (LSBoost) kernel methods by replacing the kernel matrix with a
fits regression ensembles. At every step, the ensem- low rank approximation. The approximation is achieved
ble fits a new regression learner to the difference by projecting the data matrix on a subset of data points,
between the observed response and the aggregated resulting in a linear system that is cheaper to solve.
prediction of all learners grown previously. The 11) Fast Decorrelated Neural Network Ensembles
ensemble fits to minimize mean-squared error. (DNNE) : DNNE [27] is an ensemble learning approach
[18] that uses Random Vector Functional Link (RVFL)
5) Epsilon Twin Support Vector Regression (ETSVR) : [28], [29] networks as base components, and it is fitted in
ETSVR is an ε-twin support vector regression (ε-TSVR) a negative correlation learning framework. Since RVFL
developed by [20] and is based on TSVR (Twin Support networks do not require learning all their parameters
Vector Regression). ε-TSVR determines a pair of ε- (basis functions are set randomly), DNNE computes a
insensitive proximal functions by solving two related simple and fast solution to calculate the base RVFL
SVM-type problems. The structural risk minimization networks’ output weights by considering the correlation
principle is implemented in ε-TSVR by introducing among the base RVFL networks in the output space and
the regularization term in primal problems of ε-TSVR, making sure that it is at minimum. Although DNNE
yielding the dual problems to be stable positive definite aims at encouraging the diversity among ensemble com-
quadratic programming problems, so can improve the ponents by reducing the correlation among their outputs,
performance of regression. In addition, the successive it still maintains an overall good ensemble accuracy.
over-relaxation technique is used to solve the opti- 12) k-Nearest Neighbors (kNN) : The k-Nearest Neigh-
mization problems to speed up the training procedure. bors algorithm is essentially non-parametric classifica-
For more details about the variants of twin support tion method, which was later expanded for regression
vector machines (TSVM) for classification, regression [30]. Under this algorithm, the closest ’k’ samples from
and clustering, interested readers are referred to the the dataset are taken with respect to the object under
recent TSVM Survey [21]. consideration. The algorithm utilizes Euclidean distance
6) Lasso Regression: The ‘Lasso’ method for estimation to find the nearest neighbors to an object. The output
of linear models, proposed by Robert Tibshirani [22] value is found as the average of the values of the
minimizes the residual sum of squares subject to the ’k’ nearest neighbors. Improvements such as Weighted
sum of the absolute value of the coefficients being less Mean rule [31] enhance the accuracy of the vanilla
than a constant. Owing to the nature of this constraint, algorithm.
it tends to produce some coefficients that are exactly 0 13) Neural Network (NN): Neural Networks (NN) [32] are
and hence gives interpretable models. comprised of node layers, that include an input layer, one
7) Ridge Regression: Ridge regression [23] is a model or more hidden layers, and an output layer. In general,
tuning method that is used to analyze the data that each node or neuron, connects to every other node and
suffers from multi-collinearity. This method uses l2 reg- have an associated weight and threshold. If the output
ularization. When the issue of multi-collinearity occurs, of any individual node is above the specified threshold
least-squares are unbiased, and variances are significant. value, that node is activated, sending the information to
Hence, the predicted values are far away from the actual the next layer of the network. If the output is below
values. Ridge regression adds a small bias factor to the that specified threshold, the passage of information to
variables in order to alleviate this problem. the next layer through that node does not occur.
8) Gaussian Processes for Regression: Gaussian Process 14) Regularized K-Nearest Neighbor based Weighted
regression [24] is a non-parametric regression approach. Twin Support Vector Regression(RKNNWTSVR):
Rather than calculating the probability distribution of RKNNWTSVR [33] is an extremely efficient and simple
parameters of a specific function, it calculates the prob- algorithm which implements structural risk minimization
ability distribution over all admissible functions that fit principle be introducing extra regularization terms in
the data. Owing to large variety of kernel functions each objective function. This algorithm not only allevi-
available for the Gaussian processes, it can be widely ates the overfitting issue and improves the generalization
used for any specific dataset. performance, but also introduces invertibility in the dual
9) Kernel Ridge Regression: Kernel Ridge Regression formulation. The square of the 2-norm of the vector of
(KRR) [25] combines ridge regression with the kernel slack variables is used to make the objective functions
trick. It thus learns a linear function in the space induced strongly convex. This ensures that only two systems
by the respective kernel and the data. For non-linear of linear equations are required to be solved, which
kernels, this corresponds to a non-linear function in the results in low computational cost and and improved
original space. KRR uses squared error loss combined generalisation performance.
with l2 regularization. Fitting a KRR model can be done 15) Lagrangian Twin Support Vector Regression
in closed-form and is typically fast for medium-sized (LTSVR): LTSVR [34] is a linearly convergent
datasets. algorithm wherein the unknown regressor is obtained

2168-2194 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 30,2021 at 23:38:20 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2021.3083187, IEEE Journal of
Biomedical and Health Informatics
4 IEEE JOURNAL ON BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2021

without directly solving the twin QPPs as in SVR and achieved by the Quadratic SVR (MAE = 4.63 yrs, RMSE =
TSVR. This algorithm ensures strong convex nature 6.29 yrs, R2 = 0.88, 95% CI = [-1.26 , 1.42]). The Binary
of the objective functions without any non-negative Decision Tree shows poor performance on this testing set,
constraints, and does not require any optimization reporting a larger MAE and RMSE value and a low R2 value
packages. Compared to TSVR, this algorithm achieves (MAE = 7.14 yrs, RMSE = 9.67 yrs, R2 = 0.76, 95% CI =
similar or better generalisation performance with a [-1.50 , 2.62]). The ideal value of mean brain age-delta for
reduction in overall training time. this section is 0, and hence all the models under assessment
D. Validation, experimental Setup and statistical analysis return this value close to zero. Some models such as Gaussian
The performance of each prediction algorithm was com- SVR returned a deviation from expected, which implies that
puted using 10-fold cross validation strategy on the training set these models were underestimating the Brain Age, despite
(N= 788). To avoid of curse of dimensionality, data reduction the comparable MAE and RMSE to the rest of the models.
was performed by PCA technique. The number of principal The association between brain age-delta and chronological
components was set at 100 per subject for all experiments. age in the CH set was insignificant for all prediction models
Age-dependent bias-correction was done as described in [35]. (P>0.05).
The performance of each prediction model was reported based 2) Performance measurements in the clinical sets: Given
on mean absolute error (MAE), root mean squared error that the brain age-delta is a more important item in clinical
(RMSE), the coefficient of determination (R2 ), brain age- sets, we only report mean brain age-delta and respective 95%
delta(∆) (i.e., chronological age subtracted from predicted CI values for these groups (i.e., MCI and AD). Since our
brain age), and 95% Confidence Interval (CI). Thereafter, we models were trained on the dataset based on CH individuals,
built the final prediction model using the entire training set a relatively large brain age-delta values was expected for the
(i.e., N = 788) and then applied to the independent test sets datasets based on individuals belonging to the MCI and the
(i.e., 88 CH individuals, 70 MCI patients, 30 AD patients) AD sections, with patients suffering from AD expected to have
to estimate the brain ages. The loss function for training was a greater brain age-delta than MCI patients. The boxplots for
chosen as MAE. Hyperparameter tuning (wherever applicable) the brain age-delta are plotted in Fig. 1, over different test sets.
was done by applying Grid Search for each tunable parameters The model labels in the plots correspond to the Serial Number
on the range as specified by the corresponding literature. (S.No) of the discussed regression techniques in Tables II-
The statistical comparisons of groups were conducted by IV. From the plots, we find that the brain age-delta is almost
an analysis of variance (ANOVA) test followed by Post-hoc zero for the samples in CH individual set, while samples in
analyses using Tukey’s HSD at a significance level of 5%. All the clinical sets (i.e., MCI/AD patients) result in a relatively
the codes and statistical tests were run through the MATLAB® positive brain age-delta compared with the CH individuals
software (version - R2020b) on a machine with an Intel i5- (MCI, from +2.53 to +11.68 years; AD, from +6.77 to +16.69
10210u processor and 8GB of RAM. years) for all regression models. The median of brain age-delta
III. R ESULTS values is larger for samples in the AD set, as compared to the
MCI set, indicating that the rate of morphological changes in
A. Experimental results for the training set AD group is higher than MCI group.
Referring to Table I, we find that all of the prediction models 3) Statistical analysis on test sets: A summary of statistical
gave a mean brain age-delta of 0 and a high coefficient of analysis on independent test sets in term of brain-age delta is
determination R2 (from 0.81 to 0.95), which is consistent with presented in Table IV. An ANOVA test showed that the brain-
the metrics expected in the training process with the given age delta was statistically significant among three groups (i.e.,
dataset. The Gaussian SVR particularly returned the highest CH vs. MCI vs. AD) for all regression models (P<0.05). As
estimation accuracy (MAE = 3.04 yrs, RMSE = 4.45 yrs, R2 for CH vs. AD, all regression models showed a significant
= 0.95, 95% CI = [-0.31 , 0.31]), which can be attributed higher brain-age delta in AD patients than CH individuals
to the model giving an almost exact fitting the data points (Post-hoc comparisons using the Tukey HSD test, P<0.05).
in the training set. The models based on Lasso Regression With the exception of Lasso Regression, Ridge Regression,
and Ridge Regression also provided a good MAE (of about and Binary Decision Tree, all other algorithm models yielded
4.70 years) on the training set. The Gaussian Regression with significantly higher brain-age delta values in MCI patients than
kernel chosen as “Squared Exponential” exhibited markedly CH individuals (Post-hoc comparisons using the Tukey HSD
worse performance (MAE = 7.54 yrs, RMSE = 9.43 yrs, R2 test, P<0.05). Also, most regression models showed a higher
= 0.81, 95% CI = [-0.66 , 0.66]) on training data than other brain-age delta values in AD patients than MCI patients (Post-
prediction models. hoc comparisons using the Tukey HSD test, P<0.05) except
for Ensemble Trees (Bag), Binary Decision Tree and DNNE
models.
B. Experimental results on the independent test sets
1) Performance measurements on cognitively healthy individ-
IV. D ISCUSSION
uals: Referring to the Table II, the range of MAE in CH group
was from 4.63 to 7.14 yrs. Indeed, most prediction models In recent years, various machine learning models have been
showed a very good prediction accuracy on CH individuals developed for the estimation of Brain Age, which is the hypo-
as independent test set. The highest prediction accuracy was thetical age of an organism. Brain Age can be estimated based

2168-2194 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 30,2021 at 23:38:20 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2021.3083187, IEEE Journal of
Biomedical and Health Informatics
BEHESHTI et al.: PREDICTING BRAIN AGE USING MACHINE LEARNING ALGORITHMS: A COMPREHENSIVE EVALUATION 5

TABLE I: Summary of the performance results based on different prediction algorithms in the training set (cognitively healthy
individuals, N=788)
S.No Regression Model MAE (Years) RMSE (Years) Mean brain age-delta (Years) 95 % CI Values R2 Score
1. Linear SVR 5.40 6.92 0 [-0.48 , 0.48] 0.88
2. Quadratic SVR 5.36 6.84 0 [-0.48 , 0.48] 0.89
3. Gaussian SVR 3.04 4.45 0 [-0.31 , 0.31] 0.95
4. Ensemble Trees (Bag) 5.48 7.16 0 [-0.50 , 0.50] 0.88
5. Ensemble Trees (LSBoost) 6.71 8.62 0 [-0.60 , 0.60] 0.84
6. Linear Regression 5.40 6.91 0 [-0.48 , 0.48] 0.89
7. Lasso Regression 4.77 6.23 0 [-0.44 , 0.44] 0.91
8. Ridge Regression 4.74 6.21 0 [-0.43 , 0.43] 0.91
9. Binary Decision Tree 5.72 7.37 0 [-0.52 , 0.52] 0.88
Gaussian Regression
10. 5.29 6.78 0 [-0.47 , 0.47] 0.89
(Kernel - Exponential)
Gaussian Regression
11. 7.54 9.43 0 [-0.66 , 0.66] 0.81
(Kernel - Squared Exponential)
Gaussian Regression
12. 5.33 6.81 0 [-0.48 , 0.48] 0.89
(Kernel - Matern 32)
Gaussian Regression
13. 5.34 6.82 0 [-0.48 , 0.48] 0.89
(Kernel - Matern 52)
Gaussian Regression
14. 5.35 6.85 0 [-0.48 , 0.48] 0.89
(Kernel - Rational-Quadratic)
15. ETSVR (Kernel - Linear) 5.30 6.77 0 [-0.47 , 0.47] 0.89
Kernel Ridge Regression
16. 5.34 6.81 0 [-0.48 , 0.48] 0.89
(Kernel - Linear)
Nyström Kernel Ridge Regression
17. 5.37 6.84 0 [-0.48 , 0.48] 0.89
(Kernel - Linear)
18. DNNE 5.65 7.30 0 [-0.51 , 0.51] 0.88
19. kNN (Weighted Mean) 5.41 7.01 0 [-0.49 , 0.49] 0.89
20. Neural Network 6.65 8.88 0 [-0.62 , 0.62] 0.83
21. RKNNWTSVR (Kernel - Linear) 5.45 6.93 0 [-0.48 , 0.48] 0.89
22. LTSVR (Kernel - Linear) 5.33 6.77 0 [-0.47 , 0.47] 0.89
MAE: Mean Absolute Error, RMSE: Root Mean Square Error, R2 : Coefficient of Determination

TABLE II: Summary of the performance results based on different prediction algorithms in the testing set (cognitively healthy
individuals, N=88)
S.No Regression Model MAE (Years) RMSE (Years) Mean brain age-delta (Years) 95% CI Values R2 Score
1. Linear SVR 5.15 6.54 0.30 [-1.09 , 1.70] 0.87
2. Quadratic SVR 4.63 6.29 0.08 [-1.26 , 1.42] 0.88
3. Gaussian SVR 5.90 7.70 -1.64 [-3.25 , -0.04] 0.85
4. Ensemble Trees (Bag) 5.63 7.41 0.64 [-0.93 , 2.22] 0.87
5. Ensemble Trees (LSBoost) 6.64 8.39 0.21 [-1.57 , 2.00] 0.82
6. Linear Regression 5.16 6.54 0.26 [-1.13 , 1.66] 0.87
7. Lasso Regression 5.38 6.67 -0.11 [-1.44 , 1.41] 0.86
8. Ridge Regression 5.27 6.59 -0.06 [-1.44 , 1.36] 0.86
9. Binary Decision Tree 7.14 9.67 0.56 [-1.50 , 2.62] 0.76
Gaussian Regression
10. 5.08 6.89 0.01 [-1.46 , 1.48] 0.87
(Kernel - Exponential)
Gaussian Regression
11. 5.82 7.57 0.44 [-1.17 , 2.05] 0.90
(Kernel - Squared Exponential)
Gaussian Regression
12. 5.10 6.90 0.12 [-1.35 , 1.59] 0.86
(Kernel - Matern 32)
Gaussian Regression
13. 5.13 6.90 0.16 [-1.31 , 1.63] 0.86
(Kernel - Matern 52)
Gaussian Regression
14. 5.11 6.92 0.13 [-1.34 , 1.61] 0.86
(Kernel - Rational-Quadratic)
15. ETSVR (Kernel - Linear) 4.98 6.31 -0.04 [-1.39 , 1.30] 0.88
Kernel Ridge Regression
16. 5.09 6.41 0.2 [-1.17 , 1.55] 0.87
(Kernel - Linear)
Nyström Kernel Ridge Regression
17. 5.16 6.50 0.24 [-1.14 , 1.63] 0.87
(Kernel - Linear)
18. DNNE 5.54 7.20 0.42 [-1.11 , 1.95] 0.85
19. kNN (Weighted Mean) 5.56 7.30 0.15 [-1.41 , 1.70] 0.86
20. Neural Network 6.46 8.35 -0.79 [-2.56 , 0.98] 0.82
21. RKNNWTSVR (Kernel - Linear) 5.21 6.54 0.28 [-1.12 , 1.67] 0.87
22. LTSVR (Kernel - Linear) 5.01 6.37 0.17 [-1.18 , 1.53] 0.88
MAE: Mean Absolute Error, RMSE: Root Mean Square Error, R2 : Coefficient of Determination

on many brain imaging bio-markers. The identification of these Brain Age is of fundamental influence for developing Machine
brain imaging biomarkers that have a prominent influence on Learning based models that can monitor healthy aging and

2168-2194 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 30,2021 at 23:38:20 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2021.3083187, IEEE Journal of
Biomedical and Health Informatics
6 IEEE JOURNAL ON BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2021

TABLE III: Summary of brain age values based on different prediction algorithms in the clinical set (MCI: N=70, AD: N =
30)
MCI AD
S.No Regression Model Mean brain age-delta (Years) 95% CI Values Mean brain age-delta (Years) 95% CI Values
1. Linear SVR 4.13 [2.01 , 6.27] 10.40 [7.60 , 13.22]
2. Quadratic SVR 7.79 [5.24 , 10.34] 15.09 [11.51 , 18.67]
3. Gaussian SVR 4.91 [2.67 , 7.15] 9.98 [7.35 , 12.62]
4. Ensemble Trees (Bag) 5.67 [3.74 , 7.59] 8.00 [5.46 , 10.55]
5. Ensemble Trees (LSBoost) 6.28 [4.09 , 8.47] 11.27 [8.23 , 14.31]
6. Linear Regression 4.05 [1.93 , 6.18] 10.32 [7.52 , 13.12]
7. Lasso Regression 2.60 [0.58 , 4.87] 9.14 [6.21 , 11.76]
8. Ridge Regression 2.53 [0.41 , 4.75] 8.87 [6.03 , 11.82]
9. Binary Decision Tree 2.58 [0.06 , 5.10] 7.35 [4.26 , 10.43]
Gaussian Regression
10. 6.14 [4.13 , 8.15] 10.61 [8.02 , 13.12]
(Kernel - Exponential)
Gaussian Regression
11. 11.68 [9.66 , 13.70] 16.69 [14.36 , 19.02]
(Kernel - Squared Exponential)
Gaussian Regression
12. 6.24 [4.20 , 8.27] 10.80 [8.21 , 13.39]
(Kernel - Matern32)
Gaussian Regression
13. 6.22 [4.17 , 8.26] 10.81 [8.24 , 13.38]
(Kernel - Matern52)
Gaussian Regression
14. 6.24 [4.20 , 8.27] 10.74 [8.15 , 13.21]
(Kernel - Rational Quadratic)
15. ETSVR (Kernel - Linear) 4.22 [2.13 , 6.31] 10.56 [7.75 , 13.37]
Kernel Ridge Regression
16. 4.05 [2.31 , 6.44] 10.22 [7.79 , 13.35]
(Kernel - Linear)
Nyström Kernel Ridge Regression
17. 4.36 [2.31 , 6.40] 10.35 [7.58 , 13.11]
(Kernel - Linear)
18. DNNE 3.66 [1.81 , 5.51] 6.77 [4.37 , 9.18]
19. kNN (Weighted Mean) 6.52 [4.37 , 8.67] 10.47 [7.75 , 13.18]
20. Neural Network 5.14 [2.86 , 7.42] 8.26 [5.07 , 11.44]
21. RKNNWTSVR (Kernel - Linear) 4.87 [2.82 , 6.93] 10.80 [7.98 , 13.61]
22. LTSVR (Kernel - Linear) 4.69 [2.64 , 6.73] 10.65 [7.88 , 13.42]

TABLE IV: Summary of statistical tests among independent test groups


CH vs. MCI vs. AD CH vs. MCI MCI vs. AD CH vs. AD
S.No Regression Model F-Value log10 (P ) MD log10 (P ) MD log10 (P ) MD log10 (P )
1. Linear SVR 19.96 7.85 3.83 2.28 6.27 3.27 10.10 8.62
2. Quadratic SVR 37.64 13.71 7.71 7.01 7.30 3.44 15.00 9.02
3. Gaussian SVR 26.57 10.14 6.56 5.72 5.07 1.88 11.63 8.99
4. Ensemble Trees (Bag) 14.34 5.79 5.03 3.98 2.34 0.48 7.36 4.88
5. Ensemble Trees (LSBoost) 21.27 8.32 6.07 4.43 4.99 1.64 11.06 8.23
6. Linear Regression 19.84 7.81 3.79 2.24 6.27 3.28 10.06 8.59
7. Lasso Regression 14.91 6.00 2.63 1.05 6.38 3.27 9.01 6.83
8. Ridge Regression 15.07 6.06 2.68 1.09 6.36 3.26 9.03 6.89
9. Binary Decision Tree 5.35 2.26 2.02 0.39 4.77 1.17 6.79 2.50
Gaussian Regression
10. 26.80 10.22 6.13 5.97 4.47 1.75 10.60 8.99
(Kernel - Exponential)
Gaussian Regression
11. 67.71 22.07 11.24 9.02 5.01 2.07 16.25 9.02
(Kernel - Squared Exponential)
Gaussian Regression
12. 26.68 10.18 6.12 5.86 4.57 1.80 10.68 8.99
(Kernel - Matern32)
Gaussian Regression
13. 26.31 10.06 6.06 5.73 4.59 1.81 10.65 8.98
(Kernel - Matern52)
Gaussian Regression
14. 26.37 10.08 6.11 5.84 4.50 1.75 10.61 8.97
(Kernel - Rational Quadratic)
15. ETSVR (Kernel - Linear) 23.18 8.98 4.25 2.92 6.34 3.49 10.59 8.99
Kernel Ridge Regression
16. 22.45 8.73 4.18 2.86 6.19 3.36 10.38 8.95
(Kernel - Linear)
Nyström Kernel Ridge Regression
17. 21.28 8.32 4.11 2.75 5.99 3.14 10.10 8.83
(Kernel - Linear)
18. DNNE 9.51 3.93 3.24 1.80 3.11 0.90 6.36 3.92
19. kNN (Weighted Mean) 23.47 9.08 6.37 5.72 3.94 1.21 10.32 8.40
20. Neural Network 15.36 6.17 5.93 4.07 3.12 0.62 9.04 5.39
21. RKNNWTSVR (Kernel - Linear) 23.18 8.98 4.60 3.38 5.92 3.02 10.52 8.97
22. LTSVR (Kernel - Linear) 23.53 9.10 4.51 3.35 5.96 3.15 10.47 8.98
MD: mean difference, statistical tests with a significant value (i.e., P<0.05) are indicated in bold.

can provide new tools to screen health status and the risk of of this study was to judge which of the algorithms provide the
clinical events in the general population. Hence, the purpose best prediction for the Brain Age under clinical settings for

2168-2194 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 30,2021 at 23:38:20 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2021.3083187, IEEE Journal of
Biomedical and Health Informatics
BEHESHTI et al.: PREDICTING BRAIN AGE USING MACHINE LEARNING ALGORITHMS: A COMPREHENSIVE EVALUATION 7

Fig. 1: The box-plots showing the brain-age delta followed by different regression algorithms on independent test sets. A) CH
individuals, B) MCI patients, and C) AD patients. Model 1 = Linear SVR, Model 2 = Quadratic SVR, Model 3 = Gaussian
SVR, Model 4 = Ensemble Trees (Bag) , Model 5 = Ensemble Trees (LSBoost), Model 6 = Linear Regression, Model 7 =
Lasso Regression, Model 8 = Ridge Regression, Model 9 = Binary Decision Tree, Model 10 = Gaussian Regression (Kernel-
Exponential), Model 11 = Gaussian Regression (Kernel-Squared Exponential), Model 12 = Gaussian Regression (Kernel-
Matern32), Model 13 = Gaussian Regression (Kernel-Matern52), Model 14 = Gaussian Regression (Kernel-Rational-Quadratic),
Model 15 = ETSVR (Kernel - Linear), Model 16 = Kernel Ridge Regression (Kernel-Linear), Model 17 = Nystrom Kernel
Ridge Regression, Model 18 = DNNE, Model 19 = kNN (Weighted Mean), Model 20 = Neural Network (NN), Model 21 =
RKNNWTSVR, Model 22 = LTSVR.

2168-2194 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 30,2021 at 23:38:20 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2021.3083187, IEEE Journal of
Biomedical and Health Informatics
8 IEEE JOURNAL ON BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2021

different class of patients such as those suffering from Mild current regression models are unable to produce accurate
Cognitive Impairement, or severe neurological impairments results for the clinical setting, which manifests the inadequate
such as Alzheimer’s Disease. transferability of these models to the clinical setting. This
Twenty two different linear and non-linear regression mod- throws light on the direction of future works in this domain,
els were evaluated in this study. The “linear” regression models and highlights the need for a regression architecture specifi-
include: Linear Regression, Lasso, Ridge, Linear ε-TSVR and cally tailored for Brain Age estimation and possesses a high
SVR (linear kernel), Linear LTSVR, Linear KNNWTSVR. transferability to the clinical settings after being trained on
The “non-linear” methods include: SVR (Quadratic and Gaus- a set comprising of Healthy Individuals. In this study, we
sian kernels), Ensemble Trees, Binary Decision Tree, KRR, assessed the effects of regression algorithms on T1-weighted
Nyström-KRR, Gaussian Regression, DNNE, kNN and Neural voxel-wise metrics. Future studies are also needed to examine
Network. the effects of regression models on other neuroimaging data
Considering Linear SVR as the baseline regression model, such as T1-weighted region-wise metrics, metabolic brain
we find that for the dataset comprising of CH individuals, features, and diffusion tensor imaging data.
the Quadratic SVR gave the best overall training and testing This study can be viewed as a comprehensive reference
performance. The reason may be the nature of Quadratic to researchers aiming to perform further study using Brain
kernel, which fit the used dataset more accurately. Further- Age as an efficient biomarker for various underlying dis-
more, Kernel Ridge Regression and its Nyström variant also eases. Supplementary file containing the details of the regres-
gave a good performance on the CH set, reporting low MAE sion algorithms and source Matlab codes related to different
scores on both the training and testing data. Also, all Gaussian regression algorithms used in this study are available at
Regression models with different kernels (except Rational https://github.com/mtanveer1.
Quadratic Kernel) returned low errors. The Linear ε-Twin SVR V. C ONCLUSIONS
also performed well in the train and test cycles for the CH set. This study aimed to comprehensively evaluate various re-
On the other hand, we found that Gaussian SVR, because of gression models for estimating Brain Age not only on CH
its high overfitting, gave the lowest MAE on training set while individuals but also in clinical population. We assessed 22
it reported high MAE on test set (see Table II). Ensemble trees different regression models on a dataset comprising CH indi-
(LSBoost), Binary Decision Tree and Gaussian Regression viduals as a training set. We then quantified each regression
(with Squared Exponential Kernel) performed poor and all model on independent test sets composed of CH individuals,
the three reported very high MAE values for both the training MCI subjects, and AD patients. Our comprehensive evaluation
and CH individual data as independent test set.This may be suggests that the type of regression algorithm affects down-
attributed to the fact that the Binary Decision Tree just tries stream comparisons between groups, and caution should be
to minimize the “Sum of Squared Errors” which results in its taken to select the regression model in clinical settings.
poor performance. On the assessment of the prediction results ACKNOWLEDGMENT
of the regression models on the clinical set, all prediction
models highlighted a positive brain age-delta (see Table II, This work is supported by Science and Engineering Re-
III), indicating that MCI/AD patients have markedly “older- search Board (SERB), Government of India under Ramanujan
looking” brains when compared to CH individuals, confirming Fellowship Scheme, Grant No. SB/S2/RJN-001/2016, and
earlier reports [1]. Council of Scientific & Industrial Research (CSIR), New
In the context of statistical tests among independent test sets, Delhi, INDIA for funding under Extra Mural Research (EMR)
all prediction models showed significant differences among Scheme grant no. 22(0751)/17/EMR-II. We gratefully ac-
the three groups (See Table IV). However, some prediction knowledge the Indian Institute of Technology Indore for
models did not show a significant difference between sub- providing the required facilities and support for this work.
groups. For instance, the brain-age delta in MCI patients Besides, this study was performed based on multiple
compared to CH individual group did not statistically differ samples of participants. We wish to acknowledge all par-
for Lasso Regression, Ridge Regression, and Binary Decision ticipants and principal investigators who collected these
Tree models (Post-hoc comparisons using the Tukey HSD test, datasets and agreed to let them accessible: The Open Ac-
P>0.05). Also, the brain-age delta in AD patients compared cess Series of Imaging Studies (OASIS), Cross-Sectional,
to MCI patients was not statistically different for Ensemble Principal Investigators: D. Marcus, R, Buckner, J, Cser-
Trees (Bag), Binary Decision Tree and DNNE models (Post- nansky J. Morris, P50 AG05681, P01 AG03991, P01
hoc comparisons using the Tukey HSD test, P>0.05). Indeed, AG026276, R01 AG021910, P20 MH071616, U24 RR021382,
caution must be taken in selecting the prediction model for see http://www.oasis-brains.org/ for more details. The IXI
brain age estimation for clinical purposes. data were supported by the U.K. Engineering and Physi-
In addition to the most frequently used regression models in cal Sciences Research Council (EPSRC) GR/S21533/02, see
brain age estimation studies (i.e., Gaussian process regression http://www.brain-development.org/ for more details.
and support vector regression), we used prediction algorithms AUTHOR CONTRIBUTIONS
for the first time (11, 5, 9, 4), which showed comparable I.B. and M.T. designed the research; I.B. collected data and
performance to the current state-of-the-art models (See Table performed pre-processing. V.P., A.R., and M.A.G. performed
I, and II). numerical experiments, analyzed data, and wrote the paper.
This comparative study also highlights that most of the M.T. and I.R. edited the paper. M.T. supervised the study.

2168-2194 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 30,2021 at 23:38:20 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2021.3083187, IEEE Journal of
Biomedical and Health Informatics
BEHESHTI et al.: PREDICTING BRAIN AGE USING MACHINE LEARNING ALGORITHMS: A COMPREHENSIVE EVALUATION 9

R EFERENCES [24] C. E. Rasmussen, “Gaussian processes in machine learning,” in Summer


School on Machine Learning. Springer, 2003, pp. 63–71.
[1] K. Franke and C. Gaser, “Ten years of brainage as a neuroimaging [25] C. Saunders, A. Gammerman, and V. Vovk, “Ridge regression learning
biomarker of brain aging: what insights have we gained?” Frontiers in algorithm in dual variables,” 1998.
Neurology, vol. 10, p. 789, 2019. [26] C. Williams and M. Seeger, “Using the nyström method to speed up
[2] J. H. Cole, S. J. Ritchie, M. E. Bastin, M. V. Hernández, S. M. Maniega, kernel machines,” in Proceedings of the 14th annual conference on
N. Royle, J. Corley, A. Pattie, S. E. Harris, Q. Zhang, et al., “Brain age neural information processing systems, no. CONF, 2001, pp. 682–688.
predicts mortality,” Molecular Psychiatry, vol. 23, no. 5, pp. 1385–1392, [27] M. Alhamdoosh and D. Wang, “Fast decorrelated neural network
2018. ensembles with random weights,” Information Sciences, vol. 264,
[3] I. Beheshti, S. Nugent, O. Potvin, and S. Duchesne, “Disappearing pp. 104–117, 2014, serious Games. [Online]. Available: https:
metabolic youthfulness in the cognitively impaired female brain,” Neu- //www.sciencedirect.com/science/article/pii/S0020025513008669
robiology of Aging, 2021. [28] L. Zhang and P. N. Suganthan, “A comprehensive evaluation of random
[4] K. Franke, G. Ziegler, S. Klöppel, C. Gaser, and A. D. N. Initiative, vector functional link networks,” Information Sciences, vol. 367, pp.
“Estimating the age of healthy subjects from T1-weighted MRI scans 1094–1105, 2016.
using kernel methods: exploring the influence of various parameters,” [29] M. A. Ganaie, M. Tanveer, and P. N. Suganthan, “Minimum variance
Neuroimage, vol. 50, no. 3, pp. 883–892, 2010. embedded random vector functional link network,” in International
[5] I. Beheshti, S. Mishra, D. Sone, P. Khanna, and H. Matsuda, “T1- Conference on Neural Information Processing. Springer, 2020, pp.
weighted MRI-driven brain age estimation in alzheimer’s disease and 412–419.
parkinson’s disease,” Aging and Disease, vol. 11, no. 3, p. 618, 2020. [30] N. S. Altman, “An introduction to kernel and nearest-neighbor
[6] D. Sone, I. Beheshti, N. Maikusa, M. Ota, Y. Kimura, M. Koepp, and nonparametric regression,” The American Statistician, vol. 46, no. 3, pp.
H. Matsuda, “Neuroimaging-based brain-age prediction in diverse forms 175–185, 1992. [Online]. Available: http://www.jstor.org/stable/2685209
of epilepsy: a signature of psychosis and beyond,” Molecular Psychiatry, [31] S. A. Dudani, “The distance-weighted k-nearest-neighbor rule,” IEEE
06 2019. Transactions on Systems, Man, and Cybernetics, vol. SMC-6, no. 4, pp.
[7] I. Nenadić, M. Dietzek, K. Langbein, H. Sauer, and C. Gaser, “Brainage 325–327, 1976.
score indicates accelerated brain aging in schizophrenia, but not bipolar [32] X. Glorot and Y. Bengio, “Understanding the difficulty of training
disorder,” Psychiatry Research: Neuroimaging, vol. 266, pp. 86–89, deep feedforward neural networks,” in Proceedings of the Thirteenth
2017. International Conference on Artificial Intelligence and Statistics. JMLR
[8] I. Beheshti, P. Gravel, O. Potvin, L. Dieumegarde, and S. Duchesne, “A Workshop and Conference Proceedings, 2010, pp. 249–256.
novel patch-based procedure for estimating brain age across adulthood,” [33] M. Tanveer, K. Shubham, M. Aldhaifallah, and S. S. Ho, “An efficient
NeuroImage, vol. 197, pp. 618–624, 2019. [Online]. Available: regularized k-nearest neighbor based weighted twin support vector
https://www.sciencedirect.com/science/article/pii/S1053811919304173 regression,” Knowledge-Based Systems, vol. 94, pp. 70–87, 2016.
[9] A. Cherubini, M. E. Caligiuri, P. Péran, U. Sabatini, C. Cosentino, and [34] S. Balasundaram and M. Tanveer, “On Lagrangian twin support vector
F. Amato, “Importance of multimodal MRI in characterizing brain tissue regression,” Neural Computing and Applications, vol. 21, 05 2013.
and its potential application for individual age prediction,” IEEE Journal [35] I. Beheshti, S. Nugent, O. Potvin, and S. Duchesne, “Bias-adjustment
of Biomedical and Health Informatics, vol. 20, no. 5, pp. 1232–1239, in neuroimaging-based brain age frameworks: A robust scheme,”
2016. NeuroImage: Clinical, vol. 24, p. 102063, 2019. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S2213158219304103
[10] J. H. Cole, “Multimodality neuroimaging brain-age in UK biobank:
relationship to biomedical, lifestyle, and cognitive factors,” Neurobiology
of Aging, vol. 92, pp. 34–42, 2020.
[11] J. H. Cole, R. Leech, D. J. Sharp, and A. D. N. Initiative, “Prediction
of brain age suggests accelerated atrophy after traumatic brain injury,”
Annals of Neurology, vol. 77, no. 4, pp. 571–581, 2015.
[12] J. H. Cole, J. Underwood, M. W. Caan, D. De Francesco, R. A. van
Zoest, R. Leech, F. W. Wit, P. Portegies, G. J. Geurtsen, B. A. Schmand,
et al., “Increased brain-predicted aging in treated HIV disease,” Neurol-
ogy, vol. 88, no. 14, pp. 1349–1357, 2017.
[13] S. Valizadeh, J. Hänggi, S. Mérillat, and L. Jäncke, “Age prediction on
the basis of brain anatomical measures,” Human Brain Mapping, vol. 38,
no. 2, pp. 997–1008, 2017.
[14] H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik,
“Support vector regression machines,” Advances in Neural Information
Processing Systems, vol. 9, pp. 155–161, 1997.
[15] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning,
vol. 20, no. 3, pp. 273–297, 1995.
[16] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification
and Regression Trees. CRC press, 1984.
[17] Y. Ren, L. Zhang, and P. N. Suganthan, “Ensemble classification
and regression-recent developments, applications and future directions,”
IEEE Computational intelligence magazine, vol. 11, no. 1, pp. 41–53,
2016.
[18] M. A. Ganaie, M. Hu, M. Tanveer, and P. N. Suganthan, “Ensemble
deep learning: A review,” arXiv preprint arXiv:2104.02395, 2021.
[19] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp.
123–140, 1996.
[20] Y.-H. Shao, C.-H. Zhang, Z.-M. Yang, L. Jing, and N.-Y. Deng, “An
ε-twin support vector machine for regression,” Neural Computing and
Applications, vol. 23, no. 1, pp. 175–185, 2013.
[21] M. Tanveer, T. Rajani, R. Rastogi, and Y. H. Shao, “Compre-
hensive review on twin support vector machines,” arXiv preprint
arXiv:2105.00336, 2021.
[22] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Jour-
nal of the Royal Statistical Society: Series B (Methodological), vol. 58,
no. 1, pp. 267–288, 1996.
[23] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation
for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67,
1970.

2168-2194 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 30,2021 at 23:38:20 UTC from IEEE Xplore. Restrictions apply.

You might also like