Smith 2019
Smith 2019
NeuroImage
journal homepage: www.elsevier.com/locate/neuroimage
A R T I C L E I N F O A B S T R A C T
Keywords: It is of increasing interest to study “brain age” - the apparent age of a subject, as inferred from brain imaging data.
Brain aging The difference between brain age and actual age (the “delta”) is typically computed, reflecting deviation from the
Brain imaging population norm. This therefore may reflect accelerated aging (positive delta) or resilience (negative delta) and
UK biobank
has been found to be a useful correlate with factors such as disease and cognitive decline. However, although
there has been a range of methods proposed for estimating brain age, there has been little study of the optimal
ways of computing the delta. In this technical note we describe problems with the most common current
approach, and present potential improvements. We evaluate different estimation methods on simulated and real
data. We also find the strongest correlations of corrected brain age delta with 5,792 non-imaging variables (non-
brain physical measures, life-factor measures, cognitive test scores, etc.), and also with 2,641 multimodal brain
imaging-derived phenotypes, with data from 19,000 participants in UK Biobank.
1. Introduction dataset of multiple subjects' features, and their true ages, are fed into a
supervised-learning algorithm (e.g., regression, support vector machine,
Brain imaging (and other sources of relevant data) can be used to deep learning), which learns to predict the subjects' ages from their brain
predict “brain age” - the apparent age of individuals, when comparing imaging features. The hope is that a given subject's predicted brain age
their data against a population dataset spanning a range of ages. The will deviate from their true age according to a meaningful delta, as long
difference between brain age and actual age (the “delta”) is often then as this training is not badly overfitting.
computed, providing a measure of whether a subject's brain appears to While there have been a range of methods proposed for estimating
have aged more or less than the population average for their actual brain age (i.e., choice of imaging-derived features to use, and choice of
chronological age. For example, looking at structural MRI data, a high supervised-learning approach), there has been very little study of the
degree of atrophy (e.g., caused by disease) would cause a subject's brain optimal ways of then computing the delta. Presumably this is in part
to appear older than a normal age-matched brain [Franke et al., 2010; because this seems like a very simple calculation (delta equals brain age
Cole et al., 2017; Cole and Franke, 2017]. minus age). However, there are frequently various sources of bias in
The approach typically taken is to use one or more imaging modal- estimating brain age delta, which can give rise to significant false posi-
ities, for example, acquiring a T1-weighted structural image from each tives and false negatives when looking for associations between delta and
subject. The data then receives some level of preprocessing, e.g., align- other measures [Le et al., 2018]. Here we describe some important
ment to standard space and tissue type segmentation. The imaging data problems with this most common approach, and present potential im-
then becomes “features” for predicting brain age - for example, from provements via explicitly laid out (albeit simple) mathematical
voxelwise maps of grey matter partial volume estimates, the voxelwise frameworks.
values themselves can be the features. Alternatively, a smaller number of We study these effects in simulated and real data, and suggest simple
more highly-condensed features, such as volumes of grey matter within models for removing bias and increasing the accuracy of delta estimation.
multiple distinct brain regions of interest, may be generated. The entire This includes models for correcting underestimation of brain aging,
* Corresponding author.
E-mail address: [email protected] (S.M. Smith).
https://doi.org/10.1016/j.neuroimage.2019.06.017
Received 25 February 2019; Received in revised form 1 June 2019; Accepted 5 June 2019
Available online 12 June 2019
1053-8119/© 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
S.M. Smith et al. NeuroImage 200 (2019) 528–539
removing the resulting dependency of delta on age, modelling nonlinear estimated β1 to be biased towards zero (Fig. 1), resulting in under-fitting
dependence of brain aging (as a function of age), and studying non- to Y and hence “moving” age dependence into δ1 :
additive brain age delta. One of the causes of bias (regression dilution,
see below) has recently also been discussed in depth in [Le et al., 2018], 1. It is typical for more sophisticated fitting methods (e.g., sparse/reg-
where the same linear correction as Eq. (4) below was proposed. A ularised regression) to result in underestimation of β1 [Grosenick
closely-related linear correction was also recently proposed in [Liang et al., 2013].
et al., 2019], although these authors suggest that regression towards the 2. It is common for datasets to not have a Gaussian distribution (across
mean is the only source of bias, and we find empirically that all subjects) for age, often because of hard limits on age in the study
non-Gaussian distribution of subject ages can be a major cause of bias. design. In the linear model framework, it does not generally matter
what the distribution of the predictors is (here, X), but a non-Gaussian
2. Methods and results distribution in the independent variable (here Y) can cause serious
bias (typically underestimation) in the estimated β1 .
In order to present the clearest description of the gradual develop- 3. Errors in measurement of the predictors (X) also likely cause under-
ment of models throughout the paper, we do not separate out Methods estimation of β1 (“regression dilution”) [Le et al., 2018].
and Results sections, but intermix discussions of model improvements
with simulation and real-data results. Note that the above formulations, as they stand, model the entire set
of subjects together, to estimate brain age and the delta, and these causes
2.1. Common approach of bias often exist in such a scenario. In practice, researchers often use
cross-validation (or a group of healthy subjects and a clinically distinct
We start with some basic definitions, and outline the typical approach subject group), where model parameters are learned from training data
for brain age delta estimation. Actual age is Y (an Nsubjects 1 vector), and then applied to left-out data in order to estimate delta, in a way that
brain age is YB and the delta is δ ¼ YB Y. The imaging data matrix is X, is more robust against model over-fitting. Applying such cross-validation
which has Nsubjects rows and D columns; the columns are features from the (as opposed to all-in-one fitting) can be yet another factor causing non-
imaging data, and might be different voxels, or different IDPs (imaging- orthogonality between estimated delta and age (because, like with reg-
derived phenotypes - summary measures of brain structure and ularisation, the tendency would be to increase under-fitting).
function).1 Finally, with β1 being underestimated, and age-dependence therefore
It is common to try to predict YB from X: moved into δ1 , there is the danger that association tests against non-
imaging variables (e.g., cognitive status, health outcomes) will be
YB ¼ Y þ δ ¼ f ðXÞ: (1) dominated by true age (i.e., be driven by the aging process), rather than
the intended brain age delta. Obviously this can be eliminated through
Although many alternatives have been proposed for the form of f ðÞ
careful deconfounding of the non-imaging variables, but with age-
and the fitting of its model parameters, the issues explored in this paper
dependence left in δ, this still results in loss of statistical sensitivity. To
generally hold, irrespective of these choices (e.g., most brain age litera-
re-iterate the danger here: if (as seems to happen frequently in the
ture shows underestimation of brain age for old subjects, and over-
literature) estimated brain age delta is not orthogonal to age, and if other
estimation in young subjects). We start with a very simple linear model,
variables (cognitive status, health measures, etc.) have not been decon-
with f ðXÞ ¼ Xβ, a multiple regression with parameters β (a D 1 vector).
founded with respect to age, then any apparent associations between
We therefore have:
“brain age delta” and the non-imaging measures might be more driven by
Y ¼ Xβ1 δ1 ; (2) age and not true delta.
where we have added subscripts to differentiate some variables in this 2.2. Stage-2 correction of delta
model from later variants. Clearly δ1 is the brain age delta being esti-
mated, and the noise term in this formulation. One very simple approach to correct for all of the above issues is to
This multiple regression can be solved with standard simple remove age dependence in δ1 in a second step. For now we will assume
methods,2 for example, setting β1 ¼ X þ Y using the pseudo-inverse X þ ¼ only linear corrections, and will return to nonlinear correction later.
ðX'XÞ1 X'. This gives the initially predicted brain age YB1 ¼ XX þ Y, and The method is easily motivated by considering the scatterplot of δ1
brain age delta: against Y (Fig. 2). As discussed above, ideally we would want the data to
be distributed around δ1 ¼ 0, with no overall slope (dependence on Y). If
δ1 ¼ Xβ1 Y ¼ XX þ Y Y ¼ ðXX þ IÞY: (3) an overall slope is present, we can simply fit a straight line to the full data
Two complications are immediately clear: δ1 will be orthogonal to the cloud and subtract this:
imaging data matrix X (though there is no reason to assume that this is
δ1 ¼ Yβ2 þ δ2 (4)
desirable), and it will not be orthogonal to age (Y). This latter issue is
likely to be problematic, given the simplest conception of the delta as where we have defined δ2 (the residuals from this fitting), to be
being the difference between brain age and age, as a useful (objective, orthogonal to age, with biases (and modelling failures due to poor X)
non-changing) description of a subject's brain-health that is not depen- removed.
dent on their current age. In the extreme case of X being an entirely Although it is perfectly convenient (and easy to understand) to carry
useless model for brain aging, Xβ1 0 and δ1 will just be 1 times the out this modelling in two separate steps, they can easily be combined into
actual age. a single calculation:
In addition, there are several factors that almost always cause the
δ2 ¼ δ1 Yβ2
¼ δ1 YY þ δ1
(5)
1
We assume throughout that Y and all columns in X are demeaned (shifted to ¼ ðI YY þ Þδ1
have zero mean), to simplify equations with no loss of generality. Also, for ¼ MY ðXX þ IÞY
readability, we do not in general differentiate in our notation between true
parameters and estimates of those parameters. ¼ MY XX þ Y (6)
2
Alternatively, if this is poorly conditioned, e.g., with the number of features
D too large, using penalised regression.
529
S.M. Smith et al. NeuroImage 200 (2019) 528–539
Fig. 1. Examples of different causes of biased estimation of the model fit. The black line shows the correct model fit (B ¼ A), while the red line shows the fitted
model (B ¼ Aβ). A. The model A is Gaussian distributed, and simulated B is set to be equal to A, with Gaussian noise added to B. The true model fit (β ¼ 1) is correctly
estimated. B. Increased measurement noise on B increases the error in the model fit (σ ), but does not cause bias (even though the data cloud appears rotated). C.
Changing A to being non-Gaussian does not harm the model fitting. D. Applying regularised model-fitting (here Tikhonov regularisation) biases the model fit towards
zero (even though the data cloud is not rotated). E. Adding measurement noise to A (i.e., after it has been used to generate B) causes regression dilution. F. Truncating
B (a special case of non-Gaussianity in B) causes a biased model fit.
Fig. 2. Examples of the two stages of age prediction. The black lines show the ideal unbiased model fits. A,D. The plots of predicted brain age from steps 1 and 2,
vs. actual age. B,E. The plots of estimated brain age delta from steps 1 and 2, vs. actual age. C,F. The plots of estimated brain age delta from steps 1 and 2, vs. true delta.
530
S.M. Smith et al. NeuroImage 200 (2019) 528–539
where MY ¼ I YY þ is the “residual-forming matrix”, which orthogon- modelling for an appropriate choice of J. By definition, U is orthonormal,
alises a vector with respect to Y.3 The second term in the brackets in Eqn. and hence, as discussed above, this causes δ2 and δ3 to be the same (apart
(5) disappears because MY Y ¼ 0. A simple intuitive interpretation of this from the overall scaling). It is important to note that, even when using
is that XX þ Y is the predicted age from step 1, and therefore δ2 is just the effective regularisation, it remains the case (as demonstrated in examples
orthogonalisation of this with respect to actual age. The predicted brain below) that the initial estimate (δ1 ) most commonly used in brain age
age from this second step is YB2 ¼ Y þ δ2 . studies remains suboptimal and biased. In the following sections we
Fig. 2 uses a simple simulation to illustrate the two stages of the illustrate this with simulated and real data.
model fitting described above. In A and B the biases in the one-stage
modelling are apparent, and these are removed in D and E. Further-
2.5. Cross-validation
more, the correlation of estimated delta with the true delta is improved in
the second step.
Although not a primary focus of this paper, we briefly discuss here the
use of cross-validation. In general, when there is any risk of an analysis
2.3. Switching predictor matrix and age over-fitting the data (e.g., where there is some data-dependence in the
processing, such as is the case with any supervised machine learning), it
Alternatively, one could frame the modelling in an arguably more is important to use methods such as cross-validation to avoid inflated
natural framework, with X being dependent on YB : estimates of modelling success.
The models covered in this paper are sufficiently well-conditioned
X ¼ f ðYB Þ
¼ ðY þ δ3 Þγ þ ε ðfor exampleÞ (7) when run on large subject numbers that cross-validation is not ex-
¼ Yγ þ δ3 γ þ ε pected to change results greatly. In practice, we found that results re-
ported in the various simulations reported below were only significantly
where γ is a 1 D row vector, meaning that Yγ is a rank-1 approximation altered (when using cross-validation) where the models fitted had the
to X. This approach has the advantage of looking “causally sensible” greatest numbers of features - i.e., when using the full (no SVD) model for
(brain aging affects what we measure of the brain). Also, if this is solved X, or when using the maximum number of SVD eigenvectors. Neverthe-
using Y as the predictor variable, δ3 will be treated as being in the re- less, all results reported below were estimated using cross-validation,
siduals and hence is defined as being orthogonal to age (and not to X). which we now describe briefly.
Finally, some problems such as regression dilution go away, as we can We applied 10-fold cross-validation, where the data samples are
generally assume that there are no errors in Y. randomly assigned into 10 roughly equal-sized groups. For each group of
However, one might (in general correctly) expect that this model is left out data, the other 90% of samples (subjects) are used to “train”, i.e.,
not as statistically powerful as the above approaches, given that each estimate, model parameters. These parameters are then applied to the
column in X is being modelled separately from each other (with respect left-out subjects. In this case, the training refers to 4 analysis stages: a)
to estimating γ). confound removal (for real data), b) SVD reduction, c) δ1 initial esti-
Again, this formulation can be easily solved: mation and d) δ2=3 correction.
Finally, note that if automated model tuning is to be carried out (e.g.,
γ ¼ Y þX optimal SVD dimensionality to be estimated from the data), then clearly
X Yγ ¼ X YY þ X ¼ MY X ¼ δ3 γ þ ε (8) this also should be done within a cross-validation framework.
δ3 ¼ MY XX'Yk
where scaling factor k ¼ ðY'YÞ=ðY'XX'YÞ, as derived from multiplying by 2.6. Evaluations with simulations and real data
the right-pseudo-inverse of γ. This defines the residuals ε as being
vertically orthogonal to age, and horizontally orthogonal to γ. We now present results from simple but realistic simulations and real
It can immediately be seen that δ3 becomes equal to kδ2 if X is first data.
orthonormalised before being used here (i.e., X þ ¼ X'). This makes
intuitive sense as we would now be saying that the columns in X do not 2.6.1. Simulation 1
depend on each other, so the potential for statistical insensitivity in the For Simulation 1, we set the number of samples (subjects) to 20000,
formulation of Eq. (7) disappears. set a fairly sharply truncated (non-Gaussian) distribution for the age
range (total range approximately 45-75y), and added Gaussian δ with
2.4. Improved prediction accuracy via singular value decomposition standard deviation 2y to form the gold-standard brain age. We then
defined 100 underlying components (processes) of subject variation in
It is straightforward (and typical) for the first stage of the common “brain imaging” measures, the first being brain age, and the other 99
approach to use regularised modelling applied to X, particularly given being random. We then mixed these ground truth population modes by a
that X is frequently formed from (unwrapped) maps of voxels, resulting in (100x3000) sparse mixing matrix (random Gaussian noise to the fifth
there being too many variables in X for trivial application of multiple power) to form 3000 imaging variables, resulting in an X of size
regression. Even if this were not the case, regularised estimations are 20000x3000. Finally, we standardised all columns in X to 1, and added
known to often perform better than non-regularised estimations in terms measurement noise with a standard deviation of 0.5. (Reducing this noise
of statistical efficiency, for a sufficiently good choice of the regularisation to the kinder level of 0.1 does not make a large qualitative difference to
parameter. the results.) We ran the simulation 20 times, and show the mean and
As one option for regularisation, X can be pre-reduced using SVD standard deviation results across the 20.
(singular value decomposition): X ¼ USV', where the J strongest eigen- The results can be seen in Table 1. The most important results are the
subject-vectors from U (those that explain the strongest variance in X) 3 right-most columns, where (in the simulated datasets) Q shows how
would be used in place of X. This procedure is sometimes referred to as well the different estimates of δ correlate with the true δ. The first row
principal-component-regression [Massy, 1965; Franke et al., 2010], and shows results when the “raw” X is used (i.e., without SVD). This is out-
has the general effect of “denoising” X and therefore improving overall performed by the best SVD-based analyses. However, even in this case it
is clear that δ1 is not as accurate at recovering the true δ as δ2 .
The best estimates of brain age delta are when using δ2 or δ3 with an
3 SVD reduction to 100 components. Reducing the number of subjects to
Here we are using pseudo-inverse notation even though Y is just a column
vector. 500 gives qualitatively similar results, with the best option again being
531
S.M. Smith et al. NeuroImage 200 (2019) 528–539
Table 1
Results from quantitative evaluations of brain age estimation. We show results for two different simulations and one real
dataset. Different rows are for different SVD dimensionality reductions, with the first in each experiment being with no use of
SVD at all. Columns 2–5 show the correlations of various estimated measures with true age. The next three show the mean
absolute value of three different brain age δ estimates (for the true δ in the simulations, it is 1.6y). The final three columns (“Q”)
show the most important aspect of the results: in the case of the simulations, these are the correlation between the true delta and
the different estimations of delta. In the case of the real data from UK Biobank, where we do not know the true delta, we are using
as a surrogate a summary measure of the significance of the correlation between estimated delta and 5792 non-imaging vari-
ables.
SVD reduction to 100 components (and best δ2=3 prediction of true delta While δ2 and δ3 are identical5 (up to a scaling factor) when using SVD,
having a correlation of 0.82). their overall scaling performs quite differently, with the scaling of δ2
In all cases there is strong (negative) correlation between estimated being more stable and accurate overall. This means that correlations
brain age delta δ1 and actual age (column 2), an undesirable feature of between δ2 and δ3 with true δ are the same, but, because of the difference
the most simple brain age modelling. This is at its worst for poor models in overall scaling, once these are added to true age to form estimated
(very low SVD dimensionality), which in effect is like Fig. 1E (regression brain age, that will differ between the two models (as seen in r(Y, YB1 ) vs.
dilution). We do not show these correlations for δ2 and δ3 as these are r(Y, YB2 )).
zero by design. While many of these analyses show very good correlation
between actual age and predicted age (columns 3–5), it is clear (from the 2.6.2. Simulation 2
final 3 columns) that this is not a good indicator of successful modelling Simulation 2 reflects a much simpler scenario (probably unrealisti-
of the brain age δ. cally simple), in order to illustrate what happens if there are no other
When X is orthogonalised via SVD, δ2 and δ3 become identical (apart structured effects in the data apart from age dependency. In this case the
from scaling factor k), and results improve for sufficiently large values of 99 structured effects were removed, and noise standard deviation raised
J. The best results are achieved when setting the SVD dimensionality J to to 10. When using the original X (i.e., no SVD), δ2 performs significantly
be the “correct” number (here 100). Underestimating dimensionality worse than δ3 . Using SVD gives results that are as good as when using δ3
(e.g., here J ¼ 50) damages estimation much more than overestimation with no SVD (and using J > 1 gives almost identical results; using J ¼ 1
(e.g., J ¼ 1000), an important result to keep in mind when choosing J. works well here because of the true data rank being 1). Most importantly,
The estimates of mean absolute values of δ largely show the expected the simple common approach (δ1 ) still results in suboptimal estimation of
pattern of results: smaller δ in general indicates more successful model- true delta, and strong incorrect correlation with age.
ling (as judged by the gold standard metric, the accuracy of recovering
the true δ). This is expected because successful modelling of age implies 2.6.3. Real data
relatively small modelling residuals, upon which delta is based. However, The real data evaluation used IDPs from 19000 subjects in UK Bio-
there are clearly exceptions to this, e.g., with the smallest (across bank. We used 2641 IDPs spanning a range of structural, diffusion and
different models) jδ2 j corresponding to total modelling failure (SVD fMRI phenotypes, to create X. Confounds were removed from the data as
J ¼ 1).4 More importantly (in practice), the optimal modelling choices done in [Miller et al., 2016; Elliott et al., 2018] (although of course
can show slightly “supoptimal” mean absolute δ (see UK Biobank results age-dependent confounds were not removed from X). While it is common
discussed below). Hence it is important to realise that simply choosing (and generally sensible) to derive brain-aging models from healthy sub-
(or tuning) a method that minimises estimated δ does not guarantee to jects only, and then apply those models to all subjects (including those
give the best overall results.
5
In fact, as can been seen in the “Q” columns, in practice here they are not
4
In the case where X is a useless model, δ2 tends to zero (because X explains exactly identical, simply because these simulated datasets are fit in a cross-
none of Y), whereas δ3 tends to infinity (because Y explains none of X: the fit validated way, with randomly assigned fold memberships causing small varia-
becomes horizontal and the horizontal errors large). tions in outcomes.
532
S.M. Smith et al. NeuroImage 200 (2019) 528–539
with disease), we did not remove individual subjects from the modelling being modulated by a random number uniformly drawn from 0:1. The
here using UK Biobank data, given that the fractions of imaged subjects results were unchanged with respect to the optimal dimensionality (still
having specific existent diagnoses are low (less than 10% having mental being 100, and having Q ¼ 0.97). However, now the results were more
health or neurological diagnoses). symmetric about this, i.e., lower dimensionalities did not perform as
The methods described above were applied to estimate brain age badly, with even a low J of 25 giving Q ¼ 0.8.
delta. As this is real data we do not know the “true” delta, and therefore The real data results are illustrated further in Fig. 3, showing clearly
as a surrogate for this, for the “Q” columns, we instead estimated the the value in correcting the estimate of brain age delta, and the danger of
significance of the correlation between estimated deltas and 5792 non- computing associations between biased brain age delta and non-imaging
imaging variables, after applying the same deconfounding (though now variables that have not been deconfounded for age.
including age-dependent confounds) to those variables. In general we
would hope that “higher correlation is better”: accurately estimated brain
age delta should have stronger correlations with interesting non-imaging 2.7. Nonlinear relationships between age and imaging data
variables such as health outcomes and biophysical markers. In order to
turn the 5792 correlations into a single summary statistic, with emphasis One cannot assume that the effect of aging on imaging measures is a
on stronger associations rather than weakest (null) associations, we took linear function of age; indeed, acceleration of the effects of aging (in
the 99th percentile of log10 ðPÞ over all non-imaging variables as our older age) seems quite likely, particularly in disease. The above models
measure of success (“Q”). are simple to adapt to include an additive nonlinear term in Y, the most
Doing SVD reduction with J 50 components gave the best results, natural extension being to add a quadratic term.6
and δ2=3 gave much stronger associations with non-imaging variables
than δ1 . Encouragingly, a relatively wide range of J (e.g., 20–100) all YB ¼ Y þ Y 2 α þ δ ¼ Xβ (9)
gave similar strong results. The “symmetry” of Q (as a function of J, To adapt the first model described above, we can integrate the
looking either side of the optimal dimensionality) for real data is quadratic correction into the second step (that also corrects for bias in
different than the pattern found in the simulated data, where success fell estimating the linear effect in step 1). Hence we subsume the quadratic
off very quickly as dimensionality was reduced to being lower than the term into the initially estimated δ (which is particularly straightforward
true dimensionality. We hypothesised that this was because the simu-
lated data had a true strong cutoff in the eigenspectrum, whereas real
data has a much more gradually falling eigenspectrum, as structured 6
Without loss of generality, and to simplify notation and interpretation, in
signals are of varying strength. We re-ran Simulation 1, this time with the our equations and in the simulations, by Y 2 we mean a quadratic term that has
strengths of the added 99 dimensions of structured subject covariation been orthogonalised with respect to Y and demeaned. If Y has already been
demeaned, then Y 2 will already be close to being orthogonal to Y.
533
S.M. Smith et al. NeuroImage 200 (2019) 528–539
if we assume that the quadratic term has been orthogonalised with where γ becomes two rows of parameters (and hence incorporating α),
respect to Y): and for the final line we have post-multiplied by just the first row of γ þ ,
i.e., X'Yk. According to the model in Eq. (13), the two rows of γ should be
δ1 ¼ Y 2 α þ δ ¼ Xβ1 Y ¼ XX þ Y Y ¼ ðXX þ IÞY ðstep 1Þ (10)
identical (up to a scale factor), but in practice there will normally be
much less variance in the data associated with the quadratic term, and we
δ1 ¼ Y Y 2 β2q þ δ2q ¼ Y2 β2q þ δ2q ðstep 2Þ (11) find indeed that results are improved by only using the linear term (i.e.,
using the first row of γ). (Again, in some populations, it may be better to
where β2q has two free parameters, covering the linear and quadratic
use the quadratic part of the model rather than the linear, for the estimate
regressors. Note that the first step is identical to what we had before; of γ.)
hence the unchanged subscript in δ1 . This can all be combined: Table 2 shows quantitative results from simulations of nonlinear
brain aging, and from the UK Biobank real data. For the simulations, we
δ2q ¼ δ1 Y2 β2q
¼ MY2 XX þ Y;
(12) start with the first linear simulation above, and add quadratic aging ef-
fects. For these 3 simulations we set true quadratic-term α values (Eq. (9))
where MY2 ¼ I Y2 Y þ 2.
of 0, 0.01 and 0.025 respectively, corresponding to total deviations away
Fig. 4 uses a simple simulation to illustrate the two stages of the from linear brain aging of 0, 4y and 10y.
model fitting described above. In A and B the biases in the one-stage With no simulated nonlinear effect (Sim1), the quadratic models give
modelling are apparent, and these are removed in D and E. Further- identical results to the linear models (i.e., there is no noticeable penalty
more, the correlation of estimated delta with the true delta is improved in being paid for the additional model flexibility, presumably because,
the second step. given the simplicity of the corrections to delta, overfitting is negligible).
Note that step 1 above assumes that it is most useful to group the When a quadratic effect is included in the simulation, the quadratic-
quadratic term into the residuals of the first model fitting. It may be that correction models (i.e., generating δ2q and δ3q ) provide a major
some study populations have a linear age effect that is smaller than the improvement in accuracy over the linear models. Accuracy of recovering
quadratic, for example with young healthy adults around the peak of the the true δ remains high even for large amounts of nonlinear behaviour,
lifespan “growth-aging inverted-U curve”. In such cases, it might be providing J is optimal. As before, optimal SVD data reduction out-
possible that step 1 should fit to the quadratic term rather than the linear, performs using the original data matrix X.
with step 2 being unchanged. The quadratic modelling in the UKB real data accounts approximately
For the second model, that switches predictor matrix and age, again for a 2y total deviation from the linear fit. Including this quadratic
the extension to a quadratic term in Y is straightforward: modelling results in improvements in the strength of associations with
non-imaging variables, although these improvements are here very
X ¼ Y þ Y 2 α þ δ3q γ þ ε (13) small. In disease populations this might be expected to be much greater.
Where the nonlinear effect is greater, it is straightforward (particularly
γ ¼ Yþ for the first, two-stage, model) to extend the above formulations to higher
2X (14)
powers, or, possibly more well-conditioned, nonlinear modelling such as
with splines.
X Y2 γ ¼ X Y2 Y þ
2 X ¼ MY2 X ¼ δ3q γ þ ε (15)
In all cases the “common” approach (δ1 ) performs significantly worse
than these models that correct for bias and nonlinearity in brain age
δ3q ¼ MY2 XX'Yk; (16)
prediction.
Fig. 4. Examples of the two stages of quadratic-fit age prediction. The black lines show the ideal linear model fits. A,D. The plots of predicted brain age from steps
1 and 2, vs. actual age. B,E. The plots of estimated brain age delta from steps 1 and 2, vs. actual age. C,F. The plots of estimated brain age delta from steps 1 and 2, vs.
true delta.
534
S.M. Smith et al. NeuroImage 200 (2019) 528–539
Table 2
Results from quantitative evaluations of brain age estimation in the presence of nonlinear brain aging. We show results
for 3 realistic simulations and a real dataset. The 3 simulations have different strengths of nonlinearity in brain age Y 2 α, with α ¼
0; 0:01; 0:025 respectively. Different rows are for different SVD dimensionality reductions, with the first in each experiment
being with no use of SVD. We show mean absolute value of different brain age δ estimates (for the true δ in the simulations, it is
1.6y) and then correlations between estimated and true δ (for the simulations) and significance of associations with non-imaging
variables (for real data), as above.
2.8. Non-additive brain age delta right is a pointwise product between the two column vectors δ0 and ð1 þ
λY0 Þ. The reason for formulating things this way will now become clear,
A different model for brain aging would be for each person to be aging where we solve by squaring, taking the natural log, approximating the
at a different rate, meaning that their delta (according to the above log function on the right using a power series expansion, and solving with
models) is changing with age, as opposed to being fixed. Unfortunately, linear regression:
this cannot trivially be distinguished from having constant delta, at the
level of the individual subject, if there is only one measurement (time- log δ2 ¼ log δ20 þ log ð1 þ λY0 Þ2
2
point) per subject. ¼ log δ0 þ 2logð1 þ λY0 Þ (18)
This ambiguity arises because subject-specific rate of aging would be log δ2 2 ¼ logðjδjÞ D0 þ λY0 ;
considered to create a spread of trajectories around the population mean
aging curve. By this definition, the effect of interest is orthogonal to the where D0 ¼ logðδ20 Þ is the noise in this model and this assumes that λ is
population mean aging effect, and therefore gets included in estimates of not very large and negative. Hence fitting the model Y0 to data logðjδjÞ
delta given above. Hence, the above aging models do not explicitly gives us an initial λ, which we can then adjust for the expansion
identify multiplicative brain aging, yet readily fit data from individual approximation error:
subjects at a single timepoint as an additive offset term regardless of the
λ0 ¼ Y þ
0 logðjδjÞ (19)
cause (constant offset vs. multiplicative).
However, while additive and multiplicative brain aging cannot easily
be disambiguated at the subject level, it is possible to apply the above λ1 ¼ eλ0 1: (20)
models, and then use the resulting delta values to estimate how scaling of To evaluate this with simulated data, we took the first linear simu-
the size of delta is changing with age in the population as a whole: lation described above, and added age-dependence by setting true
lambda to 0.5 according to Eq. (17). We then ran the simulation 10 times,
δ ¼ δ0 ð1 þ λY0 Þ; (17)
with J ¼ 100, estimating λ from δ2 . This resulted in estimated λ ¼
where we form a temporary version of age Y0 , which is a linear mapping 0:46 0:09.
of Y into the range 0:1, and hence jδ0 j relates to the brain age delta On real data from UK Biobank, using δ2q estimated using optimal SVD
distribution for the youngest subjects in the data. The product on the setting of J ¼ 50, we find that λ ¼ 0.13, with the regression-based fitting
535
S.M. Smith et al. NeuroImage 200 (2019) 528–539
also estimating this as being significantly greater than zero (P ¼ 0.001). same was true for males (r ¼ 0.97). From the all-in-one analysis, females
This corresponds to an increase in “spread” of brain age delta of about had a mean brain age delta that was 0.7y higher than in males.
jδjλ ¼ 0:4y when moving from the youngest to oldest subjects in UKB We then correlated these various versions of brain age delta against
(45–80y). This provides evidence for a modest non-additive brain aging 5792 non-imaging variables from UK Biobank, and converted these un-
effect in this largely healthy, aging population. corrected P-values into log10 P. As expected from the above results,
these vectors of correlation significance (with one entry in the vector for
2.9. Further results from UK Biobank brain age estimation each non-imaging variable) were highly similar when comparing sex-
separated delta estimation for females against when taking the deltas
We carried out brain age delta estimation on UK Biobank data as for females from the all-in-one estimation (r ¼ 0.98). The same was found
described above, and also for females and males separately. Below we for males (r ¼ 0.98). Hence, for sex-specific results reported below, we
report results using δ2q . only list those from the sex-separated delta estimations.
The delta estimations for females were almost identical when Comparing log10 P derived from all subjects (both sexes) from the
comparing delta estimated purely for females vs. estimated from the all- all-in-one analysis against the sex-separated estimates of log10 P showed
in-one delta estimation from females and males together (r ¼ 0.98). The greater differences (all subjects vs. female: r ¼ 0.87, vs. male: r ¼ 0.73).
Table 3
The strongest associations between brain age delta and non-imaging variables in UK Biobank, 19038
subjects, males and females combined. Positive correlations (red italics) imply accelerated brain aging.
536
S.M. Smith et al. NeuroImage 200 (2019) 528–539
Table 4
The strongest associations between brain age delta and non-imaging variables in UK Biobank, for just
the 10112 females, and just the 8926 males. Positive correlations (red italics) imply accelerated brain aging.
Comparing log10 P derived from female-only analysis against male-only Tables 3 and 4.7 Bonferroni correction, for the number of non-imaging
gave r ¼ 0.36. variables, gives a threshold for log10 P of 5.1, but to limit the number
The strongest associations with non-imaging variables are shown in of reported associations to a reasonable degree (and because effect sizes
become tiny at this threshold, the weakest passing this being 0.1%
variance explained!), we report results for log10 P > 8. The non-imaging
7 variables are denoted via unique variable codes and brief descriptions.
Note that there is not a monotonic relationship between r and P, because of
different missing-data patterns in different variables resulting in varying N. Also,
For more information on any variable, one can take the initial integer
unlike with the results correlating delta against the imaging IDPs below, here we from the ID, and search for the variable on the UK Biobank website: https
sort on P rather than r, given that the r values are rather small, leading to ://biobank.ctsu.ox.ac.uk/showcase/search.cgi.
increased relevance of P. For these lists we have manually removed largely-redundant (highly
537
S.M. Smith et al. NeuroImage 200 (2019) 528–539
similar) variables for purposes of readability, listing just the strongest Table 5
associated result in each case. If a variable is positively correlated with The strongest associations between brain age delta and
brain age delta, this implies that accelerated brain aging is associated imaging-derived phenotypes (IDPs) in UK Biobank, for all sub-
with larger values of that variable, i.e., it is a “bad” life factor or bio- jects, and for just the 10112 females, and just the 8926 males.
logical measure. Of course, the results do not allow one to infer causality. IDP names listed in blue italics denotes where the correlations differ
between females and males by more than 0.05.
There are several strong patterns that emerge. Higher body weight,
body fat and bone density are all associated with reduced brain aging,
largely in females. Higher blood pressure, heart rate and blood haemo-
globin are all associated with accelerated brain aging, in both females
and males, but to a slightly stronger extent in males.
Smoking and alcohol are associated with accelerated brain aging.
There are several life-factor/life-style measures associated with delta,
presumably reflecting socio-economic status, where the biological cau-
sality is likely complex (e.g., household income, number of vehicles in
household, and possibly also related, time spent outdoors in summer).
Several measures from the cognitive testing are associated with brain age
delta, in all cases in the direction one might expect; measures of success/
accuracy correlate negatively, and measures of “time taken” (to complete
a cognitive task) correlate positively.
Finally, two associations related to clinical diagnosis/treatment, both
found in males, are that the number of treatments/medication taken, and
diagnoses of diabetes, are both associated with accelerated brain aging.
We also looked at the correlations between delta and the imaging
features (IDPs). This is expected to largely reflect which IDPs most
strongly contribute to the modelling of the brain age delta, but of course,
being a univariate analysis, is straightforward to interpret and does not
take into account redundancy across IDPs. Table 5 lists these, sorted
according to decreasing strength of correlations computed from all sub-
jects (females and males), but also showing correlations for just females
and just males. We do not report P here, as, given the relatively strong
correlations, the P values are too strongly significant to be differentially
informative. IDPs are included in the table if any of the 3 correlations
(from females only, males only, or all subjects) are stronger than 0.3.
We highlight in blue the IDPs for which the correlation with brain age
delta is different between females and males by more than 0.05. More
detailed descriptions of the IDPs (including expansions of some of the
anatomical acronyms) can be found here; http://www.fmrib.ox.ac.uk/
ukbiobank/IDPinfo_Jan2018.txt.
The strongest associations with delta are for grey and white total
volumes, both normalised for head size and unnormalised. (Note also
that we have used fully deconfounded versions of the IDPs for these
correlations with delta, which included regressing out head size). There
are then many measures of white matter microstructure (derived from
the diffusion MRI) that correlate with delta, a minority of which are
reasonably strongly different between the sexes.
3. Conclusions
538
S.M. Smith et al. NeuroImage 200 (2019) 528–539
studies that has carried out linear age adjustment of delta is [Cole et al., Acknowledgments
2017]. Interestingly, in this case, the adjustment reduced the strength of
associations with external information (in that case, heritability). While We are grateful for funding from the Wellcome Trust (grants 098369/
this goes against most of our findings above, it may relate to non-additive Z/12/Z and 203139/Z/16/Z). This research has been conducted in part
aging, in which case, combining the modelling (for non-additive aging) using the UK Biobank Resource under Application Number 8107. We are
presented here with the corrected delta models presented above might grateful to UK Biobank for making the data available, and to all UK
optimise a delta estimation. Alternatively, it may simply be that the bias Biobank study participants, who generously donated their time to make
(present in delta before correction) contributed to apparent heritability, this resource possible. Thanks to Saad Jbabdi, Matteo Bastiani, Elliot
given that twins have the same age and so experience related bias. Tucker-Drob and Simon Cox for discussions on this work.
One simple, effective and computationally cheap tool for modelling
brain age (before delta estimation) is to reduce the input data features Appendix A. Supplementary data
(from the brain imaging) via SVD. We have shown that there is likely
to be an optimal SVD dimensionality for the data reduction, and that Supplementary data to this article can be found online at https://doi.
it is safer to slightly overestimate rather than underestimate this org/10.1016/j.neuroimage.2019.06.017.
dimensionality.
To apply the recommended approach for estimating brain age delta, References
apply the following.
Cole, J.H., Franke, K., 2017. Predicting age using neuroimaging: innovative brain ageing
biomarkers. Trends Neurosci. 40, 681–690.
1. Your vector of ages is Y (subjects 1). Cole, J.H., Poudel, R.P., Tsagkrasoulis, D., Caan, M.W., Steves, C., Spector, T.D.,
2. Your matrix of brain imaging measures is X (subjects features/ Montana, G., 2017. Predicting brain age with deep learning from raw imaging data
voxels). results in a reliable and heritable biomarker. Neuroimage 163, 115–124.
Elliott, L., Sharp, K., Alfaro-Almagro, F., Shi, S., Miller, K., Douaud, G., Marchini, J.,
3. Subtract the means from Y and all columns in X. Smith, S., 2018. Genome-wide association studies of brain imaging phenotypes in UK
4. Use SVD to replace X with its top 10–25% vertical eigenvectors. Biobank. Nature 562, 210–216.
5. Compute Y 2 , demean it and orthogonalise it with respect to Y to give Franke, K., Ziegler, G., Kl€oppel, S., Gaser, C., ADNI, 2010. Estimating the age of healthy
subjects from T1-weighted MRI scans using kernel methods: exploring the influence
Y 2o . of various parameters. Neuroimage 50, 883–892.
6. Create matrix Y2 ¼ ½Y Y 2o . Grosenick, L., Klingenberg, B., Katovich, K., Knutson, B., Taylor, J.E., 2013. Interpretable
whole-brain prediction analysis with GraphNet. Neuroimage 72, 304–321.
7. The initial model is YB1 ¼ Xβ1 þ δ1 . Do: Le, T., Kuplicki, R., McKinney, B., Yeh, H.-W., Thompson, W., Paulus, M., Tulsa1000,
(a) Compute initial age prediction β1 ¼ X þ Y giving YB1 ¼ Xβ1 2018. A nonlinear simulation framework supports adjusting for age when analyzing
(where X þ is the pseudo-inverse of X). BrainAGE. Front. Aging Neurosci. 10, 317. https://www.frontiersin.org/art
icle/10.3389/fnagi.2018.00317.
(b) Compute initial brain age delta δ1 ¼ YB1 Y.
Liang, H., Zhang, F., Niu, X., August 1, 2019. Investigating systematic bias in brain age
8. The corrected model is δ1 ¼ Y2 β2 þ δ2q . Do: estimation with application to post-traumatic stress disorders. Hum. Brain Mapp. 40
(a) Compute corrected model fit β2 ¼ Y þ 2 δ1 (correcting for bias in the
(11), 3143–3152.
Massy, W., 1965. Principal components regression in exploratory statistical research.
initial fit and quadratic brain aging). J. Am. Statistical Ass. 60, 234–256.
(b) Compute final brain age delta δ2q ¼ δ1 Y2 β2 . Miller, K., Alfaro-Almagro, F., Bangerter, N., Thomas, D., Yacoub, E., Xu, J., Bartsch, A.,
Jbabdi, S., Sotiropoulos, S., Andersson, J., Griffanti, L., Douaud, G., Okell, T.,
Weale, P., Dragonu, I., Garratt, S., Hudson, S., Collins, R., Jenkinson, M.,
All associations of brain-age delta with UK Biobank non-imaging
Matthews, P., Smith, S., 2016. Multimodal population brain imaging in the UK
variables are listed in a supplementary spreadsheet. Example Matlab Biobank prospective epidemiological study. Nat. Neurosci. 19, 1523–1536.
code for the delta computations and all simulations can be found at:
http://www.fmrib.ox.ac.uk/BrainAgeDelta.
539